Hi all, I have a Palm handheld, and use the excellent (and written in Python) Plucker <http://www.plkr.org/> to spider webpages and format the results for viewing on the Palm.
One site I 'pluck' is the Daily Python URL <http://www.pythonware.com/daily/>. From the point of view of a daily custom 'newspaper' everything but the last day or two of URLs is so much cruft. (The cruft would be the total history of the last seven'ish days, the navigation links for www.pythonware.com, etc.) Today, I wrote a script to parse the Daily URL, and create a minimal local html page including nothing but the last n items, n links, or last n days worth of links. (Which is employed is a user option.) Then, I pluck that, rather than the actual Daily URL site. Works great. :-) (If anyone on the list is a fellow plucker'er and would be interested in my script, I'm happy to share.) In anticipation of wanting to do the same thing to other sites, I've spent a bit of time abstracting it. I've made some real progress. But, before I finish up, I've a voice in the back of my head asking if maybe I'm re-inventing the wheel. To my shame, I've not spent very much time at all exploring available frameworks and modules for any domain, and almost none for web-related tasks. So, does anyone know of any modules or frameworks which would make the sort of task I am describing easier? The difficulty in making my routine general is that pretty much each site will need its own code for identifying what counts as a distinct item (such as a URL and its description in the Daily URL) and what counts as a distinct block of items (such as a days worth of Daily URL items). I can't imagine there's a way around that, but if someone else has done much of the work in setting up the general structure to be tweaked for each site, that'd be good to know. (Doesn't feel like one that would be googleable.) Thanks for any suggestions, and best to all, Brian vdB _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor