Hi,
This might probably look very trivial to you, but this was my problem:
On internet, there are sites with the TV guide. I wanted to download these guides in
plucker format. A major problem was that these sites use large headers, images and
other garbage that I wanted to get rid of.
This was my solution:
1) I made a script that creates a home-page, calls plucker and puts the result in the
install queue. This is makegids.pl (the bottom part). The most interesting part for
you is in lines 246, 255, 256, which call plucker.
The environment variable in line 255 functions as a call-back: "perl $0" is the same
script, "filter" is an argument that is picked up at the start of the script,
"$source" is an argument that specifies the kind of filtering ("tvgids" is the only
one that actually works). A third argument will be given and that is the file in which
the HTML file is.
2) The thing in PLUCKER_FILTER_PAGE will be executed for every downloaded thing. This
is done by modifying Spider.py a little bit (I'm a novice in python, in fact, this
file was the first I saw of python, so I did the real work in perl :-) ). The addition
is in lines 451-464: It writes the (temporary) file, calls the call back, which may
modify the file and reads back the (modified) contents.
3) The filtering tries to recognize the kind of page (in my case the interesting ones
were: 'time table' pages and 'description' pages).
Well, I think this is all I did. I think this is a nice feature that might be very
handy in certain situations. And it's also easy to add. Maybe it's a good idea for the
next version. Btw, can you tell me when that next version (with color support) is
coming. I have a color palm (m515) and color pages would be a big extra.agb
makegids.pl
Description: Binary data
Spider.py
Description: Binary data
