Page filter

Anne-Gert Bultena (ELN) Tue, 25 Jun 2002 03:20:58 -0700

Hi,
 
This might probably look very trivial to you, but this was my problem:
 
On internet, there are sites with the TV guide. I wanted to download these guides in 
plucker format. A major problem was that these sites use large headers, images and 
other garbage that I wanted to get rid of.
 
This was my solution:
1) I made a script that creates a home-page, calls plucker and puts the result in the 
install queue. This is makegids.pl (the bottom part). The most interesting part for 
you is in lines 246, 255, 256, which call plucker.
The environment variable in line 255 functions as a call-back: "perl $0" is the same 
script, "filter" is an argument that is picked up at the start of the script, 
"$source" is an argument that specifies the kind of filtering ("tvgids" is the only 
one that actually works). A third argument will be given and that is the file in which 
the HTML file is.
2) The thing in PLUCKER_FILTER_PAGE will be executed for every downloaded thing. This 
is done by modifying Spider.py a little bit (I'm a novice in python, in fact, this 
file was the first I saw of python, so I did the real work in perl :-) ). The addition 
is in lines 451-464: It writes the (temporary) file, calls the call back, which may 
modify the file and reads back the (modified) contents.
3) The filtering tries to recognize the kind of page (in my case the interesting ones 
were: 'time table' pages and 'description' pages).
 
Well, I think this is all I did. I think this is a nice feature that might be very 
handy in certain situations. And it's also easy to add. Maybe it's a good idea for the 
next version. Btw, can you tell me when that next version (with color support) is 
coming. I have a color palm (m515) and color pages would be a big extra.

agb

makegids.pl
Description: Binary data

Spider.py
Description: Binary data

Page filter

Reply via email to