Re: postprocessing downloaded HTML pages?

Daniel Serodio Wed, 05 Nov 2003 18:46:48 -0800

On Sat, 2003-11-01 at 14:40, Florian Hacker wrote:
> Is there any way to postprocess downloaded HTML files before plucking them with
> Plucker Desktop?
> 
> There are many sites that contain useful information like weather forecasts or
> cinema schedules. Unfortunately these pages are so big and blown up that  it's
> no use plucking them entirely, but it would be great to extract the contents of
> a specific HTML tag (like a specific table section). This could be easily done
> with Perl regexes, if there was a way to postprocess the fetched HTML before
> plucking.
> 
> Any suggestions?


        Well, it's not as easy as some Perl regexes, but it can be done with
JPluck and XSLT. I'm trying to use this combo to correct IBM
Developerwork's stupid HTML that marks titles with <span class="title">
instead of <h1>, but the problem is that this HTML is not "well-formed",
and doesn't even has a DOCTYPE declaration, So XML tools won't process
it. :-/

[]'s
Daniel Serodio

_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Re: postprocessing downloaded HTML pages?

Reply via email to