Data Extraction from 100+ different sites...

Tony Mullins Tue, 11 Jun 2013 07:08:06 -0700

Hi,

I have 100+ different sites ( and may be more will be added in near
future), I have to crawl them and extract my required information from each
site. So each site would have its own extraction rule ( XPaths).


So far I have seen there is no built-in mechanism in Nutch to fulfill my
requirement and I may  have to write custom HTMLParserFilter extension and
IndexFilter plugin.

And I may have to write 100+ switch cases in my plugin to handle the
extraction rules of each site....

Is this the best way to handle my requirement or there is any better way to
handle it ?

Thanks for your support & help.

Tony.

Data Extraction from 100+ different sites...

Reply via email to