Hi,

I have 100+ different sites ( and may be more will be added in near
future), I have to crawl them and extract my required information from each
site. So each site would have its own extraction rule ( XPaths).

So far I have seen there is no built-in mechanism in Nutch to fulfill my
requirement and I may  have to write custom HTMLParserFilter extension and
IndexFilter plugin.

And I may have to write 100+ switch cases in my plugin to handle the
extraction rules of each site....

Is this the best way to handle my requirement or there is any better way to
handle it ?

Thanks for your support & help.

Tony.

Reply via email to