Hi, I have 100+ different sites ( and may be more will be added in near future), I have to crawl them and extract my required information from each site. So each site would have its own extraction rule ( XPaths).
So far I have seen there is no built-in mechanism in Nutch to fulfill my requirement and I may have to write custom HTMLParserFilter extension and IndexFilter plugin. And I may have to write 100+ switch cases in my plugin to handle the extraction rules of each site.... Is this the best way to handle my requirement or there is any better way to handle it ? Thanks for your support & help. Tony.

