Hi, I want to crawl different forums through nutch and need to extract different posts from the pages in the forums.
Below are some of the queries 1. How can I implement different custom parser for different domains. Do I need to add different plugins for each domain . if yes then how nutch will identify which parser to use for particular domain 2. How de-duplication works in nutch . If I modify text column in hbase to suit my requirement will it effect de-duplication somehow Thanks and Regards, Ankit Gupta

