Hi,

I want to crawl different forums through nutch and need to extract
different posts from the pages in the forums.

Below are some of the queries

1. How can I implement different custom parser for different domains. Do I
need to add different plugins for each domain . if yes then how nutch will
identify which parser to use for particular domain

2. How de-duplication works in nutch . If I modify text column in hbase to
suit my requirement will it effect de-duplication somehow


Thanks and Regards,
Ankit Gupta

Reply via email to