Hello,

I have been experimenting with Apache Nutch version 1.11 for few days. My use 
case is to crawl a forum in local mode. Seed url text just contains one entry :


http://www.flyertalk.com/forum/united-airlines-mileageplus/1736400-have-simple-question-about-united-airlines-mileageplus-ask-here-2016-a.html


Nutch config is pasted @ http://pasted.co/782e59ad


I issue the following commands :-

1. nutch generate ftalk-db/ ftalk-db/segments/ -depth 5 -topN 500

2. nutch fetch ftalk-db/segments/20160125154244/


I am struggling to find, why nutch keeps on fetching same page multiple times. 
Instead of getting unique web-pages at the end of crawl, I get lot of 
duplicates.


Please suggest what am I doing wrong ?


Thanks,

Hussain

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Reply via email to