Run bin/nutch dedup segments dedup.tmp
Dima Mazmanov wrote:
Hi all!! I'm running on nutch-0.7.1.
Here is result of my search.
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.org/RootPages/Default.aspx (Cached)
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.com/rootpages/Default.aspx (Cached)
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.com/RootPages/Default.aspx (Cached)
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.org/rootpages/Default.aspx (Cached)
As you can see one result is shown multiple times.
Why so? What is the difference between these links? I don't see any..
So, how can I avoid this problem?
Thanks, Regards, Dima