Run bin/nutch dedup segments dedup.tmp
Dima Mazmanov wrote:
Hi all!! I'm running on nutch-0.7.1.
Here is result of my search.
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.org/RootPages/Default.aspx (Cached)
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.com/rootpages/Default.aspx (Cached)
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.com/RootPages/Default.aspx (Cached)
ArGo Software Design Homepage [html] - 30.2 k - ... Look of our Web
Site Our web site has new look and ... link on the ...
http://www.argosoft.org/rootpages/Default.aspx (Cached)
As you can see one result is shown multiple times.
Why so? What is the difference between these links? I don't see any..
So, how can I avoid this problem?
Thanks, Regards, Dima
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general