So what filter settings do you use?
Like this +^http://([a-z0-9]*\.)*bbc.co.uk/
Then you will get bbc.co.uk and www.bbc.co.uk <http://www.bbc.co.uk/> and
since this site is dynamic, content might bee different.
Have the same problem myself :-(




-----------------------------------
Well my script already contains this command....




   Run bin/nutch dedup segments dedup.tmp


   Dima Mazmanov wrote:
       Hi all!! I'm running on nutch-0.7.1.

       Here is result of my search.

       ArGo Software Design Homepage [html] - 30.2 k - ... Look of our
       Web Site Our web site has new look and ... link on the ...
       http://www.argosoft.org/RootPages/Default.aspx (Cached) ArGo
       Software Design Homepage [html] - 30.2 k - ... Look of our Web
       Site Our web site has new look and ... link on the ...
       http://www.argosoft.com/rootpages/Default.aspx (Cached) ArGo
       Software Design Homepage [html] - 30.2 k - ... Look of our Web
       Site Our web site has new look and ... link on the ...
       http://www.argosoft.com/RootPages/Default.aspx (Cached) ArGo
       Software Design Homepage [html] - 30.2 k - ... Look of our Web
       Site Our web site has new look and ... link on the ...
       http://www.argosoft.org/rootpages/Default.aspx (Cached)

       As you can see one result is shown multiple times.
       Why so? What is the difference between these links? I don't see any..
       So, how can I avoid this problem?
       Thanks, Regards, Dima


Reply via email to