Hi Folks,

The site we're crawling serves up pages both via http and https. There are 
links switching from one to the other depending on the page. When this happens, 
I'll see two results which are almost identical except one page is http and the 
next is https. Is there any way to remove those duplicates through normal nutch 
config? There are some pages that only show up via https, so I can't just 
exclude those. 

Thanks,
Matt

Reply via email to