By default the regex-urlfilter.txt file excludes URLs that contain query 
strings (i.e. include "?"). Could somebody explain the reason for excluding 
these sites. Is there something risky about including them in a crawl? Is 
there anyone who is no excluding these files, and if so, how has it worked 
out? The reason I ask is that some of the domains I'm hoping to crawl use 
query strings for most of their pages.

Thanks,
Bryan

Reply via email to