By default the regex-urlfilter.txt file excludes URLs that contain query strings (i.e. include "?"). Could somebody explain the reason for excluding these sites. Is there something risky about including them in a crawl? Is there anyone who is no excluding these files, and if so, how has it worked out? The reason I ask is that some of the domains I'm hoping to crawl use query strings for most of their pages.
Thanks, Bryan
