To answer my own question, I now realize there is an entry in crawl-urlfilter.txt to ignore query strings by default. I commented that out and it works now.

Chris Stephens wrote:
How do I get Nutch to follow URLs that contain a query string such as ?blah=something at the end of the url? Nutch seems to ignore these and I didn't find any configuration option to enable this. Does a plugin or some such exist to facilitate following these types of links?





Reply via email to