You can open an issue for rewriting httpclient for version 4 and maybe submit a patch ;)
A dirty fix would be hacking protocol-http to send a cookie or HTTP auth credentials along with its requests. > Is there an alternative to protocol-httpclient that can do basic auth? I am > running into a wall right now trying to get nutch to get anything past the > seed URL of my site. It requires auth, so I configured httpclient, which > (according to apache logs) is correctly sending credentials when it gets a > 401 auth request returned from the server, but after getting '/', it quits > with: > > Stopping at depth=1 - no more URLs to fetch. > > Running again stops at depth=0. The target page is an apache mod_autoindex > page with 15 or so directories listed so it should not be hitting any limit > since it only fetching the 1 page total (turned off the > ignore.db.internal.links option even though I think I read it only applies > to index scoring, not the crawlDB). I thought it might be one of the regexp > filters blocking, so I trimmed them down to +.*, still nothing. I pointed > it at a server that does not require auth, and it spit out a > "unzipBestEffort returned null" error, even though nothing on the page is > a zip/gz/tgz, and server compression is not on. I traced this to > NUTCH-990, which is marked "won't fix", and everything pointing at > upgrading to httpclient4 says it wont happen.... so is there an > alternative, or some way to get this working?? Crawling the non-auth site > with protocol-http works as expected, nutch starts crawling the autoindex > pages and I can watch from the console or the apache access log. > > -T

