Nutch 1.14 is using HttpClient 3.x which does not work with NTLM2. Not sure
if that's your case. To get auth to work, we've had to migrate the
httpclient plugin to use HttpClient 4.x
This may have been done in Nutch 1.15
On Fri., Apr. 26, 2019, 10:24 a.m. Larry.Santello,
wrote:
> Been reading a
+1
On Wed., Oct. 2, 2019, 1:55 p.m. Sebastian Nagel,
wrote:
> Hi Folks,
>
> A first candidate for the Nutch 1.16 release is available at:
>
>https://dist.apache.org/repos/dist/dev/nutch/1.16/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>https://g
The pages that I'm crawling are dynamically generated (i.e. using
javascript) for which purpose I am using the `protocol-selenium` plugin
instead of `protocol-http` as per
https://wiki.apache.org/nutch/AdvancedAjaxInteraction.
Problem:
protocol-selenium is using lib-selenium which unlike protocol
for " + key);
} catch (IllegalArgumentException e) {
LOG.warn("Could not decode: " + key + ", it probably wasn't encoded
in the first place..");
}
Commenting out the above resolves the issue, but I don't understand why
this workaround was added in the first place.
4 matches
Mail list logo