Yeah seriously - if NTLM auth (or HTTP Basic for that matter) is supported natively by Nutch, I'd love to read the documentation on it!
-- Jim On 10/14/06, Tomi NA <[EMAIL PROTECTED]> wrote:
2006/10/14, Toufeeq Hussain <[EMAIL PROTECTED]>: > From internal tests with ntlmaps + Nutch the conclusion we came to was > that though it "kinda-works" it puts a huge load on the Nutch server > as ntlmaps is a major memory-hog and the mixture of the two leads to > performance issues. For a PoC this will do but for > production-deployments I would not suggest one goes the ntlmaps way. > > An alternate would be to have a separate ntlmaps-server ,a dedicated > machine acting as the NTLM proxy for the Nutch-box which sits behind > it. I haven't noticed the added resource drain, but then again, I haven't really tested all that much: the constraints on the partical project I implemented the approach weren't very strict. I'll keep my eye on the cpu usage. > The right way would be to use the in-built authentication features of > Nutch for Auth based crawling. Nutch supports ntlm authentication? I see I've got some reading to catch up on... t.n.a.
