On 9/8/06, Jim Wilson <[EMAIL PROTECTED]> wrote:
Dear Nutch User List,

I am desperately trying to index an Intranet with the following
characteristics

1) Some sites require no authentication - these already work great!
2) Some sites require basic HTTP Authentication.
3) Some sites require NTLM Authentication.
4) No sites require both HTTP and NTLM (only one or the other).
5) The same Username/Password should work on all sites which require either
type of Authentication.
6) For sites requiring NTLM Authentication, the same Domain is always used.
7) If a site requires authentication, but the Username/Password mentioned
above fails, the site doesn't matter and does not need fetched/indexed.

My question is this: How can I provide a default Username/Password/Domain
for Nutch to use when answering HTTP or NTLM challenges?

(I really hope all I need is a couple of <property> tags in my
nutch-site.xml, but I'm beginning to doubt it).

I love Nutch, and really want to use it.  Please help if you know the
answer.  Thanks!

I'm also very interested in hearing more on the topic.
The only mention of a solution to (a part of) this problem I found is
http://www.dehora.net/journal/2005/11/nutch_with_basic_authentication.html

t.n.a.

Reply via email to