Yeah - I saw that page too. Looks like it's been done... but no mention of how to do it.
This page on the wiki seems to indicate that a person by the name of Ken Meltsner had at least a partial solution: http://wiki.apache.org/nutch/TaskList?highlight=%28Authentication%29 Anyone? Does anybody know how to beat authentication? Thanks in advance. -- Jim On 9/9/06, Tomi NA <[EMAIL PROTECTED]> wrote:
On 9/8/06, Jim Wilson <[EMAIL PROTECTED]> wrote: > Dear Nutch User List, > > I am desperately trying to index an Intranet with the following > characteristics > > 1) Some sites require no authentication - these already work great! > 2) Some sites require basic HTTP Authentication. > 3) Some sites require NTLM Authentication. > 4) No sites require both HTTP and NTLM (only one or the other). > 5) The same Username/Password should work on all sites which require either > type of Authentication. > 6) For sites requiring NTLM Authentication, the same Domain is always used. > 7) If a site requires authentication, but the Username/Password mentioned > above fails, the site doesn't matter and does not need fetched/indexed. > > My question is this: How can I provide a default Username/Password/Domain > for Nutch to use when answering HTTP or NTLM challenges? > > (I really hope all I need is a couple of <property> tags in my > nutch-site.xml, but I'm beginning to doubt it). > > I love Nutch, and really want to use it. Please help if you know the > answer. Thanks! I'm also very interested in hearing more on the topic. The only mention of a solution to (a part of) this problem I found is http://www.dehora.net/journal/2005/11/nutch_with_basic_authentication.html t.n.a.
