Could you please set the scheme to "NTLM" and realm to your domain? For
example, if you log into your Windows network as: EXAMPLE\admin, your realm
would be "EXAMPLE".

It would help if you delete any existing hadoop.log file, perform a fresh
crawl and attach the complete hadoop.log file so that we can have a look at
the complete log file ourselves.

Regards,
Susam Pal

On Wed, Dec 15, 2010 at 10:47 PM, Claudio Martella <
[email protected]> wrote:

> Hi Susam,
>
> thanks for your answer.
>
> 1) yes I've overridden the plugin.includes property and added the
> protocol-httpclient
> 2) doesn't apply to me
> 3) I have configured httpclient-auth.xml like in my last email.
> 4) Yes, the page is fetched
> 5) The only thing i see in the logs is the thing i pasted. There's no
> "Credentials - username ... set". This is tricky.
> 6) I saw what I showed in the last email about the selected credentials.
>
> even if the webserver was expecting ntlm, why wouldn't it authenticate
> anyways?
>
> On 12/15/10 6:06 PM, Susam Pal wrote:
> > From the logs, it looks like your server requires NTLM authentication.
> Could
> > you please go through the "Need Help?" section of
> > http://wiki.apache.org/nutch/HttpAuthenticationSchemes and provide all
> the
> > information requested there?
> >
> > Regards,
> > Susam Pal
> >
> > On Wed, Dec 15, 2010 at 10:30 PM, Claudio Martella <
> > [email protected]> wrote:
> >
> >> Hello list,
> >>
> >> I'm trying to crawl an intranet site which is behind authentication. The
> >> webserver is behind Digest authentication.
> >> My plugin.includes has the protocol-httpclient specified and I have
> >> httpclient-auth.xml set like this:
> >>
> >> <auth-configuration>
> >> <credentials username="user" password="password">
> >> <default scheme="digest"/>
> >> </credentials>
> >> </auth-configuration>
> >>
> >> I've also tried without specifying the scheme. Here's what comes out of
> >> the httpclient logs:
> >>
> >> Supported authentication schemes in the order of preference: [ntlm,
> >> digest, basic]
> >> ntlm authentication scheme selected
> >> Using authentication scheme: ntlm
> >> Authorization challenge processed
> >> Supported authentication schemes in the order of preference: [ntlm,
> >> digest, basic]
> >> ntlm authentication scheme selected
> >> Using authentication scheme: ntlm
> >> Authorization challenge processed
> >>
> >> Here's a like from hadoop.log
> >>
> >> 2010-12-15 17:51:29,853 INFO  httpclient.HttpMethodDirector - No
> >> credentials available for NTLM <any realm>@192.168.10.210:8090
> >>
> >> I've also tried an <authscope host="192.168.10.210" port="8090"
> >> scheme="digest"/> but nothings changes.
> >>
> >> Does anybody have an idea of what's going on? I'm using nutch 1.2 in
> >> standalone mode.
> >>
> >>
> >> Thanks
> >>
> >> --
> >> Claudio Martella
> >> Digital Technologies
> >> Unit Research & Development - Analyst
> >>
> >> TIS innovation park
> >> Via Siemens 19 | Siemensstr. 19
> >> 39100 Bolzano | 39100 Bozen
> >> Tel. +39 0471 068 123
> >> Fax  +39 0471 068 129
> >> [email protected] http://www.tis.bz.it
> >>
> >> Short information regarding use of personal data. According to Section
> 13
> >> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you
> that we
> >> process your personal data in order to fulfil contractual and fiscal
> >> obligations and also to send you information regarding our services and
> >> events. Your personal data are processed with and without electronic
> means
> >> and by respecting data subjects' rights, fundamental freedoms and
> dignity,
> >> particularly with regard to confidentiality, personal identity and the
> right
> >> to personal data protection. At any time and without formalities you can
> >> write an e-mail to [email protected] in order to object the processing
> of
> >> your personal data for the purpose of sending advertising materials and
> also
> >> to exercise the right to access personal data and other rights referred
> to
> >> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> >> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> >> complete information on the web site www.tis.bz.it.
> >>
> >>
> >>
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> [email protected] http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13
> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we
> process your personal data in order to fulfil contractual and fiscal
> obligations and also to send you information regarding our services and
> events. Your personal data are processed with and without electronic means
> and by respecting data subjects' rights, fundamental freedoms and dignity,
> particularly with regard to confidentiality, personal identity and the right
> to personal data protection. At any time and without formalities you can
> write an e-mail to [email protected] in order to object the processing of
> your personal data for the purpose of sending advertising materials and also
> to exercise the right to access personal data and other rights referred to
> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> complete information on the web site www.tis.bz.it.
>
>
>

Reply via email to