Do you have the following lines in your conf/log4j.properties file?

log4j.logger.org.apache.nutch.protocol.httpclient=DEBUG,cmdstdout
log4j.logger.org.apache.commons.httpclient.auth=DEBUG,cmdstdout

We need to enable the DEBUG logs for httpclient in this manner. Could
you please do this and send me a new hadoop.log file?

Regards,
Susam Pal

On Thu, Dec 16, 2010 at 5:14 AM, Claudio Martella
<[email protected]> wrote:
>
> Hi susam,
>
> i attach here a tar.gz of my hadoop.log, nutch-site.xml and
> httpclient-auth.xml.
>
> On 12/15/10 6:21 PM, Susam Pal wrote:
> > Could you please set the scheme to "NTLM" and realm to your domain? For
> > example, if you log into your Windows network as: EXAMPLE\admin, your realm
> > would be "EXAMPLE".
> >
> > It would help if you delete any existing hadoop.log file, perform a fresh
> > crawl and attach the complete hadoop.log file so that we can have a look at
> > the complete log file ourselves.
> >
> > Regards,
> > Susam Pal
> >
> > On Wed, Dec 15, 2010 at 10:47 PM, Claudio Martella <
> > [email protected]> wrote:
> >
> >> Hi Susam,
> >>
> >> thanks for your answer.
> >>
> >> 1) yes I've overridden the plugin.includes property and added the
> >> protocol-httpclient
> >> 2) doesn't apply to me
> >> 3) I have configured httpclient-auth.xml like in my last email.
> >> 4) Yes, the page is fetched
> >> 5) The only thing i see in the logs is the thing i pasted. There's no
> >> "Credentials - username ... set". This is tricky.
> >> 6) I saw what I showed in the last email about the selected credentials.
> >>
> >> even if the webserver was expecting ntlm, why wouldn't it authenticate
> >> anyways?
> >>
> >> On 12/15/10 6:06 PM, Susam Pal wrote:
> >>> From the logs, it looks like your server requires NTLM authentication.
> >> Could
> >>> you please go through the "Need Help?" section of
> >>> http://wiki.apache.org/nutch/HttpAuthenticationSchemes and provide all
> >> the
> >>> information requested there?
> >>>
> >>> Regards,
> >>> Susam Pal
> >>>
> >>> On Wed, Dec 15, 2010 at 10:30 PM, Claudio Martella <
> >>> [email protected]> wrote:
> >>>
> >>>> Hello list,
> >>>>
> >>>> I'm trying to crawl an intranet site which is behind authentication. The
> >>>> webserver is behind Digest authentication.
> >>>> My plugin.includes has the protocol-httpclient specified and I have
> >>>> httpclient-auth.xml set like this:
> >>>>
> >>>> <auth-configuration>
> >>>> <credentials username="user" password="password">
> >>>> <default scheme="digest"/>
> >>>> </credentials>
> >>>> </auth-configuration>
> >>>>
> >>>> I've also tried without specifying the scheme. Here's what comes out of
> >>>> the httpclient logs:
> >>>>
> >>>> Supported authentication schemes in the order of preference: [ntlm,
> >>>> digest, basic]
> >>>> ntlm authentication scheme selected
> >>>> Using authentication scheme: ntlm
> >>>> Authorization challenge processed
> >>>> Supported authentication schemes in the order of preference: [ntlm,
> >>>> digest, basic]
> >>>> ntlm authentication scheme selected
> >>>> Using authentication scheme: ntlm
> >>>> Authorization challenge processed
> >>>>
> >>>> Here's a like from hadoop.log
> >>>>
> >>>> 2010-12-15 17:51:29,853 INFO  httpclient.HttpMethodDirector - No
> >>>> credentials available for NTLM <any realm>@192.168.10.210:8090
> >>>>
> >>>> I've also tried an <authscope host="192.168.10.210" port="8090"
> >>>> scheme="digest"/> but nothings changes.
> >>>>
> >>>> Does anybody have an idea of what's going on? I'm using nutch 1.2 in
> >>>> standalone mode.
> >>>>
> >>>>
> >>>> Thanks
> >>>>
> >>>> --
> >>>> Claudio Martella
> >>>> Digital Technologies
> >>>> Unit Research & Development - Analyst
> >>>>
> >>>> TIS innovation park
> >>>> Via Siemens 19 | Siemensstr. 19
> >>>> 39100 Bolzano | 39100 Bozen
> >>>> Tel. +39 0471 068 123
> >>>> Fax  +39 0471 068 129
> >>>> [email protected] http://www.tis.bz.it
> >>>>
> >>>> Short information regarding use of personal data. According to Section
> >> 13
> >>>> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you
> >> that we
> >>>> process your personal data in order to fulfil contractual and fiscal
> >>>> obligations and also to send you information regarding our services and
> >>>> events. Your personal data are processed with and without electronic
> >> means
> >>>> and by respecting data subjects' rights, fundamental freedoms and
> >> dignity,
> >>>> particularly with regard to confidentiality, personal identity and the
> >> right
> >>>> to personal data protection. At any time and without formalities you can
> >>>> write an e-mail to [email protected] in order to object the processing
> >> of
> >>>> your personal data for the purpose of sending advertising materials and
> >> also
> >>>> to exercise the right to access personal data and other rights referred
> >> to
> >>>> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> >>>> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> >>>> complete information on the web site www.tis.bz.it.
> >>>>
> >>>>
> >>>>
> >>
> >> --
> >> Claudio Martella
> >> Digital Technologies
> >> Unit Research & Development - Analyst
> >>
> >> TIS innovation park
> >> Via Siemens 19 | Siemensstr. 19
> >> 39100 Bolzano | 39100 Bozen
> >> Tel. +39 0471 068 123
> >> Fax  +39 0471 068 129
> >> [email protected] http://www.tis.bz.it
> >>
> >> Short information regarding use of personal data. According to Section 13
> >> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that 
> >> we
> >> process your personal data in order to fulfil contractual and fiscal
> >> obligations and also to send you information regarding our services and
> >> events. Your personal data are processed with and without electronic means
> >> and by respecting data subjects' rights, fundamental freedoms and dignity,
> >> particularly with regard to confidentiality, personal identity and the 
> >> right
> >> to personal data protection. At any time and without formalities you can
> >> write an e-mail to [email protected] in order to object the processing of
> >> your personal data for the purpose of sending advertising materials and 
> >> also
> >> to exercise the right to access personal data and other rights referred to
> >> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> >> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> >> complete information on the web site www.tis.bz.it.
> >>
> >>
> >>
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> [email protected] http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13 of 
> Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
> process your personal data in order to fulfil contractual and fiscal 
> obligations and also to send you information regarding our services and 
> events. Your personal data are processed with and without electronic means 
> and by respecting data subjects' rights, fundamental freedoms and dignity, 
> particularly with regard to confidentiality, personal identity and the right 
> to personal data protection. At any time and without formalities you can 
> write an e-mail to [email protected] in order to object the processing of 
> your personal data for the purpose of sending advertising materials and also 
> to exercise the right to access personal data and other rights referred to in 
> Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation 
> Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete 
> information on the web site www.tis.bz.it.
>

Reply via email to