Do you have the following lines in your conf/log4j.properties file? log4j.logger.org.apache.nutch.protocol.httpclient=DEBUG,cmdstdout log4j.logger.org.apache.commons.httpclient.auth=DEBUG,cmdstdout
We need to enable the DEBUG logs for httpclient in this manner. Could you please do this and send me a new hadoop.log file? Regards, Susam Pal On Thu, Dec 16, 2010 at 5:14 AM, Claudio Martella <[email protected]> wrote: > > Hi susam, > > i attach here a tar.gz of my hadoop.log, nutch-site.xml and > httpclient-auth.xml. > > On 12/15/10 6:21 PM, Susam Pal wrote: > > Could you please set the scheme to "NTLM" and realm to your domain? For > > example, if you log into your Windows network as: EXAMPLE\admin, your realm > > would be "EXAMPLE". > > > > It would help if you delete any existing hadoop.log file, perform a fresh > > crawl and attach the complete hadoop.log file so that we can have a look at > > the complete log file ourselves. > > > > Regards, > > Susam Pal > > > > On Wed, Dec 15, 2010 at 10:47 PM, Claudio Martella < > > [email protected]> wrote: > > > >> Hi Susam, > >> > >> thanks for your answer. > >> > >> 1) yes I've overridden the plugin.includes property and added the > >> protocol-httpclient > >> 2) doesn't apply to me > >> 3) I have configured httpclient-auth.xml like in my last email. > >> 4) Yes, the page is fetched > >> 5) The only thing i see in the logs is the thing i pasted. There's no > >> "Credentials - username ... set". This is tricky. > >> 6) I saw what I showed in the last email about the selected credentials. > >> > >> even if the webserver was expecting ntlm, why wouldn't it authenticate > >> anyways? > >> > >> On 12/15/10 6:06 PM, Susam Pal wrote: > >>> From the logs, it looks like your server requires NTLM authentication. > >> Could > >>> you please go through the "Need Help?" section of > >>> http://wiki.apache.org/nutch/HttpAuthenticationSchemes and provide all > >> the > >>> information requested there? > >>> > >>> Regards, > >>> Susam Pal > >>> > >>> On Wed, Dec 15, 2010 at 10:30 PM, Claudio Martella < > >>> [email protected]> wrote: > >>> > >>>> Hello list, > >>>> > >>>> I'm trying to crawl an intranet site which is behind authentication. The > >>>> webserver is behind Digest authentication. > >>>> My plugin.includes has the protocol-httpclient specified and I have > >>>> httpclient-auth.xml set like this: > >>>> > >>>> <auth-configuration> > >>>> <credentials username="user" password="password"> > >>>> <default scheme="digest"/> > >>>> </credentials> > >>>> </auth-configuration> > >>>> > >>>> I've also tried without specifying the scheme. Here's what comes out of > >>>> the httpclient logs: > >>>> > >>>> Supported authentication schemes in the order of preference: [ntlm, > >>>> digest, basic] > >>>> ntlm authentication scheme selected > >>>> Using authentication scheme: ntlm > >>>> Authorization challenge processed > >>>> Supported authentication schemes in the order of preference: [ntlm, > >>>> digest, basic] > >>>> ntlm authentication scheme selected > >>>> Using authentication scheme: ntlm > >>>> Authorization challenge processed > >>>> > >>>> Here's a like from hadoop.log > >>>> > >>>> 2010-12-15 17:51:29,853 INFO httpclient.HttpMethodDirector - No > >>>> credentials available for NTLM <any realm>@192.168.10.210:8090 > >>>> > >>>> I've also tried an <authscope host="192.168.10.210" port="8090" > >>>> scheme="digest"/> but nothings changes. > >>>> > >>>> Does anybody have an idea of what's going on? I'm using nutch 1.2 in > >>>> standalone mode. > >>>> > >>>> > >>>> Thanks > >>>> > >>>> -- > >>>> Claudio Martella > >>>> Digital Technologies > >>>> Unit Research & Development - Analyst > >>>> > >>>> TIS innovation park > >>>> Via Siemens 19 | Siemensstr. 19 > >>>> 39100 Bolzano | 39100 Bozen > >>>> Tel. +39 0471 068 123 > >>>> Fax +39 0471 068 129 > >>>> [email protected] http://www.tis.bz.it > >>>> > >>>> Short information regarding use of personal data. According to Section > >> 13 > >>>> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you > >> that we > >>>> process your personal data in order to fulfil contractual and fiscal > >>>> obligations and also to send you information regarding our services and > >>>> events. Your personal data are processed with and without electronic > >> means > >>>> and by respecting data subjects' rights, fundamental freedoms and > >> dignity, > >>>> particularly with regard to confidentiality, personal identity and the > >> right > >>>> to personal data protection. At any time and without formalities you can > >>>> write an e-mail to [email protected] in order to object the processing > >> of > >>>> your personal data for the purpose of sending advertising materials and > >> also > >>>> to exercise the right to access personal data and other rights referred > >> to > >>>> in Section 7 of Decree 196/2003. The data controller is TIS Techno > >>>> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > >>>> complete information on the web site www.tis.bz.it. > >>>> > >>>> > >>>> > >> > >> -- > >> Claudio Martella > >> Digital Technologies > >> Unit Research & Development - Analyst > >> > >> TIS innovation park > >> Via Siemens 19 | Siemensstr. 19 > >> 39100 Bolzano | 39100 Bozen > >> Tel. +39 0471 068 123 > >> Fax +39 0471 068 129 > >> [email protected] http://www.tis.bz.it > >> > >> Short information regarding use of personal data. According to Section 13 > >> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that > >> we > >> process your personal data in order to fulfil contractual and fiscal > >> obligations and also to send you information regarding our services and > >> events. Your personal data are processed with and without electronic means > >> and by respecting data subjects' rights, fundamental freedoms and dignity, > >> particularly with regard to confidentiality, personal identity and the > >> right > >> to personal data protection. At any time and without formalities you can > >> write an e-mail to [email protected] in order to object the processing of > >> your personal data for the purpose of sending advertising materials and > >> also > >> to exercise the right to access personal data and other rights referred to > >> in Section 7 of Decree 196/2003. The data controller is TIS Techno > >> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > >> complete information on the web site www.tis.bz.it. > >> > >> > >> > > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > [email protected] http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 of > Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to [email protected] in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to in > Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation > Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete > information on the web site www.tis.bz.it. >

