Could you please set the scheme to "NTLM" and realm to your domain? For example, if you log into your Windows network as: EXAMPLE\admin, your realm would be "EXAMPLE".
It would help if you delete any existing hadoop.log file, perform a fresh crawl and attach the complete hadoop.log file so that we can have a look at the complete log file ourselves. Regards, Susam Pal On Wed, Dec 15, 2010 at 10:47 PM, Claudio Martella < [email protected]> wrote: > Hi Susam, > > thanks for your answer. > > 1) yes I've overridden the plugin.includes property and added the > protocol-httpclient > 2) doesn't apply to me > 3) I have configured httpclient-auth.xml like in my last email. > 4) Yes, the page is fetched > 5) The only thing i see in the logs is the thing i pasted. There's no > "Credentials - username ... set". This is tricky. > 6) I saw what I showed in the last email about the selected credentials. > > even if the webserver was expecting ntlm, why wouldn't it authenticate > anyways? > > On 12/15/10 6:06 PM, Susam Pal wrote: > > From the logs, it looks like your server requires NTLM authentication. > Could > > you please go through the "Need Help?" section of > > http://wiki.apache.org/nutch/HttpAuthenticationSchemes and provide all > the > > information requested there? > > > > Regards, > > Susam Pal > > > > On Wed, Dec 15, 2010 at 10:30 PM, Claudio Martella < > > [email protected]> wrote: > > > >> Hello list, > >> > >> I'm trying to crawl an intranet site which is behind authentication. The > >> webserver is behind Digest authentication. > >> My plugin.includes has the protocol-httpclient specified and I have > >> httpclient-auth.xml set like this: > >> > >> <auth-configuration> > >> <credentials username="user" password="password"> > >> <default scheme="digest"/> > >> </credentials> > >> </auth-configuration> > >> > >> I've also tried without specifying the scheme. Here's what comes out of > >> the httpclient logs: > >> > >> Supported authentication schemes in the order of preference: [ntlm, > >> digest, basic] > >> ntlm authentication scheme selected > >> Using authentication scheme: ntlm > >> Authorization challenge processed > >> Supported authentication schemes in the order of preference: [ntlm, > >> digest, basic] > >> ntlm authentication scheme selected > >> Using authentication scheme: ntlm > >> Authorization challenge processed > >> > >> Here's a like from hadoop.log > >> > >> 2010-12-15 17:51:29,853 INFO httpclient.HttpMethodDirector - No > >> credentials available for NTLM <any realm>@192.168.10.210:8090 > >> > >> I've also tried an <authscope host="192.168.10.210" port="8090" > >> scheme="digest"/> but nothings changes. > >> > >> Does anybody have an idea of what's going on? I'm using nutch 1.2 in > >> standalone mode. > >> > >> > >> Thanks > >> > >> -- > >> Claudio Martella > >> Digital Technologies > >> Unit Research & Development - Analyst > >> > >> TIS innovation park > >> Via Siemens 19 | Siemensstr. 19 > >> 39100 Bolzano | 39100 Bozen > >> Tel. +39 0471 068 123 > >> Fax +39 0471 068 129 > >> [email protected] http://www.tis.bz.it > >> > >> Short information regarding use of personal data. According to Section > 13 > >> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you > that we > >> process your personal data in order to fulfil contractual and fiscal > >> obligations and also to send you information regarding our services and > >> events. Your personal data are processed with and without electronic > means > >> and by respecting data subjects' rights, fundamental freedoms and > dignity, > >> particularly with regard to confidentiality, personal identity and the > right > >> to personal data protection. At any time and without formalities you can > >> write an e-mail to [email protected] in order to object the processing > of > >> your personal data for the purpose of sending advertising materials and > also > >> to exercise the right to access personal data and other rights referred > to > >> in Section 7 of Decree 196/2003. The data controller is TIS Techno > >> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > >> complete information on the web site www.tis.bz.it. > >> > >> > >> > > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > [email protected] http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 > of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to [email protected] in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to > in Section 7 of Decree 196/2003. The data controller is TIS Techno > Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > complete information on the web site www.tis.bz.it. > > >

