[Nutch Wiki] Update of HttpAuthenticationSchemes by s usam

2010-03-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The HttpAuthenticationSchemes page has been changed by susam.
http://wiki.apache.org/nutch/HttpAuthenticationSchemes?action=diffrev1=18rev2=19

--

  === Important Points ===
   1. For authscope tag, 'host' and 'port' attribute should always be 
specified. 'realm' and 'scheme' attributes may or may not be specified 
depending on your needs. If you are tempted to omit the 'host' and 'port' 
attribute, because you want the credentials to be used for any host and any 
port for that realm/scheme, please use the 'default' tag instead. That's what 
'default' tag is meant for.
   1. One authentication scope should not be defined twice as different 
authscope tags for different credentials tag. However, if this is done by 
mistake, the credentials for the last defined authscope tag would be used. 
This is because, the XML parsing code, reads the file from top to bottom and 
sets the credentials for authentication-scopes. If the same authentication 
scope is encountered once again, it will be overwritten with the new 
credentials. However, one should not rely on this behavior as this might change 
with further developments.
-  1. Do not define multiple authscope tags with the same host, port but 
different realms if the server requires NTLM authentication. This means there 
should not be multiple tags with same host, port, scheme=NTLM but different 
realms. If you are omitting the scheme attribute and the server requires NTLM 
authentication, then there should not be multiple tags with same host, port but 
different realms. This is discussed more in the next section.
+  1. Do not define multiple authscope tags with the same host, port but 
different realms if the server requires NTLM authentication. This means there 
should not be multiple authscope tags with same host, port, scheme=NTLM but 
different realms. If you are omitting the scheme attribute and the server 
requires NTLM authentication, then there should not be multiple tags with same 
host, port but different realms. This is discussed more in the next section.
   1. If you are using NTLM scheme, you should also set the 'http.agent.host' 
property in conf/nutch-site.xml
  
  === A note on NTLM domains ===
  NTLM does not use the concept of realms. Therefore, multiple realms for a 
web-server can not be defined as different authentication scopes for the same 
web-server requiring NTLM authentication. There should be exactly one authscope 
tag for NTLM scheme authentication scope for a particular web-server. The 
authentication domain should be specified as the value of the 'realm' 
attribute. NTLM authentication also requires the name of IP address of the host 
on which the crawler is running. Thus, 'http.agent.host' should be set properly.
  
  == Underlying HttpClient Library ==
- 'protocol-httpclient' is based on 
[[http://jakarta.apache.org/httpcomponents/httpclient-3.x/|Jakarta Commons 
HttpClient]]. Some servers support multiple schemes for authenticating users. 
Given that only one scheme may be used at a time for authenticating, it must 
choose which scheme to use. To accompish this, it uses an order of preference 
to select the correct authentication scheme. By default this order is: NTLM, 
Digest, Basic. For more information on the behavior during authentication, you 
might want to read the 
[[http://jakarta.apache.org/httpcomponents/httpclient-3.x/authentication.html|HttpClient
 Authentication Guide]].
+ 'protocol-httpclient' is based on 
[[http://hc.apache.org/httpclient-3.x/|Jakarta Commons HttpClient]]. Some 
servers support multiple schemes for authenticating users. Given that only one 
scheme may be used at a time for authenticating, it must choose which scheme to 
use. To accomplish this, it uses an order of preference to select the correct 
authentication scheme. By default this order is: NTLM, Digest, Basic. For more 
information on the behavior during authentication, you might want to read the 
[[http://hc.apache.org/httpclient-3.x/authentication.html|HttpClient 
Authentication Guide]].
  
  == Need Help? ==
  If you need help, please feel free to post your question to the 
[[http://lucene.apache.org/nutch/mailing_lists.html#Users|nutch-user mailing 
list]]. The author of this work, Susam Pal, usually responds to mails related 
to authentication problems. The DEBUG logs may be required to troubleshoot the 
problem. You must enable the debug log for 'protocol-httpclient' before running 
the crawler. To enable debug log for 'protocol-httpclient', open 
'conf/log4j.properties' and add the following line:


[Nutch Wiki] Update of HttpAuthenticationSchemes by s usam

2010-03-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The HttpAuthenticationSchemes page has been changed by susam.
The comment on this change is: Added suggestion to enable debug for for Jakarta 
Commons HttpClient.
http://wiki.apache.org/nutch/HttpAuthenticationSchemes?action=diffrev1=19rev2=20

--

  'protocol-httpclient' is based on 
[[http://hc.apache.org/httpclient-3.x/|Jakarta Commons HttpClient]]. Some 
servers support multiple schemes for authenticating users. Given that only one 
scheme may be used at a time for authenticating, it must choose which scheme to 
use. To accomplish this, it uses an order of preference to select the correct 
authentication scheme. By default this order is: NTLM, Digest, Basic. For more 
information on the behavior during authentication, you might want to read the 
[[http://hc.apache.org/httpclient-3.x/authentication.html|HttpClient 
Authentication Guide]].
  
  == Need Help? ==
- If you need help, please feel free to post your question to the 
[[http://lucene.apache.org/nutch/mailing_lists.html#Users|nutch-user mailing 
list]]. The author of this work, Susam Pal, usually responds to mails related 
to authentication problems. The DEBUG logs may be required to troubleshoot the 
problem. You must enable the debug log for 'protocol-httpclient' before running 
the crawler. To enable debug log for 'protocol-httpclient', open 
'conf/log4j.properties' and add the following line:
+ If you need help, please feel free to post your question to the 
[[http://lucene.apache.org/nutch/mailing_lists.html#Users|nutch-user mailing 
list]]. The author of this work, Susam Pal, usually responds to mails related 
to authentication problems. The DEBUG logs may be required to troubleshoot the 
problem. You must enable the debug log for 'protocol-httpclient' and Jakarta 
Commons !HttpClient before running the crawler. To enable debug log for 
'protocol-httpclient' and !HttpClient, open 'conf/log4j.properties' and add the 
following line:
  {{{
  log4j.logger.org.apache.nutch.protocol.httpclient=DEBUG,cmdstdout
+ log4j.logger.org.apache.commons.httpclient.auth=DEBUG,cmdstdout
  }}}
  
  It would be good to check the following things before asking for help.