kranthi:

i would try removing the authscope tag from the httpclient-auth.xml. though in my case i'm not going to an alternate port and you are, my working file does not have an authscope tag.

if that doesn't help, since you are crawling an intranet, do you have access to the http server's log? seeing that might help.

\dmc


At 4:04 PM +0530 9/9/09, kranthi reddy wrote:
Hi all,

 I am trying to crawl password protected web pages present in our intranet .
I don't know the reason why "*401 Authentication Required*" error creeps up.
I have gone through the previous mails sent by others, but it is not getting
resolved.

Below are the configuration files i have modified as told in "
http://wiki.apache.org/nutch/HttpAuthenticationSchemes";

My Url file contains single url  *"http://10.2.44.34:8088/xwiki/";  *(This
url is actually being redirect to "*
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=CDsTIqqN*";)

*"httpclient-auth.xml* "

                 <credentials username="xyz" password="xyz">
                 <default/>
                 <authscope host="10.2.44.34" port="8088"/>
                 </credentials>

*"nutch-default.xml"*

                 <property>
                 <name>plugin.includes</name>
                 <value>*protocol-httpclient|*
urlfilter-regex|parse-(text|html|js|zip)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|

summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
                 </property>

*OutPut Printed to Terminal*

Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting
Fetcher: segment: crawl/segments/20090909151219
Fetcher: threads: 10
QueueFeeder finished: total 1 records.
fetching http://10.2.44.34:8088/xwiki/
http.proxy.host = null
http.proxy.port = 8080
http.timeout = 10000
http.content.limit = -1
http.agent = iiith/Nutch-1.0 ([email protected])
protocol.plugin.check.blocking = false
protocol.plugin.check.robots = false
*Credentials - username: superadmin; set as default for realm: ; scheme:*
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
*Credentials - username: superadmin; set for AuthScope - host: 10.2.44.34;
port: 8088; realm: ; scheme:
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url: http://10.2.44.34:8088/robots.txt
url: http://10.2.44.34:8088/robots.txt; status code: 401; bytes received:
6739; Content-Length: 6739
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url: http://10.2.44.34:8088/xwiki/
url: http://10.2.44.34:8088/xwiki/; status code: 302; bytes received: 0;
Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/view/Main/*
-activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1
* queue: http://10.2.44.34
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1252489344874
  now           = 1252489344577
  0. http://10.2.44.34:8088/xwiki/bin/view/Main/
*fetching http://10.2.44.34:8088/xwiki/bin/view/Main/
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url: http://10.2.44.34:8088/xwiki/bin/view/Main/
url: http://10.2.44.34:8088/xwiki/bin/view/Main/; status code: 302; bytes
received: 0; Content-Length: 0; Location:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX*
-activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1
* queue: http://10.2.44.34
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1252489345884
  now           = 1252489345578
  0. http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX
*fetching
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX
url: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX;
status code: 401; bytes received: 6739; Content-Length: 6739
401 Authentication Required*
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: done



*LOG FILE IS*


2009-09-09 15:46:55,602 INFO  fetcher.Fetcher - fetching
http://10.2.44.34:8088/xwiki/
2009-09-09 15:46:55,657 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=1
2009-09-09 15:46:55,657 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=1
2009-09-09 15:46:55,691 INFO  httpclient.Http - http.proxy.host = null
2009-09-09 15:46:55,691 INFO  httpclient.Http - http.proxy.port = 8080
2009-09-09 15:46:55,691 INFO  httpclient.Http - http.timeout = 10000
2009-09-09 15:46:55,691 INFO  httpclient.Http - http.content.limit = -1
2009-09-09 15:46:55,691 INFO  httpclient.Http - http.agent = iiith/Nutch-1.0
([email protected])
2009-09-09 15:46:55,691 INFO  httpclient.Http -
protocol.plugin.check.blocking = false
2009-09-09 15:46:55,691 INFO  httpclient.Http - protocol.plugin.check.robots
= false
2009-09-09 15:46:55,695 DEBUG httpclient.Http - Credentials - username:
superadmin; set as default for realm: ; scheme:
2009-09-09 15:46:55,697 DEBUG httpclient.Http - Credentials - username:
superadmin; set for AuthScope - host: 10.2.44.34; port: 8088; realm: ;
scheme:
*2009-09-09 15:46:55,697 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/robots.txt
2009-09-09 15:46:55,942 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/robots.txt; status code: 401; bytes received: 6739;
Content-Length: 6739
2009-09-09 15:46:55,943 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/xwiki/
2009-09-09 15:46:55,946 INFO  httpclient.HttpMethodDirector - Redirect
requested but followRedirects is disabled
2009-09-09 15:46:55,946 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/xwiki/; status code: 302; bytes received: 0;
Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/view/Main/*
2009-09-09 15:46:56,657 INFO  fetcher.Fetcher - -activeThreads=1,
spinWaiting=1, fetchQueues.totalSize=1
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher - * queue: http://10.2.44.34
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   maxThreads    = 1
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   inProgress    = 0
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   crawlDelay    = 1000
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   minCrawlDelay = 0
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   nextFetchTime =
1252491417050
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   now           =
1252491416658
2009-09-09 15:46:56,658 INFO  fetcher.Fetcher -   0.
http://10.2.44.34:8088/xwiki/bin/view/Main/
2009-09-09 15:46:57,051 INFO  fetcher.Fetcher - fetching
http://10.2.44.34:8088/xwiki/bin/view/Main/
2*009-09-09 15:46:57,051 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/xwiki/bin/view/Main/
2009-09-09 15:46:57,056 INFO  httpclient.HttpMethodDirector - Redirect
requested but followRedirects is disabled
2009-09-09 15:46:57,057 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/xwiki/bin/view/Main/; status code: 302; bytes
received: 0; Content-Length: 0; Location:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1*
2009-09-09 15:46:57,658 INFO  fetcher.Fetcher - -activeThreads=1,
spinWaiting=1, fetchQueues.totalSize=1
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher - * queue: http://10.2.44.34
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   maxThreads    = 1
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   inProgress    = 0
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   crawlDelay    = 1000
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   minCrawlDelay = 0
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   nextFetchTime =
1252491418057
2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   now           =
1252491417659
*2009-09-09 15:46:57,659 INFO  fetcher.Fetcher -   0.
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1
2009-09-09 15:46:58,058 INFO  fetcher.Fetcher - fetching
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1
2009-09-09 15:46:58,058 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1
2009-09-09 15:46:58,170 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1;
status code: 401; bytes received: 6739; Content-Length: 6739
2009-09-09 15:46:58,180 DEBUG httpclient.Http - 401 Authentication Required*
2009-09-09 15:46:58,180 INFO  fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=0
2009-09-09 15:46:58,659 INFO  fetcher.Fetcher - -activeThreads=0,
spinWaiting=0, fetchQueues.totalSize=0
2009-09-09 15:46:58,659 INFO  fetcher.Fetcher - -activeThreads=0


Thank you in advance,

bye,
Kranthi Reddy. B


--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
   David M. Cole                                            [email protected]
   Editor & Publisher, NewsInc. <http://newsinc.net>        V: (650) 557-2993
   Consultant: The Cole Group <http://colegroup.com/>       F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Reply via email to