kranthi:
i would try removing the authscope tag from the httpclient-auth.xml.
though in my case i'm not going to an alternate port and you are, my
working file does not have an authscope tag.
if that doesn't help, since you are crawling an intranet, do you have
access to the http server's log? seeing that might help.
\dmc
At 4:04 PM +0530 9/9/09, kranthi reddy wrote:
Hi all,
I am trying to crawl password protected web pages present in our intranet .
I don't know the reason why "*401 Authentication Required*" error creeps up.
I have gone through the previous mails sent by others, but it is not getting
resolved.
Below are the configuration files i have modified as told in "
http://wiki.apache.org/nutch/HttpAuthenticationSchemes"
My Url file contains single url *"http://10.2.44.34:8088/xwiki/" *(This
url is actually being redirect to "*
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=CDsTIqqN*")
*"httpclient-auth.xml* "
<credentials username="xyz" password="xyz">
<default/>
<authscope host="10.2.44.34" port="8088"/>
</credentials>
*"nutch-default.xml"*
<property>
<name>plugin.includes</name>
<value>*protocol-httpclient|*
urlfilter-regex|parse-(text|html|js|zip)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|
summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
*OutPut Printed to Terminal*
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting
Fetcher: segment: crawl/segments/20090909151219
Fetcher: threads: 10
QueueFeeder finished: total 1 records.
fetching http://10.2.44.34:8088/xwiki/
http.proxy.host = null
http.proxy.port = 8080
http.timeout = 10000
http.content.limit = -1
http.agent = iiith/Nutch-1.0 ([email protected])
protocol.plugin.check.blocking = false
protocol.plugin.check.robots = false
*Credentials - username: superadmin; set as default for realm: ; scheme:*
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
*Credentials - username: superadmin; set for AuthScope - host: 10.2.44.34;
port: 8088; realm: ; scheme:
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url: http://10.2.44.34:8088/robots.txt
url: http://10.2.44.34:8088/robots.txt; status code: 401; bytes received:
6739; Content-Length: 6739
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url: http://10.2.44.34:8088/xwiki/
url: http://10.2.44.34:8088/xwiki/; status code: 302; bytes received: 0;
Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/view/Main/*
-activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1
* queue: http://10.2.44.34
maxThreads = 1
inProgress = 0
crawlDelay = 1000
minCrawlDelay = 0
nextFetchTime = 1252489344874
now = 1252489344577
0. http://10.2.44.34:8088/xwiki/bin/view/Main/
*fetching http://10.2.44.34:8088/xwiki/bin/view/Main/
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url: http://10.2.44.34:8088/xwiki/bin/view/Main/
url: http://10.2.44.34:8088/xwiki/bin/view/Main/; status code: 302; bytes
received: 0; Content-Length: 0; Location:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX*
-activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1
* queue: http://10.2.44.34
maxThreads = 1
inProgress = 0
crawlDelay = 1000
minCrawlDelay = 0
nextFetchTime = 1252489345884
now = 1252489345578
0. http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX
*fetching
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX
Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found
for url:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX
url: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX;
status code: 401; bytes received: 6739; Content-Length: 6739
401 Authentication Required*
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: done
*LOG FILE IS*
2009-09-09 15:46:55,602 INFO fetcher.Fetcher - fetching
http://10.2.44.34:8088/xwiki/
2009-09-09 15:46:55,657 INFO fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=1
2009-09-09 15:46:55,657 INFO fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=1
2009-09-09 15:46:55,691 INFO httpclient.Http - http.proxy.host = null
2009-09-09 15:46:55,691 INFO httpclient.Http - http.proxy.port = 8080
2009-09-09 15:46:55,691 INFO httpclient.Http - http.timeout = 10000
2009-09-09 15:46:55,691 INFO httpclient.Http - http.content.limit = -1
2009-09-09 15:46:55,691 INFO httpclient.Http - http.agent = iiith/Nutch-1.0
([email protected])
2009-09-09 15:46:55,691 INFO httpclient.Http -
protocol.plugin.check.blocking = false
2009-09-09 15:46:55,691 INFO httpclient.Http - protocol.plugin.check.robots
= false
2009-09-09 15:46:55,695 DEBUG httpclient.Http - Credentials - username:
superadmin; set as default for realm: ; scheme:
2009-09-09 15:46:55,697 DEBUG httpclient.Http - Credentials - username:
superadmin; set for AuthScope - host: 10.2.44.34; port: 8088; realm: ;
scheme:
*2009-09-09 15:46:55,697 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/robots.txt
2009-09-09 15:46:55,942 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/robots.txt; status code: 401; bytes received: 6739;
Content-Length: 6739
2009-09-09 15:46:55,943 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/xwiki/
2009-09-09 15:46:55,946 INFO httpclient.HttpMethodDirector - Redirect
requested but followRedirects is disabled
2009-09-09 15:46:55,946 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/xwiki/; status code: 302; bytes received: 0;
Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/view/Main/*
2009-09-09 15:46:56,657 INFO fetcher.Fetcher - -activeThreads=1,
spinWaiting=1, fetchQueues.totalSize=1
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - * queue: http://10.2.44.34
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - maxThreads = 1
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - inProgress = 0
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - crawlDelay = 1000
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - minCrawlDelay = 0
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - nextFetchTime =
1252491417050
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - now =
1252491416658
2009-09-09 15:46:56,658 INFO fetcher.Fetcher - 0.
http://10.2.44.34:8088/xwiki/bin/view/Main/
2009-09-09 15:46:57,051 INFO fetcher.Fetcher - fetching
http://10.2.44.34:8088/xwiki/bin/view/Main/
2*009-09-09 15:46:57,051 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/xwiki/bin/view/Main/
2009-09-09 15:46:57,056 INFO httpclient.HttpMethodDirector - Redirect
requested but followRedirects is disabled
2009-09-09 15:46:57,057 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/xwiki/bin/view/Main/; status code: 302; bytes
received: 0; Content-Length: 0; Location:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1*
2009-09-09 15:46:57,658 INFO fetcher.Fetcher - -activeThreads=1,
spinWaiting=1, fetchQueues.totalSize=1
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - * queue: http://10.2.44.34
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - maxThreads = 1
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - inProgress = 0
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - crawlDelay = 1000
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - minCrawlDelay = 0
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - nextFetchTime =
1252491418057
2009-09-09 15:46:57,659 INFO fetcher.Fetcher - now =
1252491417659
*2009-09-09 15:46:57,659 INFO fetcher.Fetcher - 0.
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1
2009-09-09 15:46:58,058 INFO fetcher.Fetcher - fetching
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1
2009-09-09 15:46:58,058 DEBUG httpclient.Http - Pre-configured credentials
with scope - host: 10.2.44.34; port: 8088; found for url:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1
2009-09-09 15:46:58,170 DEBUG httpclient.Http - url:
http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1;
status code: 401; bytes received: 6739; Content-Length: 6739
2009-09-09 15:46:58,180 DEBUG httpclient.Http - 401 Authentication Required*
2009-09-09 15:46:58,180 INFO fetcher.Fetcher - -finishing thread
FetcherThread, activeThreads=0
2009-09-09 15:46:58,659 INFO fetcher.Fetcher - -activeThreads=0,
spinWaiting=0, fetchQueues.totalSize=0
2009-09-09 15:46:58,659 INFO fetcher.Fetcher - -activeThreads=0
Thank you in advance,
bye,
Kranthi Reddy. B
--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
David M. Cole [email protected]
Editor & Publisher, NewsInc. <http://newsinc.net> V: (650) 557-2993
Consultant: The Cole Group <http://colegroup.com/> F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+