I have just tested NUTCH-559v0.4.patch against trunk revision 589654 and the latest trunk revision. <http://svn.apache.org/repos/asf/lucene/nutch/trunk/> It works fine for me with revision 589654. Here are the contents of the relevant files, commands and output:-
$ cat /opt/apache-tomcat-6.0.13/conf/tomcat-users.xml <?xml version='1.0' encoding='utf-8'?> <tomcat-users> <role rolename="manager"/> <user username="tomcat" password="s3cret" roles="manager"/> </tomcat-users> $ cat urls/url http://127.0.0.1:8080/manager/html/ $ tail -2 conf/crawl-urlfilter.txt # skip everything else +. $ tail -n +153 conf/nutch-site.xml | head -4 <property> <name>plugin.includes</name> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|pdf|mp3|oo|msexcel|mspowerpoint|msword|pdf|rss|swf|zip)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> <description>Regular expression naming plugin directory names to $ tail -5 conf/httpclient-auth.xml <credentials username="tomcat" password="s3cret"> <authscope host="127.0.0.1" port="8080"/> </credentials> </auth-configuration> $ bin/nutch crawl urls -dir crawl -depth 2 -topN 5 -threads 2 $ bin/nutch org.apache.nutch.searcher.NutchBean tomcat Total hits: 5 0 20071128205034/http://127.0.0.1:8080/TomcatROOT/ ... Apache Tomcat Apache Tomcat Administration Status Tomcat Manager Documentation Release Notes Change ... configuring and using Tomcat dev@ ... 1 20071128205034/http://127.0.0.1:8080/docs/ ... made to Apache Tomcat. Status - Apache Tomcat development status. Developers - List of ... portion of Apache Tomcat ... 2 20071128205034/http://127.0.0.1:8080/docs/html-manager-howto.html ... Apache Tomcat 6.0 - Tomcat Web Application Manager How To ... Specs. Apache Tomcat 6.0 Tomcat Web Application Manager How To ... 3 20071128205034/http://127.0.0.1:8080/docs/manager-howto.html ... select a different one, Tomcat 6 defaults to an ... file stored at $CATALINA_HOME/conf/tomcat-users.xml , which can be ... 4 20071128205025/http://127.0.0.1:8080/manager/html/ ... Undeploy with idle ≥ minutes /manager Tomcat Manager Application true 6 Start ... to upload Server Information Tomcat Version JVM Version JVM Vendor ... With the latest trunk too, protocol-httpclient does its job properly and the crawl finishes successfully. I am only unable to perform a search because of a NullPointerException that is being discussed here:- http://www.mail-archive.com/[email protected]/msg10030.html Regards, Susam Pal http://susam.in/ On Nov 28, 2007 6:20 PM, <[EMAIL PROTECTED]> wrote: > I have tried to use Susam Pal's patch (Nutch-559) NTLM, Basic and Digest > Authentication schemes for web/proxy server to be able to crawl Tomcat's > http://127.0.0.1:8080/manager/html/ without any success. I see [Fatal > Error] :-1:-1: Premature end of file right after it tries fetching. > Given that most people on the list seem to have successfully used his > patch I am very sure it something I am doing wrong. Starting from the > very basics can anybody confirm that just using the latest patch > NUTCH-559v0.4.patch versus the latest 1.0 trunk (not the 0.9) is okay or > does it have to be patched versus the 0.9 trunk? Is the Tomcat manager a > bad example for testing authentication? > > Thanks > > Sully > >
