I have just uploaded NUTCH-559v0.5.patch in JIRA <https://issues.apache.org/jira/browse/NUTCH-559>. It works fine too with Tomcat Basic authentication. I tested it with the same configuration and commands that I mentioned in my previous mail.
Regards, Susam Pal On Nov 28, 2007 9:50 PM, Susam Pal <[EMAIL PROTECTED]> wrote: > I have just tested NUTCH-559v0.4.patch against trunk revision 589654 > and the latest trunk revision. > <http://svn.apache.org/repos/asf/lucene/nutch/trunk/> It works fine > for me with revision 589654. Here are the contents of the relevant > files, commands and output:- > > $ cat /opt/apache-tomcat-6.0.13/conf/tomcat-users.xml > <?xml version='1.0' encoding='utf-8'?> > <tomcat-users> > <role rolename="manager"/> > <user username="tomcat" password="s3cret" roles="manager"/> > </tomcat-users> > > $ cat urls/url > http://127.0.0.1:8080/manager/html/ > > $ tail -2 conf/crawl-urlfilter.txt > # skip everything else > +. > > $ tail -n +153 conf/nutch-site.xml | head -4 > <property> > <name>plugin.includes</name> > > <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|pdf|mp3|oo|msexcel|mspowerpoint|msword|pdf|rss|swf|zip)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > <description>Regular expression naming plugin directory names to > > $ tail -5 conf/httpclient-auth.xml > <credentials username="tomcat" password="s3cret"> > <authscope host="127.0.0.1" port="8080"/> > </credentials> > </auth-configuration> > > $ bin/nutch crawl urls -dir crawl -depth 2 -topN 5 -threads 2 > $ bin/nutch org.apache.nutch.searcher.NutchBean tomcat > Total hits: 5 > 0 20071128205034/http://127.0.0.1:8080/TomcatROOT/ > ... Apache Tomcat Apache Tomcat Administration Status Tomcat Manager > Documentation Release Notes Change ... configuring and using Tomcat > dev@ ... > 1 20071128205034/http://127.0.0.1:8080/docs/ > ... made to Apache Tomcat. Status - Apache Tomcat development status. > Developers - List of ... portion of Apache Tomcat ... > 2 20071128205034/http://127.0.0.1:8080/docs/html-manager-howto.html > ... Apache Tomcat 6.0 - Tomcat Web Application Manager How To ... > Specs. Apache Tomcat 6.0 Tomcat Web Application Manager How To ... > 3 20071128205034/http://127.0.0.1:8080/docs/manager-howto.html > ... select a different one, Tomcat 6 defaults to an ... file stored > at $CATALINA_HOME/conf/tomcat-users.xml , which can be ... > 4 20071128205025/http://127.0.0.1:8080/manager/html/ > ... Undeploy with idle ≥ minutes /manager Tomcat Manager > Application true 6 Start ... to upload Server Information Tomcat > Version JVM Version JVM Vendor ... > > With the latest trunk too, protocol-httpclient does its job properly > and the crawl finishes successfully. I am only unable to perform a > search because of a NullPointerException that is being discussed here:- > http://www.mail-archive.com/[email protected]/msg10030.html > > Regards, > Susam Pal > http://susam.in/ > > > On Nov 28, 2007 6:20 PM, <[EMAIL PROTECTED]> wrote: > > I have tried to use Susam Pal's patch (Nutch-559) NTLM, Basic and Digest > > Authentication schemes for web/proxy server to be able to crawl Tomcat's > > http://127.0.0.1:8080/manager/html/ without any success. I see [Fatal > > Error] :-1:-1: Premature end of file right after it tries fetching. > > Given that most people on the list seem to have successfully used his > > patch I am very sure it something I am doing wrong. Starting from the > > very basics can anybody confirm that just using the latest patch > > NUTCH-559v0.4.patch versus the latest 1.0 trunk (not the 0.9) is okay or > > does it have to be patched versus the 0.9 trunk? Is the Tomcat manager a > > bad example for testing authentication? > > > > Thanks > > > > Sully > > > > >
