I have just uploaded NUTCH-559v0.5.patch in JIRA
<https://issues.apache.org/jira/browse/NUTCH-559>. It works fine too
with Tomcat Basic authentication. I tested it with the same
configuration and commands that I mentioned in my previous mail.

Regards,
Susam Pal

On Nov 28, 2007 9:50 PM, Susam Pal <[EMAIL PROTECTED]> wrote:
> I have just tested NUTCH-559v0.4.patch against trunk revision 589654
> and the latest trunk revision.
> <http://svn.apache.org/repos/asf/lucene/nutch/trunk/> It works fine
> for me with revision 589654. Here are the contents of the relevant
> files, commands and output:-
>
> $ cat /opt/apache-tomcat-6.0.13/conf/tomcat-users.xml
> <?xml version='1.0' encoding='utf-8'?>
> <tomcat-users>
>   <role rolename="manager"/>
>   <user username="tomcat" password="s3cret" roles="manager"/>
> </tomcat-users>
>
> $ cat urls/url
> http://127.0.0.1:8080/manager/html/
>
> $ tail -2 conf/crawl-urlfilter.txt
> # skip everything else
> +.
>
> $ tail -n +153 conf/nutch-site.xml | head -4
> <property>
>   <name>plugin.includes</name>
>   
> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|pdf|mp3|oo|msexcel|mspowerpoint|msword|pdf|rss|swf|zip)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>   <description>Regular expression naming plugin directory names to
>
> $ tail -5 conf/httpclient-auth.xml
> <credentials username="tomcat" password="s3cret">
>   <authscope host="127.0.0.1" port="8080"/>
> </credentials>
> </auth-configuration>
>
> $ bin/nutch crawl urls -dir crawl -depth 2 -topN 5 -threads 2
> $  bin/nutch org.apache.nutch.searcher.NutchBean tomcat
> Total hits: 5
>  0 20071128205034/http://127.0.0.1:8080/TomcatROOT/
>  ... Apache Tomcat Apache Tomcat Administration Status Tomcat Manager
>  Documentation Release Notes Change ... configuring and using Tomcat
> dev@ ...
>  1 20071128205034/http://127.0.0.1:8080/docs/
>  ... made to Apache Tomcat. Status - Apache Tomcat development status.
> Developers - List of ... portion of Apache Tomcat ...
>  2 20071128205034/http://127.0.0.1:8080/docs/html-manager-howto.html
>  ... Apache Tomcat 6.0 - Tomcat Web Application Manager How To ...
> Specs. Apache Tomcat 6.0 Tomcat Web Application Manager How To ...
>  3 20071128205034/http://127.0.0.1:8080/docs/manager-howto.html
>  ... select a different one, Tomcat 6 defaults to an ... file stored
> at $CATALINA_HOME/conf/tomcat-users.xml , which can be ...
>  4 20071128205025/http://127.0.0.1:8080/manager/html/
>  ... Undeploy      with idle ≥   minutes  /manager Tomcat Manager
> Application true 6  Start ... to upload   Server Information Tomcat
> Version JVM Version JVM Vendor ...
>
> With the latest trunk too, protocol-httpclient does its job properly
> and the crawl finishes successfully. I am only unable to perform a
> search because of a NullPointerException that is being discussed here:-
> http://www.mail-archive.com/[email protected]/msg10030.html
>
> Regards,
> Susam Pal
> http://susam.in/
>
>
> On Nov 28, 2007 6:20 PM,  <[EMAIL PROTECTED]> wrote:
> > I have tried to use Susam Pal's patch (Nutch-559) NTLM, Basic and Digest
> > Authentication schemes for web/proxy server to be able to crawl Tomcat's
> > http://127.0.0.1:8080/manager/html/ without any success. I see [Fatal
> > Error] :-1:-1: Premature end of file right after it tries fetching.
> > Given that most people on the list seem to have successfully used his
> > patch I am very sure it something I am doing wrong. Starting from the
> > very basics can anybody confirm that just using the latest patch
> > NUTCH-559v0.4.patch versus the latest 1.0 trunk (not the 0.9) is okay or
> > does it have to be patched versus the 0.9 trunk? Is the Tomcat manager a
> > bad example for testing authentication?
> >
> > Thanks
> >
> > Sully
> >
> >
>

Reply via email to