I have just tested NUTCH-559v0.4.patch against trunk revision 589654
and the latest trunk revision.
<http://svn.apache.org/repos/asf/lucene/nutch/trunk/> It works fine
for me with revision 589654. Here are the contents of the relevant
files, commands and output:-

$ cat /opt/apache-tomcat-6.0.13/conf/tomcat-users.xml
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <role rolename="manager"/>
  <user username="tomcat" password="s3cret" roles="manager"/>
</tomcat-users>

$ cat urls/url
http://127.0.0.1:8080/manager/html/

$ tail -2 conf/crawl-urlfilter.txt
# skip everything else
+.

$ tail -n +153 conf/nutch-site.xml | head -4
<property>
  <name>plugin.includes</name>
  
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|pdf|mp3|oo|msexcel|mspowerpoint|msword|pdf|rss|swf|zip)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>Regular expression naming plugin directory names to

$ tail -5 conf/httpclient-auth.xml
<credentials username="tomcat" password="s3cret">
  <authscope host="127.0.0.1" port="8080"/>
</credentials>
</auth-configuration>

$ bin/nutch crawl urls -dir crawl -depth 2 -topN 5 -threads 2
$  bin/nutch org.apache.nutch.searcher.NutchBean tomcat
Total hits: 5
 0 20071128205034/http://127.0.0.1:8080/TomcatROOT/
 ... Apache Tomcat Apache Tomcat Administration Status Tomcat Manager
 Documentation Release Notes Change ... configuring and using Tomcat
dev@ ...
 1 20071128205034/http://127.0.0.1:8080/docs/
 ... made to Apache Tomcat. Status - Apache Tomcat development status.
Developers - List of ... portion of Apache Tomcat ...
 2 20071128205034/http://127.0.0.1:8080/docs/html-manager-howto.html
 ... Apache Tomcat 6.0 - Tomcat Web Application Manager How To ...
Specs. Apache Tomcat 6.0 Tomcat Web Application Manager How To ...
 3 20071128205034/http://127.0.0.1:8080/docs/manager-howto.html
 ... select a different one, Tomcat 6 defaults to an ... file stored
at $CATALINA_HOME/conf/tomcat-users.xml , which can be ...
 4 20071128205025/http://127.0.0.1:8080/manager/html/
 ... Undeploy      with idle ≥   minutes  /manager Tomcat Manager
Application true 6  Start ... to upload   Server Information Tomcat
Version JVM Version JVM Vendor ...

With the latest trunk too, protocol-httpclient does its job properly
and the crawl finishes successfully. I am only unable to perform a
search because of a NullPointerException that is being discussed here:-
http://www.mail-archive.com/[email protected]/msg10030.html

Regards,
Susam Pal
http://susam.in/

On Nov 28, 2007 6:20 PM,  <[EMAIL PROTECTED]> wrote:
> I have tried to use Susam Pal's patch (Nutch-559) NTLM, Basic and Digest
> Authentication schemes for web/proxy server to be able to crawl Tomcat's
> http://127.0.0.1:8080/manager/html/ without any success. I see [Fatal
> Error] :-1:-1: Premature end of file right after it tries fetching.
> Given that most people on the list seem to have successfully used his
> patch I am very sure it something I am doing wrong. Starting from the
> very basics can anybody confirm that just using the latest patch
> NUTCH-559v0.4.patch versus the latest 1.0 trunk (not the 0.9) is okay or
> does it have to be patched versus the 0.9 trunk? Is the Tomcat manager a
> bad example for testing authentication?
>
> Thanks
>
> Sully
>
>

Reply via email to