Does the following fix it? <!-- This is so that NutchBean will work on the command line --> <property> <name>searcher.dir</name> <value>/usr/tmp/13sites</value> <description> Path to root of crawl. This directory is searched (in order) for either the file search-servers.txt, containing a list of distributed search servers, or the directory "index" containing merged indexes, or the directory "segments" containing segment indexes. </description> </property>
I think you need to set searcher.dir to the directory of your index as I did in the example above. To be thorough, this is what 13sites looks like: $ cd /usr/tmp/13sites/ $ ls -latr total 14 drwxr-xr-x 12 kai wheel 512 Jul 5 00:27 segments drwxr-xr-x 3 kai wheel 512 Jul 5 01:21 crawldb drwxr-xr-x 3 kai wheel 512 Jul 5 01:24 linkdb drwxr-xr-x 3 kai wheel 512 Jul 5 01:33 indexes drwxr-xr-x 7 kai wheel 512 Jul 5 01:33 . drwxr-xr-x 2 kai wheel 512 Jul 5 01:33 index drwxr-xr-x 19 kai wheel 1024 Aug 14 07:20 .. ----- Original Message ---- From: Fabian López <[EMAIL PROTECTED]> To: [email protected] Sent: Tuesday, August 14, 2007 5:11:52 AM Subject: UBUNTU total hits 0 Hi, after following the tutorial of Nutch 0.8, when I try to search with bin/nutch org.apache.nutch.searcher.NutchBean apache I receive "Total Hits:0" I have followed all the steps: 1. Create a directory with a flat file of root urls. For example, to crawl the nutch site you might start with a file named urls/nutchcontaining the url of just the Nutch home page. All other Nutch pages should be reachable from this page. The urls/nutch file would thus contain: http://lucene.apache.org/nutch/ 2. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAMEwith the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the apache.org domain, the line should read: +^http://([a-z0-9]*\.)*apache.org/ This will include any url in the domain apache.org. 3. Edit the file conf/nutch-site.xml, insert at minimum following properties into it and edit in proper values for the properties.... Then I executed: bin/nutch crawl urls -dir crawl -depth 3 -topN 50 Maybe the only problem that I find is when fetching, there is a java.lang.NullpointerException. Questions are: 1.- Is this the cause of the problem? How can I solution it? 2.- Is this the question why y always find the problem in http://localhost:8080 the HTTP STATUS 500, No Context configured to process this request - HTTP Status 500 <http://www.mail-archive.com/[email protected]/msg09150.html> tHANKS A LOT Fabian ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/
