Re: UBUNTU total hits 0

Kai_testing Middleton Tue, 14 Aug 2007 10:21:22 -0700

Does the following fix it?

<!-- This is so that NutchBean will work on the command line -->
<property>
  <name>searcher.dir</name>
  <value>/usr/tmp/13sites</value>
  <description>
  Path to root of crawl.  This directory is searched (in
  order) for either the file search-servers.txt, containing a list of
  distributed search servers, or the directory "index" containing
  merged indexes, or the directory "segments" containing segment
  indexes.
  </description>
</property>


I think you need to set searcher.dir to the directory of your index as I did in 
the example
 above.

To be thorough, this is what 13sites looks like:

$ cd /usr/tmp/13sites/
$ ls -latr
total 14
drwxr-xr-x  12 kai  wheel   512 Jul  5 00:27 segments
drwxr-xr-x   3 kai  wheel   512 Jul  5 01:21 crawldb
drwxr-xr-x   3 kai  wheel   512 Jul  5 01:24 linkdb
drwxr-xr-x   3 kai  wheel   512 Jul  5 01:33 indexes
drwxr-xr-x   7 kai  wheel   512 Jul  5 01:33 .
drwxr-xr-x   2 kai  wheel   512 Jul  5 01:33 index
drwxr-xr-x  19 kai  wheel  1024 Aug 14 07:20 ..

----- Original Message ----
From: Fabian López <[EMAIL PROTECTED]>
To: [email protected]
Sent: Tuesday, August 14, 2007 5:11:52 AM
Subject: UBUNTU total hits 0

Hi,
after following the tutorial of Nutch 0.8, when I try to search with

bin/nutch org.apache.nutch.searcher.NutchBean apache

I receive "Total Hits:0"

I have followed all the steps:


   1. Create a directory with a flat file of root urls. For example, to
   crawl the nutch site you might start with a file named
urls/nutchcontaining the url of just the Nutch home page. All other
Nutch pages should
   be reachable from this page. The urls/nutch file would thus contain:

   http://lucene.apache.org/nutch/

   2. Edit the file conf/crawl-urlfilter.txt and replace
MY.DOMAIN.NAMEwith the name of the domain you wish to crawl. For
example, if you wished to
   limit the crawl to the apache.org domain, the line should read:

   +^http://([a-z0-9]*\.)*apache.org/

   This will include any url in the domain apache.org.
   3. Edit the file conf/nutch-site.xml, insert at minimum following
   properties into it and edit in proper values for the properties....

Then I executed:

bin/nutch crawl urls -dir crawl -depth 3 -topN 50

Maybe the only problem that I find is when fetching, there is a
java.lang.NullpointerException.
Questions are:

1.- Is this the cause of the problem? How can I solution it?
2.- Is this the question why y always find the problem in
http://localhost:8080 the HTTP STATUS 500,
No Context configured to process this request - HTTP Status 500
<http://www.mail-archive.com/[email protected]/msg09150.html>


tHANKS A LOT
Fabian







      
____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/

Re: UBUNTU total hits 0

Reply via email to