In the config file, site.xml, under the root directory of tomcat
(tomcat/webapps/root/web-inf/classes), go the searcher properties and for
searcher.dir, just type "crawl" or, if you have another name for this
directory, just ". " I hope this works for you, as I had the same
problem the first time I used the 0.8 version.
----- Original Message -----
From: "rashmin babaria" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Monday, May 07, 2007 2:35 PM
Subject: Re: Why nutch return 0 results?
Hi,
start tomcat after crawl is completed. so if crawl is completed by now
stop
the tomcat and start it again. It might solve your problem.
-Rashmin.
On 5/7/07, openxu <[EMAIL PROTECTED]> wrote:
Hi ,all!
I install nutch0.9.
After starting tomcat, I crawl website as follows:
./nutch crawl urls -dir crawl -depth 2 -threads 2 -topN 4
But when I search in the http://localhost:8080/, it returns 0 results.
Below is my configuration files.
Will you give me any hints?
Thanks in advance!
crawl-urlfilter.txt:
----------------------------------------------------------
+^http://([a-z0-9]*\.)*apache.org/
------------------------------------------------------------//end
urls:
------------------------------------------------------------
http://www.apache.org/
------------------------------------------------------------//end
/apache-tomcat-5.5.23/webapps/root/web-inf/classes/nutch-site.xml:
------------------------------------------------------------
<configuration>
<property>
<name>searcher.dir</name>
<value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value>
</property>
</configuration>
------------------------------------------------------------//end
/nutch-0.9/conf/nutch-site.xml:
------------------------------------------------------------
<configuration>
<property>
<name>http.agent.name</name>
<value>nutch</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
</description>
</property>
<property>
<name>http.agent.description</name>
<value>hello</value>
<description>Further description of our bot- this text is used in
the User-Agent header. It appears in parenthesis after the agent name.
</description>
</property>
<property>
<name>http.agent.url</name>
<value>hello.com</value>
<description>A URL to advertise in the User-Agent header. This will
appear in parenthesis after the agent name. Custom dictates that this
should be a URL of a page explaining the purpose and behavior of this
crawler.
</description>
</property>
<property>
<name>http.agent.email</name>
<value>[EMAIL PROTECTED]</value>
<description>An email address to advertise in the HTTP 'From' request
header and User-Agent header. A good practice is to mangle this
address (e.g. 'info at example dot com') to avoid spamming.
</description>
</property>
</configuration>
------------------------------------------------------------//end
--
View this message in context:
http://www.nabble.com/Why-nutch-return-0-results--tf3703924.html#a10357955
Sent from the Nutch - User mailing list archive at Nabble.com.
--------------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.467 / Virus Database: 269.6.5/792 - Release Date: 6/5/2007
21:01