Hi, start tomcat after crawl is completed. so if crawl is completed by now stop the tomcat and start it again. It might solve your problem.
-Rashmin. On 5/7/07, openxu <[EMAIL PROTECTED]> wrote:
Hi ,all! I install nutch0.9. After starting tomcat, I crawl website as follows: ./nutch crawl urls -dir crawl -depth 2 -threads 2 -topN 4 But when I search in the http://localhost:8080/, it returns 0 results. Below is my configuration files. Will you give me any hints? Thanks in advance! crawl-urlfilter.txt: ---------------------------------------------------------- +^http://([a-z0-9]*\.)*apache.org/ ------------------------------------------------------------//end urls: ------------------------------------------------------------ http://www.apache.org/ ------------------------------------------------------------//end /apache-tomcat-5.5.23/webapps/root/web-inf/classes/nutch-site.xml: ------------------------------------------------------------ <configuration> <property> <name>searcher.dir</name> <value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value> </property> </configuration> ------------------------------------------------------------//end /nutch-0.9/conf/nutch-site.xml: ------------------------------------------------------------ <configuration> <property> <name>http.agent.name</name> <value>nutch</value> <description>HTTP 'User-Agent' request header. MUST NOT be empty - please set this to a single word uniquely related to your organization. NOTE: You should also check other related properties: http.robots.agents http.agent.description http.agent.url http.agent.email http.agent.version and set their values appropriately. </description> </property> <property> <name>http.agent.description</name> <value>hello</value> <description>Further description of our bot- this text is used in the User-Agent header. It appears in parenthesis after the agent name. </description> </property> <property> <name>http.agent.url</name> <value>hello.com</value> <description>A URL to advertise in the User-Agent header. This will appear in parenthesis after the agent name. Custom dictates that this should be a URL of a page explaining the purpose and behavior of this crawler. </description> </property> <property> <name>http.agent.email</name> <value>[EMAIL PROTECTED]</value> <description>An email address to advertise in the HTTP 'From' request header and User-Agent header. A good practice is to mangle this address (e.g. 'info at example dot com') to avoid spamming. </description> </property> </configuration> ------------------------------------------------------------//end -- View this message in context: http://www.nabble.com/Why-nutch-return-0-results--tf3703924.html#a10357955 Sent from the Nutch - User mailing list archive at Nabble.com.
