hi,

Try putting 

+^http://localhost:8080/ instead of +^http://([a-z0-9]*\.)*apache.org/

in crawl-urlfilter.txt & urls file. 

Make sure that tomcat is running.Hope that will solve the problem.

Cheers,
cha


openxu wrote:
> 
> Hi ,all!
> I install nutch0.9. 
> After starting tomcat, I crawl website as follows: 
> ./nutch crawl urls -dir crawl -depth 2 -threads 2 -topN 4 
> But when I search in the http://localhost:8080/, it returns 0 results.
> Below is my configuration files.
> Will you give me any hints? 
> Thanks in advance!
> crawl-urlfilter.txt:
> ----------------------------------------------------------
> +^http://([a-z0-9]*\.)*apache.org/
> ------------------------------------------------------------//end
> urls:
> ------------------------------------------------------------
> http://www.apache.org/
> ------------------------------------------------------------//end
> 
> /apache-tomcat-5.5.23/webapps/root/web-inf/classes/nutch-site.xml:
> ------------------------------------------------------------
> <configuration>
>   <property>
>     <name>searcher.dir</name>
>     <value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value>
>   </property>
> </configuration>
> ------------------------------------------------------------//end
> 
> /nutch-0.9/conf/nutch-site.xml:
> ------------------------------------------------------------
> <configuration>
> <property>
>   <name>http.agent.name</name>
>   <value>nutch</value>
>   <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
>   please set this to a single word uniquely related to your organization.
> 
>   NOTE: You should also check other related properties:
> 
>       http.robots.agents
>       http.agent.description
>       http.agent.url
>       http.agent.email
>       http.agent.version
> 
>   and set their values appropriately.
> 
>   </description>
> </property>
> 
> <property>
>   <name>http.agent.description</name>
>   <value>hello</value>
>   <description>Further description of our bot- this text is used in
>   the User-Agent header.  It appears in parenthesis after the agent name.
>   </description>
> </property>
> 
> <property>
>   <name>http.agent.url</name>
>   <value>hello.com</value>
>   <description>A URL to advertise in the User-Agent header.  This will 
>    appear in parenthesis after the agent name. Custom dictates that this
>    should be a URL of a page explaining the purpose and behavior of this
>    crawler.
>   </description>
> </property>
> 
> <property>
>   <name>http.agent.email</name>
>   <value>[EMAIL PROTECTED]</value>
>   <description>An email address to advertise in the HTTP 'From' request
>    header and User-Agent header. A good practice is to mangle this
>    address (e.g. 'info at example dot com') to avoid spamming.
>   </description>
> </property>
> </configuration>
> ------------------------------------------------------------//end
> 

-- 
View this message in context: 
http://www.nabble.com/Why-nutch-return-0-results--tf3703924.html#a10358631
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to