hi, Try putting
+^http://localhost:8080/ instead of +^http://([a-z0-9]*\.)*apache.org/ in crawl-urlfilter.txt & urls file. Make sure that tomcat is running.Hope that will solve the problem. Cheers, cha openxu wrote: > > Hi ,all! > I install nutch0.9. > After starting tomcat, I crawl website as follows: > ./nutch crawl urls -dir crawl -depth 2 -threads 2 -topN 4 > But when I search in the http://localhost:8080/, it returns 0 results. > Below is my configuration files. > Will you give me any hints? > Thanks in advance! > crawl-urlfilter.txt: > ---------------------------------------------------------- > +^http://([a-z0-9]*\.)*apache.org/ > ------------------------------------------------------------//end > urls: > ------------------------------------------------------------ > http://www.apache.org/ > ------------------------------------------------------------//end > > /apache-tomcat-5.5.23/webapps/root/web-inf/classes/nutch-site.xml: > ------------------------------------------------------------ > <configuration> > <property> > <name>searcher.dir</name> > <value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value> > </property> > </configuration> > ------------------------------------------------------------//end > > /nutch-0.9/conf/nutch-site.xml: > ------------------------------------------------------------ > <configuration> > <property> > <name>http.agent.name</name> > <value>nutch</value> > <description>HTTP 'User-Agent' request header. MUST NOT be empty - > please set this to a single word uniquely related to your organization. > > NOTE: You should also check other related properties: > > http.robots.agents > http.agent.description > http.agent.url > http.agent.email > http.agent.version > > and set their values appropriately. > > </description> > </property> > > <property> > <name>http.agent.description</name> > <value>hello</value> > <description>Further description of our bot- this text is used in > the User-Agent header. It appears in parenthesis after the agent name. > </description> > </property> > > <property> > <name>http.agent.url</name> > <value>hello.com</value> > <description>A URL to advertise in the User-Agent header. This will > appear in parenthesis after the agent name. Custom dictates that this > should be a URL of a page explaining the purpose and behavior of this > crawler. > </description> > </property> > > <property> > <name>http.agent.email</name> > <value>[EMAIL PROTECTED]</value> > <description>An email address to advertise in the HTTP 'From' request > header and User-Agent header. A good practice is to mangle this > address (e.g. 'info at example dot com') to avoid spamming. > </description> > </property> > </configuration> > ------------------------------------------------------------//end > -- View this message in context: http://www.nabble.com/Why-nutch-return-0-results--tf3703924.html#a10358631 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
