Hi there, I am a new commer to the Nutch world.
After install Nutch and Tomcat in my linux box, I tried to crawl a single url. Using command of " bin/nutch crawl url2 -dir crawl2 -depth 3 >& crawl2.log " my url2 is a plain text file with content of "http://www.nutch.org/" and I do change "urlfilter.txt" But, after crawling, I checked the crawl.log, seems it didn't fetch anything. " 050717 083918 DONE indexing segment 20050717083916: total 0 records in 0.19 s (NaN rec/s). " And the search result is return NULL in web UI. Any suggestion will be very helpful, thanks, Michael, FYI, I attached the catalina log file for the search hit; > /home/fji/SE/tomcat4/work/Standalone/localhost/examples > is unusable. > Jul 17, 2005 8:41:56 AM > org.apache.struts.util.PropertyMessageResources > <init> > INFO: Initializing, > config='org.apache.struts.util.LocalStrings', > returnNull=true > Jul 17, 2005 8:41:56 AM > org.apache.struts.util.PropertyMessageResources > <init> > INFO: Initializing, > config='org.apache.struts.action.ActionResources', > returnNull=true > Jul 17, 2005 8:41:56 AM > org.apache.struts.util.PropertyMessageResources > <init> > INFO: Initializing, > config='org.apache.webapp.admin.ApplicationResources', > returnNull=true > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/admin > is unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/manager > is unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/_ is > unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/tomcat-docs > is unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/webdav > is unusable. > Jul 17, 2005 8:41:59 AM > org.apache.coyote.http11.Http11Protocol start > INFO: Starting Coyote HTTP/1.1 on http-8888 > Jul 17, 2005 8:41:59 AM > org.apache.jk.common.ChannelSocket init > INFO: JK2: ajp13 listening on /0.0.0.0:8009 > Jul 17, 2005 8:41:59 AM org.apache.jk.server.JkMain > start > INFO: Jk running ID=0 time=1/110 > config=/home/fji/SE/tomcat4/conf/jk2.properties > 050717 084206 parsing > file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-default.xml > 050717 084206 parsing > file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-site.xml > 050717 084206 Plugins: looking in: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/clustering-carrot2 > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/creativecommons > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/index-basic/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.indexer.IndexingFilter > class=org.apache.nutch.indexer.basic.BasicIndexingFilter > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/index-more > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/language-identifier > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/ontology > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-ext > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-html/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.parse.Parser > class=org.apache.nutch.parse.html.HtmlParser > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-js/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.parse.Parser > class=org.apache.nutch.parse.js.JSParseFilter > 050717 084206 impl: > point=org.apache.nutch.parse.HtmlParseFilter > class=org.apache.nutch.parse.js.JSParseFilter > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-msword > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-pdf > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-text/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.parse.Parser > class=org.apache.nutch.parse.text.TextParser > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-file > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-ftp > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-http > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-httpclient/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.protocol.Protocol > class=org.apache.nutch.protocol.httpclient.Http > 050717 084206 impl: > point=org.apache.nutch.protocol.Protocol > class=org.apache.nutch.protocol.httpclient.Http > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-basic/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.searcher.QueryFilter > class=org.apache.nutch.searcher.basic.BasicQueryFilter > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-more > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-site/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.searcher.QueryFilter > class=org.apache.nutch.searcher.site.SiteQueryFilter > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-url/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.searcher.QueryFilter > class=org.apache.nutch.searcher.url.URLQueryFilter > 050717 084206 not including: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/urlfilter-prefix > 050717 084206 parsing: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/urlfilter-regex/plugin.xml > 050717 084206 impl: > point=org.apache.nutch.net.URLFilter > class=org.apache.nutch.net.RegexURLFilter > 050717 084206 11 creating new bean > 050717 084206 11 opening segment indexes in > /home/fji/SE/tomcat4/segments > 050717 084207 11 query request from 127.0.0.1 > 050717 084207 11 query: commer > 050717 084207 11 searching for 20 raw hits > 050717 084207 11 total hits: 0 > Stopping service Tomcat-Standalone > > > Jul 17, 2005 8:41:54 AM > org.apache.coyote.http11.Http11Protocol init > INFO: Initializing Coyote HTTP/1.1 on http-8888 > Starting service Tomcat-Standalone > Apache Tomcat/4.1.31 > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/examples > is unusable. > Jul 17, 2005 8:41:56 AM > org.apache.struts.util.PropertyMessageResources > <init> > INFO: Initializing, > config='org.apache.struts.util.LocalStrings', > returnNull=true > Jul 17, 2005 8:41:56 AM > org.apache.struts.util.PropertyMessageResources > <init> > INFO: Initializing, > config='org.apache.struts.action.ActionResources', > returnNull=true > Jul 17, 2005 8:41:56 AM > org.apache.struts.util.PropertyMessageResources > <init> > INFO: Initializing, > config='org.apache.webapp.admin.ApplicationResources', > returnNull=true > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/admin > is unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/manager > is unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/_ is > unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/tomcat-docs > is unusable. > The scratchDir you specified: > /home/fji/SE/tomcat4/work/Standalone/localhost/webdav > is unusable. > Jul 17, 2005 8:41:59 AM > org.apache.coyote.http11.Http11Protocol start > INFO: Starting Coyote HTTP/1.1 on http-8888 > Jul 17, 2005 8:41:59 AM > org.apache.jk.common.ChannelSocket init > INFO: JK2: ajp13 listening on /0.0.0.0:8009 > Jul 17, 2005 8:41:59 AM org.apache.jk.server.JkMain > start > INFO: Jk running ID=0 time=1/110 > config=/home/fji/SE/tomcat4/conf/jk2.properties > 050717 084206 parsing > file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-default.xml > 050717 084206 parsing > file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-site.xml > 050717 084206 Plugins: looking in: > /home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins > === message truncated === __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
