Hi there, 

I am a new commer to the Nutch world.

After install Nutch and Tomcat in my linux box, I
tried to crawl a single url.

Using command of
"
bin/nutch crawl url2 -dir crawl2 -depth 3 >&
crawl2.log
"

my url2 is a plain text file with content of
"http://www.nutch.org/"; and I do change
"urlfilter.txt"

But, after crawling, I checked the crawl.log, seems it
didn't fetch anything.

"
050717 083918 DONE indexing segment 20050717083916:
total 0 records in 0.19 s (NaN rec/s).
"

And the search result is return NULL in web UI.

Any suggestion will be very helpful,

thanks,

Michael,

FYI, I attached the catalina log file for the search
hit;

>
/home/fji/SE/tomcat4/work/Standalone/localhost/examples
> is unusable.
> Jul 17, 2005 8:41:56 AM
> org.apache.struts.util.PropertyMessageResources
> <init>
> INFO: Initializing,
> config='org.apache.struts.util.LocalStrings',
> returnNull=true
> Jul 17, 2005 8:41:56 AM
> org.apache.struts.util.PropertyMessageResources
> <init>
> INFO: Initializing,
> config='org.apache.struts.action.ActionResources',
> returnNull=true
> Jul 17, 2005 8:41:56 AM
> org.apache.struts.util.PropertyMessageResources
> <init>
> INFO: Initializing,
>
config='org.apache.webapp.admin.ApplicationResources',
> returnNull=true
> The scratchDir you specified:
> /home/fji/SE/tomcat4/work/Standalone/localhost/admin
> is unusable.
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/manager
> is unusable.
> The scratchDir you specified:
> /home/fji/SE/tomcat4/work/Standalone/localhost/_ is
> unusable.
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/tomcat-docs
> is unusable.
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/webdav
> is unusable.
> Jul 17, 2005 8:41:59 AM
> org.apache.coyote.http11.Http11Protocol start
> INFO: Starting Coyote HTTP/1.1 on http-8888
> Jul 17, 2005 8:41:59 AM
> org.apache.jk.common.ChannelSocket init
> INFO: JK2: ajp13 listening on /0.0.0.0:8009
> Jul 17, 2005 8:41:59 AM org.apache.jk.server.JkMain
> start
> INFO: Jk running ID=0 time=1/110 
> config=/home/fji/SE/tomcat4/conf/jk2.properties
> 050717 084206 parsing
>
file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-default.xml
> 050717 084206 parsing
>
file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-site.xml
> 050717 084206 Plugins: looking in:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/clustering-carrot2
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/creativecommons
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/index-basic/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.indexer.IndexingFilter
>
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/index-more
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/language-identifier
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/ontology
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-ext
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-html/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.html.HtmlParser
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-js/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.js.JSParseFilter
> 050717 084206 impl:
> point=org.apache.nutch.parse.HtmlParseFilter
> class=org.apache.nutch.parse.js.JSParseFilter
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-msword
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-pdf
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/parse-text/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.text.TextParser
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-file
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-ftp
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-http
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/protocol-httpclient/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.protocol.Protocol
> class=org.apache.nutch.protocol.httpclient.Http
> 050717 084206 impl:
> point=org.apache.nutch.protocol.Protocol
> class=org.apache.nutch.protocol.httpclient.Http
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-basic/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.searcher.QueryFilter
>
class=org.apache.nutch.searcher.basic.BasicQueryFilter
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-more
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-site/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.site.SiteQueryFilter
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/query-url/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.url.URLQueryFilter
> 050717 084206 not including:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/urlfilter-prefix
> 050717 084206 parsing:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins/urlfilter-regex/plugin.xml
> 050717 084206 impl:
> point=org.apache.nutch.net.URLFilter
> class=org.apache.nutch.net.RegexURLFilter
> 050717 084206 11 creating new bean
> 050717 084206 11 opening segment indexes in
> /home/fji/SE/tomcat4/segments
> 050717 084207 11 query request from 127.0.0.1
> 050717 084207 11 query: commer
> 050717 084207 11 searching for 20 raw hits
> 050717 084207 11 total hits: 0
> Stopping service Tomcat-Standalone
> 
> 
> Jul 17, 2005 8:41:54 AM
> org.apache.coyote.http11.Http11Protocol init
> INFO: Initializing Coyote HTTP/1.1 on http-8888
> Starting service Tomcat-Standalone
> Apache Tomcat/4.1.31
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/examples
> is unusable.
> Jul 17, 2005 8:41:56 AM
> org.apache.struts.util.PropertyMessageResources
> <init>
> INFO: Initializing,
> config='org.apache.struts.util.LocalStrings',
> returnNull=true
> Jul 17, 2005 8:41:56 AM
> org.apache.struts.util.PropertyMessageResources
> <init>
> INFO: Initializing,
> config='org.apache.struts.action.ActionResources',
> returnNull=true
> Jul 17, 2005 8:41:56 AM
> org.apache.struts.util.PropertyMessageResources
> <init>
> INFO: Initializing,
>
config='org.apache.webapp.admin.ApplicationResources',
> returnNull=true
> The scratchDir you specified:
> /home/fji/SE/tomcat4/work/Standalone/localhost/admin
> is unusable.
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/manager
> is unusable.
> The scratchDir you specified:
> /home/fji/SE/tomcat4/work/Standalone/localhost/_ is
> unusable.
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/tomcat-docs
> is unusable.
> The scratchDir you specified:
>
/home/fji/SE/tomcat4/work/Standalone/localhost/webdav
> is unusable.
> Jul 17, 2005 8:41:59 AM
> org.apache.coyote.http11.Http11Protocol start
> INFO: Starting Coyote HTTP/1.1 on http-8888
> Jul 17, 2005 8:41:59 AM
> org.apache.jk.common.ChannelSocket init
> INFO: JK2: ajp13 listening on /0.0.0.0:8009
> Jul 17, 2005 8:41:59 AM org.apache.jk.server.JkMain
> start
> INFO: Jk running ID=0 time=1/110 
> config=/home/fji/SE/tomcat4/conf/jk2.properties
> 050717 084206 parsing
>
file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-default.xml
> 050717 084206 parsing
>
file:/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/nutch-site.xml
> 050717 084206 Plugins: looking in:
>
/home/fji/SE/tomcat4/webapps/ROOT/WEB-INF/classes/plugins
> 
=== message truncated ===



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to