Re: Cannot crawl problem

Kelvin Sat, 16 Jul 2011 03:48:53 -0700

Dear all,

Just to update, I have solved my problem. Apparently, we also need to edit this 
file conf/crawl-urlfilter.txt, besides conf/regex-urlfilter.txt


Can we amend this pagehttp://wiki.apache.org/nutch/NutchTutorialPre1.3
I am sure many others encounter the same problem as me.






________________________________
From: Kelvin <[email protected]>
To: "'[email protected]'" <[email protected]>
Sent: Saturday, 16 July 2011 2:32 PM
Subject: Cannot crawl problem

Dear all,

I was able to get nutch 1.2 working previously. I have done a clean install of 
nutch 1.2 now, and I strictly follow the instructions below:
http://wiki.apache.org/nutch/NutchTutorialPre1.3

But now I have encounter this problem below. Why is it so? Do we need to setup 
tomcat in order to get nutch crawling working? Previously I have set up both 
tomcat and nutch together, but now I would only like to have nutch only. 


Thank you for your kind help.


depth 3 -topN 50 
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 50
Injector: starting at 2011-07-16 14:30:29
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-16 14:30:32, elapsed: 00:00:02
Generator: starting at 2011-07-16 14:30:32
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl

Re: Cannot crawl problem

Reply via email to