Dear all, Just to update, I have solved my problem. Apparently, we also need to edit this file conf/crawl-urlfilter.txt, besides conf/regex-urlfilter.txt
Can we amend this pagehttp://wiki.apache.org/nutch/NutchTutorialPre1.3 I am sure many others encounter the same problem as me. ________________________________ From: Kelvin <[email protected]> To: "'[email protected]'" <[email protected]> Sent: Saturday, 16 July 2011 2:32 PM Subject: Cannot crawl problem Dear all, I was able to get nutch 1.2 working previously. I have done a clean install of nutch 1.2 now, and I strictly follow the instructions below: http://wiki.apache.org/nutch/NutchTutorialPre1.3 But now I have encounter this problem below. Why is it so? Do we need to setup tomcat in order to get nutch crawling working? Previously I have set up both tomcat and nutch together, but now I would only like to have nutch only. Thank you for your kind help. depth 3 -topN 50 crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 indexer=lucene topN = 50 Injector: starting at 2011-07-16 14:30:29 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: finished at 2011-07-16 14:30:32, elapsed: 00:00:02 Generator: starting at 2011-07-16 14:30:32 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=0 - no more URLs to fetch. No URLs to fetch - check your seed list and URL filters. crawl finished: crawl

