Hi, We have anumber of archived tutorials which exist here [1], please feel free to add to the wiki as you see fit.
If you do not already have a username and write permissions you can sign up on the wiki front page Thanks [1] http://wiki.apache.org/nutch/Archive%20and%20Legacy#Nutch_.3C1.3_Tutorials On Sat, Jul 16, 2011 at 11:48 AM, Kelvin <[email protected]> wrote: > Dear all, > > Just to update, I have solved my problem. Apparently, we also need to edit > this file conf/crawl-urlfilter.txt, besides conf/regex-urlfilter.txt > > Can we amend this pagehttp://wiki.apache.org/nutch/NutchTutorialPre1.3 > I am sure many others encounter the same problem as me. > > > > > > > ________________________________ > From: Kelvin <[email protected]> > To: "'[email protected]'" <[email protected]> > Sent: Saturday, 16 July 2011 2:32 PM > Subject: Cannot crawl problem > > Dear all, > > I was able to get nutch 1.2 working previously. I have done a clean install > of nutch 1.2 now, and I strictly follow the instructions below: > http://wiki.apache.org/nutch/NutchTutorialPre1.3 > > But now I have encounter this problem below. Why is it so? Do we need to > setup tomcat in order to get nutch crawling working? Previously I have set > up both tomcat and nutch together, but now I would only like to have nutch > only. > > > Thank you for your kind help. > > > depth 3 -topN 50 > crawl started in: crawl > rootUrlDir = urls > threads = 10 > depth = 3 > indexer=lucene > topN = 50 > Injector: starting at 2011-07-16 14:30:29 > Injector: crawlDb: crawl/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2011-07-16 14:30:32, elapsed: 00:00:02 > Generator: starting at 2011-07-16 14:30:32 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: topN: 50 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: 0 records selected for fetching, exiting ... > Stopping at depth=0 - no more URLs to fetch. > No URLs to fetch - check your seed list and URL filters. > crawl finished: crawl > -- *Lewis*

