Did you inject the urls in the first command? Copy your file containing in a directory ex. urls and then inject the URls using the inject command
ex. bin/nutch inject db urls where urls is hte name of the directory containing the URL list file Hope this helps. Thanks, Bhawna On Sun, Mar 6, 2011 at 7:18 PM, chidu r <[email protected]> wrote: > Hi all > > I am trying to setup nutch 1.2 on Hadoop and used the instructions at > http://wiki.apache.org/nutch/NutchHadoopTutorial, it has been very useful. > > However, I find that when I execute the command: > > $bin/nutch crawl urls -dir crawl -depth 4 -topN 50 > > The crawler stops at the generator stage with the message: > 2011-03-06 17:23:49,538 WARN crawl.Generator - Generator: 0 records > selected for fetching, exiting ... > > I have configured the following plugins in nutch-site.xml > > > protocol-http|parse-(text|html|js)|urlnormalizer-(pass|regex|basic)|urlfilter-regex|index-(basic|anchor) > > I am not using crawl-urlfilter.txt or regex-urlfilter.txt tp filter URLs. I > initiated the crawl with 10 seed urls from popular sites on internet. > > Any pointers to what I am missing here? > > > regards > Chidu >

