After setup, you should put the urls you want to crawl into the HDFS by the command : $bin/hadoop dfs -put urls urls
Maybe that's something you forgot to do and I hope it helps :) ----- Original Message ----- From: "Meryl Silverburgh" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Saturday, April 07, 2007 3:08 AM Subject: Trying to setup Nutch > Hi, > > i am trying to setup Nutch. > I setup 1 site in my urls file: > http://www.yahoo.com > > And then I start crawl using this command: > $bin/nutch crawl urls -dir crawl -depth 1 -topN 5 > > But I get this "No URLs to fecth", can you please tell me what am i > missing? > $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5 > crawl started in: crawl > rootUrlDir = urls > threads = 10 > depth = 1 > topN = 5 > Injector: starting > Injector: crawlDb: crawl/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: done > Generator: Selecting best-scoring urls due for fetch. > Generator: starting > Generator: segment: crawl/segments/20070406140513 > Generator: filtering: false > Generator: topN: 5 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: 0 records selected for fetching, exiting ... > Stopping at depth=0 - no more URLs to fetch. > No URLs to fetch - check your seed list and URL filters. > crawl finished: crawl > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
