Re: [Nutch-general] Trying to setup Nutch

zhangxy Fri, 06 Apr 2007 17:40:41 -0700

After setup, you should put the urls you want to crawl into the HDFS by the 
command :
$bin/hadoop dfs -put urls urls


Maybe that's something you forgot to do and I hope it helps :)

----- Original Message -----
From: "Meryl Silverburgh" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, April 07, 2007 3:08 AM
Subject: Trying to setup Nutch

> Hi,
>
> i am trying to setup Nutch.
> I setup 1 site in my urls file:
> http://www.yahoo.com
>
> And then I start crawl using this command:
> $bin/nutch crawl urls -dir crawl -depth 1 -topN 5
>
> But I get this "No URLs to fecth", can you please tell me what am i 
> missing?
> $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 1
> topN = 5
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: crawl/segments/20070406140513
> Generator: filtering: false
> Generator: topN: 5
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Trying to setup Nutch

Reply via email to