Have yuo checked your crawl-urlfilter.txt file ?
Make sure you have replaced your accepted domain.

----- Original Message -----
From: "Meryl Silverburgh" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, April 07, 2007 8:54 AM
Subject: Re: Trying to setup Nutch

> On 4/6/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> After setup, you should put the urls you want to crawl into the HDFS by 
>> the
>> command :
>> $bin/hadoop dfs -put urls urls
>>
>> Maybe that's something you forgot to do and I hope it helps :)
>>
>
> I try your command, but I get this error:
> $ bin/hadoop dfs -put urls urls
> put: Target urls already exists
>
>
> I just have 1 line in my file 'urls':
> $ more urls
> http://www.yahoo.com
>
> Thanks for any help.
>
>
>> ----- Original Message -----
>> From: "Meryl Silverburgh" <[EMAIL PROTECTED]>
>> To: <[email protected]>
>> Sent: Saturday, April 07, 2007 3:08 AM
>> Subject: Trying to setup Nutch
>>
>> > Hi,
>> >
>> > i am trying to setup Nutch.
>> > I setup 1 site in my urls file:
>> > http://www.yahoo.com
>> >
>> > And then I start crawl using this command:
>> > $bin/nutch crawl urls -dir crawl -depth 1 -topN 5
>> >
>> > But I get this "No URLs to fecth", can you please tell me what am i
>> > missing?
>> > $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5
>> > crawl started in: crawl
>> > rootUrlDir = urls
>> > threads = 10
>> > depth = 1
>> > topN = 5
>> > Injector: starting
>> > Injector: crawlDb: crawl/crawldb
>> > Injector: urlDir: urls
>> > Injector: Converting injected urls to crawl db entries.
>> > Injector: Merging injected urls into crawl db.
>> > Injector: done
>> > Generator: Selecting best-scoring urls due for fetch.
>> > Generator: starting
>> > Generator: segment: crawl/segments/20070406140513
>> > Generator: filtering: false
>> > Generator: topN: 5
>> > Generator: jobtracker is 'local', generating exactly one partition.
>> > Generator: 0 records selected for fetching, exiting ...
>> > Stopping at depth=0 - no more URLs to fetch.
>> > No URLs to fetch - check your seed list and URL filters.
>> > crawl finished: crawl
>> >
>>
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to