[Nutch-general] Re[2]: AW: nutch-0.8 crawl problem

Nutch Wed, 22 Feb 2006 04:04:10 -0800

Здравствуйте, Gal.

Вы писали 22 февраля 2006 г., 14:11:37:


> :) a bit misleading....

> first: Hadoop is the evolution from "Nutch Distributed File System".

> It is based on google's file system. It enable one to keep all data in a
> distributed file system which is very suitable to Nutch.

> When you see bin/nuctch NDFS -ls write instead bin/hadoop dfs -ls

> now to create the seeds:

> create the urls.txt file in a folder called seeds i.e. seeds/urls.txt

> bin/hadoop dfs -put seeds seeds
> this will copy the seeds folder into hadoop file system

> and now

> bin/nutch crawl seeds -dir crawled -depth 3 >& crawl.log

> Happy crawling.

> Gal.


> On Wed, 2006-02-22 at 01:05 -0800, Foong Yie wrote:
>> matt
>> 
>> as the tutorial stated ..
>> 
>> bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log
>> 
>> the urls is in .txt right? i created it and put inside c:/nutch-0.7.1
>> 
>> Stephanie
>> 
>>               
>> ---------------------------------
>>  Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new 
>> and used cars.




> __________ NOD32 1.1415 (20060221) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com


Thanks a lot!!!
I'll try it
One thing else.. Do I have to download and compile hadoop sources?


-- 
С уважением,
 Nutch                          mailto:[EMAIL PROTECTED]



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re[2]: AW: nutch-0.8 crawl problem

Reply via email to