Re: hadoop

Jason Boss Mon, 21 Apr 2008 17:37:27 -0700

>  Whole web?  What for, if not a secret?

Tinkering and perhaps more.  I used nutch back in the day but dang you
guys have come a long ways!


>  Suggestion: don't run things as root.

I know :)

>  Have you formatted the filesystem?

Yes, I formatted the file system as per a tutorial I found online:
bin/hadoop namenode -format

>  Can you run bin/hadoop fs -ls /user/root/crawl ?
>
[EMAIL PROTECTED] search]# bin/hadoop fs -ls /usr/root/crawl
Found 0 items

Doesn't look so good...

>  Oh, if you have not injected any URLs, there is nothing to crawl in your 
> crawldb.
>  Run bin/nutch and you will see "inject" as one of the options.
>
bin/hadoop dfs -put urls urls

I did a dfs -ls and it appears there.  For whole web indexing I was used to:

bin/nutch generate crawl/crawldb crawl/segments -topN 1000
s2=`ls -d crawl/segments/2* | tail -1`
echo $s2
bin/nutch fetch $s2
bin/nutch updatedb crawl/crawldb $s2

With hadoop what changes?  Do i just point to the virtual file system?

Thanks a ton!

Jason

Re: hadoop

Reply via email to