Re: hadoop

Jason Boss Mon, 21 Apr 2008 19:54:07 -0700

rock on...that was it.  thanks a ton.

You guys are rocking this project.


Thanks!!

Jason

On Mon, Apr 21, 2008 at 6:23 PM,  <[EMAIL PROTECTED]> wrote:
> Jason, you only put a file in HDFS with that -put.  You did not inject it 
> into crawldb, and that's what you need to do with bin/nutch inject ....
>  After that, run generate, fetch2, updatedb...
>
>
>  Otis
>  --
>  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>  > From: Jason Boss <[EMAIL PROTECTED]>
>  > To: [email protected]
>
>
> > Sent: Monday, April 21, 2008 8:36:55 PM
>  > Subject: Re: hadoop
>  >
>  > >  Whole web?  What for, if not a secret?
>  >
>  > Tinkering and perhaps more.  I used nutch back in the day but dang you
>  > guys have come a long ways!
>  >
>  > >  Suggestion: don't run things as root.
>  >
>  > I know :)
>  >
>  > >  Have you formatted the filesystem?
>  >
>  > Yes, I formatted the file system as per a tutorial I found online:
>  > bin/hadoop namenode -format
>  >
>  > >  Can you run bin/hadoop fs -ls /user/root/crawl ?
>  > >
>  > [EMAIL PROTECTED] search]# bin/hadoop fs -ls /usr/root/crawl
>  > Found 0 items
>  >
>  > Doesn't look so good...
>  >
>  > >  Oh, if you have not injected any URLs, there is nothing to crawl in your
>  > crawldb.
>  > >  Run bin/nutch and you will see "inject" as one of the options.
>  > >
>  > bin/hadoop dfs -put urls urls
>  >
>  > I did a dfs -ls and it appears there.  For whole web indexing I was used 
> to:
>  >
>  > bin/nutch generate crawl/crawldb crawl/segments -topN 1000
>  > s2=`ls -d crawl/segments/2* | tail -1`
>  > echo $s2
>  > bin/nutch fetch $s2
>  > bin/nutch updatedb crawl/crawldb $s2
>  >
>  > With hadoop what changes?  Do i just point to the virtual file system?
>  >
>  > Thanks a ton!
>  >
>  > Jason
>
>

Re: hadoop

Reply via email to