rock on...that was it. thanks a ton. You guys are rocking this project.
Thanks!! Jason On Mon, Apr 21, 2008 at 6:23 PM, <[EMAIL PROTECTED]> wrote: > Jason, you only put a file in HDFS with that -put. You did not inject it > into crawldb, and that's what you need to do with bin/nutch inject .... > After that, run generate, fetch2, updatedb... > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: Jason Boss <[EMAIL PROTECTED]> > > To: [email protected] > > > > Sent: Monday, April 21, 2008 8:36:55 PM > > Subject: Re: hadoop > > > > > Whole web? What for, if not a secret? > > > > Tinkering and perhaps more. I used nutch back in the day but dang you > > guys have come a long ways! > > > > > Suggestion: don't run things as root. > > > > I know :) > > > > > Have you formatted the filesystem? > > > > Yes, I formatted the file system as per a tutorial I found online: > > bin/hadoop namenode -format > > > > > Can you run bin/hadoop fs -ls /user/root/crawl ? > > > > > [EMAIL PROTECTED] search]# bin/hadoop fs -ls /usr/root/crawl > > Found 0 items > > > > Doesn't look so good... > > > > > Oh, if you have not injected any URLs, there is nothing to crawl in your > > crawldb. > > > Run bin/nutch and you will see "inject" as one of the options. > > > > > bin/hadoop dfs -put urls urls > > > > I did a dfs -ls and it appears there. For whole web indexing I was used > to: > > > > bin/nutch generate crawl/crawldb crawl/segments -topN 1000 > > s2=`ls -d crawl/segments/2* | tail -1` > > echo $s2 > > bin/nutch fetch $s2 > > bin/nutch updatedb crawl/crawldb $s2 > > > > With hadoop what changes? Do i just point to the virtual file system? > > > > Thanks a ton! > > > > Jason > >
