> Whole web? What for, if not a secret? Tinkering and perhaps more. I used nutch back in the day but dang you guys have come a long ways!
> Suggestion: don't run things as root. I know :) > Have you formatted the filesystem? Yes, I formatted the file system as per a tutorial I found online: bin/hadoop namenode -format > Can you run bin/hadoop fs -ls /user/root/crawl ? > [EMAIL PROTECTED] search]# bin/hadoop fs -ls /usr/root/crawl Found 0 items Doesn't look so good... > Oh, if you have not injected any URLs, there is nothing to crawl in your > crawldb. > Run bin/nutch and you will see "inject" as one of the options. > bin/hadoop dfs -put urls urls I did a dfs -ls and it appears there. For whole web indexing I was used to: bin/nutch generate crawl/crawldb crawl/segments -topN 1000 s2=`ls -d crawl/segments/2* | tail -1` echo $s2 bin/nutch fetch $s2 bin/nutch updatedb crawl/crawldb $s2 With hadoop what changes? Do i just point to the virtual file system? Thanks a ton! Jason
