So, whether we do the crawling by using multiple nodes or running it on a single node with the regular bin/generate, bin/fetcher, series of commands, finally we get a single crawldb on the local system where Nutch is supposed to run. Am I right?
On Dec 16, 2007 11:47 AM, Dennis Kubes <[EMAIL PROTECTED]> wrote: > If you are talking about the clustering plugin, that is about grouping > (hopefully) related documents in the search results. > > Running the crawler and other nutch processes on multiple nodes is nutch > and hadoop running the map reduce paradigm. Moving final indexes and > database to local file systems for searching is simply best practices. > > Dennis Kubes > > > Bent Hugh wrote: > > I am a little confused. In the Nutch wiki there are chapters on > > clustering. I have never tried them though. So what is clustering > > about? Is it running the crawler on multiple nodes and creating > > crawldb on multiple nodes? And then finally merging all these on a > > local system and running the Nutch web-gui from that? > > > > > > On Dec 16, 2007 10:17 AM, Dennis Kubes <[EMAIL PROTECTED]> wrote: > >> Technically you can. The speed for most search applications would be > >> unacceptable. Searching of indexes is best done on local files systems > >> for speed. > >> > >> Dennis Kubes > >> > >> > >> hzhong wrote: > >>> Hello, > >>> > >>> Why can't we search on the Hadoop DFS? > >>> > >>> Thanks >
