So, whether we do the crawling by using multiple nodes or running it
on a single node with the regular bin/generate, bin/fetcher, series of
commands, finally we get a single crawldb on the local system where
Nutch is supposed to run. Am I right?

On Dec 16, 2007 11:47 AM, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> If you are talking about the clustering plugin, that is about grouping
> (hopefully) related documents in the search results.
>
> Running the crawler and other nutch processes on multiple nodes is nutch
> and hadoop running the map reduce paradigm.  Moving final indexes and
> database to local file systems for searching is simply best practices.
>
> Dennis Kubes
>
>
> Bent Hugh wrote:
> > I am a little confused. In the Nutch wiki there are chapters on
> > clustering. I have never tried them though. So what is clustering
> > about? Is it running the crawler on multiple nodes and creating
> > crawldb on multiple nodes? And then finally merging all these on a
> > local system and running the Nutch web-gui from that?
> >
> >
> > On Dec 16, 2007 10:17 AM, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> >> Technically you can.  The speed for most search applications would be
> >> unacceptable.  Searching of indexes is best done on local files systems
> >> for speed.
> >>
> >> Dennis Kubes
> >>
> >>
> >> hzhong wrote:
> >>> Hello,
> >>>
> >>> Why can't  we search on the Hadoop DFS?
> >>>
> >>> Thanks
>

Reply via email to