Thanks for your reply.

I have copied only the segments directory but the searcher returns 0
hits.

Do I need to copy the linkdb and the index folders as well?

Thanks.

On Sun, 2006-01-29 at 23:18 +0100, Dominik Friedrich wrote:
> Gal Nitzan schrieb:
> > 1. If NDFS is too slow and all data must be copied to HD FS why use it
> > in the first place?
> >   
> NDFS is more or less part of the map/reduce system. It's needed because 
> you have to store a large amount of data in a way that all tasktrackers 
> can access it. Another reason is the realiability of the map/reduce 
> system. With the default settings each block of the NDFS is replicated 
> on three different machines. When machines fail the system is still able 
> run jobs. The tasktrackers copy the small chunk of data to their local 
> disk to have fast access when running a task and later the results are 
> copied back into the NDFS.
> 
> When you want to search the data you need fast access to the index and 
> also to the segments used in that index. This is why you want to copy 
> those data out of the NDFS on the local disk of the search nodes.
> > 2. If using NDFS and HD don you get 4 copies of the same data?
> >   
> Yes, and when running map/reduce jobs you also get a lot of temporal 
> data, too. As said before the reduncy is needed for reliability and it 
> can also increase the performance of the map/reduce system.
> > 3. Assuming the data is 3 TB, how do you split the data to be read by
> > the searcher when not using NDFS?
> >   
> You can create multiple indexes and use multiple search servers. You 
> copy each of these indexes with it's segments to one the search servers. 
> See for example 
> http://wiki.media-style.com/display/nutchDocu/setup+multiple+search+sever 
> for more details.
> 
> best regards,
> Dominik
> 
> 


Reply via email to