Thanks for your reply.

I have copied only the segments directory but the searcher returns 0
hits.

Do I need to copy the linkdb and the index folders as well?

Thanks.

On Sun, 2006-01-29 at 23:18 +0100, Dominik Friedrich wrote:
> Gal Nitzan schrieb:
> > 1. If NDFS is too slow and all data must be copied to HD FS why use it
> > in the first place?
> >   
> NDFS is more or less part of the map/reduce system. It's needed because 
> you have to store a large amount of data in a way that all tasktrackers 
> can access it. Another reason is the realiability of the map/reduce 
> system. With the default settings each block of the NDFS is replicated 
> on three different machines. When machines fail the system is still able 
> run jobs. The tasktrackers copy the small chunk of data to their local 
> disk to have fast access when running a task and later the results are 
> copied back into the NDFS.
> 
> When you want to search the data you need fast access to the index and 
> also to the segments used in that index. This is why you want to copy 
> those data out of the NDFS on the local disk of the search nodes.
> > 2. If using NDFS and HD don you get 4 copies of the same data?
> >   
> Yes, and when running map/reduce jobs you also get a lot of temporal 
> data, too. As said before the reduncy is needed for reliability and it 
> can also increase the performance of the map/reduce system.
> > 3. Assuming the data is 3 TB, how do you split the data to be read by
> > the searcher when not using NDFS?
> >   
> You can create multiple indexes and use multiple search servers. You 
> copy each of these indexes with it's segments to one the search servers. 
> See for example 
> http://wiki.media-style.com/display/nutchDocu/setup+multiple+search+sever 
> for more details.
> 
> best regards,
> Dominik
> 
> 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to