[Nutch-general] Re: Search setup

Dominik Friedrich Sun, 29 Jan 2006 14:19:19 -0800

Gal Nitzan schrieb:

1. If NDFS is too slow and all data must be copied to HD FS why use it
in the first place?

NDFS is more or less part of the map/reduce system. It's needed becauseyou have to store a large amount of data in a way that all tasktrackerscan access it. Another reason is the realiability of the map/reducesystem. With the default settings each block of the NDFS is replicatedon three different machines. When machines fail the system is still ablerun jobs. The tasktrackers copy the small chunk of data to their localdisk to have fast access when running a task and later the results arecopied back into the NDFS.

When you want to search the data you need fast access to the index andalso to the segments used in that index. This is why you want to copythose data out of the NDFS on the local disk of the search nodes.

2. If using NDFS and HD don you get 4 copies of the same data?

Yes, and when running map/reduce jobs you also get a lot of temporaldata, too. As said before the reduncy is needed for reliability and itcan also increase the performance of the map/reduce system.

3. Assuming the data is 3 TB, how do you split the data to be read by
the searcher when not using NDFS?

You can create multiple indexes and use multiple search servers. Youcopy each of these indexes with it's segments to one the search servers.See for examplehttp://wiki.media-style.com/display/nutchDocu/setup+multiple+search+severfor more details.


best regards,
Dominik



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Search setup

Reply via email to