Thanks for your reply. I have copied only the segments directory but the searcher returns 0 hits.
Do I need to copy the linkdb and the index folders as well? Thanks. On Sun, 2006-01-29 at 23:18 +0100, Dominik Friedrich wrote: > Gal Nitzan schrieb: > > 1. If NDFS is too slow and all data must be copied to HD FS why use it > > in the first place? > > > NDFS is more or less part of the map/reduce system. It's needed because > you have to store a large amount of data in a way that all tasktrackers > can access it. Another reason is the realiability of the map/reduce > system. With the default settings each block of the NDFS is replicated > on three different machines. When machines fail the system is still able > run jobs. The tasktrackers copy the small chunk of data to their local > disk to have fast access when running a task and later the results are > copied back into the NDFS. > > When you want to search the data you need fast access to the index and > also to the segments used in that index. This is why you want to copy > those data out of the NDFS on the local disk of the search nodes. > > 2. If using NDFS and HD don you get 4 copies of the same data? > > > Yes, and when running map/reduce jobs you also get a lot of temporal > data, too. As said before the reduncy is needed for reliability and it > can also increase the performance of the map/reduce system. > > 3. Assuming the data is 3 TB, how do you split the data to be read by > > the searcher when not using NDFS? > > > You can create multiple indexes and use multiple search servers. You > copy each of these indexes with it's segments to one the search servers. > See for example > http://wiki.media-style.com/display/nutchDocu/setup+multiple+search+sever > for more details. > > best regards, > Dominik > >
