You wouldn't want to use the DFS for searching. You would want to use
the DFS/MapReduce for creating the index and slicing it up into certain
segment sizes of say 1-2 million pages. Then those individual index
segments would need to be moved to a local file systems that have search
servers running each searching that specific part of the index. You
would then have the search client (usually a website) sit in front of
the search servers and use the searchservers.txt file to specify the
search servers it connects to. The search client would aggregate the
results of the multiple index search servers and return the results to
the client.
We are currently using 1 million pages per index segment although others
on the list have stated that they have gotten up to 2 million pages
without problems. After that the query tends to slow down because of
the length of time it takes to read individual index segments. We have
been running individual servers for each search segments but are
currently playing around with having a single search server with many
small disks (say 10 x 20G) with each disk having an index segment. I
don't know if that will work though.
Dennis
Murat Ali Bayir wrote:
Hi everybody,
Does a system with one DFS (crawl, parse, index, and search etc. all
on 1 DFS)
have performance problems at search part? What if 2 DFS were used? One
for
search part (getting summary etc.) and the other one is for the other
nutch operations
(fetch, parse, index etc.). Or is there any alternative architectures
for systems performing
all the nutch functions concurrently on one DFS?