Re: Single DFS or alternative architectures for performance?

Dennis Kubes Wed, 09 Aug 2006 07:40:06 -0700

You wouldn't want to use the DFS for searching. You would want to usethe DFS/MapReduce for creating the index and slicing it up into certainsegment sizes of say 1-2 million pages. Then those individual indexsegments would need to be moved to a local file systems that have searchservers running each searching that specific part of the index. Youwould then have the search client (usually a website) sit in front ofthe search servers and use the searchservers.txt file to specify thesearch servers it connects to. The search client would aggregate theresults of the multiple index search servers and return the results tothe client.

We are currently using 1 million pages per index segment although otherson the list have stated that they have gotten up to 2 million pageswithout problems. After that the query tends to slow down because ofthe length of time it takes to read individual index segments. We havebeen running individual servers for each search segments but arecurrently playing around with having a single search server with manysmall disks (say 10 x 20G) with each disk having an index segment. Idon't know if that will work though.


Dennis

Murat Ali Bayir wrote:

Hi everybody,
Does a system with one DFS (crawl, parse, index, and search etc. allon 1 DFS)have performance problems at search part? What if 2 DFS were used? Oneforsearch part (getting summary etc.) and the other one is for the othernutch operations(fetch, parse, index etc.). Or is there any alternative architecturesfor systems performing
all the nutch functions concurrently on one DFS?

Re: Single DFS or alternative architectures for performance?

Reply via email to