Paul and TL, 
I was wondering if you could detail how you have your cluster's configured,
hardware wise ie. how many servers are used for each purpose, epecially with
regard to how your storage is configured.  

We tested search for a 20 Million page index on a dual core 64 bit machine
with 8 GB of ram using storage of the nutch data on another server through
linux nfs, and it's performance was terrible. It looks like the bottleneck
was nfs, so I was wondering how you had your storage set up.  Are you using
NDFS, or is it split up over multiple servers?  We are trying to build a
system that could handle at least 50 million pages, so would appreciate any
advice on the the best way to configure the servers.  Originally we were
thinking 3 servers, 1 for crawling and indexing and 2 for search servers
would be enough for that size of index.

Thanks,
Murray   

-----Original Message-----
From: Paul Harrison [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 14, 2005 7:40 PM
To: [email protected]
Subject: RE: Nutch Search Speed Concern

I too would love to hear some answers on this one.  We have a 100 million
page implementation on 5 machines, 4 GB of ram, and 2 SATA drives of 250 GB
each.  Part of what I have noticed is that Lucene does some sort of strange
caching in that if you do subsequent searches on a search the return results
are quite quick.  I too have noticed that different terms have different
search responses and that the problem gets worse with the number of terms in
the query.  I have also noticed that distributed search has problems.  The
main search machine waits on other machines to serve up their results before
it will respond.  So it appears that your search is only as fast as your
slowest responding machine or whenever the timeout hits (whichever comes
first).  If anyone has any suggestions on tuning the distributed search or
general suggestions on speeding up retrieval times with a large set, I am
all ears.

Thanks,

Paul  

-----Original Message-----
From: TL [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 13, 2005 12:15 PM
To: [email protected]
Subject: Nutch Search Speed Concern

Search Speed

What are the most important factors in nutch/lucene's search speed?

I've been testing nutch's search speed on a search pool with about 100M
records (separated evenly into 30 segments), and discovered that certain
search terms have a signicantly higher search time then others.
Some searches take 30 ms while others takes upwards of 3000ms. 

At first, there seemed to be a direct relationship between the total number
of results from a given query and the timeit took to complete. But after
further testing, that relationship did not hold true for all cases. There
seems to be other factors that directly affect the speed of a search.

Has anyone else encountered this issue? Or have some insight to the impact
of certain factors on search speed? 

Thanks.

- T


                
__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to