Paul and TL, I was wondering if you could detail how you have your cluster's configured, hardware wise ie. how many servers are used for each purpose, epecially with regard to how your storage is configured.
We tested search for a 20 Million page index on a dual core 64 bit machine with 8 GB of ram using storage of the nutch data on another server through linux nfs, and it's performance was terrible. It looks like the bottleneck was nfs, so I was wondering how you had your storage set up. Are you using NDFS, or is it split up over multiple servers? We are trying to build a system that could handle at least 50 million pages, so would appreciate any advice on the the best way to configure the servers. Originally we were thinking 3 servers, 1 for crawling and indexing and 2 for search servers would be enough for that size of index. Thanks, Murray -----Original Message----- From: Paul Harrison [mailto:[EMAIL PROTECTED] Sent: Friday, October 14, 2005 7:40 PM To: [email protected] Subject: RE: Nutch Search Speed Concern I too would love to hear some answers on this one. We have a 100 million page implementation on 5 machines, 4 GB of ram, and 2 SATA drives of 250 GB each. Part of what I have noticed is that Lucene does some sort of strange caching in that if you do subsequent searches on a search the return results are quite quick. I too have noticed that different terms have different search responses and that the problem gets worse with the number of terms in the query. I have also noticed that distributed search has problems. The main search machine waits on other machines to serve up their results before it will respond. So it appears that your search is only as fast as your slowest responding machine or whenever the timeout hits (whichever comes first). If anyone has any suggestions on tuning the distributed search or general suggestions on speeding up retrieval times with a large set, I am all ears. Thanks, Paul -----Original Message----- From: TL [mailto:[EMAIL PROTECTED] Sent: Thursday, October 13, 2005 12:15 PM To: [email protected] Subject: Nutch Search Speed Concern Search Speed What are the most important factors in nutch/lucene's search speed? I've been testing nutch's search speed on a search pool with about 100M records (separated evenly into 30 segments), and discovered that certain search terms have a signicantly higher search time then others. Some searches take 30 ms while others takes upwards of 3000ms. At first, there seemed to be a direct relationship between the total number of results from a given query and the timeit took to complete. But after further testing, that relationship did not hold true for all cases. There seems to be other factors that directly affect the speed of a search. Has anyone else encountered this issue? Or have some insight to the impact of certain factors on search speed? Thanks. - T __________________________________ Yahoo! Music Unlimited Access over 1 million songs. Try it free. http://music.yahoo.com/unlimited/ ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
