Hey Paul and Murray,

We're running a similar setup. We have two machines
running distributed search and we're seeing decent
performance gains by using distributed search.

Our machines have similar specs to Pauls... P 4cpu's,
4 gb of ram, sata drives. 

NFS would be a performance killer. 

But my problem only occurs for specific searches. We
have some search terms that will search in under 50
ms, and some others that would take up to 4000 ms.


--- Paul Harrison <[EMAIL PROTECTED]> wrote:

> Murray,
> 
> We are running on the following:
> 
> 5 Pentium 4 3.2 Ghz machines, 4 GB of RAM each, 1 40
> GB OS drive and 2 SATA
> 250 GB data drives each.  We are running the latest
> version of Fedora and
> have the data drives setup with ReiserFS.  We are
> running JDK 1.5 and Tomcat
> 5.5.
> 
> On a small set of 20 million I don't see much of a
> performance degredation;
> especially if it is all on one machine.  Where
> things get bad is in the
> distributed search.  We are actually contemplating
> rewriting the distributed
> search code.
> 
> Thanks,
> 
> Paul
> 
> -----Original Message-----
> From: Murray Hunter
> [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 17, 2005 9:11 AM
> To: [email protected]
> Subject: RE: Nutch Search Speed Concern
> 
> Paul and TL, 
> I was wondering if you could detail how you have
> your cluster's configured,
> hardware wise ie. how many servers are used for each
> purpose, epecially with
> regard to how your storage is configured.  
> 
> We tested search for a 20 Million page index on a
> dual core 64 bit machine
> with 8 GB of ram using storage of the nutch data on
> another server through
> linux nfs, and it's performance was terrible. It
> looks like the bottleneck
> was nfs, so I was wondering how you had your storage
> set up.  Are you using
> NDFS, or is it split up over multiple servers?  We
> are trying to build a
> system that could handle at least 50 million pages,
> so would appreciate any
> advice on the the best way to configure the servers.
>  Originally we were
> thinking 3 servers, 1 for crawling and indexing and
> 2 for search servers
> would be enough for that size of index.
> 
> Thanks,
> Murray   
> 
> -----Original Message-----
> From: Paul Harrison [mailto:[EMAIL PROTECTED] 
> Sent: Friday, October 14, 2005 7:40 PM
> To: [email protected]
> Subject: RE: Nutch Search Speed Concern
> 
> I too would love to hear some answers on this one. 
> We have a 100 million
> page implementation on 5 machines, 4 GB of ram, and
> 2 SATA drives of 250 GB
> each.  Part of what I have noticed is that Lucene
> does some sort of strange
> caching in that if you do subsequent searches on a
> search the return results
> are quite quick.  I too have noticed that different
> terms have different
> search responses and that the problem gets worse
> with the number of terms in
> the query.  I have also noticed that distributed
> search has problems.  The
> main search machine waits on other machines to serve
> up their results before
> it will respond.  So it appears that your search is
> only as fast as your
> slowest responding machine or whenever the timeout
> hits (whichever comes
> first).  If anyone has any suggestions on tuning the
> distributed search or
> general suggestions on speeding up retrieval times
> with a large set, I am
> all ears.
> 
> Thanks,
> 
> Paul  
> 
> -----Original Message-----
> From: TL [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 13, 2005 12:15 PM
> To: [email protected]
> Subject: Nutch Search Speed Concern
> 
> Search Speed
> 
> What are the most important factors in
> nutch/lucene's search speed?
> 
> I've been testing nutch's search speed on a search
> pool with about 100M
> records (separated evenly into 30 segments), and
> discovered that certain
> search terms have a signicantly higher search time
> then others.
> Some searches take 30 ms while others takes upwards
> of 3000ms. 
> 
> At first, there seemed to be a direct relationship
> between the total number
> of results from a given query and the timeit took to
> complete. But after
> further testing, that relationship did not hold true
> for all cases. There
> seems to be other factors that directly affect the
> speed of a search.
> 
> Has anyone else encountered this issue? Or have some
> insight to the impact
> of certain factors on search speed? 
> 
> Thanks.
> 
> - T
> 
> 
>               
> __________________________________
> Yahoo! Music Unlimited
> Access over 1 million songs. Try it free.
> http://music.yahoo.com/unlimited/
> 
> 



        
                
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to