Anyway you would post your conf/nutch-site.xml and
walk through your crawl process a bit?

Thanks,
Earl

--- Paul Harrison <[EMAIL PROTECTED]> wrote:

> Murray,
> 
> We are running on the following:
> 
> 5 Pentium 4 3.2 Ghz machines, 4 GB of RAM each, 1 40
> GB OS drive and 2 SATA
> 250 GB data drives each.  We are running the latest
> version of Fedora and
> have the data drives setup with ReiserFS.  We are
> running JDK 1.5 and Tomcat
> 5.5.
> 
> On a small set of 20 million I don't see much of a
> performance degredation;
> especially if it is all on one machine.  Where
> things get bad is in the
> distributed search.  We are actually contemplating
> rewriting the distributed
> search code.
> 
> Thanks,
> 
> Paul
> 
> -----Original Message-----
> From: Murray Hunter
> [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 17, 2005 9:11 AM
> To: [email protected]
> Subject: RE: Nutch Search Speed Concern
> 
> Paul and TL, 
> I was wondering if you could detail how you have
> your cluster's configured,
> hardware wise ie. how many servers are used for each
> purpose, epecially with
> regard to how your storage is configured.  
> 
> We tested search for a 20 Million page index on a
> dual core 64 bit machine
> with 8 GB of ram using storage of the nutch data on
> another server through
> linux nfs, and it's performance was terrible. It
> looks like the bottleneck
> was nfs, so I was wondering how you had your storage
> set up.  Are you using
> NDFS, or is it split up over multiple servers?  We
> are trying to build a
> system that could handle at least 50 million pages,
> so would appreciate any
> advice on the the best way to configure the servers.
>  Originally we were
> thinking 3 servers, 1 for crawling and indexing and
> 2 for search servers
> would be enough for that size of index.
> 
> Thanks,
> Murray   
> 
> -----Original Message-----
> From: Paul Harrison [mailto:[EMAIL PROTECTED] 
> Sent: Friday, October 14, 2005 7:40 PM
> To: [email protected]
> Subject: RE: Nutch Search Speed Concern
> 
> I too would love to hear some answers on this one. 
> We have a 100 million
> page implementation on 5 machines, 4 GB of ram, and
> 2 SATA drives of 250 GB
> each.  Part of what I have noticed is that Lucene
> does some sort of strange
> caching in that if you do subsequent searches on a
> search the return results
> are quite quick.  I too have noticed that different
> terms have different
> search responses and that the problem gets worse
> with the number of terms in
> the query.  I have also noticed that distributed
> search has problems.  The
> main search machine waits on other machines to serve
> up their results before
> it will respond.  So it appears that your search is
> only as fast as your
> slowest responding machine or whenever the timeout
> hits (whichever comes
> first).  If anyone has any suggestions on tuning the
> distributed search or
> general suggestions on speeding up retrieval times
> with a large set, I am
> all ears.
> 
> Thanks,
> 
> Paul  
> 
> -----Original Message-----
> From: TL [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 13, 2005 12:15 PM
> To: [email protected]
> Subject: Nutch Search Speed Concern
> 
> Search Speed
> 
> What are the most important factors in
> nutch/lucene's search speed?
> 
> I've been testing nutch's search speed on a search
> pool with about 100M
> records (separated evenly into 30 segments), and
> discovered that certain
> search terms have a signicantly higher search time
> then others.
> Some searches take 30 ms while others takes upwards
> of 3000ms. 
> 
> At first, there seemed to be a direct relationship
> between the total number
> of results from a given query and the timeit took to
> complete. But after
> further testing, that relationship did not hold true
> for all cases. There
> seems to be other factors that directly affect the
> speed of a search.
> 
> Has anyone else encountered this issue? Or have some
> insight to the impact
> of certain factors on search speed? 
> 
> Thanks.
> 
> - T
> 
> 
>               
> __________________________________
> Yahoo! Music Unlimited
> Access over 1 million songs. Try it free.
> http://music.yahoo.com/unlimited/
> 
> 



                
__________________________________ 
Yahoo! Music Unlimited 
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to