Re: Search performance for large indexes (100M docs)

2009-01-16 Thread Alex Basa
with the number of threads setting? I have mine at 10 but is hardly making a dent on the blade server running at max. I was thinking of upping it to 20. --- On Fri, 1/16/09, Mark Bennett mbenn...@ideaeng.com wrote: From: Mark Bennett mbenn...@ideaeng.com Subject: Re: Search performance for large indexes

RE: Search performance for large indexes (100M docs)

2009-01-15 Thread VishalS
...@yahoo.com] Sent: Wednesday, January 14, 2009 9:31 AM To: nutch-user@lucene.apache.org Subject: Re: Search performance for large indexes (100M docs) Vishal, Re 2. - I don't think it's quite true. RAM is still much faster than SSDs. Also, which version of Lucene are you using? Make sure you're

Re: Search performance for large indexes (100M docs)

2009-01-15 Thread Laurent Laborde
Fast storage ? you need that : http://www.hyperossystems.co.uk/ Takes 8x 240 pin DDR2 DIMMS from 2GB - 8GB. 2x SATA2 interface ports. CD ROM Drive Form Factor. 175MB/s Read rate. 145MB/s Write rate. 40,000 IOPS. Hard disks do 200-300 IOPS (File Inputs or Outputs Per Second). 10 microsecond disk

Re: Search performance for large indexes (100M docs)

2009-01-15 Thread Sean Dean
: Re: Search performance for large indexes (100M docs) Fast storage ? you need that : http://www.hyperossystems.co.uk/ Takes 8x 240 pin DDR2 DIMMS from 2GB - 8GB. 2x SATA2 interface ports. CD ROM Drive Form Factor. 175MB/s Read rate. 145MB/s Write rate. 40,000 IOPS. Hard disks do 200-300 IOPS (File

Re: Search performance for large indexes (100M docs)

2009-01-14 Thread buddha1021
it. From: Dennis Kubes ku...@apache.org To: nutch-user@lucene.apache.org Sent: Thursday, January 8, 2009 10:22:09 PM Subject: Re: Search performance for large indexes (100M docs) buddha1021 wrote: hi dennis: in your opinion,which is the most

Re: Search performance for large indexes (100M docs)

2009-01-14 Thread Dennis Kubes
, January 12, 2009 6:49:58 AM Subject: RE: Search performance for large indexes (100M docs) Hi, Thanks for the responses - I have received replies from Otis, Dennis, Sean and Jay Pound (sorry if I forgot someone). To summarize what I understood from these replies: 1.The indices *have

Re: Search performance for large indexes (100M docs)

2009-01-14 Thread Andrzej Bialecki
Dennis Kubes wrote: Otis Gospodnetic wrote: Vishal, Re 2. - I don't think it's quite true. RAM is still much faster than SSDs. I am going to agree with Otis here. And, until very recently RAM was still cheaper than SSDs. That has almost changed but now they are coming out with cheap

Re: Search performance for large indexes (100M docs)

2009-01-14 Thread Sean Dean
into consideration. From: buddha1021 buddha1...@yahoo.cn To: nutch-user@lucene.apache.org Sent: Wednesday, January 14, 2009 10:12:58 AM Subject: Re: Search performance for large indexes (100M docs) how many pages do the 32G SLC SSD indexes Contain? 20millions?I don't

Re: Search performance for large indexes (100M docs)

2009-01-14 Thread buddha1021
cant place multiple desktop or towers at the datacenter. You would only need 5 machines to search 100 million pages, although this isn't taking speed into consideration. -- View this message in context: http://www.nabble.com/Search-performance-for-large

RE: Search performance for large indexes (100M docs)

2009-01-13 Thread VishalS
. It's worth looking into SSDs to store the indices. This would probably help speed up the search performance, is cheaper compared to RAM and gives almost similar performance. 3. Jay mentioned that with Nutch 0.7, the hard drives were a bottleneck for him. He got over the issue by using multiple

Re: Search performance for large indexes (100M docs)

2009-01-13 Thread Otis Gospodnetic
that way. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: VishalS vish...@rediff.co.in To: VishalS vish...@rediff.co.in; nutch-user@lucene.apache.org Sent: Monday, January 12, 2009 6:49:58 AM Subject: RE: Search performance for large indexes

Re: Search performance for large indexes (100M docs)

2009-01-13 Thread Marc Boucher
@lucene.apache.org Sent: Monday, January 12, 2009 6:49:58 AM Subject: RE: Search performance for large indexes (100M docs) Hi, Thanks for the responses - I have received replies from Otis, Dennis, Sean and Jay Pound (sorry if I forgot someone). To summarize what I understood from these replies

Re: Search performance for large indexes (100M docs)

2009-01-10 Thread Sean Dean
: Friday, January 9, 2009 10:22:45 AM Subject: Re: Search performance for large indexes (100M docs) Check java-user archives on markmail.org and search for Toke and SSD to see SSD benchmarks done by Toke a few months back. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Search performance for large indexes (100M docs)

2009-01-10 Thread Dennis Kubes
RAM. From: Otis Gospodnetic ogjunk-nu...@yahoo.com To: nutch-user@lucene.apache.org Sent: Friday, January 9, 2009 10:22:45 AM Subject: Re: Search performance for large indexes (100M docs) Check java-user archives on markmail.org and search for Toke and SSD

Re: Search performance for large indexes (100M docs)

2009-01-09 Thread Otis Gospodnetic
: Thursday, January 8, 2009 11:46:20 PM Subject: Re: Search performance for large indexes (100M docs) You might also want to research using single-layer cell SSDs instead of bulking up on RAM. Google has started using them in new search servers to save power but also speed up search I/O

Re: Search performance for large indexes (100M docs)

2009-01-09 Thread ianwong
in the number of threads, the java stack size allocation. If there's anyone else who has had experience working with large indices, I would love to get in touch and exchange notes. Regards, -Vishal. -- View this message in context: http://www.nabble.com/Search-performance

Re: Search performance for large indexes (100M docs)

2009-01-09 Thread Dennis Kubes
Essentially you would create a tempfs (ramdisk) and put the indexes in the tempfs. Assuming your indexes were in a folder called indexes.dist, you would use something like this: mount -t tmpfs -o size=7516192768 none /your/indexes rsync --progress -aptv /your/indexes.dist/* /your/indexes/

Re: Search performance for large indexes (100M docs)

2009-01-08 Thread buddha1021
experience working with large indices, I would love to get in touch and exchange notes. Regards, -Vishal. -- View this message in context: http://www.nabble.com/Search-performance-for-large-indexes-%28%3E100M-docs%29-tp21315030p21364933.html Sent from the Nutch - User

Re: Search performance for large indexes (100M docs)

2009-01-08 Thread Dennis Kubes
buddha1021 wrote: hi dennis: in your opinion,which is the most important reason for the fast search speed of google : 1 google's programme(Code) is very excellence. or Yes. They are performance fanatics (literally). But there is only so much you are going to be able to optimize code,

Re: Search performance for large indexes (100M docs)

2009-01-08 Thread Sean Dean
, and my god I'm going to try it. From: Dennis Kubes ku...@apache.org To: nutch-user@lucene.apache.org Sent: Thursday, January 8, 2009 10:22:09 PM Subject: Re: Search performance for large indexes (100M docs) buddha1021 wrote: hi dennis: in your opinion

Search performance for large indexes (100M docs)

2009-01-06 Thread VishalS
Hi, I am experimenting with a system with around 120 million documents. The index is split into sub-indices of ~10M documents - each such index is being searched by a single machine. The results are being aggregated using the DistributedSearcher client. I am seeing a lot of performance

Re: Search performance for large indexes (100M docs)

2009-01-06 Thread Dennis Kubes
Take a look on the mailing lists for keeping the indexes in memory. When you get to the sizes you are talking about, the way you get subsecond response times is by: 1) Keeping the indexes in RAM 2) Agressive caching Dennis VishalS wrote: Hi, I am experimenting with a system with

Re: document segement size and search performance ?

2008-06-04 Thread Andrzej Bialecki
wuqi wrote: Hi, As we all know, parse_text in the segment will be used by searcher to generate snippets,and I want to know with the two conditions below which should be faster for searcher to retrieve pars_text: 1. 50 Segments * 10,000 pages/segment 2. 5 segment * 100,000 pages/segment

document segement size and search performance ?

2008-06-03 Thread wuqi
Hi, As we all know, parse_text in the segment will be used by searcher to generate snippets,and I want to know with the two conditions below which should be faster for searcher to retrieve pars_text: 1. 50 Segments * 10,000 pages/segment 2. 5 segment * 100,000 pages/segment If we have more

Re: search performance

2006-12-29 Thread shrinivas patwardhan
thank you Sean Dean for your quick reply ... well i am running nutch on ubuntu 5.01 and jdk1.5 there are some apps running in the background but they dont take up that much of memory . secondly i can understand about the first search .. but the other searches following it also take time even

Re: search performance

2006-12-29 Thread Sean Dean
experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing -server via JAVA_OPTS. 2. Disable all non-required system and user

Re: search performance

2006-12-29 Thread shrinivas patwardhan
thank you sean .. will do the same and let you know .. if the performance is not up to the mark .. thanks a lot Thanks Regards Shrinivas

Re: search performance

2006-12-29 Thread RP
million page index running at the speeds you experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing -server via JAVA_OPTS. 2

Re: search performance

2006-12-29 Thread Insurance Squared Inc.
search speed. My recommendation would be to get more RAM, another 512MB should support a 1.5 million page index running at the speeds you experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just

Re: search performance

2006-12-29 Thread Michael Wechner
even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing -server via JAVA_OPTS. 2. Disable all non-required system and user applications. 3. Download or install the newest stable

Re: search performance

2006-12-29 Thread Insurance Squared Inc.
recommendation would be to get more RAM, another 512MB should support a 1.5 million page index running at the speeds you experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more

Re: search performance

2006-12-29 Thread Michael Wechner
your 3000 page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing -server via JAVA_OPTS. 2. Disable all non-required system and user applications. 3

Re: search performance

2006-12-29 Thread Insurance Squared Inc.
page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing -server via JAVA_OPTS. 2. Disable all non-required system and user applications. 3. Download

Re: search performance

2006-12-29 Thread Michael Wechner
million page index running at the speeds you experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance. Here are a few other tips, just in case you cant get any more RAM at this time: 1. Make sure your passing -server via JAVA_OPTS. 2

search performance

2006-12-28 Thread shrinivas patwardhan
hello .. I am using nutch to study its various components . I injected the dmoz index file and fetched around 1.5 million pages when i fire a query in the web interface it takes a long time to display the results. i am just trying to run it on a single machine . my configuration is : 2.4 GHZ

nutch 0.7.0 search performance measurement

2006-03-05 Thread Stefan Groschupf
Hi, for people that found that interesting I had published some measurement values I had done a long time ago. http://www.find23.net/Web-Site/blog/A712F01B-4BB1-4FC6-AE95- E64988FBCC79.html All time related values are in milliseconds. Don't take the values to serious however at least they