with the number of threads setting? I have mine at 10 but is
hardly making a dent on the blade server running at max. I was thinking of
upping it to 20.
--- On Fri, 1/16/09, Mark Bennett mbenn...@ideaeng.com wrote:
From: Mark Bennett mbenn...@ideaeng.com
Subject: Re: Search performance for large indexes
...@yahoo.com]
Sent: Wednesday, January 14, 2009 9:31 AM
To: nutch-user@lucene.apache.org
Subject: Re: Search performance for large indexes (100M docs)
Vishal,
Re 2. - I don't think it's quite true. RAM is still much faster than SSDs.
Also, which version of Lucene are you using? Make sure you're
Fast storage ?
you need that :
http://www.hyperossystems.co.uk/
Takes 8x 240 pin DDR2 DIMMS from 2GB - 8GB.
2x SATA2 interface ports.
CD ROM Drive Form Factor.
175MB/s Read rate.
145MB/s Write rate.
40,000 IOPS. Hard disks do 200-300 IOPS (File Inputs or Outputs Per Second).
10 microsecond disk
: Re: Search performance for large indexes (100M docs)
Fast storage ?
you need that :
http://www.hyperossystems.co.uk/
Takes 8x 240 pin DDR2 DIMMS from 2GB - 8GB.
2x SATA2 interface ports.
CD ROM Drive Form Factor.
175MB/s Read rate.
145MB/s Write rate.
40,000 IOPS. Hard disks do 200-300 IOPS (File
it.
From: Dennis Kubes ku...@apache.org
To: nutch-user@lucene.apache.org
Sent: Thursday, January 8, 2009 10:22:09 PM
Subject: Re: Search performance for large indexes (100M docs)
buddha1021 wrote:
hi dennis:
in your opinion,which is the most
, January 12, 2009 6:49:58 AM
Subject: RE: Search performance for large indexes (100M docs)
Hi,
Thanks for the responses - I have received replies from Otis, Dennis,
Sean and Jay Pound (sorry if I forgot someone). To summarize what I
understood from these replies:
1.The indices *have
Dennis Kubes wrote:
Otis Gospodnetic wrote:
Vishal,
Re 2. - I don't think it's quite true. RAM is still much faster than
SSDs.
I am going to agree with Otis here. And, until very recently RAM was
still cheaper than SSDs. That has almost changed but now they are
coming out with cheap
into consideration.
From: buddha1021 buddha1...@yahoo.cn
To: nutch-user@lucene.apache.org
Sent: Wednesday, January 14, 2009 10:12:58 AM
Subject: Re: Search performance for large indexes (100M docs)
how many pages do the 32G SLC SSD indexes Contain? 20millions?I don't
cant
place multiple desktop or towers at the datacenter.
You would only need 5 machines to search 100 million pages, although this
isn't taking speed into consideration.
--
View this message in context:
http://www.nabble.com/Search-performance-for-large
. It's worth looking into SSDs to store the indices. This would
probably help speed up the search performance, is cheaper compared to RAM
and gives almost similar performance.
3. Jay mentioned that with Nutch 0.7, the hard drives were a bottleneck
for him. He got over the issue by using multiple
that way.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: VishalS vish...@rediff.co.in
To: VishalS vish...@rediff.co.in; nutch-user@lucene.apache.org
Sent: Monday, January 12, 2009 6:49:58 AM
Subject: RE: Search performance for large indexes
@lucene.apache.org
Sent: Monday, January 12, 2009 6:49:58 AM
Subject: RE: Search performance for large indexes (100M docs)
Hi,
Thanks for the responses - I have received replies from Otis,
Dennis,
Sean and Jay Pound (sorry if I forgot someone). To summarize what I
understood from these replies
: Friday, January 9, 2009 10:22:45 AM
Subject: Re: Search performance for large indexes (100M docs)
Check java-user archives on markmail.org and search for Toke and SSD to see
SSD benchmarks done by Toke a few months back.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
RAM.
From: Otis Gospodnetic ogjunk-nu...@yahoo.com
To: nutch-user@lucene.apache.org
Sent: Friday, January 9, 2009 10:22:45 AM
Subject: Re: Search performance for large indexes (100M docs)
Check java-user archives on markmail.org and search for Toke and SSD
: Thursday, January 8, 2009 11:46:20 PM
Subject: Re: Search performance for large indexes (100M docs)
You might also want to research using single-layer cell SSDs instead of
bulking
up on RAM. Google has started using them in new search servers to save power
but
also speed up search I/O
in the number of threads, the java stack size allocation.
If there's anyone else who has had experience working with large indices,
I
would love to get in touch and exchange notes.
Regards,
-Vishal.
--
View this message in context:
http://www.nabble.com/Search-performance
Essentially you would create a tempfs (ramdisk) and put the indexes in
the tempfs. Assuming your indexes were in a folder called indexes.dist,
you would use something like this:
mount -t tmpfs -o size=7516192768 none /your/indexes
rsync --progress -aptv /your/indexes.dist/* /your/indexes/
experience working with large indices,
I
would love to get in touch and exchange notes.
Regards,
-Vishal.
--
View this message in context:
http://www.nabble.com/Search-performance-for-large-indexes-%28%3E100M-docs%29-tp21315030p21364933.html
Sent from the Nutch - User
buddha1021 wrote:
hi dennis:
in your opinion,which is the most important reason for the fast search
speed of google :
1 google's programme(Code) is very excellence. or
Yes. They are performance fanatics (literally). But there is only so
much you are going to be able to optimize code,
, and my god I'm going to try it.
From: Dennis Kubes ku...@apache.org
To: nutch-user@lucene.apache.org
Sent: Thursday, January 8, 2009 10:22:09 PM
Subject: Re: Search performance for large indexes (100M docs)
buddha1021 wrote:
hi dennis:
in your opinion
Hi,
I am experimenting with a system with around 120 million documents. The
index is split into sub-indices of ~10M documents - each such index is being
searched by a single machine. The results are being aggregated using the
DistributedSearcher client. I am seeing a lot of performance
Take a look on the mailing lists for keeping the indexes in memory.
When you get to the sizes you are talking about, the way you get
subsecond response times is by:
1) Keeping the indexes in RAM
2) Agressive caching
Dennis
VishalS wrote:
Hi,
I am experimenting with a system with
wuqi wrote:
Hi,
As we all know, parse_text in the segment will be used by searcher to
generate snippets,and I want to know with the two conditions below which should be faster
for searcher to retrieve pars_text:
1. 50 Segments * 10,000 pages/segment
2. 5 segment * 100,000 pages/segment
Hi,
As we all know, parse_text in the segment will be used by searcher to
generate snippets,and I want to know with the two conditions below which should
be faster for searcher to retrieve pars_text:
1. 50 Segments * 10,000 pages/segment
2. 5 segment * 100,000 pages/segment
If we have more
thank you Sean Dean for your quick reply ...
well i am running nutch on ubuntu 5.01 and jdk1.5
there are some apps running in the background but they dont take up that
much of memory .
secondly i can understand about the first search .. but the other searches
following it also take time even
experienced during your 3000 page
trials. If you can get even more, then your only helping system (search)
performance.
Here are a few other tips, just in case you cant get any more RAM at this time:
1. Make sure your passing -server via JAVA_OPTS.
2. Disable all non-required system and user
thank you sean ..
will do the same and let you know .. if the performance is not up to the
mark ..
thanks a lot
Thanks Regards
Shrinivas
million page index running at the speeds you experienced during your 3000 page trials. If you can get even more, then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more RAM at this time:
1. Make sure your passing -server via JAVA_OPTS.
2
search speed.
My recommendation would be to get more RAM, another 512MB should
support a 1.5 million page index running at the speeds you
experienced during your 3000 page trials. If you can get even more,
then your only helping system (search) performance.
Here are a few other tips, just
even more,
then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more RAM at
this time:
1. Make sure your passing -server via JAVA_OPTS.
2. Disable all non-required system and user applications.
3. Download or install the newest stable
recommendation would be to get more RAM, another 512MB should
support a 1.5 million page index running at the speeds you
experienced during your 3000 page trials. If you can get even more,
then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more
your 3000 page trials. If you can get even
more, then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more RAM
at this time:
1. Make sure your passing -server via JAVA_OPTS.
2. Disable all non-required system and user applications.
3
page trials. If you can get even
more, then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more RAM
at this time:
1. Make sure your passing -server via JAVA_OPTS.
2. Disable all non-required system and user applications.
3. Download
million page index running at the speeds you
experienced during your 3000 page trials. If you can get even
more, then your only helping system (search) performance.
Here are a few other tips, just in case you cant get any more
RAM at this time:
1. Make sure your passing -server via JAVA_OPTS.
2
hello ..
I am using nutch to study its various components . I injected the dmoz
index file and fetched around 1.5 million pages
when i fire a query in the web interface it takes a long time to display
the results. i am just trying to run it on a single machine .
my configuration is : 2.4 GHZ
Hi,
for people that found that interesting I had published some
measurement values I had done a long time ago.
http://www.find23.net/Web-Site/blog/A712F01B-4BB1-4FC6-AE95-
E64988FBCC79.html
All time related values are in milliseconds.
Don't take the values to serious however at least they
36 matches
Mail list logo