On 3/19/2013 2:31 PM, Brian Hurt wrote:
Which is the problem- you might think that 60ms unique key accesses
(what I'm seeing) is more than good enough- and for most use cases,
you'd be right. But it's not unusual for a single web-page hit to
generate many dozens, if not low hundreds, of calls to get document by
id. At which point, 60ms hits pile up fast.
I have to concur with Jack's assessment that 60ms may indicate a general
performance issue, possibly caused by not having enough memory in your
server.
I've got a distributed index with 77 million documents in it, seven
shards, total index size about 85GB. It's running 4.2.
I tried some uncached unique id queries on it. This search kicks off
seven shard searches against two servers, collates the results, then
returns them to the browser. The results came back with a QTime of 7-8
milliseconds. When I try a different uncached query against one of the
shard servers directly (14GB index size), the QTime value is zero.
I have this performance level because I have plenty of extra RAM, which
lets the OS cache the index files effectively. Each server has half the
index (over 40GB on disk) and 64GB of RAM. Of that 64GB, 6GB is
allocated to Solr. If we say the OS takes up 1GB (which it most likely
does not), that leaves 57GB of OS disk cache. Java's garbage collector
is highly tuned in my setup, because without it, I experience very long
GC pauses.
Here's some additional info that may or may not be useful to you:
The BloomFilter postings format for Lucene is rumored to have amazing
performance improvements for searching unique keys.
An obstacle: Solr does not currently have an out-of-the-box way to
actually use it. A high-level solution has been proposed, but no code
has been written yet. The following issue describes the current state:
https://issues.apache.org/jira/browse/SOLR-3950
You could always write your own custom postings format instead of
waiting for someone (most likely me) to figure out how to go about
including it directly in Solr. If you do this, I hope you'll be able to
attach your code to the issue so everyone benefits.
Thanks,
Shawn