60ms does seem excessive for the simplest possible access - lookup by the
unique key field value. SOMETHING is clearly unacceptable at that level. Is
this on decent hardware?
Try a query with &debugQuery=true and look at the "timing" section and see
what component(s) are eating up the lion's share of that 60 ms. Is it the
query component or something else like faceting or highlighting?
Or, are you returning a lot of field values?
Or, are you using a lot of filters that are relatively unique (and hence
frequently recomputed)?
Are you doing a lot of updating while querying (and hence invalidating
caches)?
-- Jack Krupansky
-----Original Message-----
From: Brian Hurt
Sent: Tuesday, March 19, 2013 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Help getting a document by unique ID
On Mon, Mar 18, 2013 at 7:08 PM, Jack Krupansky <j...@basetechnology.com>
wrote:
Hmmm... if query by your unique key field is killing your performance,
maybe
you have some larger problem to address.
This is almost certainly true. I'm well outside the use cases
targeted by Solr/Lucene, and it's a testament to the quality of the
product that it works at all. Among other things, I'm implementing a
graph database on top of Solr (it being easier to build a graph
database on top of Solr than it is to implement Solr on top of a graph
database).
Which is the problem- you might think that 60ms unique key accesses
(what I'm seeing) is more than good enough- and for most use cases,
you'd be right. But it's not unusual for a single web-page hit to
generate many dozens, if not low hundreds, of calls to get document by
id. At which point, 60ms hits pile up fast.
The current plan is to just cache the documents as files in the local
file system (or possibly other systems), and have the get document
calls go there instead, while complicated searches still go to Solr.
Fortunately, this isn't complicated.
How bad is it? Are you using the
string field type? How long are your ids?
My ids start at 100 million and go up like a kite from there- thus the
string representation.
The only thing the real-time GET API gives you is more immediate access to
recently added, uncommitted data. Accessing older, committed data will be
no
faster. But if accessing that recent data is what you are after, real-time
GET may do the trick.
OK, so this is good to know. This answers question #1: GET isn't the
function I should be calling. Thanks.
Brian