Re: range queries on string field with millions of values

Naomi Dushay Sat, 29 Nov 2008 23:05:06 -0800

Hi Hoss,

Thanks for this.

The terms component approach, if i understand it correctly, will beproblematic. I need to present not only the next X call numbers insequence, but other fields in those documents (e.g. title, author). Iassume the Terms Component approach will only give me the next X callnumber values, not the documents.

It sounds like Glen Newton's suggestion of mapping the call numbers toa float number is the most likely solution.

I know it sounds ridiculous to do all this for a "call number browse"but our faculty have explicitly asked for this. For humanitiesscholars especially, they know the call numbers that are of interestto them, and they browse the stacks that way (ML 1500s are opera, V35is verdi ...). They are using the research methods that have beensuccessful for their entire careers. Plus, library materials aregoing to off site, high density storage, so the only way for them toto browse all materials, regardless of location, via call number isonline. I doubt they'll find this feature as useful as they expect,but it behooves us to give the users what they ask for.

So yeah, our user needs are perhaps a little outside of yourexpectations. :-)


- Naomi


On Nov 29, 2008, at 2:58 PM, Chris Hostetter wrote:

: The results are correct.  But the response time sucks.
:
: Reading the docs about caches, I thought I could populate thequery result: cache with an autowarming query and the response time would beokay. But that
: hasn't worked.  (See excerpts from my solrConfig file below.)
:
: A repeated query is very fast, implying caching happens for aparticular
: starting point ("42" above).
:
: Is there a way to populate the cache with the ENTIRE sorted listof values for: the field, so any arbitrary starting point will get results fromthe cache,: rather than grabbing all results from (x) to the end, then sortingall these
: results, then returning the first 10?

there's two "caches" that come into play for something like this...
the first cache is a low level Lucene cache called the "FieldCache"that
is completley hidden from you (and for the most part: from Solr).
anytime you sort on a field, it get's built, and reuse for all sortson
that field.  My originl concern was that it wasn't getting warmed on
"newSearcher" (because you have to be explicit about that.
the second cache is the queryResultsCache which caches a "window" ofanordered list of documents based on a query, and a sort. you can seethiscache in your Solr stats, and yes: these two requests results indifferent
cache keys for the queryResultsCache...

       q=yourField:[42+TO+*]&sort=yourField+asc&rows=10
       q=yourField:[52+TO+*]&sort=yourField+asc&rows=10
...BUT! ... the two queries below will result in the same cache key,and
the second will be a cache hit, provided a sufficient value for
the "queryResultWindowSize" ...

       q=yourField:[42+TO+*]&sort=yourField+asc&rows=10
       q=yourField:[42+TO+*]&sort=yourField+asc&rows=10&start=10
so perhaps the key to your problem is to just make sure that oncethe usergives you an id to start with, you "scroll" by increasing the startparam(not altering the id) ... the first query might be "slow" but everyqueryafter that should be a cache hit (depending on your page size, andhow far
you expect people to scroll, you should consider increasing
queryResultWindowSize)
But as Yonik said: the new TermsComponent may actually be a betteroptionfor you -- doing two requests for every page (the first to get the NTermsin your id field starting with your input, the second to do an queryfor
docs matching any of those N ids) might actually be faster even though
there won't likely even be any cache hits.
My opinion: Your use case sounds like a waste of effort. I can'timagineanyone using a library catalog system ever wanting to lookup acallnumber,and then scroll through all posisble books with similar call numbers-- itseems much more likely that i'd want to look at other books withsimilarauthors, or keywords, or tags ... all things that are actaully*easier* todo with Solr. (but then again: i don't work in a library. i trustthat
you know something i don't about what your users want.)


-Hoss


Naomi Dushay
[EMAIL PROTECTED]

Re: range queries on string field with millions of values

Reply via email to