Hi Hoss,
Thanks for this.
The terms component approach, if i understand it correctly, will be
problematic. I need to present not only the next X call numbers in
sequence, but other fields in those documents (e.g. title, author). I
assume the Terms Component approach will only give me the next X call
number values, not the documents.
It sounds like Glen Newton's suggestion of mapping the call numbers to
a float number is the most likely solution.
I know it sounds ridiculous to do all this for a "call number browse"
but our faculty have explicitly asked for this. For humanities
scholars especially, they know the call numbers that are of interest
to them, and they browse the stacks that way (ML 1500s are opera, V35
is verdi ...). They are using the research methods that have been
successful for their entire careers. Plus, library materials are
going to off site, high density storage, so the only way for them to
to browse all materials, regardless of location, via call number is
online. I doubt they'll find this feature as useful as they expect,
but it behooves us to give the users what they ask for.
So yeah, our user needs are perhaps a little outside of your
expectations. :-)
- Naomi
On Nov 29, 2008, at 2:58 PM, Chris Hostetter wrote:
: The results are correct. But the response time sucks.
:
: Reading the docs about caches, I thought I could populate the
query result
: cache with an autowarming query and the response time would be
okay. But that
: hasn't worked. (See excerpts from my solrConfig file below.)
:
: A repeated query is very fast, implying caching happens for a
particular
: starting point ("42" above).
:
: Is there a way to populate the cache with the ENTIRE sorted list
of values for
: the field, so any arbitrary starting point will get results from
the cache,
: rather than grabbing all results from (x) to the end, then sorting
all these
: results, then returning the first 10?
there's two "caches" that come into play for something like this...
the first cache is a low level Lucene cache called the "FieldCache"
that
is completley hidden from you (and for the most part: from Solr).
anytime you sort on a field, it get's built, and reuse for all sorts
on
that field. My originl concern was that it wasn't getting warmed on
"newSearcher" (because you have to be explicit about that.
the second cache is the queryResultsCache which caches a "window" of
an
ordered list of documents based on a query, and a sort. you can see
this
cache in your Solr stats, and yes: these two requests results in
different
cache keys for the queryResultsCache...
q=yourField:[42+TO+*]&sort=yourField+asc&rows=10
q=yourField:[52+TO+*]&sort=yourField+asc&rows=10
...BUT! ... the two queries below will result in the same cache key,
and
the second will be a cache hit, provided a sufficient value for
the "queryResultWindowSize" ...
q=yourField:[42+TO+*]&sort=yourField+asc&rows=10
q=yourField:[42+TO+*]&sort=yourField+asc&rows=10&start=10
so perhaps the key to your problem is to just make sure that once
the user
gives you an id to start with, you "scroll" by increasing the start
param
(not altering the id) ... the first query might be "slow" but every
query
after that should be a cache hit (depending on your page size, and
how far
you expect people to scroll, you should consider increasing
queryResultWindowSize)
But as Yonik said: the new TermsComponent may actually be a better
option
for you -- doing two requests for every page (the first to get the N
Terms
in your id field starting with your input, the second to do an query
for
docs matching any of those N ids) might actually be faster even though
there won't likely even be any cache hits.
My opinion: Your use case sounds like a waste of effort. I can't
imagine
anyone using a library catalog system ever wanting to lookup a
callnumber,
and then scroll through all posisble books with similar call numbers
-- it
seems much more likely that i'd want to look at other books with
similar
authors, or keywords, or tags ... all things that are actaully
*easier* to
do with Solr. (but then again: i don't work in a library. i trust
that
you know something i don't about what your users want.)
-Hoss
Naomi Dushay
[EMAIL PROTECTED]