Do you want to page through all items or through the result of a
query (like all hits for "civil war" in call number order).

If you want the former, then a text search engine is really
the wrong tool. This problem only requires indexed sequential
file formats (like B-trees), something that worked quite well
30 or 40 years ago, even before relational databases were invented.

Text search engines, like Lucene/Solr, have sorting and traversal
as a secondary feature. Their primary feature is relevance ranking.

With only 8M items, I'd be inclined to put them in a big array
sorted by call number, and use binary search. Sounds dumb, but
it is really fast. The entries would be a simple pair, call
number and key.

wunder

On 11/28/08 4:41 PM, "Naomi Dushay" <[EMAIL PROTECTED]> wrote:

> Gosh,  I'm sorry to be so unclear.  Hmm.  Trying to clarify below:
> 
> On Nov 28, 2008, at 3:52 PM, Chris Hostetter wrote:
> 
>> Having read through this thread, i'm not sure i understand what
>> exactly
>> the problem is.  my naive understanding is...
>> 
>> 1) you want to sort by a field
>> 2) you want to be able to "paginate" through all docs in order of this
>> field.
>> 3) you want to be able to start your pagination at any arbitrary
>> value for
>> this field.
>> 
>> so (assuming the field is a simple number for now) you could us
>> something
>> like
>> 
>>   q=yourField:[42 TO *&sort=yourField+asc&rows=10&start-0
>> 
>> where "42" is the arbitrary ID someone wants to start at.
>> 
> 
> perfect.  This is the query I'm using.
> 
> The results are correct.  But the response time sucks.
> 
> Reading the docs about caches, I thought I could populate the query
> result cache with an autowarming query and the response time would be
> okay.  But that hasn't worked.  (See excerpts from my solrConfig file
> below.)
> 
> A repeated query is very fast, implying caching happens for a
> particular starting point ("42" above).
> 
> Is there a way to populate the cache with the ENTIRE sorted list of
> values for the field, so any arbitrary starting point will get results
> from the cache, rather than grabbing all results from (x) to the end,
> then sorting all these results, then returning the first 10?
> 
> 
>> This sentence below seems to imply that you have a solution which
>> produces
>> correct results, but doesn't produce results quickly...
> 
> right.
> 
>> : I have a performance problem and I haven't thought of a clever way
>> around it.
>> 
>> ...however this lines seems to suggest that you're having trouble
>> getting at least 10 results from any query (?)
>> 
>> : Call numbers are squirrelly, so we can't predict the string that
>> will
>> : appropriately grab at least 10 subsequent documents.  They are
>> certainly not
>> : consecutive!
>> :
>> : so from
>> : A123 B34 1970
>> :
>> : we're unable to predict if any of these will return at least 10
>> results:
> 
> I was trying to express that I couldn't do this:
> 
> myfield:[X TO Y]
> 
> because I can't algorithmically compute Y.
> 
> Glen Newton suggested a work around, whereby I represent my
> squirrelly, but sortable, field values as floating point numbers, and
> then I can compute Y.
> 
>> ...but i'm not sure what exactly that means.  for any given field,
>> there
>> is always going to be some values X such that myField:[X TO *] won't
>> return at least 10 docs ... the are the last values in the index in
>> order
>> -- surely it's okay for your app to have an "end" state when you run
>> out 
>> of data? :)
> 
> yes.  Understood.  This is not an issue.
> 
>> Oh, and BTW...
>> 
>> : numbers in sort order".  I have also mucked about with the cache
>> : initialization, but that's not working either:
>> :
>> :     <listener event="firstSearcher"
>> class="solr.QuerySenderListener">
>> 
>> ...make sure you also do a newSearcher listener that does the same
>> thing,
>> otherwise your FieldCache (used for sorting) may not be warmed when
>> commits happen)
> 
> Yup yup yup.
> 
> from solrconfig:
> 
>      <filterCache
>        class="solr.LRUCache"
>        size="20000000"
>        initialSize="10000000"
>        autowarmCount="500000"/>
> 
>      <queryResultCache
>        class="solr.LRUCache"
>        size="10000000"
>        initialSize="5000000"
>        autowarmCount="5000000"/>
> 
> 
>      <listener event="newSearcher" class="solr.QuerySenderListener">
>        <arr name="queries">
> <!-- populate query result cache for sorted queries -->
>          <lst>
> <str name="q">shelfkey:[0 TO *]</str>
> <str name="sort">shelfkey asc</str>
>          </lst>
>        </arr>
>      </listener>
> 
>      <listener event="firstSearcher" class="solr.QuerySenderListener">
>        <arr name="queries">
> <!-- populate query result cache for sorted queries -->
>          <lst>
> <str name="q">shelfkey:[0 TO *]</str>
> <str name="sort">shelfkey asc</str>
>          </lst>
> 
> 

Reply via email to