Re: solr as nosql - pulling all docs vs deep paging limitations

Mikhail Khludnev Tue, 17 Dec 2013 13:30:32 -0800

Hoss,

What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've been
asked many times for that.
What if client don't need to rank results somehow, but just requesting
unordered filtering result like they are used to in RDBMS?
Do you feel it will never considered as a resonable usecase for Solr? or
there is a well known approach for dealing with?



On Tue, Dec 17, 2013 at 10:16 PM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

>
> : Then I remembered we currently don't allow deep paging in our current
> : search indexes as performance declines the deeper you go.  Is this still
> : the case?
>
> Coincidently, i'm working on a new cursor based API to make this much more
> feasible as we speak..
>
> https://issues.apache.org/jira/browse/SOLR-5463
>
> I did some simple perf testing of the strawman approach and posted the
> results last week...
>
>
> http://searchhub.org/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> ...current iterations on the patch are to eliminate the
> strawman code to improve performance even more and beef up the test
> cases.
>
> : If so, is there another approach to make all the data in a collection
> : easily available for retrieval?  The only thing I can think of is to
>         ...
> : Then I was thinking we could have a field with an incrementing numeric
> : value which could be used to perform range queries as a substitute for
> : paging through everything.  Ie queries like 'IncrementalField:[1 TO
> : 100]' 'IncrementalField:[101 TO 200]' but this would be difficult to
> : maintain as we update the index unless we reindex the entire collection
> : every time we update any docs at all.
>
> As i mentioned in the blog above, as long as you have a uniqueKey field
> that supports range queries, bulk exporting of all documents is fairly
> trivial by sorting on your uniqueKey field and using an fq that also
> filters on your uniqueKey field modify the fq each time to change the
> lower bound to match the highest ID you got on the previous "page".
>
> This approach works really well in simple cases where you wnat to "fetch
> all" documents matching a query and then process/sort them by some other
> criteria on the client -- but it's not viable if it's important to you
> that the documents come back from solr in score order before your client
> gets them because you want to "stop fetching" once some criteria is met in
> your client.  Example: you have billions of documents matching a query,
> you want to fetch all sorted by score desc and crunch them on your client
> to compute some stats, and once your client side stat crunching tells you
> you have enough results (which might be after the 1000th result, or might
> be after the millionth result) then you want to stop.
>
> SOLR-5463 will help even in that later case.  The bulk of the patch should
> easy to use in the next day or so (having other people try out and
> test in their applications would be *very* helpful) and hopefully show up
> in Solr 4.7
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: solr as nosql - pulling all docs vs deep paging limitations

Reply via email to