Re: Sorting memory-efficiently by any numeric field (dates too?)

Erick Erickson Tue, 12 Nov 2013 16:03:01 -0800

Yonik:

Of course I'm not really up on the details of sorting, but aren't there
various control structures that are allocated for a sort but not for
scoring? I'm thinking of long[maxDoc] type structures in addition to
the actual values in the FieldCache.


I've been thinking about docValues for this as well, which may make it
all moot anyway.

Erick


On Tue, Nov 12, 2013 at 4:16 PM, Yonik Seeley <yo...@heliosearch.com> wrote:

> For a reasonable top-N, the space efficiency should still be the same
> as it is really just dominated by the FieldCache representation (is it
> in-memory or disk-docvalue based).  Directly sorting on that numeric
> field vs deriving a score from the field and sorting on that shouldn't
> really be that different.
>
> -Yonik
> http://heliosearch.com -- making solr shine
>
>
> On Tue, Nov 12, 2013 at 12:00 PM, Erick Erickson
> <erickerick...@gmail.com> wrote:
> > Before I go and pat myself on the back, what do people think about this
> > trick? The base problem is "Is there a space-efficient way to return the
> > top N documents, sorted by a numeric field". The numeric field includes
> > dates.
> >
> > It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If
> > we could return the numeric field in question as the score of a document
> it
> > should work without allocating the internal arrays for holding all the
> > timestamps.
> >
> > So what about something like this?
> > /select?q={!boost b=manufacturedate_dt}text:*
> > and reverse order by
> > /select?q={!boost b=div(1,manufacturedate_dt)}text:*
> >
> > It works on the test data. So let's assume that we're space constrained.
> It
> > _seems_ like this would only allocate enough space for the top N
> documents
> > in the result set which is insignificant in terms of memory consumption
> for
> > a large number of documents in a core. Any obvious problems that people
> see?
> >
> > I see a couple of shortcomings:
> >
> > 1>  You only get one field. Unless you can create a really clever
> function
> > that incorporates all the values in multiple fields, this is going to be
> > hard to use with more than one field.
> >
> > 2> The boost syntax doesn't allow for a *:*, so you have to specify an
> > existing field. If there happen to be documents that don't have anything
> in
> > the field, you'll miss them.
> >
> > 3> I'm not sure what the performance issues are, especially in the case
> > where _every_ document scores better than the current top-N
> >
> > Erick
>

Re: Sorting memory-efficiently by any numeric field (dates too?)

Reply via email to