On Thu, Oct 27, 2011 at 1:06 PM, Ryan Zezeski <[email protected]> wrote:

> This is indeed the case.  Even in the case of an intersection, Search will
> run all sub-queries to completion and then combine them at a coordinator
> based on the query plan.  If any of the sub-queries returns a large number
> of results then latency will start to suffer and timeouts may occur.
>  However, all hope is not lost.  Search has a notion of "inline fields."
>  These fields are analyzed just like any other field but unlike a normal
> field the results are stored alongside _every_ term entry in the index for
> efficient access.
>

Thanks.  This is indeed what I did, at least for searching using the Solr
API.  As, the MR input search mechanism does not allow for a filter, but our
key includes the field we want to index on, I implemented the filter as a
reduce phase before any map phases the filters out keys.  I am currently
evaluating the performance of both options.

I would appear the Solr API is faster most of the time, but because the
currently the sort option is applied after the rows option, I had to resort
to using the undocumented presort option to sort by key, which in our use
includes the field we want to filter on (timestamp), and then apply start
and rows.  Alas, since presort only sort in ascending order and start does
not take negative indices if I want to get the most recent objects, I have
to perform the query twice.  Once to get the result count, then another to
get the actual results I want after I've computed the offset into the
results.  In such cases, MR may be faster.



> If you don't have #1 then I would say it's a sign you probably want to use
> secondary indices because #2 is basically a form of tagging and that's what
> secondary indices were built for.
>

I look forward to the evolution of secondary indices, but right now they are
too basic for our needs. Many of our queries are compounded.  And while the
intersection of all subqueries results in a set of manageable size, any
single subquery can be rather large.  Thus, there is no natural secondary
key that will return a low cardinality result set than can be filtered
further through MR.

Elias
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to