Matt, comments inline

On Tue, Dec 11, 2012 at 3:35 AM, Matt Painter <[email protected]> wrote:
>
>
> Apart from a single default value, is it possible for Riak Search to
> search for a keyword across all fields in a document without having to
> specify the field up front as a prefix in one's search term?
>

A field must be specified to search, but a default field may be specified
in the schema [1].  This field will be searched if one is not specified.
 But there is no way to do a search against all fields.  It is always over
one field.


> I'm guessing that one solution could be a post-commit hook which
> recursively iterates over all fields and squashes them into a secondary
> default value field - but since I know even less about Erlang and am just
> starting out with Riak, I thought it prudent to see if there was a more
> straightforward solution...
>

Your use case immediately makes me think of Solr copy fields.  You index
everything under their individual fields but all values get copied into a
catch-all field so that all content may be searched easily.  However, with
this you lose the ability to know which field it came from.  Riak Search
doesn't have copy field functionality.  You'd have to concatenate all the
data into a field on your application side.  The new search solution I've
been working on, Yokozuna, uses Solr underneath and therefore does support
copy fields [2].

You could create a pre-commit hook to do this field-squashing but I think
you would be better off doing it in your application.  To do it via a hook
you'd have to make sure it runs before the search hook (I can't remember if
you can force specific order of pre-commit hooks).  It would also have an
effect on your write latencies as more pre-processing would have to be
done.  Finally, you would have to write Erlang.


> The use case is this:
>
> We are providing an object + metadata store for users to deposit files and
> any number of related fragments of structured JSON metadata. We are not
> enforcing any metadata schema - and therefore can't know up-front any field
> names - but would like the ability for a dumb keyword search from a website
> to return references to the records they have deposited in
> Riak. Essentially, providing a Google-like interface.
>
> (As a side question, Is Riak Search mature enough for these type of very
> generic searches? I know that it's "inspired by" Lucene and "Lucene-like",
> but I don't know how many of Lucene's goodies are present - or is it just a
> case of invoking analysers provided by Lucene for things like stemming, and
> all will be pretty much equivalent for most situations?)
>

There are no "goodies present" _at all_.  Riak Search is an in-house
implementation, completely written in Erlang.  It's only connection to
Lucene/Solr is a superficial interface that looks very much like
Lucene/Solr.  E.g. you mention stemming, there is no stemming support in
Riak Search and would be a non-trivial addition.  This is one of the big
reasons Yokozuna is being written [2].  The world of search is vast and
complicated, best to start with proven solution and build from that.

Riak Search generally starts causing pain when you have searches that match
tens of thousands of documents.  The runtime is proportional to the size of
the result set.  In fact, Riak Search has a hard-coded upper limit to fail
queries that match 100K or more documents (although it does the work to get
100K results and then drops it all on the floor so you still use
resources/times).  For example, if a lot of your files were pictures and
were tagged with something like {"type":"picture"} then a search for
"picture" is probably going to cause issues.  Things really start to hurt
when you do conjunction queries with multiple large result sets, e.g.
"funny AND picture".  Once again, this is not the case with Yokozuna, which
in my benchmarking thus far has shown flat latencies regardless of result
set size.

-Z

[1]:
http://docs.basho.com/riak/latest/cookbooks/Riak-Search---Schema/#Defining-a-Schema

[2]: https://github.com/rzezeski/yokozuna
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to