Re: Riak Search

gordyt Tue, 16 Oct 2012 10:01:24 -0700

Pavel as an alternative to re-writing the objects to cause them to be indexed, 
you may invoke what I call a map operation with side-effects.


You define an Erlang map-phase function as follows:


map_reindex({error,notfound}, _, _) ->
    [];
map_reindex(RiakObject, _, _) ->
    riak_search_kv_hook:precommit(RiakObject),
    [].


You want to run that against all of the keys in the bucket by posting a mapred 
job like this:

{
    "inputs": "<your-bucket>",
    "query": [
        {
            "map": {
                "function": "map_reindex", 
                "language": "erlang",
                "module": "<your-module>"
            }
        }
    ],
    "timeout": <your-timeout>
}


We have used this technique to re-index rather large clusters and it runs 
quickly because you are doing it in parallel across all of the nodes in the 
cluster.

-- gordon



On Oct 16, 2012, at 07:44 , Ryan Zezeski <[email protected]> wrote:

> 
> 
> On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan <[email protected]> wrote:
> 
> 1) Is search enabling has any impact on read latency/throughput?
> 
> If you are reading and searching at the same time there is a good chance it 
> will.  It will cause more disk seeks.
>  
> 2) Is search enabling has any impact on RAM usage?
> 
> Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS 
> tables.  Each partition has an in-memory buffer as well as an in-memory 
> offset table for every segment.  It also uses a temporary ETS table for every 
> write to store posting data.  The ETS system limit can even become an issue 
> in overload scenarios.
>  
> 3) In production we have no search enabled. What is the best way to 
>     enable search without stop production? I thought about something like:
>     1) Enable search node after node.
> 
> You could change the app env dynamically but that's only half the problem.  
> The other half is then starting the Riak Search application.  I think 
> application:start(merge_index) followed by application:start(riak_search) 
> should work but I'm not 100% sure and this has not been tested.  You'll also 
> want to make sure to edit all app.configs so that it is persistent.
> 
>  
>     2) Execute some night script that runs on all keys and overwrite them back
>         with proper mime type.
> 
> Yes, you'll want to install the commit hook on the buckets you wish to index. 
>  Then you'll want to do a streaming list-keys or bucket map-reduce and 
> re-write the data.
> 
>  
> 4) If we see that search overhead is something we can't handle, is there 
> simple
>     way to disable it without stop production?
> 
> I think the best course of action in this case would be to disable the commit 
> hook.  But you would have to keep track of anything written during this time 
> and re-write it after re-installing the hook.  If you don't then you'll have 
> to re-index everything because you don't know what you missed.
> 
> 5) In what case we would need repair? It is said - on replica loss, but if I 
> understand 
>     correct we have 3 replicas on different nodes don't we? If it happens how 
> difficult and
>     long would it be for large cluster (about 100 nodes)?
> 
> Repair is on a per partition basis.  Number of nodes doesn't come into play.  
> Repair is very specific in that it requires the adjacent partitions to be in 
> a good, convergent state.  If they aren't then repair isn't much help. 
> 
> A lot of these entropy issues go away in Yokozuna.  Repairing indexes is done 
> automatically, in the background, in an efficient manner.  There is no need 
> to re-write data or run manual repair commands.
> 
> -Z
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak Search

Reply via email to