Pavel as an alternative to re-writing the objects to cause them to be indexed,
you may invoke what I call a map operation with side-effects.
You define an Erlang map-phase function as follows:
map_reindex({error,notfound}, _, _) ->
[];
map_reindex(RiakObject, _, _) ->
riak_search_kv_hook:precommit(RiakObject),
[].
You want to run that against all of the keys in the bucket by posting a mapred
job like this:
{
"inputs": "<your-bucket>",
"query": [
{
"map": {
"function": "map_reindex",
"language": "erlang",
"module": "<your-module>"
}
}
],
"timeout": <your-timeout>
}
We have used this technique to re-index rather large clusters and it runs
quickly because you are doing it in parallel across all of the nodes in the
cluster.
-- gordon
On Oct 16, 2012, at 07:44 , Ryan Zezeski <[email protected]> wrote:
>
>
> On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan <[email protected]> wrote:
>
> 1) Is search enabling has any impact on read latency/throughput?
>
> If you are reading and searching at the same time there is a good chance it
> will. It will cause more disk seeks.
>
> 2) Is search enabling has any impact on RAM usage?
>
> Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS
> tables. Each partition has an in-memory buffer as well as an in-memory
> offset table for every segment. It also uses a temporary ETS table for every
> write to store posting data. The ETS system limit can even become an issue
> in overload scenarios.
>
> 3) In production we have no search enabled. What is the best way to
> enable search without stop production? I thought about something like:
> 1) Enable search node after node.
>
> You could change the app env dynamically but that's only half the problem.
> The other half is then starting the Riak Search application. I think
> application:start(merge_index) followed by application:start(riak_search)
> should work but I'm not 100% sure and this has not been tested. You'll also
> want to make sure to edit all app.configs so that it is persistent.
>
>
> 2) Execute some night script that runs on all keys and overwrite them back
> with proper mime type.
>
> Yes, you'll want to install the commit hook on the buckets you wish to index.
> Then you'll want to do a streaming list-keys or bucket map-reduce and
> re-write the data.
>
>
> 4) If we see that search overhead is something we can't handle, is there
> simple
> way to disable it without stop production?
>
> I think the best course of action in this case would be to disable the commit
> hook. But you would have to keep track of anything written during this time
> and re-write it after re-installing the hook. If you don't then you'll have
> to re-index everything because you don't know what you missed.
>
> 5) In what case we would need repair? It is said - on replica loss, but if I
> understand
> correct we have 3 replicas on different nodes don't we? If it happens how
> difficult and
> long would it be for large cluster (about 100 nodes)?
>
> Repair is on a per partition basis. Number of nodes doesn't come into play.
> Repair is very specific in that it requires the adjacent partitions to be in
> a good, convergent state. If they aren't then repair isn't much help.
>
> A lot of these entropy issues go away in Yokozuna. Repairing indexes is done
> automatically, in the background, in an efficient manner. There is no need
> to re-write data or run manual repair commands.
>
> -Z
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com