Re: Deleting items from search index increases disk usage

Ryan Zezeski Fri, 02 Nov 2012 10:10:05 -0700

Jeremy,

On Fri, Nov 2, 2012 at 12:31 PM, Jeremy Raymond <[email protected]> wrote:


> I cycled through the compaction on another node. Again after 3 rounds
> compaction has stopped. On one node the merge index is 26 GB on the other
> 21 GB. So it looks like I've hit the 5 segment compaction no-op condition
> on both nodes.


I concur.  This condition seems arbitrary to me and I'm not sure if there
is a good reason for it to exist.  But it's there and the only way we could
remove it for you is to hot-load a new beam.


> What would account for the difference in merge_index size? Shouldn't these
> be relatively the same? There must still be tombstones in there...
>

Riak Search uses term-based partitioning.  It could be that you have some
terms that are more frequent than others which would account for some of
the difference.


>
> On my production cluster the merge_index is ~44GB. I estimate that
> approximately 90 - 95% of the index data belongs to the bucket I no longer
> want indexed. Manually deleting items from the index then manually
> triggering compaction doesn't look like it will scale. Will this workflow
> work to re-build the search index. I need to keep the cluster available for
> writes while doing this:
>
> 1. In a rolling fashion, disable Riak Search one node at a time.
> 2. Delete the contents of the merge_index on each node.
> 3. In a rolling fashion, re-enable Riak Search on each node.
> 4. Reindex the items to be included in the search index.
>

No, instead of disabling Riak Search you'll want to take the nodes down one
at a time, remove the merge index data, restart.  After doing this for all
nodes then re-index your data.


>
> This should do the trick right? Do I need to disable search before
> clearing out the merge_index folders or would disabling the search index on
> the buckets via search-cmd be enough (and then re-enabling) before
> re-indexing?
>

Again, don't bother disabling search.  The key is to take the nodes down
because merge index caches stuff in memory.

Actually, I thought of another way to achieve the same result without
taking the nodes down.  If you have a non-production cluster to test this
on that would be a good precaution.  I'm 99% sure this should work without
issue.

1. Make sure no indexes are incoming, do this either at your client or
uninstall all search hooks

For each node:

2. Get a list of the MI Pids like in the manual compaction example
3. For each MI Pid call merge_index:drop(MIPid)
3a. Verify the data files were removed on disk

After performing steps 2 & 3 on each node:

4. Re-write the objects you want indexed (of course remember to re-install
the hooks if you removed them in step 1)

-Z

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Deleting items from search index increases disk usage

Reply via email to