Jeremy, On Fri, Nov 2, 2012 at 12:31 PM, Jeremy Raymond <[email protected]> wrote:
> I cycled through the compaction on another node. Again after 3 rounds > compaction has stopped. On one node the merge index is 26 GB on the other > 21 GB. So it looks like I've hit the 5 segment compaction no-op condition > on both nodes. I concur. This condition seems arbitrary to me and I'm not sure if there is a good reason for it to exist. But it's there and the only way we could remove it for you is to hot-load a new beam. > What would account for the difference in merge_index size? Shouldn't these > be relatively the same? There must still be tombstones in there... > Riak Search uses term-based partitioning. It could be that you have some terms that are more frequent than others which would account for some of the difference. > > On my production cluster the merge_index is ~44GB. I estimate that > approximately 90 - 95% of the index data belongs to the bucket I no longer > want indexed. Manually deleting items from the index then manually > triggering compaction doesn't look like it will scale. Will this workflow > work to re-build the search index. I need to keep the cluster available for > writes while doing this: > > 1. In a rolling fashion, disable Riak Search one node at a time. > 2. Delete the contents of the merge_index on each node. > 3. In a rolling fashion, re-enable Riak Search on each node. > 4. Reindex the items to be included in the search index. > No, instead of disabling Riak Search you'll want to take the nodes down one at a time, remove the merge index data, restart. After doing this for all nodes then re-index your data. > > This should do the trick right? Do I need to disable search before > clearing out the merge_index folders or would disabling the search index on > the buckets via search-cmd be enough (and then re-enabling) before > re-indexing? > Again, don't bother disabling search. The key is to take the nodes down because merge index caches stuff in memory. Actually, I thought of another way to achieve the same result without taking the nodes down. If you have a non-production cluster to test this on that would be a good precaution. I'm 99% sure this should work without issue. 1. Make sure no indexes are incoming, do this either at your client or uninstall all search hooks For each node: 2. Get a list of the MI Pids like in the manual compaction example 3. For each MI Pid call merge_index:drop(MIPid) 3a. Verify the data files were removed on disk After performing steps 2 & 3 on each node: 4. Re-write the objects you want indexed (of course remember to re-install the hooks if you removed them in step 1) -Z
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
