If you want to avoid overwhelming the cluster, there's an easy trick you can 
try.

In the RDBMS world, we frequently batch requests and issue 5,000 or 10,000 
updates at once. We then put a delay in the code of a second or so after the 
commit so that the transaction log has a second to catch up. This is really so 
that the checkpoint flushing dirty pages from memory doesn't run into I/O 
contention with the log writer.

Updating by time instead of key will have a double I/O hit for the lookup, but 
you can easily control the order of updates. I say go for it.

---
Jeremiah Peschka, Managing Director, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Feb 28, 2012, at 7:34 AM, Jeremy Raymond <[email protected]> wrote:

> Hi,
> 
> I need to reindex a bucket with a ~4 million items. If I do a
> streaming list keys using the Erlang client and then read/write the
> items as they keys come in it puts too much load on the cluster and
> other mapred queries that get run timeout. I already have a date based
> index on the items and was thinking getting items based on hourly
> chunks and update them in batches that way as the times should be
> relatively evenly distributed in time. I can then better control the
> flow and load of the reindexing operations.
> 
> Anyone have any better ideas or use any other strategies when having to 
> reindex?
> 
> --
> Jeremy
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to