Sadly, this is not unexpected on EC2, especially if your storage is on EBS.
In a BoF session at the Surge Conference last year, Brendan Gregg measured
about 1.5sec of I/O silence to EBS on an OmniOS instance running Riak that
was under load (using DTrace of course). Granted, this is just one case,
but I think the symptoms you see are indicative of a similar problem.
Others have told me they have seen large iowait/delay even on ephemeral
storage.

Now, to Riak's part. That "local put coordination" stuff (sometimes called
vnode_vclocks), while increasing latency in some cases, also improves
convergence and reduces the appearance of spurious siblings. It also
removes the need for clients to specify an identifier, instead using the
vnode's identifier in the vector clock. Which vnode does the coordination
is somewhat randomized, so 1 out of N times you send a put request, you
could get that slow node.

I hope that helps understand the why's... not sure I know how I would fix
your problem in general, other than the typical "get bigger instances"
solution.


On Wed, Jun 26, 2013 at 7:47 AM, Andreas Hasselberg <
[email protected]> wrote:

> Hi,
>
> We are having problems with the io on one of our amazon instances being
> very slow for short periods of time. When this happens we get some put
> timeouts even though all the other amazon instances seems to be fine. I
> have done some investigations here:
> https://gist.github.com/anha0825/5866757. Could anyone confirm this or
> point me in a better direction?
>
> Thanks!
> Andreas
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs <[email protected]>
Software Engineer
Basho Technologies, Inc.
http://basho.com/
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to