If your jumping between 5-25% iowait, I'd add nodes. Also tuning pdflush will help with the jumpy iowait.
Lucid by default uses up to 20% of your ram before flushing. So say you have 10gb of ram, your system could be trying to flush 2gb of data causing huge iowait spikes. Sean Carey @densone On Thursday, December 6, 2012 at 11:49, Ken Perkins wrote: > We're around ~20% Swapping, IO Wait in the 5-25% range, depending on the > machine. > > We're running Lucid, with the deadline scheduler. I'm strongly biasing > towards adding a few more nodes, but I'm not married to it :) > > > On Wed, Dec 5, 2012 at 9:22 PM, Sean Carey <[email protected] > (mailto:[email protected])> wrote: > > So Ken, > > Fair amount? < 5% or > 20% > > > > If there's iowait and memory issues, adding nodes could alleviate that. If > > there's almost no iowait or minimal iowait, adding memory will help. Also, > > tuning vm.dirty on linux might get you more memory and less iowait. Or at > > least more consistent iowait. > > > > > > Which linux distro are you on and which scheduler are you using? > > > > > > -Sean > > > > > > > > > > > > > > On Thursday, December 6, 2012 at 12:15 AM, Ken Perkins wrote: > > > > > VMs, not the same host, rackspace has VM affinity to protect against > > > that. We do see a fair amount of IO Wait. > > > > > > Rackspace has a new affinity based SSD block device service that I plan > > > to evaluate, but I'm not ready for that in production. > > > > > > > > > On Wed, Dec 5, 2012 at 7:45 PM, Sean Carey <[email protected] > > > (mailto:[email protected])> wrote: > > > > Ken, > > > > Are your vms on different bare metal? Could they potentially be on the > > > > same bare metal? > > > > > > > > Are you seeing any io contention? > > > > > > > > > > > > Sean Carey > > > > @densone > > > > > > > > > > > > On Wednesday, December 5, 2012 at 20:41, Ken Perkins wrote: > > > > > > > > > Yes, we're thrashing on all of the boxes, due to disk access when > > > > > looking through merge_index. It's not noisy neighbors, given how > > > > > consistent the thrashing is. We had a box with a corrupted index (we > > > > > had to remove merge_index and rebuild) and that machine instantly > > > > > went to 0% thrashing. So we have a pretty good indication of the > > > > > source. > > > > > > > > > > The cost for 10 8GB VMs is roughly equivalent to 5 16GB ones. > > > > > > > > > > Thanks for your input Michael! > > > > > > > > > > Ken > > > > > > > > > > > > > > > On Wed, Dec 5, 2012 at 4:47 PM, Michael Johnson <[email protected] > > > > > (mailto:[email protected])> wrote: > > > > > > There are a lot of things that go into this, but I would tend to > > > > > > suggest in a hosted VM senario, upping the ram is likely the right > > > > > > solution. > > > > > > > > > > > > You mention thrashing, but what is that thrashing coming from? I > > > > > > assume all the boxes are thrashing and not just one or two of them? > > > > > > Is it due to swapping or is it just the raw disk access? Maybe > > > > > > you logging too aggressively? > > > > > > > > > > > > Perhaps your are suffering from a bad neighbor effect. If this is > > > > > > the case, increasing the amount of ram will likely put you on a > > > > > > physical host with few customers and thus you would be less likely > > > > > > to have a bad neighbor. > > > > > > > > > > > > Cost-wise in the VM world, you might be better off adding a few > > > > > > nodes rather than increasing the ram in your existing vm's. > > > > > > > > > > > > But then we are talking VMs and thus it should be fairly painless > > > > > > to experiment. I would try adding ram first and if that doesn't > > > > > > work, add a few nodes. Someone else my have a different opinion, > > > > > > but that is my two cents. > > > > > > > > > > > > > > > > > > On Wed, Dec 5, 2012 at 4:33 PM, Ken Perkins <[email protected] > > > > > > (mailto:[email protected])> wrote: > > > > > > > Hello all, > > > > > > > > > > > > > > We're seeing enough thrashing and low-memory on our production > > > > > > > ring that we've decided to upgrade our hardware. The real > > > > > > > question is should we scale up or out. > > > > > > > > > > > > > > Currently our ring is 512 partitions. We know that it's a > > > > > > > sub-optimal size but we can't easily solve that now. We're > > > > > > > currently running a search-heavy app on 5 8GB VMs. I'm debating > > > > > > > between moving the VMs up to 16GB, or adding a few more 8GB VMs. > > > > > > > > > > > > > > Some of the talk in #riak has pushed me towards adding more > > > > > > > machines (thus lowering the per node number of partitions) but I > > > > > > > wanted to do a quick sanity check here with folks that it's > > > > > > > better than scaling up my current machines. > > > > > > > > > > > > > > Thanks! > > > > > > > Ken Perkins > > > > > > > clipboard.com (http://clipboard.com) > > > > > > > > > > > > > > _______________________________________________ > > > > > > > riak-users mailing list > > > > > > > [email protected] (mailto:[email protected]) > > > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > riak-users mailing list > > > > > [email protected] (mailto:[email protected]) > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
