Re: Scaling up or out

Sean Carey Thu, 06 Dec 2012 10:21:47 -0800

If your jumping between 5-25% iowait, I'd add nodes. Also tuning pdflush will 
help with the jumpy iowait.


Lucid by default uses up to 20% of your ram before flushing. 

So say you have 10gb of ram,  your system could be trying to flush 2gb of data 
causing huge iowait spikes.


Sean Carey
@densone


On Thursday, December 6, 2012 at 11:49, Ken Perkins wrote:

> We're around ~20% Swapping, IO Wait in the 5-25% range, depending on the 
> machine.
> 
> We're running Lucid, with the deadline scheduler. I'm strongly biasing 
> towards adding a few more nodes, but I'm not married to it :) 
> 
> 
> On Wed, Dec 5, 2012 at 9:22 PM, Sean Carey <[email protected] 
> (mailto:[email protected])> wrote:
> > So Ken,   
> > Fair amount? < 5% or  > 20% 
> > 
> > If there's iowait and memory issues, adding nodes could alleviate that. If 
> > there's almost no iowait or minimal iowait, adding memory will help. Also, 
> > tuning vm.dirty on linux might get you more memory and less iowait. Or at 
> > least more consistent iowait. 
> > 
> > 
> > Which linux distro are you on and which scheduler are you using?  
> > 
> > 
> > -Sean 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Thursday, December 6, 2012 at 12:15 AM, Ken Perkins wrote:
> > 
> > > VMs, not the same host, rackspace has VM affinity to protect against 
> > > that. We do see a fair amount of IO Wait.
> > > 
> > > Rackspace has a new affinity based SSD block device service that I plan 
> > > to evaluate, but I'm not ready for that in production. 
> > > 
> > > 
> > > On Wed, Dec 5, 2012 at 7:45 PM, Sean Carey <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > > Ken,  
> > > > Are your vms on different bare metal? Could they potentially be on the 
> > > > same bare metal? 
> > > > 
> > > > Are you seeing any io contention? 
> > > > 
> > > > 
> > > > Sean Carey
> > > > @densone
> > > > 
> > > > 
> > > > On Wednesday, December 5, 2012 at 20:41, Ken Perkins wrote:
> > > > 
> > > > > Yes, we're thrashing on all of the boxes, due to disk access when 
> > > > > looking through merge_index. It's not noisy neighbors, given how 
> > > > > consistent the thrashing is. We had a box with a corrupted index (we 
> > > > > had to remove merge_index and rebuild) and that machine instantly 
> > > > > went to 0% thrashing. So we have a pretty good indication of the 
> > > > > source.
> > > > > 
> > > > > The cost for 10 8GB VMs is roughly equivalent to 5 16GB ones.
> > > > > 
> > > > > Thanks for your input Michael!
> > > > > 
> > > > > Ken
> > > > > 
> > > > > 
> > > > > On Wed, Dec 5, 2012 at 4:47 PM, Michael Johnson <[email protected] 
> > > > > (mailto:[email protected])> wrote:
> > > > > > There are a lot of things that go into this, but I would tend to 
> > > > > > suggest in a hosted VM senario, upping the ram is likely the right 
> > > > > > solution.
> > > > > > 
> > > > > > You mention thrashing, but what is that thrashing coming from?  I 
> > > > > > assume all the boxes are thrashing and not just one or two of them? 
> > > > > >  Is it due to swapping or is it just the raw disk access?  Maybe 
> > > > > > you logging too aggressively? 
> > > > > > 
> > > > > > Perhaps your are suffering from a bad neighbor effect.  If this is 
> > > > > > the case, increasing the amount of ram will likely put you on a 
> > > > > > physical host with few customers and thus you would be less likely 
> > > > > > to have a bad neighbor. 
> > > > > > 
> > > > > > Cost-wise in the VM world, you might be better off adding a few 
> > > > > > nodes rather than increasing the ram in your existing vm's.
> > > > > > 
> > > > > > But then we are talking VMs and thus it should be fairly painless 
> > > > > > to experiment.  I would try adding ram first and if that doesn't 
> > > > > > work, add a few nodes.  Someone else my have a different opinion, 
> > > > > > but that is my two cents. 
> > > > > > 
> > > > > > 
> > > > > > On Wed, Dec 5, 2012 at 4:33 PM, Ken Perkins <[email protected] 
> > > > > > (mailto:[email protected])> wrote:
> > > > > > > Hello all,
> > > > > > > 
> > > > > > > We're seeing enough thrashing and low-memory on our production 
> > > > > > > ring that we've decided to upgrade our hardware. The real 
> > > > > > > question is should we scale up or out.
> > > > > > > 
> > > > > > > Currently our ring is 512 partitions. We know that it's a 
> > > > > > > sub-optimal size but we can't easily solve that now. We're 
> > > > > > > currently running a search-heavy app on 5 8GB VMs. I'm debating 
> > > > > > > between moving the VMs up to 16GB, or adding a few more 8GB VMs. 
> > > > > > > 
> > > > > > > Some of the talk in #riak has pushed me towards adding more 
> > > > > > > machines (thus lowering the per node number of partitions) but I 
> > > > > > > wanted to do a quick sanity check here with folks that it's 
> > > > > > > better than scaling up my current machines. 
> > > > > > > 
> > > > > > > Thanks!
> > > > > > > Ken Perkins
> > > > > > > clipboard.com (http://clipboard.com)
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > riak-users mailing list
> > > > > > > [email protected] (mailto:[email protected])
> > > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > > > > 
> > > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > riak-users mailing list
> > > > > [email protected] (mailto:[email protected])
> > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > 
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Scaling up or out

Reply via email to