Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Simon Effenberg Wed, 11 Dec 2013 03:30:31 -0800

Hi Matthew,

it took around 11hours for the first node to finish the compaction. The
second node is running already 12 hours and is still doing compaction.


Besides that I wonder because the fsm_put time on the new 1.4.2 host is
much higher (after the compaction) than on an old 1.3.1 (both are
running in the cluster right now and another one is doing the
compaction/upgrade while it is in the cluster but not directly
accessible because it is out of the Loadbalancer):

1.4.2:

node_put_fsm_time_mean : 2208050
node_put_fsm_time_median : 39231
node_put_fsm_time_95 : 17400382
node_put_fsm_time_99 : 50965752
node_put_fsm_time_100 : 59537762
node_put_fsm_active : 5
node_put_fsm_active_60s : 364
node_put_fsm_in_rate : 5
node_put_fsm_out_rate : 3
node_put_fsm_rejected : 0
node_put_fsm_rejected_60s : 0
node_put_fsm_rejected_total : 0


1.3.1:

node_put_fsm_time_mean : 5036
node_put_fsm_time_median : 1614
node_put_fsm_time_95 : 8789
node_put_fsm_time_99 : 38258
node_put_fsm_time_100 : 384372


any clue why this could/should be?

Cheers
Simon

On Tue, 10 Dec 2013 17:21:07 +0100
Simon Effenberg <[email protected]> wrote:

> Hi Matthew,
> 
> thanks!.. that answers my questions!
> 
> Cheers
> Simon
> 
> On Tue, 10 Dec 2013 11:08:32 -0500
> Matthew Von-Maszewski <[email protected]> wrote:
> 
> > 2i is not my expertise, so I had to discuss you concerns with another Basho 
> > developer.  He says:
> > 
> > Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format. 
> >  You must wait for all nodes to update if you desire to use the new 2i 
> > query.  The 2i data will properly write/update on both 1.3 and 1.4 machines 
> > during the migration.
> > 
> > Does that answer your question?
> > 
> > 
> > And yes, you might see available disk space increase during the upgrade 
> > compactions if your dataset contains numerous delete "tombstones".  The 
> > Riak 2.0 code includes a new feature called "aggressive delete" for 
> > leveldb.  This feature is more proactive in pushing delete tombstones 
> > through the levels to free up disk space much more quickly (especially if 
> > you perform block deletes every now and then).
> > 
> > Matthew
> > 
> > 
> > On Dec 10, 2013, at 10:44 AM, Simon Effenberg <[email protected]> 
> > wrote:
> > 
> > > Hi Matthew,
> > > 
> > > see inline..
> > > 
> > > On Tue, 10 Dec 2013 10:38:03 -0500
> > > Matthew Von-Maszewski <[email protected]> wrote:
> > > 
> > >> The sad truth is that you are not the first to see this problem.  And 
> > >> yes, it has to do with your 950GB per node dataset.  And no, nothing to 
> > >> do but sit through it at this time.
> > >> 
> > >> While I did extensive testing around upgrade times before shipping 1.4, 
> > >> apparently there are data configurations I did not anticipate.  You are 
> > >> likely seeing a cascade where a shift of one file from level-1 to 
> > >> level-2 is causing a shift of another file from level-2 to level-3, 
> > >> which causes a level-3 file to shift to level-4, etc … then the next 
> > >> file shifts from level-1.
> > >> 
> > >> The bright side of this pain is that you will end up with better write 
> > >> throughput once all the compaction ends.
> > > 
> > > I have to deal with that.. but my problem is now, if I'm doing this
> > > node by node it looks like 2i searches aren't possible while 1.3 and
> > > 1.4 nodes exists in the cluster. Is there any problem which leads me to
> > > an 2i repair marathon or could I easily wait for some hours for each
> > > node until all merges are done before I upgrade the next one? (2i
> > > searches can fail for some time.. the APP isn't having problems with
> > > that but are new inserts with 2i indices processed successfully or do
> > > I have to do the 2i repair?)
> > > 
> > > /s
> > > 
> > > one other good think: saving disk space is one advantage ;)..
> > > 
> > > 
> > >> 
> > >> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
> > >> that is not going to help you today.
> > >> 
> > >> Matthew
> > >> 
> > >> On Dec 10, 2013, at 10:26 AM, Simon Effenberg 
> > >> <[email protected]> wrote:
> > >> 
> > >>> Hi @list,
> > >>> 
> > >>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> > >>> upgrading the first node (out of 12) this node seems to do many merges.
> > >>> the sst_* directories changes in size "rapidly" and the node is having
> > >>> a disk utilization of 100% all the time.
> > >>> 
> > >>> I know that there is something like that:
> > >>> 
> > >>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> > >>> will initiate an automatic conversion that could pause the startup of
> > >>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> > >>> adjusted such that "level #1" can operate as an overlapped data level
> > >>> instead of as a sorted data level. The conversion is simply the
> > >>> reduction of the number of files in "level #1" to being less than eight
> > >>> via normal compaction of data from "level #1" into "level #2". This is
> > >>> a one time conversion."
> > >>> 
> > >>> but it looks much more invasive than explained here or doesn't have to
> > >>> do anything with the (probably seen) merges.
> > >>> 
> > >>> Is this "normal" behavior or could I do anything about it?
> > >>> 
> > >>> At the moment I'm stucked with the upgrade procedure because this high
> > >>> IO load would probably lead to high response times.
> > >>> 
> > >>> Also we have a lot of data (per node ~950 GB).
> > >>> 
> > >>> Cheers
> > >>> Simon
> > >>> 
> > >>> _______________________________________________
> > >>> riak-users mailing list
> > >>> [email protected]
> > >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >> 
> > > 
> > > 
> > > -- 
> > > Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> > > Fon:     + 49-(0)30-8109 - 7173
> > > Fax:     + 49-(0)30-8109 - 7131
> > > 
> > > Mail:     [email protected]
> > > Web:    www.mobile.de
> > > 
> > > Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> > > 
> > > 
> > > Geschäftsführer: Malte Krüger
> > > HRB Nr.: 18517 P, Amtsgericht Potsdam
> > > Sitz der Gesellschaft: Kleinmachnow 
> > 
> 
> 
> -- 
> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> Fon:     + 49-(0)30-8109 - 7173
> Fax:     + 49-(0)30-8109 - 7131
> 
> Mail:     [email protected]
> Web:    www.mobile.de
> 
> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> 
> 
> Geschäftsführer: Malte Krüger
> HRB Nr.: 18517 P, Amtsgericht Potsdam
> Sitz der Gesellschaft: Kleinmachnow 
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon:     + 49-(0)30-8109 - 7173
Fax:     + 49-(0)30-8109 - 7131

Mail:     [email protected]
Web:    www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Reply via email to