Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Simon Effenberg Wed, 11 Dec 2013 04:11:16 -0800

Hi Matthew

Memory: 23999 MB


ring_creation_size, 256
max_open_files, 100

riak-admin status:

memory_total : 276001360
memory_processes : 191506322
memory_processes_used : 191439568
memory_system : 84495038
memory_atom : 686993
memory_atom_used : 686560
memory_binary : 21965352
memory_code : 11332732
memory_ets : 10823528

Thanks for looking!

Cheers
Simon



On Wed, 11 Dec 2013 06:44:42 -0500
Matthew Von-Maszewski <[email protected]> wrote:

> I need to ask other developers as they arrive for the new day.  Does not make 
> sense to me.
> 
> How many nodes do you have?  How much RAM do you have in each node?  What are 
> your settings for max_open_files and cache_size in the app.config file?  
> Maybe this is as simple as leveldb using too much RAM in 1.4.  The memory 
> accounting for maz_open_files changed in 1.4.
> 
> Matthew Von-Maszewski
> 
> 
> On Dec 11, 2013, at 6:28, Simon Effenberg <[email protected]> wrote:
> 
> > Hi Matthew,
> > 
> > it took around 11hours for the first node to finish the compaction. The
> > second node is running already 12 hours and is still doing compaction.
> > 
> > Besides that I wonder because the fsm_put time on the new 1.4.2 host is
> > much higher (after the compaction) than on an old 1.3.1 (both are
> > running in the cluster right now and another one is doing the
> > compaction/upgrade while it is in the cluster but not directly
> > accessible because it is out of the Loadbalancer):
> > 
> > 1.4.2:
> > 
> > node_put_fsm_time_mean : 2208050
> > node_put_fsm_time_median : 39231
> > node_put_fsm_time_95 : 17400382
> > node_put_fsm_time_99 : 50965752
> > node_put_fsm_time_100 : 59537762
> > node_put_fsm_active : 5
> > node_put_fsm_active_60s : 364
> > node_put_fsm_in_rate : 5
> > node_put_fsm_out_rate : 3
> > node_put_fsm_rejected : 0
> > node_put_fsm_rejected_60s : 0
> > node_put_fsm_rejected_total : 0
> > 
> > 
> > 1.3.1:
> > 
> > node_put_fsm_time_mean : 5036
> > node_put_fsm_time_median : 1614
> > node_put_fsm_time_95 : 8789
> > node_put_fsm_time_99 : 38258
> > node_put_fsm_time_100 : 384372
> > 
> > 
> > any clue why this could/should be?
> > 
> > Cheers
> > Simon
> > 
> > On Tue, 10 Dec 2013 17:21:07 +0100
> > Simon Effenberg <[email protected]> wrote:
> > 
> >> Hi Matthew,
> >> 
> >> thanks!.. that answers my questions!
> >> 
> >> Cheers
> >> Simon
> >> 
> >> On Tue, 10 Dec 2013 11:08:32 -0500
> >> Matthew Von-Maszewski <[email protected]> wrote:
> >> 
> >>> 2i is not my expertise, so I had to discuss you concerns with another 
> >>> Basho developer.  He says:
> >>> 
> >>> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk 
> >>> format.  You must wait for all nodes to update if you desire to use the 
> >>> new 2i query.  The 2i data will properly write/update on both 1.3 and 1.4 
> >>> machines during the migration.
> >>> 
> >>> Does that answer your question?
> >>> 
> >>> 
> >>> And yes, you might see available disk space increase during the upgrade 
> >>> compactions if your dataset contains numerous delete "tombstones".  The 
> >>> Riak 2.0 code includes a new feature called "aggressive delete" for 
> >>> leveldb.  This feature is more proactive in pushing delete tombstones 
> >>> through the levels to free up disk space much more quickly (especially if 
> >>> you perform block deletes every now and then).
> >>> 
> >>> Matthew
> >>> 
> >>> 
> >>> On Dec 10, 2013, at 10:44 AM, Simon Effenberg <[email protected]> 
> >>> wrote:
> >>> 
> >>>> Hi Matthew,
> >>>> 
> >>>> see inline..
> >>>> 
> >>>> On Tue, 10 Dec 2013 10:38:03 -0500
> >>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>> 
> >>>>> The sad truth is that you are not the first to see this problem.  And 
> >>>>> yes, it has to do with your 950GB per node dataset.  And no, nothing to 
> >>>>> do but sit through it at this time.
> >>>>> 
> >>>>> While I did extensive testing around upgrade times before shipping 1.4, 
> >>>>> apparently there are data configurations I did not anticipate.  You are 
> >>>>> likely seeing a cascade where a shift of one file from level-1 to 
> >>>>> level-2 is causing a shift of another file from level-2 to level-3, 
> >>>>> which causes a level-3 file to shift to level-4, etc … then the next 
> >>>>> file shifts from level-1.
> >>>>> 
> >>>>> The bright side of this pain is that you will end up with better write 
> >>>>> throughput once all the compaction ends.
> >>>> 
> >>>> I have to deal with that.. but my problem is now, if I'm doing this
> >>>> node by node it looks like 2i searches aren't possible while 1.3 and
> >>>> 1.4 nodes exists in the cluster. Is there any problem which leads me to
> >>>> an 2i repair marathon or could I easily wait for some hours for each
> >>>> node until all merges are done before I upgrade the next one? (2i
> >>>> searches can fail for some time.. the APP isn't having problems with
> >>>> that but are new inserts with 2i indices processed successfully or do
> >>>> I have to do the 2i repair?)
> >>>> 
> >>>> /s
> >>>> 
> >>>> one other good think: saving disk space is one advantage ;)..
> >>>> 
> >>>> 
> >>>>> 
> >>>>> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
> >>>>> that is not going to help you today.
> >>>>> 
> >>>>> Matthew
> >>>>> 
> >>>>> On Dec 10, 2013, at 10:26 AM, Simon Effenberg 
> >>>>> <[email protected]> wrote:
> >>>>> 
> >>>>>> Hi @list,
> >>>>>> 
> >>>>>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> >>>>>> upgrading the first node (out of 12) this node seems to do many merges.
> >>>>>> the sst_* directories changes in size "rapidly" and the node is having
> >>>>>> a disk utilization of 100% all the time.
> >>>>>> 
> >>>>>> I know that there is something like that:
> >>>>>> 
> >>>>>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> >>>>>> will initiate an automatic conversion that could pause the startup of
> >>>>>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> >>>>>> adjusted such that "level #1" can operate as an overlapped data level
> >>>>>> instead of as a sorted data level. The conversion is simply the
> >>>>>> reduction of the number of files in "level #1" to being less than eight
> >>>>>> via normal compaction of data from "level #1" into "level #2". This is
> >>>>>> a one time conversion."
> >>>>>> 
> >>>>>> but it looks much more invasive than explained here or doesn't have to
> >>>>>> do anything with the (probably seen) merges.
> >>>>>> 
> >>>>>> Is this "normal" behavior or could I do anything about it?
> >>>>>> 
> >>>>>> At the moment I'm stucked with the upgrade procedure because this high
> >>>>>> IO load would probably lead to high response times.
> >>>>>> 
> >>>>>> Also we have a lot of data (per node ~950 GB).
> >>>>>> 
> >>>>>> Cheers
> >>>>>> Simon
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> riak-users mailing list
> >>>>>> [email protected]
> >>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>> Fon:     + 49-(0)30-8109 - 7173
> >>>> Fax:     + 49-(0)30-8109 - 7131
> >>>> 
> >>>> Mail:     [email protected]
> >>>> Web:    www.mobile.de
> >>>> 
> >>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>> 
> >>>> 
> >>>> Geschäftsführer: Malte Krüger
> >>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>> Sitz der Gesellschaft: Kleinmachnow 
> >>> 
> >> 
> >> 
> >> -- 
> >> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >> Fon:     + 49-(0)30-8109 - 7173
> >> Fax:     + 49-(0)30-8109 - 7131
> >> 
> >> Mail:     [email protected]
> >> Web:    www.mobile.de
> >> 
> >> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >> 
> >> 
> >> Geschäftsführer: Malte Krüger
> >> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >> Sitz der Gesellschaft: Kleinmachnow 
> >> 
> >> _______________________________________________
> >> riak-users mailing list
> >> [email protected]
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> > 
> > -- 
> > Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> > Fon:     + 49-(0)30-8109 - 7173
> > Fax:     + 49-(0)30-8109 - 7131
> > 
> > Mail:     [email protected]
> > Web:    www.mobile.de
> > 
> > Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> > 
> > 
> > Geschäftsführer: Malte Krüger
> > HRB Nr.: 18517 P, Amtsgericht Potsdam
> > Sitz der Gesellschaft: Kleinmachnow 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon:     + 49-(0)30-8109 - 7173
Fax:     + 49-(0)30-8109 - 7131

Mail:     [email protected]
Web:    www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Reply via email to