Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Simon Effenberg Wed, 11 Dec 2013 14:18:26 -0800

Cool..

gave me an exception about 
** exception error: undefined shell command profit/0


but it worked and now I have new data.. thanks a lot!

Cheers
Simon

On Wed, 11 Dec 2013 17:05:29 -0500
Matthew Von-Maszewski <[email protected]> wrote:

> One of the core developers says that the following line should stop the stats 
> process.  It will then be automatically started, without the stuck data.
> 
> exit(whereis(riak_core_stat_calc_sup), kill), profit().
> 
> Matthew
> 
> On Dec 11, 2013, at 4:50 PM, Simon Effenberg <[email protected]> 
> wrote:
> 
> > So I think I have no real chance to get good numbers. I can see a
> > little bit through the app monitoring but I'm not sure if I can see
> > real differences about the 100 -> 170 open_files increase.
> > 
> > I will try to change the value on the already migrated nodes as well to
> > see if this improves the stuff I can see..
> > 
> > Any other ideas?
> > 
> > Cheers
> > Simon
> > 
> > On Wed, 11 Dec 2013 15:37:03 -0500
> > Matthew Von-Maszewski <[email protected]> wrote:
> > 
> >> The real Riak developers have suggested this might be your problem with 
> >> stats being stuck:
> >> 
> >> https://github.com/basho/riak_core/pull/467
> >> 
> >> The fix is included in the upcoming 1.4.4 maintenance release (which is 
> >> overdue so I am not going to bother guessing when it will actually arrive).
> >> 
> >> Matthew
> >> 
> >> On Dec 11, 2013, at 2:47 PM, Simon Effenberg <[email protected]> 
> >> wrote:
> >> 
> >>> I will do..
> >>> 
> >>> but one other thing:
> >>> 
> >>> watch Every 10.0s: sudo riak-admin status | grep put_fsm
> >>> node_put_fsm_time_mean : 2208050
> >>> node_put_fsm_time_median : 39231
> >>> node_put_fsm_time_95 : 17400382
> >>> node_put_fsm_time_99 : 50965752
> >>> node_put_fsm_time_100 : 59537762
> >>> node_put_fsm_active : 5
> >>> node_put_fsm_active_60s : 364
> >>> node_put_fsm_in_rate : 5
> >>> node_put_fsm_out_rate : 3
> >>> node_put_fsm_rejected : 0
> >>> node_put_fsm_rejected_60s : 0
> >>> node_put_fsm_rejected_total : 0
> >>> 
> >>> this is not changing at all.. so maybe my expectations are _wrong_?! So
> >>> I will start searching around if there is a "status" bug or I'm
> >>> looking in the wrong place... maybe there is no problem while searching
> >>> for one?! But I see that at least the app has some issues on GET and
> >>> PUT (more on PUT).. so I would like to know how fast the things are..
> >>> but "status" isn't working.. aaaaargh...
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> 
> >>> On Wed, 11 Dec 2013 14:32:07 -0500
> >>> Matthew Von-Maszewski <[email protected]> wrote:
> >>> 
> >>>> An additional thought:  if increasing max_open_files does NOT help, try 
> >>>> removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, 
> >>>> but one other user mentioned that the new sorted 2i queries needed more 
> >>>> CPU in the Erlang layer.
> >>>> 
> >>>> Summary:
> >>>> - try increasing max_open_files to 170
> >>>> - helps:  try setting sst_block_size to 32768 in app.config
> >>>> - does not help:  try removing +S from vm.args
> >>>> 
> >>>> Matthew
> >>>> 
> >>>> On Dec 11, 2013, at 1:58 PM, Simon Effenberg <[email protected]> 
> >>>> wrote:
> >>>> 
> >>>>> Hi Matthew,
> >>>>> 
> >>>>> On Wed, 11 Dec 2013 18:38:49 +0100
> >>>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>>> 
> >>>>>> Simon,
> >>>>>> 
> >>>>>> I have plugged your various values into the attached spreadsheet.  I 
> >>>>>> assumed a vnode count to allow for one of your twelve servers to die 
> >>>>>> (256 ring size / 11 servers).
> >>>>> 
> >>>>> Great, thanks!
> >>>>> 
> >>>>>> 
> >>>>>> The spreadsheet suggests you can safely raise your max_open_files from 
> >>>>>> 100 to 170.  I would suggest doing this for the next server you 
> >>>>>> upgrade.  If part of your problem is file cache thrashing, you should 
> >>>>>> see an improvement.
> >>>>> 
> >>>>> I will try this out.. starting the next server in 3-4 hours.
> >>>>> 
> >>>>>> 
> >>>>>> Only if max_open_files helps, you should then consider adding 
> >>>>>> {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
> >>>>>> setting, given your value sizes, would likely half the size of the 
> >>>>>> metadata held in the file cache.  It only impacts the files newly 
> >>>>>> compacted in the upgrade, and would gradually increase space in the 
> >>>>>> file cache while slowing down the file cache thrashing.
> >>>>> 
> >>>>> So I'll do this at the over-next server if the next server is fine.
> >>>>> 
> >>>>>> 
> >>>>>> What build/packaging of Riak do you use, or do you build from source?
> >>>>> 
> >>>>> Using the debian packages from the basho site..
> >>>>> 
> >>>>> I'm really wondering why the "put" performance is that bad.
> >>>>> Here are the changes which were introduced/changed only on the new
> >>>>> upgraded servers:
> >>>>> 
> >>>>> 
> >>>>> +        fsm_limit                 => 50000,
> >>>>> --- our '+P' is set to 262144 so more than 3x fsm_limit which was
> >>>>> --- stated somewhere
> >>>>> +        # after finishing the upgrade this should be switched to v1 !!!
> >>>>> +        object_format             => '__atom_v0',
> >>>>> 
> >>>>> -      '-env ERL_MAX_ETS_TABLES' => 8192,
> >>>>> +      '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
> >>>>> but 1.4.2 raised it to this high number
> >>>>> +      '-env ERL_MAX_PORTS'       => 64000,
> >>>>> +      # Treat error_logger warnings as warnings
> >>>>> +      '+W'                       => 'w',
> >>>>> +      # Tweak GC to run more often
> >>>>> +      '-env ERL_FULLSWEEP_AFTER' => 0,
> >>>>> +      # Force the erlang VM to use SMP
> >>>>> +      '-smp'                     => 'enable',
> >>>>> +      #################################
> >>>>> 
> >>>>> Cheers
> >>>>> Simon
> >>>>> 
> >>>>> 
> >>>>>> 
> >>>>>> Matthew
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> On Dec 11, 2013, at 9:48 AM, Simon Effenberg 
> >>>>>> <[email protected]> wrote:
> >>>>>> 
> >>>>>>> Hi Matthew,
> >>>>>>> 
> >>>>>>> thanks for all your time and work.. see inline for answers..
> >>>>>>> 
> >>>>>>> On Wed, 11 Dec 2013 09:17:32 -0500
> >>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>>>>> 
> >>>>>>>> The real Riak developers have arrived on-line for the day.  They are 
> >>>>>>>> telling me that all of your problems are likely due to the extended 
> >>>>>>>> upgrade times, and yes there is a known issue with handoff between 
> >>>>>>>> 1.3 and 1.4.  They also say everything should calm down after all 
> >>>>>>>> nodes are upgraded.
> >>>>>>>> 
> >>>>>>>> I will review your system settings now and see if there is something 
> >>>>>>>> that might make the other machines upgrade quicker.  So three more 
> >>>>>>>> questions:
> >>>>>>>> 
> >>>>>>>> - what is the average size of your keys
> >>>>>>> 
> >>>>>>> bucket names are between 5 and 15 characters (only ~ 10 buckets)..
> >>>>>>> key names are normally something like 26iesj:hovh7egz
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> - what is the average size of your value (data stored)
> >>>>>>> 
> >>>>>>> I have to guess.. but mean is (from Riak) 12kb but 95th percentile is
> >>>>>>> at 75kb and in theory we have a limit of 1MB (then it will be split 
> >>>>>>> up)
> >>>>>>> but sometimes thanks to sibblings (we have to buckets with allow_mult)
> >>>>>>> we have also some 7MB in MAX but this will be reduced again (it's a 
> >>>>>>> new
> >>>>>>> feature in our app which has to many parallel wrights below of 15ms).
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> - in regular use, are your keys accessed randomly across their 
> >>>>>>>> entire range, or do they contain a date component which clusters 
> >>>>>>>> older, less used keys
> >>>>>>> 
> >>>>>>> normally we don't search but retrieve keys by key name.. and we have
> >>>>>>> data which is up to 6 months old and normally we access mostly
> >>>>>>> new/active/hot data and not all the old ones.. besides this we have a
> >>>>>>> job doing a 2i query every 5mins and another one doing this maybe once
> >>>>>>> an hour.. both don't work while the upgrade is ongoing (2i isn't
> >>>>>>> working).
> >>>>>>> 
> >>>>>>> Cheers
> >>>>>>> Simon
> >>>>>>> 
> >>>>>>>> 
> >>>>>>>> Matthew
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> On Dec 11, 2013, at 8:43 AM, Simon Effenberg 
> >>>>>>>> <[email protected]> wrote:
> >>>>>>>> 
> >>>>>>>>> Oh and at the moment they are waiting for some handoffs and I see
> >>>>>>>>> errors in logfiles:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 2013-12-11 13:41:47.948 UTC [error]
> >>>>>>>>> <0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
> >>>>>>>>> transfer of riak_kv_vnode from '[email protected]'
> >>>>>>>>> 468137243207554840987117797979434404733540892672
> >>>>>>>>> 
> >>>>>>>>> but I remember that somebody else had this as well and if I recall
> >>>>>>>>> correctly it disappeared after the full upgrade was done.. but at 
> >>>>>>>>> the
> >>>>>>>>> moment it's hard to think about upgrading everything at once..
> >>>>>>>>> (~12hours 100% disk utilization on all 12 nodes will lead to real 
> >>>>>>>>> slow
> >>>>>>>>> puts/gets)
> >>>>>>>>> 
> >>>>>>>>> What can I do?
> >>>>>>>>> 
> >>>>>>>>> Cheers
> >>>>>>>>> Simon
> >>>>>>>>> 
> >>>>>>>>> PS: transfers output:
> >>>>>>>>> 
> >>>>>>>>> '[email protected]' waiting to handoff 17 partitions
> >>>>>>>>> '[email protected]' waiting to handoff 19 partitions
> >>>>>>>>> 
> >>>>>>>>> (these are the 1.4.2 nodes)
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> On Wed, 11 Dec 2013 14:39:58 +0100
> >>>>>>>>> Simon Effenberg <[email protected]> wrote:
> >>>>>>>>> 
> >>>>>>>>>> Also some side notes:
> >>>>>>>>>> 
> >>>>>>>>>> "top" is even better on new 1.4.2 than on 1.3.1 machines.. IO
> >>>>>>>>>> utilization of disk is mostly the same (round about 33%)..
> >>>>>>>>>> 
> >>>>>>>>>> but
> >>>>>>>>>> 
> >>>>>>>>>> 95th percentile of response time for get (avg over all nodes):
> >>>>>>>>>> before upgrade: 29ms
> >>>>>>>>>> after upgrade: almost the same
> >>>>>>>>>> 
> >>>>>>>>>> 95th percentile of response time for put (avg over all nodes):
> >>>>>>>>>> before upgrade: 60ms
> >>>>>>>>>> after upgrade: 1548ms
> >>>>>>>>>> but this is only because of 2 of 12 nodes are
> >>>>>>>>>> on 1.4.2 and are really slow (17000ms)
> >>>>>>>>>> 
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Simon
> >>>>>>>>>> 
> >>>>>>>>>> On Wed, 11 Dec 2013 13:45:56 +0100
> >>>>>>>>>> Simon Effenberg <[email protected]> wrote:
> >>>>>>>>>> 
> >>>>>>>>>>> Sorry I forgot the half of it..
> >>>>>>>>>>> 
> >>>>>>>>>>> seffenberg@kriak46-1:~$ free -m
> >>>>>>>>>>>         total       used       free     shared    buffers cached
> >>>>>>>>>>> Mem:         23999      23759        239          0        184    
> >>>>>>>>>>>   16183
> >>>>>>>>>>> -/+ buffers/cache:       7391      16607
> >>>>>>>>>>> Swap:            0          0          0
> >>>>>>>>>>> 
> >>>>>>>>>>> We have 12 servers..
> >>>>>>>>>>> datadir on the compacted servers (1.4.2) ~ 765 GB
> >>>>>>>>>>> 
> >>>>>>>>>>> AAE is enabled.
> >>>>>>>>>>> 
> >>>>>>>>>>> I attached app.config and vm.args.
> >>>>>>>>>>> 
> >>>>>>>>>>> Cheers
> >>>>>>>>>>> Simon
> >>>>>>>>>>> 
> >>>>>>>>>>> On Wed, 11 Dec 2013 07:33:31 -0500
> >>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>>>>>>>>> 
> >>>>>>>>>>>> Ok, I am now suspecting that your servers are either using swap 
> >>>>>>>>>>>> space (which is slow) or your leveldb file cache is thrashing 
> >>>>>>>>>>>> (opening and closing multiple files per request).
> >>>>>>>>>>>> 
> >>>>>>>>>>>> How many servers do you have and do you use Riak's active 
> >>>>>>>>>>>> anti-entropy feature?  I am going to plug all of this into a 
> >>>>>>>>>>>> spreadsheet.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Matthew Von-Maszewski
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> On Dec 11, 2013, at 7:09, Simon Effenberg 
> >>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> Hi Matthew
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Memory: 23999 MB
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> ring_creation_size, 256
> >>>>>>>>>>>>> max_open_files, 100
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> riak-admin status:
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> memory_total : 276001360
> >>>>>>>>>>>>> memory_processes : 191506322
> >>>>>>>>>>>>> memory_processes_used : 191439568
> >>>>>>>>>>>>> memory_system : 84495038
> >>>>>>>>>>>>> memory_atom : 686993
> >>>>>>>>>>>>> memory_atom_used : 686560
> >>>>>>>>>>>>> memory_binary : 21965352
> >>>>>>>>>>>>> memory_code : 11332732
> >>>>>>>>>>>>> memory_ets : 10823528
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Thanks for looking!
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>> Simon
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> On Wed, 11 Dec 2013 06:44:42 -0500
> >>>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>>> I need to ask other developers as they arrive for the new day. 
> >>>>>>>>>>>>>>  Does not make sense to me.
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> How many nodes do you have?  How much RAM do you have in each 
> >>>>>>>>>>>>>> node?  What are your settings for max_open_files and 
> >>>>>>>>>>>>>> cache_size in the app.config file?  Maybe this is as simple as 
> >>>>>>>>>>>>>> leveldb using too much RAM in 1.4.  The memory accounting for 
> >>>>>>>>>>>>>> maz_open_files changed in 1.4.
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Matthew Von-Maszewski
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On Dec 11, 2013, at 6:28, Simon Effenberg 
> >>>>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Hi Matthew,
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> it took around 11hours for the first node to finish the 
> >>>>>>>>>>>>>>> compaction. The
> >>>>>>>>>>>>>>> second node is running already 12 hours and is still doing 
> >>>>>>>>>>>>>>> compaction.
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Besides that I wonder because the fsm_put time on the new 
> >>>>>>>>>>>>>>> 1.4.2 host is
> >>>>>>>>>>>>>>> much higher (after the compaction) than on an old 1.3.1 (both 
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>> running in the cluster right now and another one is doing the
> >>>>>>>>>>>>>>> compaction/upgrade while it is in the cluster but not directly
> >>>>>>>>>>>>>>> accessible because it is out of the Loadbalancer):
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 1.4.2:
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> node_put_fsm_time_mean : 2208050
> >>>>>>>>>>>>>>> node_put_fsm_time_median : 39231
> >>>>>>>>>>>>>>> node_put_fsm_time_95 : 17400382
> >>>>>>>>>>>>>>> node_put_fsm_time_99 : 50965752
> >>>>>>>>>>>>>>> node_put_fsm_time_100 : 59537762
> >>>>>>>>>>>>>>> node_put_fsm_active : 5
> >>>>>>>>>>>>>>> node_put_fsm_active_60s : 364
> >>>>>>>>>>>>>>> node_put_fsm_in_rate : 5
> >>>>>>>>>>>>>>> node_put_fsm_out_rate : 3
> >>>>>>>>>>>>>>> node_put_fsm_rejected : 0
> >>>>>>>>>>>>>>> node_put_fsm_rejected_60s : 0
> >>>>>>>>>>>>>>> node_put_fsm_rejected_total : 0
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 1.3.1:
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> node_put_fsm_time_mean : 5036
> >>>>>>>>>>>>>>> node_put_fsm_time_median : 1614
> >>>>>>>>>>>>>>> node_put_fsm_time_95 : 8789
> >>>>>>>>>>>>>>> node_put_fsm_time_99 : 38258
> >>>>>>>>>>>>>>> node_put_fsm_time_100 : 384372
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> any clue why this could/should be?
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>> Simon
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> On Tue, 10 Dec 2013 17:21:07 +0100
> >>>>>>>>>>>>>>> Simon Effenberg <[email protected]> wrote:
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Hi Matthew,
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> thanks!.. that answers my questions!
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>> Simon
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> On Tue, 10 Dec 2013 11:08:32 -0500
> >>>>>>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> 2i is not my expertise, so I had to discuss you concerns 
> >>>>>>>>>>>>>>>>> with another Basho developer.  He says:
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> Between 1.3 and 1.4, the 2i query did change but not the 2i 
> >>>>>>>>>>>>>>>>> on-disk format.  You must wait for all nodes to update if 
> >>>>>>>>>>>>>>>>> you desire to use the new 2i query.  The 2i data will 
> >>>>>>>>>>>>>>>>> properly write/update on both 1.3 and 1.4 machines during 
> >>>>>>>>>>>>>>>>> the migration.
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> Does that answer your question?
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> And yes, you might see available disk space increase during 
> >>>>>>>>>>>>>>>>> the upgrade compactions if your dataset contains numerous 
> >>>>>>>>>>>>>>>>> delete "tombstones".  The Riak 2.0 code includes a new 
> >>>>>>>>>>>>>>>>> feature called "aggressive delete" for leveldb.  This 
> >>>>>>>>>>>>>>>>> feature is more proactive in pushing delete tombstones 
> >>>>>>>>>>>>>>>>> through the levels to free up disk space much more quickly 
> >>>>>>>>>>>>>>>>> (especially if you perform block deletes every now and 
> >>>>>>>>>>>>>>>>> then).
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> Matthew
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>> On Dec 10, 2013, at 10:44 AM, Simon Effenberg 
> >>>>>>>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> Hi Matthew,
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> see inline..
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> On Tue, 10 Dec 2013 10:38:03 -0500
> >>>>>>>>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> The sad truth is that you are not the first to see this 
> >>>>>>>>>>>>>>>>>>> problem.  And yes, it has to do with your 950GB per node 
> >>>>>>>>>>>>>>>>>>> dataset.  And no, nothing to do but sit through it at 
> >>>>>>>>>>>>>>>>>>> this time.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> While I did extensive testing around upgrade times before 
> >>>>>>>>>>>>>>>>>>> shipping 1.4, apparently there are data configurations I 
> >>>>>>>>>>>>>>>>>>> did not anticipate.  You are likely seeing a cascade 
> >>>>>>>>>>>>>>>>>>> where a shift of one file from level-1 to level-2 is 
> >>>>>>>>>>>>>>>>>>> causing a shift of another file from level-2 to level-3, 
> >>>>>>>>>>>>>>>>>>> which causes a level-3 file to shift to level-4, etc … 
> >>>>>>>>>>>>>>>>>>> then the next file shifts from level-1.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> The bright side of this pain is that you will end up with 
> >>>>>>>>>>>>>>>>>>> better write throughput once all the compaction ends.
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> I have to deal with that.. but my problem is now, if I'm 
> >>>>>>>>>>>>>>>>>> doing this
> >>>>>>>>>>>>>>>>>> node by node it looks like 2i searches aren't possible 
> >>>>>>>>>>>>>>>>>> while 1.3 and
> >>>>>>>>>>>>>>>>>> 1.4 nodes exists in the cluster. Is there any problem 
> >>>>>>>>>>>>>>>>>> which leads me to
> >>>>>>>>>>>>>>>>>> an 2i repair marathon or could I easily wait for some 
> >>>>>>>>>>>>>>>>>> hours for each
> >>>>>>>>>>>>>>>>>> node until all merges are done before I upgrade the next 
> >>>>>>>>>>>>>>>>>> one? (2i
> >>>>>>>>>>>>>>>>>> searches can fail for some time.. the APP isn't having 
> >>>>>>>>>>>>>>>>>> problems with
> >>>>>>>>>>>>>>>>>> that but are new inserts with 2i indices processed 
> >>>>>>>>>>>>>>>>>> successfully or do
> >>>>>>>>>>>>>>>>>> I have to do the 2i repair?)
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> /s
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> one other good think: saving disk space is one advantage 
> >>>>>>>>>>>>>>>>>> ;)..
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Riak 2.0's leveldb has code to prevent/reduce compaction 
> >>>>>>>>>>>>>>>>>>> cascades, but that is not going to help you today.
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> Matthew
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>> On Dec 10, 2013, at 10:26 AM, Simon Effenberg 
> >>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> Hi @list,
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> I'm trying to upgrade our Riak cluster from 1.3.1 to 
> >>>>>>>>>>>>>>>>>>>> 1.4.2 .. after
> >>>>>>>>>>>>>>>>>>>> upgrading the first node (out of 12) this node seems to 
> >>>>>>>>>>>>>>>>>>>> do many merges.
> >>>>>>>>>>>>>>>>>>>> the sst_* directories changes in size "rapidly" and the 
> >>>>>>>>>>>>>>>>>>>> node is having
> >>>>>>>>>>>>>>>>>>>> a disk utilization of 100% all the time.
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> I know that there is something like that:
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> "The first execution of 1.4.0 leveldb using a 1.3.x or 
> >>>>>>>>>>>>>>>>>>>> 1.2.x dataset
> >>>>>>>>>>>>>>>>>>>> will initiate an automatic conversion that could pause 
> >>>>>>>>>>>>>>>>>>>> the startup of
> >>>>>>>>>>>>>>>>>>>> each node by 3 to 7 minutes. The leveldb data in "level 
> >>>>>>>>>>>>>>>>>>>> #1" is being
> >>>>>>>>>>>>>>>>>>>> adjusted such that "level #1" can operate as an 
> >>>>>>>>>>>>>>>>>>>> overlapped data level
> >>>>>>>>>>>>>>>>>>>> instead of as a sorted data level. The conversion is 
> >>>>>>>>>>>>>>>>>>>> simply the
> >>>>>>>>>>>>>>>>>>>> reduction of the number of files in "level #1" to being 
> >>>>>>>>>>>>>>>>>>>> less than eight
> >>>>>>>>>>>>>>>>>>>> via normal compaction of data from "level #1" into 
> >>>>>>>>>>>>>>>>>>>> "level #2". This is
> >>>>>>>>>>>>>>>>>>>> a one time conversion."
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> but it looks much more invasive than explained here or 
> >>>>>>>>>>>>>>>>>>>> doesn't have to
> >>>>>>>>>>>>>>>>>>>> do anything with the (probably seen) merges.
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> Is this "normal" behavior or could I do anything about 
> >>>>>>>>>>>>>>>>>>>> it?
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> At the moment I'm stucked with the upgrade procedure 
> >>>>>>>>>>>>>>>>>>>> because this high
> >>>>>>>>>>>>>>>>>>>> IO load would probably lead to high response times.
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> Also we have a lot of data (per node ~950 GB).
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>> Simon
> >>>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>>>> riak-users mailing list
> >>>>>>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international 
> >>>>>>>>>>>>>>>>>> GmbH
> >>>>>>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> Mail:     [email protected]
> >>>>>>>>>>>>>>>>>> Web:    www.mobile.de
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international 
> >>>>>>>>>>>>>>>> GmbH
> >>>>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Mail:     [email protected]
> >>>>>>>>>>>>>>>> Web:    www.mobile.de
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>> riak-users mailing list
> >>>>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international 
> >>>>>>>>>>>>>>> GmbH
> >>>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Mail:     [email protected]
> >>>>>>>>>>>>>>> Web:    www.mobile.de
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Mail:     [email protected]
> >>>>>>>>>>>>> Web:    www.mobile.de
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> --
> >>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>>>> 
> >>>>>>>>>>> Mail:     [email protected]
> >>>>>>>>>>> Web:    www.mobile.de
> >>>>>>>>>>> 
> >>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> --
> >>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>>> 
> >>>>>>>>>> Mail:     [email protected]
> >>>>>>>>>> Web:    www.mobile.de
> >>>>>>>>>> 
> >>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> --
> >>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>>>> 
> >>>>>>>>> Mail:     [email protected]
> >>>>>>>>> Web:    www.mobile.de
> >>>>>>>>> 
> >>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> --
> >>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>>>> 
> >>>>>>> Mail:     [email protected]
> >>>>>>> Web:    www.mobile.de
> >>>>>>> 
> >>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Geschäftsführer: Malte Krüger
> >>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>>>> Sitz der Gesellschaft: Kleinmachnow
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>>>> Fon:     + 49-(0)30-8109 - 7173
> >>>>> Fax:     + 49-(0)30-8109 - 7131
> >>>>> 
> >>>>> Mail:     [email protected]
> >>>>> Web:    www.mobile.de
> >>>>> 
> >>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>>>> 
> >>>>> 
> >>>>> Geschäftsführer: Malte Krüger
> >>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>>>> Sitz der Gesellschaft: Kleinmachnow 
> >>>> 
> >>> 
> >>> 
> >>> -- 
> >>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> >>> Fon:     + 49-(0)30-8109 - 7173
> >>> Fax:     + 49-(0)30-8109 - 7131
> >>> 
> >>> Mail:     [email protected]
> >>> Web:    www.mobile.de
> >>> 
> >>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> >>> 
> >>> 
> >>> Geschäftsführer: Malte Krüger
> >>> HRB Nr.: 18517 P, Amtsgericht Potsdam
> >>> Sitz der Gesellschaft: Kleinmachnow 
> >> 
> > 
> > 
> > -- 
> > Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> > Fon:     + 49-(0)30-8109 - 7173
> > Fax:     + 49-(0)30-8109 - 7131
> > 
> > Mail:     [email protected]
> > Web:    www.mobile.de
> > 
> > Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> > 
> > 
> > Geschäftsführer: Malte Krüger
> > HRB Nr.: 18517 P, Amtsgericht Potsdam
> > Sitz der Gesellschaft: Kleinmachnow 
> 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon:     + 49-(0)30-8109 - 7173
Fax:     + 49-(0)30-8109 - 7131

Mail:     [email protected]
Web:    www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Reply via email to