Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Matthew Von-Maszewski Wed, 11 Dec 2013 14:06:13 -0800

One of the core developers says that the following line should stop the stats 
process.  It will then be automatically started, without the stuck data.


exit(whereis(riak_core_stat_calc_sup), kill), profit().

Matthew

On Dec 11, 2013, at 4:50 PM, Simon Effenberg <[email protected]> wrote:

> So I think I have no real chance to get good numbers. I can see a
> little bit through the app monitoring but I'm not sure if I can see
> real differences about the 100 -> 170 open_files increase.
> 
> I will try to change the value on the already migrated nodes as well to
> see if this improves the stuff I can see..
> 
> Any other ideas?
> 
> Cheers
> Simon
> 
> On Wed, 11 Dec 2013 15:37:03 -0500
> Matthew Von-Maszewski <[email protected]> wrote:
> 
>> The real Riak developers have suggested this might be your problem with 
>> stats being stuck:
>> 
>> https://github.com/basho/riak_core/pull/467
>> 
>> The fix is included in the upcoming 1.4.4 maintenance release (which is 
>> overdue so I am not going to bother guessing when it will actually arrive).
>> 
>> Matthew
>> 
>> On Dec 11, 2013, at 2:47 PM, Simon Effenberg <[email protected]> 
>> wrote:
>> 
>>> I will do..
>>> 
>>> but one other thing:
>>> 
>>> watch Every 10.0s: sudo riak-admin status | grep put_fsm
>>> node_put_fsm_time_mean : 2208050
>>> node_put_fsm_time_median : 39231
>>> node_put_fsm_time_95 : 17400382
>>> node_put_fsm_time_99 : 50965752
>>> node_put_fsm_time_100 : 59537762
>>> node_put_fsm_active : 5
>>> node_put_fsm_active_60s : 364
>>> node_put_fsm_in_rate : 5
>>> node_put_fsm_out_rate : 3
>>> node_put_fsm_rejected : 0
>>> node_put_fsm_rejected_60s : 0
>>> node_put_fsm_rejected_total : 0
>>> 
>>> this is not changing at all.. so maybe my expectations are _wrong_?! So
>>> I will start searching around if there is a "status" bug or I'm
>>> looking in the wrong place... maybe there is no problem while searching
>>> for one?! But I see that at least the app has some issues on GET and
>>> PUT (more on PUT).. so I would like to know how fast the things are..
>>> but "status" isn't working.. aaaaargh...
>>> 
>>> Cheers
>>> Simon
>>> 
>>> 
>>> On Wed, 11 Dec 2013 14:32:07 -0500
>>> Matthew Von-Maszewski <[email protected]> wrote:
>>> 
>>>> An additional thought:  if increasing max_open_files does NOT help, try 
>>>> removing +S 4:4 from the vm.args.  Typically +S setting helps leveldb, but 
>>>> one other user mentioned that the new sorted 2i queries needed more CPU in 
>>>> the Erlang layer.
>>>> 
>>>> Summary:
>>>> - try increasing max_open_files to 170
>>>> - helps:  try setting sst_block_size to 32768 in app.config
>>>> - does not help:  try removing +S from vm.args
>>>> 
>>>> Matthew
>>>> 
>>>> On Dec 11, 2013, at 1:58 PM, Simon Effenberg <[email protected]> 
>>>> wrote:
>>>> 
>>>>> Hi Matthew,
>>>>> 
>>>>> On Wed, 11 Dec 2013 18:38:49 +0100
>>>>> Matthew Von-Maszewski <[email protected]> wrote:
>>>>> 
>>>>>> Simon,
>>>>>> 
>>>>>> I have plugged your various values into the attached spreadsheet.  I 
>>>>>> assumed a vnode count to allow for one of your twelve servers to die 
>>>>>> (256 ring size / 11 servers).
>>>>> 
>>>>> Great, thanks!
>>>>> 
>>>>>> 
>>>>>> The spreadsheet suggests you can safely raise your max_open_files from 
>>>>>> 100 to 170.  I would suggest doing this for the next server you upgrade. 
>>>>>>  If part of your problem is file cache thrashing, you should see an 
>>>>>> improvement.
>>>>> 
>>>>> I will try this out.. starting the next server in 3-4 hours.
>>>>> 
>>>>>> 
>>>>>> Only if max_open_files helps, you should then consider adding 
>>>>>> {sst_block_size, 32767} to the eleveldb portion of app.config.  This 
>>>>>> setting, given your value sizes, would likely half the size of the 
>>>>>> metadata held in the file cache.  It only impacts the files newly 
>>>>>> compacted in the upgrade, and would gradually increase space in the file 
>>>>>> cache while slowing down the file cache thrashing.
>>>>> 
>>>>> So I'll do this at the over-next server if the next server is fine.
>>>>> 
>>>>>> 
>>>>>> What build/packaging of Riak do you use, or do you build from source?
>>>>> 
>>>>> Using the debian packages from the basho site..
>>>>> 
>>>>> I'm really wondering why the "put" performance is that bad.
>>>>> Here are the changes which were introduced/changed only on the new
>>>>> upgraded servers:
>>>>> 
>>>>> 
>>>>> +        fsm_limit                 => 50000,
>>>>> --- our '+P' is set to 262144 so more than 3x fsm_limit which was
>>>>> --- stated somewhere
>>>>> +        # after finishing the upgrade this should be switched to v1 !!!
>>>>> +        object_format             => '__atom_v0',
>>>>> 
>>>>> -      '-env ERL_MAX_ETS_TABLES' => 8192,
>>>>> +      '-env ERL_MAX_ETS_TABLES'  => 256000, # old package used 8192
>>>>> but 1.4.2 raised it to this high number
>>>>> +      '-env ERL_MAX_PORTS'       => 64000,
>>>>> +      # Treat error_logger warnings as warnings
>>>>> +      '+W'                       => 'w',
>>>>> +      # Tweak GC to run more often
>>>>> +      '-env ERL_FULLSWEEP_AFTER' => 0,
>>>>> +      # Force the erlang VM to use SMP
>>>>> +      '-smp'                     => 'enable',
>>>>> +      #################################
>>>>> 
>>>>> Cheers
>>>>> Simon
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Matthew
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Dec 11, 2013, at 9:48 AM, Simon Effenberg <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Matthew,
>>>>>>> 
>>>>>>> thanks for all your time and work.. see inline for answers..
>>>>>>> 
>>>>>>> On Wed, 11 Dec 2013 09:17:32 -0500
>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
>>>>>>> 
>>>>>>>> The real Riak developers have arrived on-line for the day.  They are 
>>>>>>>> telling me that all of your problems are likely due to the extended 
>>>>>>>> upgrade times, and yes there is a known issue with handoff between 1.3 
>>>>>>>> and 1.4.  They also say everything should calm down after all nodes 
>>>>>>>> are upgraded.
>>>>>>>> 
>>>>>>>> I will review your system settings now and see if there is something 
>>>>>>>> that might make the other machines upgrade quicker.  So three more 
>>>>>>>> questions:
>>>>>>>> 
>>>>>>>> - what is the average size of your keys
>>>>>>> 
>>>>>>> bucket names are between 5 and 15 characters (only ~ 10 buckets)..
>>>>>>> key names are normally something like 26iesj:hovh7egz
>>>>>>> 
>>>>>>>> 
>>>>>>>> - what is the average size of your value (data stored)
>>>>>>> 
>>>>>>> I have to guess.. but mean is (from Riak) 12kb but 95th percentile is
>>>>>>> at 75kb and in theory we have a limit of 1MB (then it will be split up)
>>>>>>> but sometimes thanks to sibblings (we have to buckets with allow_mult)
>>>>>>> we have also some 7MB in MAX but this will be reduced again (it's a new
>>>>>>> feature in our app which has to many parallel wrights below of 15ms).
>>>>>>> 
>>>>>>>> 
>>>>>>>> - in regular use, are your keys accessed randomly across their entire 
>>>>>>>> range, or do they contain a date component which clusters older, less 
>>>>>>>> used keys
>>>>>>> 
>>>>>>> normally we don't search but retrieve keys by key name.. and we have
>>>>>>> data which is up to 6 months old and normally we access mostly
>>>>>>> new/active/hot data and not all the old ones.. besides this we have a
>>>>>>> job doing a 2i query every 5mins and another one doing this maybe once
>>>>>>> an hour.. both don't work while the upgrade is ongoing (2i isn't
>>>>>>> working).
>>>>>>> 
>>>>>>> Cheers
>>>>>>> Simon
>>>>>>> 
>>>>>>>> 
>>>>>>>> Matthew
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Dec 11, 2013, at 8:43 AM, Simon Effenberg 
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Oh and at the moment they are waiting for some handoffs and I see
>>>>>>>>> errors in logfiles:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013-12-11 13:41:47.948 UTC [error]
>>>>>>>>> <0.7157.24>@riak_core_handoff_sender:start_fold:269 hinted_handoff
>>>>>>>>> transfer of riak_kv_vnode from '[email protected]'
>>>>>>>>> 468137243207554840987117797979434404733540892672
>>>>>>>>> 
>>>>>>>>> but I remember that somebody else had this as well and if I recall
>>>>>>>>> correctly it disappeared after the full upgrade was done.. but at the
>>>>>>>>> moment it's hard to think about upgrading everything at once..
>>>>>>>>> (~12hours 100% disk utilization on all 12 nodes will lead to real slow
>>>>>>>>> puts/gets)
>>>>>>>>> 
>>>>>>>>> What can I do?
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> Simon
>>>>>>>>> 
>>>>>>>>> PS: transfers output:
>>>>>>>>> 
>>>>>>>>> '[email protected]' waiting to handoff 17 partitions
>>>>>>>>> '[email protected]' waiting to handoff 19 partitions
>>>>>>>>> 
>>>>>>>>> (these are the 1.4.2 nodes)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, 11 Dec 2013 14:39:58 +0100
>>>>>>>>> Simon Effenberg <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>>> Also some side notes:
>>>>>>>>>> 
>>>>>>>>>> "top" is even better on new 1.4.2 than on 1.3.1 machines.. IO
>>>>>>>>>> utilization of disk is mostly the same (round about 33%)..
>>>>>>>>>> 
>>>>>>>>>> but
>>>>>>>>>> 
>>>>>>>>>> 95th percentile of response time for get (avg over all nodes):
>>>>>>>>>> before upgrade: 29ms
>>>>>>>>>> after upgrade: almost the same
>>>>>>>>>> 
>>>>>>>>>> 95th percentile of response time for put (avg over all nodes):
>>>>>>>>>> before upgrade: 60ms
>>>>>>>>>> after upgrade: 1548ms
>>>>>>>>>> but this is only because of 2 of 12 nodes are
>>>>>>>>>> on 1.4.2 and are really slow (17000ms)
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Simon
>>>>>>>>>> 
>>>>>>>>>> On Wed, 11 Dec 2013 13:45:56 +0100
>>>>>>>>>> Simon Effenberg <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Sorry I forgot the half of it..
>>>>>>>>>>> 
>>>>>>>>>>> seffenberg@kriak46-1:~$ free -m
>>>>>>>>>>>         total       used       free     shared    buffers cached
>>>>>>>>>>> Mem:         23999      23759        239          0        184      
>>>>>>>>>>> 16183
>>>>>>>>>>> -/+ buffers/cache:       7391      16607
>>>>>>>>>>> Swap:            0          0          0
>>>>>>>>>>> 
>>>>>>>>>>> We have 12 servers..
>>>>>>>>>>> datadir on the compacted servers (1.4.2) ~ 765 GB
>>>>>>>>>>> 
>>>>>>>>>>> AAE is enabled.
>>>>>>>>>>> 
>>>>>>>>>>> I attached app.config and vm.args.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers
>>>>>>>>>>> Simon
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, 11 Dec 2013 07:33:31 -0500
>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Ok, I am now suspecting that your servers are either using swap 
>>>>>>>>>>>> space (which is slow) or your leveldb file cache is thrashing 
>>>>>>>>>>>> (opening and closing multiple files per request).
>>>>>>>>>>>> 
>>>>>>>>>>>> How many servers do you have and do you use Riak's active 
>>>>>>>>>>>> anti-entropy feature?  I am going to plug all of this into a 
>>>>>>>>>>>> spreadsheet.
>>>>>>>>>>>> 
>>>>>>>>>>>> Matthew Von-Maszewski
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Dec 11, 2013, at 7:09, Simon Effenberg 
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Matthew
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Memory: 23999 MB
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ring_creation_size, 256
>>>>>>>>>>>>> max_open_files, 100
>>>>>>>>>>>>> 
>>>>>>>>>>>>> riak-admin status:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> memory_total : 276001360
>>>>>>>>>>>>> memory_processes : 191506322
>>>>>>>>>>>>> memory_processes_used : 191439568
>>>>>>>>>>>>> memory_system : 84495038
>>>>>>>>>>>>> memory_atom : 686993
>>>>>>>>>>>>> memory_atom_used : 686560
>>>>>>>>>>>>> memory_binary : 21965352
>>>>>>>>>>>>> memory_code : 11332732
>>>>>>>>>>>>> memory_ets : 10823528
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for looking!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>> Simon
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, 11 Dec 2013 06:44:42 -0500
>>>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I need to ask other developers as they arrive for the new day.  
>>>>>>>>>>>>>> Does not make sense to me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> How many nodes do you have?  How much RAM do you have in each 
>>>>>>>>>>>>>> node?  What are your settings for max_open_files and cache_size 
>>>>>>>>>>>>>> in the app.config file?  Maybe this is as simple as leveldb 
>>>>>>>>>>>>>> using too much RAM in 1.4.  The memory accounting for 
>>>>>>>>>>>>>> maz_open_files changed in 1.4.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Matthew Von-Maszewski
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Dec 11, 2013, at 6:28, Simon Effenberg 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Matthew,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> it took around 11hours for the first node to finish the 
>>>>>>>>>>>>>>> compaction. The
>>>>>>>>>>>>>>> second node is running already 12 hours and is still doing 
>>>>>>>>>>>>>>> compaction.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Besides that I wonder because the fsm_put time on the new 1.4.2 
>>>>>>>>>>>>>>> host is
>>>>>>>>>>>>>>> much higher (after the compaction) than on an old 1.3.1 (both 
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> running in the cluster right now and another one is doing the
>>>>>>>>>>>>>>> compaction/upgrade while it is in the cluster but not directly
>>>>>>>>>>>>>>> accessible because it is out of the Loadbalancer):
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1.4.2:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> node_put_fsm_time_mean : 2208050
>>>>>>>>>>>>>>> node_put_fsm_time_median : 39231
>>>>>>>>>>>>>>> node_put_fsm_time_95 : 17400382
>>>>>>>>>>>>>>> node_put_fsm_time_99 : 50965752
>>>>>>>>>>>>>>> node_put_fsm_time_100 : 59537762
>>>>>>>>>>>>>>> node_put_fsm_active : 5
>>>>>>>>>>>>>>> node_put_fsm_active_60s : 364
>>>>>>>>>>>>>>> node_put_fsm_in_rate : 5
>>>>>>>>>>>>>>> node_put_fsm_out_rate : 3
>>>>>>>>>>>>>>> node_put_fsm_rejected : 0
>>>>>>>>>>>>>>> node_put_fsm_rejected_60s : 0
>>>>>>>>>>>>>>> node_put_fsm_rejected_total : 0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1.3.1:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> node_put_fsm_time_mean : 5036
>>>>>>>>>>>>>>> node_put_fsm_time_median : 1614
>>>>>>>>>>>>>>> node_put_fsm_time_95 : 8789
>>>>>>>>>>>>>>> node_put_fsm_time_99 : 38258
>>>>>>>>>>>>>>> node_put_fsm_time_100 : 384372
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> any clue why this could/should be?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>> Simon
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, 10 Dec 2013 17:21:07 +0100
>>>>>>>>>>>>>>> Simon Effenberg <[email protected]> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Matthew,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> thanks!.. that answers my questions!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>> Simon
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, 10 Dec 2013 11:08:32 -0500
>>>>>>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2i is not my expertise, so I had to discuss you concerns with 
>>>>>>>>>>>>>>>>> another Basho developer.  He says:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Between 1.3 and 1.4, the 2i query did change but not the 2i 
>>>>>>>>>>>>>>>>> on-disk format.  You must wait for all nodes to update if you 
>>>>>>>>>>>>>>>>> desire to use the new 2i query.  The 2i data will properly 
>>>>>>>>>>>>>>>>> write/update on both 1.3 and 1.4 machines during the 
>>>>>>>>>>>>>>>>> migration.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Does that answer your question?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> And yes, you might see available disk space increase during 
>>>>>>>>>>>>>>>>> the upgrade compactions if your dataset contains numerous 
>>>>>>>>>>>>>>>>> delete "tombstones".  The Riak 2.0 code includes a new 
>>>>>>>>>>>>>>>>> feature called "aggressive delete" for leveldb.  This feature 
>>>>>>>>>>>>>>>>> is more proactive in pushing delete tombstones through the 
>>>>>>>>>>>>>>>>> levels to free up disk space much more quickly (especially if 
>>>>>>>>>>>>>>>>> you perform block deletes every now and then).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Matthew
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Dec 10, 2013, at 10:44 AM, Simon Effenberg 
>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Matthew,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> see inline..
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, 10 Dec 2013 10:38:03 -0500
>>>>>>>>>>>>>>>>>> Matthew Von-Maszewski <[email protected]> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The sad truth is that you are not the first to see this 
>>>>>>>>>>>>>>>>>>> problem.  And yes, it has to do with your 950GB per node 
>>>>>>>>>>>>>>>>>>> dataset.  And no, nothing to do but sit through it at this 
>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> While I did extensive testing around upgrade times before 
>>>>>>>>>>>>>>>>>>> shipping 1.4, apparently there are data configurations I 
>>>>>>>>>>>>>>>>>>> did not anticipate.  You are likely seeing a cascade where 
>>>>>>>>>>>>>>>>>>> a shift of one file from level-1 to level-2 is causing a 
>>>>>>>>>>>>>>>>>>> shift of another file from level-2 to level-3, which causes 
>>>>>>>>>>>>>>>>>>> a level-3 file to shift to level-4, etc … then the next 
>>>>>>>>>>>>>>>>>>> file shifts from level-1.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The bright side of this pain is that you will end up with 
>>>>>>>>>>>>>>>>>>> better write throughput once all the compaction ends.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I have to deal with that.. but my problem is now, if I'm 
>>>>>>>>>>>>>>>>>> doing this
>>>>>>>>>>>>>>>>>> node by node it looks like 2i searches aren't possible while 
>>>>>>>>>>>>>>>>>> 1.3 and
>>>>>>>>>>>>>>>>>> 1.4 nodes exists in the cluster. Is there any problem which 
>>>>>>>>>>>>>>>>>> leads me to
>>>>>>>>>>>>>>>>>> an 2i repair marathon or could I easily wait for some hours 
>>>>>>>>>>>>>>>>>> for each
>>>>>>>>>>>>>>>>>> node until all merges are done before I upgrade the next 
>>>>>>>>>>>>>>>>>> one? (2i
>>>>>>>>>>>>>>>>>> searches can fail for some time.. the APP isn't having 
>>>>>>>>>>>>>>>>>> problems with
>>>>>>>>>>>>>>>>>> that but are new inserts with 2i indices processed 
>>>>>>>>>>>>>>>>>> successfully or do
>>>>>>>>>>>>>>>>>> I have to do the 2i repair?)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> /s
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> one other good think: saving disk space is one advantage ;)..
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Riak 2.0's leveldb has code to prevent/reduce compaction 
>>>>>>>>>>>>>>>>>>> cascades, but that is not going to help you today.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Matthew
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Dec 10, 2013, at 10:26 AM, Simon Effenberg 
>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi @list,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 
>>>>>>>>>>>>>>>>>>>> .. after
>>>>>>>>>>>>>>>>>>>> upgrading the first node (out of 12) this node seems to do 
>>>>>>>>>>>>>>>>>>>> many merges.
>>>>>>>>>>>>>>>>>>>> the sst_* directories changes in size "rapidly" and the 
>>>>>>>>>>>>>>>>>>>> node is having
>>>>>>>>>>>>>>>>>>>> a disk utilization of 100% all the time.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I know that there is something like that:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> "The first execution of 1.4.0 leveldb using a 1.3.x or 
>>>>>>>>>>>>>>>>>>>> 1.2.x dataset
>>>>>>>>>>>>>>>>>>>> will initiate an automatic conversion that could pause the 
>>>>>>>>>>>>>>>>>>>> startup of
>>>>>>>>>>>>>>>>>>>> each node by 3 to 7 minutes. The leveldb data in "level 
>>>>>>>>>>>>>>>>>>>> #1" is being
>>>>>>>>>>>>>>>>>>>> adjusted such that "level #1" can operate as an overlapped 
>>>>>>>>>>>>>>>>>>>> data level
>>>>>>>>>>>>>>>>>>>> instead of as a sorted data level. The conversion is 
>>>>>>>>>>>>>>>>>>>> simply the
>>>>>>>>>>>>>>>>>>>> reduction of the number of files in "level #1" to being 
>>>>>>>>>>>>>>>>>>>> less than eight
>>>>>>>>>>>>>>>>>>>> via normal compaction of data from "level #1" into "level 
>>>>>>>>>>>>>>>>>>>> #2". This is
>>>>>>>>>>>>>>>>>>>> a one time conversion."
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> but it looks much more invasive than explained here or 
>>>>>>>>>>>>>>>>>>>> doesn't have to
>>>>>>>>>>>>>>>>>>>> do anything with the (probably seen) merges.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Is this "normal" behavior or could I do anything about it?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> At the moment I'm stucked with the upgrade procedure 
>>>>>>>>>>>>>>>>>>>> because this high
>>>>>>>>>>>>>>>>>>>> IO load would probably lead to high response times.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Also we have a lot of data (per node ~950 GB).
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>> Simon
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> riak-users mailing list
>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international 
>>>>>>>>>>>>>>>>>> GmbH
>>>>>>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Mail:     [email protected]
>>>>>>>>>>>>>>>>>> Web:    www.mobile.de
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Mail:     [email protected]
>>>>>>>>>>>>>>>> Web:    www.mobile.de
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> riak-users mailing list
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Mail:     [email protected]
>>>>>>>>>>>>>>> Web:    www.mobile.de
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Mail:     [email protected]
>>>>>>>>>>>>> Web:    www.mobile.de
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>>>> 
>>>>>>>>>>> Mail:     [email protected]
>>>>>>>>>>> Web:    www.mobile.de
>>>>>>>>>>> 
>>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>>> 
>>>>>>>>>> Mail:     [email protected]
>>>>>>>>>> Web:    www.mobile.de
>>>>>>>>>> 
>>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>>>> 
>>>>>>>>> Mail:     [email protected]
>>>>>>>>> Web:    www.mobile.de
>>>>>>>>> 
>>>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>>> 
>>>>>>> Mail:     [email protected]
>>>>>>> Web:    www.mobile.de
>>>>>>> 
>>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>>> 
>>>>>>> 
>>>>>>> Geschäftsführer: Malte Krüger
>>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>>> Sitz der Gesellschaft: Kleinmachnow
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>> 
>>>>> Mail:     [email protected]
>>>>> Web:    www.mobile.de
>>>>> 
>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>> 
>>>>> 
>>>>> Geschäftsführer: Malte Krüger
>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>> Sitz der Gesellschaft: Kleinmachnow 
>>>> 
>>> 
>>> 
>>> -- 
>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>> Fon:     + 49-(0)30-8109 - 7173
>>> Fax:     + 49-(0)30-8109 - 7131
>>> 
>>> Mail:     [email protected]
>>> Web:    www.mobile.de
>>> 
>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>> 
>>> 
>>> Geschäftsführer: Malte Krüger
>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>> Sitz der Gesellschaft: Kleinmachnow 
>> 
> 
> 
> -- 
> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> Fon:     + 49-(0)30-8109 - 7173
> Fax:     + 49-(0)30-8109 - 7131
> 
> Mail:     [email protected]
> Web:    www.mobile.de
> 
> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> 
> 
> Geschäftsführer: Malte Krüger
> HRB Nr.: 18517 P, Amtsgericht Potsdam
> Sitz der Gesellschaft: Kleinmachnow 


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Reply via email to