Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Simon Effenberg Tue, 10 Dec 2013 08:23:28 -0800

Hi Matthew,

thanks!.. that answers my questions!


Cheers
Simon

On Tue, 10 Dec 2013 11:08:32 -0500
Matthew Von-Maszewski <[email protected]> wrote:

> 2i is not my expertise, so I had to discuss you concerns with another Basho 
> developer.  He says:
> 
> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format.  
> You must wait for all nodes to update if you desire to use the new 2i query.  
> The 2i data will properly write/update on both 1.3 and 1.4 machines during 
> the migration.
> 
> Does that answer your question?
> 
> 
> And yes, you might see available disk space increase during the upgrade 
> compactions if your dataset contains numerous delete "tombstones".  The Riak 
> 2.0 code includes a new feature called "aggressive delete" for leveldb.  This 
> feature is more proactive in pushing delete tombstones through the levels to 
> free up disk space much more quickly (especially if you perform block deletes 
> every now and then).
> 
> Matthew
> 
> 
> On Dec 10, 2013, at 10:44 AM, Simon Effenberg <[email protected]> 
> wrote:
> 
> > Hi Matthew,
> > 
> > see inline..
> > 
> > On Tue, 10 Dec 2013 10:38:03 -0500
> > Matthew Von-Maszewski <[email protected]> wrote:
> > 
> >> The sad truth is that you are not the first to see this problem.  And yes, 
> >> it has to do with your 950GB per node dataset.  And no, nothing to do but 
> >> sit through it at this time.
> >> 
> >> While I did extensive testing around upgrade times before shipping 1.4, 
> >> apparently there are data configurations I did not anticipate.  You are 
> >> likely seeing a cascade where a shift of one file from level-1 to level-2 
> >> is causing a shift of another file from level-2 to level-3, which causes a 
> >> level-3 file to shift to level-4, etc … then the next file shifts from 
> >> level-1.
> >> 
> >> The bright side of this pain is that you will end up with better write 
> >> throughput once all the compaction ends.
> > 
> > I have to deal with that.. but my problem is now, if I'm doing this
> > node by node it looks like 2i searches aren't possible while 1.3 and
> > 1.4 nodes exists in the cluster. Is there any problem which leads me to
> > an 2i repair marathon or could I easily wait for some hours for each
> > node until all merges are done before I upgrade the next one? (2i
> > searches can fail for some time.. the APP isn't having problems with
> > that but are new inserts with 2i indices processed successfully or do
> > I have to do the 2i repair?)
> > 
> > /s
> > 
> > one other good think: saving disk space is one advantage ;)..
> > 
> > 
> >> 
> >> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
> >> that is not going to help you today.
> >> 
> >> Matthew
> >> 
> >> On Dec 10, 2013, at 10:26 AM, Simon Effenberg <[email protected]> 
> >> wrote:
> >> 
> >>> Hi @list,
> >>> 
> >>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
> >>> upgrading the first node (out of 12) this node seems to do many merges.
> >>> the sst_* directories changes in size "rapidly" and the node is having
> >>> a disk utilization of 100% all the time.
> >>> 
> >>> I know that there is something like that:
> >>> 
> >>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
> >>> will initiate an automatic conversion that could pause the startup of
> >>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
> >>> adjusted such that "level #1" can operate as an overlapped data level
> >>> instead of as a sorted data level. The conversion is simply the
> >>> reduction of the number of files in "level #1" to being less than eight
> >>> via normal compaction of data from "level #1" into "level #2". This is
> >>> a one time conversion."
> >>> 
> >>> but it looks much more invasive than explained here or doesn't have to
> >>> do anything with the (probably seen) merges.
> >>> 
> >>> Is this "normal" behavior or could I do anything about it?
> >>> 
> >>> At the moment I'm stucked with the upgrade procedure because this high
> >>> IO load would probably lead to high response times.
> >>> 
> >>> Also we have a lot of data (per node ~950 GB).
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> _______________________________________________
> >>> riak-users mailing list
> >>> [email protected]
> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> 
> > 
> > 
> > -- 
> > Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> > Fon:     + 49-(0)30-8109 - 7173
> > Fax:     + 49-(0)30-8109 - 7131
> > 
> > Mail:     [email protected]
> > Web:    www.mobile.de
> > 
> > Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> > 
> > 
> > Geschäftsführer: Malte Krüger
> > HRB Nr.: 18517 P, Amtsgericht Potsdam
> > Sitz der Gesellschaft: Kleinmachnow 
> 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon:     + 49-(0)30-8109 - 7173
Fax:     + 49-(0)30-8109 - 7131

Mail:     [email protected]
Web:    www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Reply via email to