One downside of using synthetic versions is you won't be able to use TTL, which gives you automatic purge of stale data for free Have you thought already how to purge old data?
Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [email protected] ________________________________________ From: Sagar Naik [[email protected]] Sent: Friday, January 24, 2014 10:46 AM To: [email protected]; Dhaval Shah Subject: Re: HBase Design : Column name v/s Version Thanks for clarifying, I will be using custom version numbers (auto incrementing on the client side) and not timestamps. Two clients do not update the same row -Sagar On 1/24/14 10:33 AM, "Dhaval Shah" <[email protected]> wrote: >I am talking about schema 2. Schema 1 would definitely work. Schema 2 can >have the version collisions if you decide to use timestamps as versions > >Regards, > >Dhaval > > >----- Original Message ----- >From: Sagar Naik <[email protected]> >To: "[email protected]" <[email protected]>; Dhaval Shah ><[email protected]> >Cc: >Sent: Friday, 24 January 2014 1:07 PM >Subject: Re: HBase Design : Column name v/s Version > >I am not sure I understand you correctly. >I assume you are talking abt schema 1. >In this case I m appending the version number to the column name. > >The column_names are different (data_1/data_2) for value_1 and value_2 >respectively. > > >-Sagar > > >On 1/24/14 9:47 AM, "Dhaval Shah" <[email protected]> wrote: > >>Versions in HBase are timestamps by default. If you intend to continue >>using the timestamps, what will happen when someone writes value_1 and >>value_2 at the exact same time? >> >>Regards, >> >>Dhaval >> >> >>----- Original Message ----- >>From: Sagar Naik <[email protected]> >>To: "[email protected]" <[email protected]> >>Cc: >>Sent: Friday, 24 January 2014 12:27 PM >>Subject: HBase Design : Column name v/s Version >> >>Hi, >> >>I have a choice to maintain to data either in column values or as >>versioned data. >>This data is not a versioned copy per se. >> >>The access pattern on this get all the data every time >> >>So the schema choices are : >>Schema 1: >>1. column_name/qualifier => data_1. column_value => value_1 >>1.a. column_name/qualifier => data_2. column_value => value_2,value_2.a >> >>1.b. column_name/qualifier => data_3. column_value => value_3 >> >>To get all the values for "data", I will have to use ColumnPrefixFilter >>with prefix set "data" >> >>Schema 2: >>2. column_name/qualifier => data. version=> 1, column_value => value_1 >> >>2.a. column_name/qualifier => data. version=> 2, column_value => >>value_2,value_2.a >> >>2.b. column_name/qualifier => data. version=> 3, column_value => value_3 >>To get all the values for "data" , I will do a simple get operation to >>get >>all the versions. >> >>Number of versions can go from: 10 to 100K >> >>Get operation perf should beat the Filter perf. >>Comparing 100K values will be costly as the # versions increase. >> >>I would like to know if there are drawbacks in going the version route. >> >> >> >> >>-Sagar >> > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [email protected] and delete or destroy any copy of this message and its attachments.
