RE: HBase Design : Column name v/s Version

Vladimir Rodionov Fri, 24 Jan 2014 11:26:11 -0800

One downside of using synthetic versions is you won't be able to use TTL, which 
gives you automatic purge of stale data for free
Have you thought already how to purge old data?


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]

________________________________________
From: Sagar Naik [[email protected]]
Sent: Friday, January 24, 2014 10:46 AM
To: [email protected]; Dhaval Shah
Subject: Re: HBase Design : Column name v/s Version

Thanks for clarifying,

I will be using custom version numbers (auto incrementing on the client
side) and not timestamps.
Two clients do not update the same row


-Sagar

On 1/24/14 10:33 AM, "Dhaval Shah" <[email protected]> wrote:

>I am talking about schema 2. Schema 1 would definitely work. Schema 2 can
>have the version collisions if you decide to use timestamps as versions
>
>Regards,
>
>Dhaval
>
>
>----- Original Message -----
>From: Sagar Naik <[email protected]>
>To: "[email protected]" <[email protected]>; Dhaval Shah
><[email protected]>
>Cc:
>Sent: Friday, 24 January 2014 1:07 PM
>Subject: Re: HBase Design : Column name v/s Version
>
>I am not sure I understand you correctly.
>I assume you are talking abt schema 1.
>In this case I m appending the version number to the column name.
>
>The column_names are different (data_1/data_2) for value_1 and value_2
>respectively.
>
>
>-Sagar
>
>
>On 1/24/14 9:47 AM, "Dhaval Shah" <[email protected]> wrote:
>
>>Versions in HBase are timestamps by default. If you intend to continue
>>using the timestamps, what will happen when someone writes value_1 and
>>value_2 at the exact same time?
>>
>>Regards,
>>
>>Dhaval
>>
>>
>>----- Original Message -----
>>From: Sagar Naik <[email protected]>
>>To: "[email protected]" <[email protected]>
>>Cc:
>>Sent: Friday, 24 January 2014 12:27 PM
>>Subject: HBase Design : Column name v/s Version
>>
>>Hi,
>>
>>I have a choice to maintain to data either in column values or as
>>versioned data.
>>This data is not a versioned copy per se.
>>
>>The access pattern on this get all the data every time
>>
>>So the schema choices are :
>>Schema 1:
>>1. column_name/qualifier => data_1. column_value => value_1
>>1.a. column_name/qualifier => data_2. column_value => value_2,value_2.a
>>
>>1.b. column_name/qualifier => data_3. column_value => value_3
>>
>>To get all the values for "data", I will have to use ColumnPrefixFilter
>>with prefix set "data"
>>
>>Schema 2:
>>2. column_name/qualifier => data. version=> 1, column_value => value_1
>>
>>2.a. column_name/qualifier => data. version=> 2, column_value =>
>>value_2,value_2.a
>>
>>2.b. column_name/qualifier => data. version=> 3, column_value => value_3
>>To get all the values for "data" , I will do a simple get operation to
>>get
>>all the versions.
>>
>>Number of versions can go from: 10 to 100K
>>
>>Get operation perf should beat the Filter perf.
>>Comparing 100K values will be costly as the # versions increase.
>>
>>I would like to know if there are drawbacks in going the version route.
>>
>>
>>
>>
>>-Sagar
>>
>


Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or [email protected] and delete or destroy any 
copy of this message and its attachments.

RE: HBase Design : Column name v/s Version

Reply via email to