Re: HBase Key Design : Doubt

Jean-Marc Spaggiari Thu, 11 Oct 2012 06:31:53 -0700

No, you're right.

But if you just want to keep "500" as the value, you just have to set
the number of version to 1 for your table...


If you just want to keep 100, then you can insert with a revert
timestamp, so the last cell inserted will be hidden by the previous
one.

JM

2012/10/11, Narayanan K <[email protected]>:
> Hi,
>
> I have 2 column families A and B in table T1.
>
> put 'T1', 'R1', 'A:qualf1',100
> put 'T1', R1', 'B:qualf2', 200
>
> As per my understanding the above is one row and one single version each
> for the 2 column families.
>
> If I do a put 'T1', 'R1', 'A:qualf1', 500, then there is another version
> for the rowkey pertaining to the combination {R1, A, qualf1}
>
> Please correct me if I am wrong.
>
> Regards,
> Narayanan
>
> On Thu, Oct 11, 2012 at 1:02 AM, Doug Meil
> <[email protected]>wrote:
>
>>
>> Correct.
>>
>> If you do 2 Puts for row key A-B-C-D on different days, the second Put
>> logically replaces the first and the earlier Put becomes a previous
>> version.  Unless you specifically want older versions, you won't get them
>> in either Gets or Scans.
>>
>> Definitely want to read thisŠ
>>
>> http://hbase.apache.org/book.html#datamodel
>>
>> See this for more information about they internal KeyValue structure.
>>
>> http://hbase.apache.org/book.html#regions.arch
>> 9.7.5.4. KeyValue
>>
>>
>> Older versions are kept around as long as the table descriptor says so
>> (e.g., max versions).  See the StoreFile and Compactions entries in the
>> RefGuide for more information on the internals.
>>
>>
>>
>>
>> On 10/10/12 3:24 PM, "Jerry Lam" <[email protected]> wrote:
>>
>> >correct me if I'm wrong. The version applies to the individual cell (ie.
>> >row key, column family and column qualifier) not (row key, column
>> > family).
>> >
>> >
>> >On Wed, Oct 10, 2012 at 3:13 PM, Narayanan K <[email protected]>
>> >wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have a usecase wherein I need to find the unique of some things in
>> >>HBase
>> >> across dates.
>> >>
>> >> Say, on 1st Oct, A-B-C-D appeared, hence I insert a row with rowkey :
>> >> A-B-C-D.
>> >> On 2nd Oct, I get the same value A-B-C-D and I don't want to
>> >> redundantly
>> >> store the row again with a new rowkey - A-B-C-D for 2nd Oct
>> >> i.e I will not want to have 20121001-A-B-C-D and 20121002-A-B-C-D as 2
>> >> rowkeys in the table.
>> >>
>> >> Eg: If I have 1st Oct , 2nd Oct as 2 column families and if number of
>> >> versions are set to 1, only 1 row will be present in for both the
>> >> dates
>> >> having rowkey A-B-C-D.
>> >> Hence if I need to find unique number of times A-B-C-D appeared during
>> >>Oct
>> >> 1 and Oct 2, I just need to take rowcount of the row A-B-C-D by
>> >>filtering
>> >> over the 2 column families.
>> >> Similarly, if we have 10  date column families, and I need to scan
>> >> only
>> >>for
>> >> 2 dates, then it scans only those store files having the specified
>> >>column
>> >> families. This will make scanning faster.
>> >>
>> >> But here the design problem is that I cant add more column families to
>> >>the
>> >> table each day.
>> >>
>> >> I would need to store data every day and I read that HBase doesnt work
>> >>well
>> >> with more than 3 column families.
>> >>
>> >> The other option is to have one single column family and store dates
>> >> as
>> >> qualifiers : date:d1, date:d2.... But here if there are 30 date
>> >>qualifiers
>> >> under date column family, to scan a single date qualifier or may be
>> >>range
>> >> of 2-3 dates will have to scan through the entire data of all d1 to
>> >> d30
>> >> qualifiers in the date column family which would be slower compared to
>> >> having separate column families for the each date..
>> >>
>> >> Please share your thoughts on this. Also any alternate design
>> >>suggestions
>> >> you might have.
>> >>
>> >> Regards,
>> >> Narayanan
>> >>
>>
>>
>>
>

Re: HBase Key Design : Doubt

Reply via email to