Hi,
I have 2 column families A and B in table T1.
put 'T1', 'R1', 'A:qualf1',100
put 'T1', R1', 'B:qualf2', 200
As per my understanding the above is one row and one single version each
for the 2 column families.
If I do a put 'T1', 'R1', 'A:qualf1', 500, then there is another version
for the rowkey pertaining to the combination {R1, A, qualf1}
Please correct me if I am wrong.
Regards,
Narayanan
On Thu, Oct 11, 2012 at 1:02 AM, Doug Meil <[email protected]>wrote:
>
> Correct.
>
> If you do 2 Puts for row key A-B-C-D on different days, the second Put
> logically replaces the first and the earlier Put becomes a previous
> version. Unless you specifically want older versions, you won't get them
> in either Gets or Scans.
>
> Definitely want to read thisÅ
>
> http://hbase.apache.org/book.html#datamodel
>
> See this for more information about they internal KeyValue structure.
>
> http://hbase.apache.org/book.html#regions.arch
> 9.7.5.4. KeyValue
>
>
> Older versions are kept around as long as the table descriptor says so
> (e.g., max versions). See the StoreFile and Compactions entries in the
> RefGuide for more information on the internals.
>
>
>
>
> On 10/10/12 3:24 PM, "Jerry Lam" <[email protected]> wrote:
>
> >correct me if I'm wrong. The version applies to the individual cell (ie.
> >row key, column family and column qualifier) not (row key, column family).
> >
> >
> >On Wed, Oct 10, 2012 at 3:13 PM, Narayanan K <[email protected]>
> >wrote:
> >
> >> Hi all,
> >>
> >> I have a usecase wherein I need to find the unique of some things in
> >>HBase
> >> across dates.
> >>
> >> Say, on 1st Oct, A-B-C-D appeared, hence I insert a row with rowkey :
> >> A-B-C-D.
> >> On 2nd Oct, I get the same value A-B-C-D and I don't want to redundantly
> >> store the row again with a new rowkey - A-B-C-D for 2nd Oct
> >> i.e I will not want to have 20121001-A-B-C-D and 20121002-A-B-C-D as 2
> >> rowkeys in the table.
> >>
> >> Eg: If I have 1st Oct , 2nd Oct as 2 column families and if number of
> >> versions are set to 1, only 1 row will be present in for both the dates
> >> having rowkey A-B-C-D.
> >> Hence if I need to find unique number of times A-B-C-D appeared during
> >>Oct
> >> 1 and Oct 2, I just need to take rowcount of the row A-B-C-D by
> >>filtering
> >> over the 2 column families.
> >> Similarly, if we have 10 date column families, and I need to scan only
> >>for
> >> 2 dates, then it scans only those store files having the specified
> >>column
> >> families. This will make scanning faster.
> >>
> >> But here the design problem is that I cant add more column families to
> >>the
> >> table each day.
> >>
> >> I would need to store data every day and I read that HBase doesnt work
> >>well
> >> with more than 3 column families.
> >>
> >> The other option is to have one single column family and store dates as
> >> qualifiers : date:d1, date:d2.... But here if there are 30 date
> >>qualifiers
> >> under date column family, to scan a single date qualifier or may be
> >>range
> >> of 2-3 dates will have to scan through the entire data of all d1 to d30
> >> qualifiers in the date column family which would be slower compared to
> >> having separate column families for the each date..
> >>
> >> Please share your thoughts on this. Also any alternate design
> >>suggestions
> >> you might have.
> >>
> >> Regards,
> >> Narayanan
> >>
>
>
>