I think you are confusing a few things, I'll try to clear this up inline. J-D
On Fri, Jun 10, 2011 at 8:27 PM, Sam Seigal <[email protected]> wrote: > Hi All, > > I had a question about a certain kind of query I would like to do in hbase. > > I am storing records in HBase that transition from an initial state "A" to > an end state "B" . > > Initially, the record I will store will look like the following -> > > t1 rowid:columnFamily:A <value> > > when I get a notification that the state has changed, I will write the > following value -> > > t2:rowid:columnFamily:B <value> > > I basically end up with two versions of the same row. This is not how it works in HBase. Here you end up with a row that has two columns since they have different qualifiers. > > Now, I want to query for all the records that have NOT transitioned to state > B yet. > > Is it possible to express a query in HBase where one can say "retrieve only > row Id values where there exists a column qualifier A but not B" ? This is called a secondary index, which HBase doesn't support out of the box. Google for that and you should see a bunch of discussions on the subject. > > How can I do this ? > > I tried doing the following through the hbase shell. I had the following > values stores: > > t1:rowid:cf:A > t2:rowid:cf:B > > > I did a query for "rowid" with VERSIONS => 1. However, this gives me both A > and B qualifier values. I am only interested in values that have not yet > transitioned to B. Yep, since A and B have one version each they both get returned. > > Is there a way to query HBase only for the highest timestamp regardless of > the value of the column qualifier ? In the above example, the highest > timestamp for "rowid" is t2 with column qualifier B, but I get t1 and t2 > both back. You would have to filter the qualifiers yourself, but if you write multiple times in the same qualifier then it does return the latest version.
