Hi,

I have the desire to find the columns that have not been updated for more
than a specific time period.

So I want to do a scan against the columns with a timerange.
The normal behavior of HBase is that you then get the latest value in that
time range (which is not what I want).

As far as I understand the way HBase should work is that if you set the
maximum number of versions for the values in a column family to '1' it
should retain only the last value that was put into the cell.

What I found is different.

If I do the following commands into the hbase shell

    create 't1', {NAME => 'c1', VERSIONS => 1}
    put 't1', 'r1', 'c1', 'One', 1000
    put 't1', 'r1', 'c1', 'Two', 2000
    put 't1', 'r1', 'c1', 'Three', 3000
    get 't1', 'r1'
    get 't1', 'r1' , {TIMERANGE => [0,1500]}

the result is this:

    get 't1', 'r1'
    COLUMN                     CELL
     c1:                       timestamp=3000, value=Three
    1 row(s) in 0.0780 seconds

    get 't1', 'r1' , {TIMERANGE => [0,1500]}
    COLUMN                     CELL
     c1:                       timestamp=1000, value=One
    1 row(s) in 0.1390 seconds

Why does the second query return a value even though I've set the max
versions to only 1?
I expect that it only 'knows' about the latest value ('Three') and thus
should return an empty result in the above example.
What is the correct way to obtain what I'm looking for?

My current workaround is that I simply retrieve the latest value for all my
columns and filter them in my application code.

The HBase version I currently have installed here is HBase 0.94.6-cdh4.4.0

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to