Before the second get command was executed, was there compaction on server side ?
You can find out by going to region server hosting row 'r1' and check server log. Cheers On Sat, Dec 7, 2013 at 12:05 AM, Niels Basjes <[email protected]> wrote: > Hi, > > I have the desire to find the columns that have not been updated for more > than a specific time period. > > So I want to do a scan against the columns with a timerange. > The normal behavior of HBase is that you then get the latest value in that > time range (which is not what I want). > > As far as I understand the way HBase should work is that if you set the > maximum number of versions for the values in a column family to '1' it > should retain only the last value that was put into the cell. > > What I found is different. > > If I do the following commands into the hbase shell > > create 't1', {NAME => 'c1', VERSIONS => 1} > put 't1', 'r1', 'c1', 'One', 1000 > put 't1', 'r1', 'c1', 'Two', 2000 > put 't1', 'r1', 'c1', 'Three', 3000 > get 't1', 'r1' > get 't1', 'r1' , {TIMERANGE => [0,1500]} > > the result is this: > > get 't1', 'r1' > COLUMN CELL > c1: timestamp=3000, value=Three > 1 row(s) in 0.0780 seconds > > get 't1', 'r1' , {TIMERANGE => [0,1500]} > COLUMN CELL > c1: timestamp=1000, value=One > 1 row(s) in 0.1390 seconds > > Why does the second query return a value even though I've set the max > versions to only 1? > I expect that it only 'knows' about the latest value ('Three') and thus > should return an empty result in the above example. > What is the correct way to obtain what I'm looking for? > > My current workaround is that I simply retrieve the latest value for all my > columns and filter them in my application code. > > The HBase version I currently have installed here is HBase 0.94.6-cdh4.4.0 > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
