What's the max versions for table 't1' ? When you issue 'describe' command, you would see something similar to the following:
VERSIONS => '1' On Mon, Dec 9, 2013 at 4:47 PM, Niels Basjes <[email protected]> wrote: > Hi, > > When I first started learning about HBase I compared the logic of setting > new values to something that is similar to the way a tool like Subversion > works: When you set a new value you don't overwrite the old one, you simply > create a new version. > Just like subversion you can then at a later moment retrieve the old value > that way the situation at an earlier date. > > (The only real variation to the SVN model is that HBase only retains the > last N versions of a cell.) > > There is however one situation where this comparison really fails: When you > do a delete on a cell. > If you want to retrieve the state of a thing from subversion and in the > current version this thing has been deleted then you can still get it back. > With HBase however if you delete a cell you place a tombstone at a specific > time and as such internally the older values are still present. > > But when you try to retrieve such an older value then you still get an > empty result back (i.e. no such cell). > The direct consequence of the currently implemented model is that an > application can never retrieve the correct state of a row at an older > timestamp if a delete on any cell has occurred. > > Example: > > I create a table with one row: > > > create 't1', 'cf' > > put 't1', 'rowid', 'cf:1', 'One', 1000 > > put 't1', 'rowid', 'cf:2', 'Two', 2000 > > put 't1', 'rowid', 'cf:3', 'Three', 3000 > > get 't1', 'rowid' , {TIMERANGE => [0,3500]} > > COLUMN CELL > cf:1 timestamp=1000, value=One > cf:2 timestamp=2000, value=Two > cf:3 timestamp=3000, value=Three > 3 row(s) in 0.0150 seconds > > Then the delete of a cell at a later timestamp: > > > delete 't1', 'rowid', 'cf:1', 4000 > > Now if I retrieve the row at time 3500 I would find it logical that I would > still see the same values as I would above. > This is however the reality: > > > get 't1', 'rowid' , {TIMERANGE => [0,3500]} > > COLUMN CELL > cf:2 timestamp=2000, value=Two > cf:3 timestamp=3000, value=Three > 2 row(s) in 0.0120 seconds > > > Why has it been designed/implemented like this? > What is the logic behind this model? > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
