Thanks for clarifying this, I know now why my code didn't work as expected.
For now I think that creating a simple custom Filter for my situation is the most efficient workaround. Niels Basjes On Sat, Dec 7, 2013 at 3:26 AM, lars hofhansl <[email protected]> wrote: > Filed https://issues.apache.org/jira/browse/HBASE-10102 > > > > ________________________________ > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]>; hbase-dev < > [email protected]> > Sent: Friday, December 6, 2013 5:31 PM > Subject: Re: HBase returns old values even with max versions = 1 > > > + dev list > > Specifically: > > Currently the workflow in ScanQueryMatcher is something like this: > > 1. <versions> = min(<CF versions>, <scan version>) > 2. filter by timerange > 3. filter out columns (i.e. columns not specified in the scan) > 4. apply customer filters > 5. filter by <versions> > > Every KV is passed through this filtering process. > > What we should do is this: > > 1. filter by <CF versions> > 2. filter by timerange > 3. filter out columns (i.e. columns not specified in the scan) > 4. apply customer filters > 5. filter by <scan versions> > > The trick will be doing that efficiently. > > -- Lars > > > > ________________________________ > > From: lars hofhansl <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Friday, December 6, 2013 5:10 PM > Subject: Re: HBase returns old values even with max versions = 1 > > > The old versions can still be around until a flush and/or compaction. > > During a user-level scan, HBase first filters by timerange and then counts > the versions. > I agree, this is counter intuitive in this case. In other cases people > want to first limit by timerange, and then get x numbers of versions back. > We might need to start to distinguish between the number of version > configured for the column family and the number of versions configured for > the scan. > > Mind filing a jira? Can discuss solutions there. > > Thanks. > > -- Lars > > > > ________________________________ > > From: Niels Basjes <[email protected]> > To: user <[email protected]> > Sent: Friday, December 6, 2013 8:05 AM > Subject: HBase returns old values even with max versions = 1 > > > Hi, > > I have the desire to find the columns that have not been updated for more > than a specific time period. > > So I want to do a scan against the columns with a timerange. > The normal behavior of HBase is that you then get the latest value in that > time range (which is not what I want). > > As far as I understand the way HBase should work is that if you set the > maximum number of versions for the values in a column family to '1' it > should retain only the last value that was put into the cell. > > What I found is different. > > If I do the following commands into the hbase shell > > create 't1', {NAME => 'c1', VERSIONS => 1} > put 't1', 'r1', 'c1', 'One', 1000 > put 't1', 'r1', 'c1', 'Two', 2000 > put 't1', 'r1', 'c1', 'Three', 3000 > get 't1', 'r1' > get 't1', 'r1' , {TIMERANGE => [0,1500]} > > the result is this: > > get 't1', 'r1' > COLUMN CELL > c1: timestamp=3000, value=Three > 1 row(s) in 0.0780 seconds > > get 't1', 'r1' , {TIMERANGE => [0,1500]} > COLUMN CELL > c1: timestamp=1000, value=One > 1 row(s) in 0.1390 seconds > > Why does the second query return a value even though I've set the max > versions to only 1? > I expect that it only 'knows' about the latest value ('Three') and thus > should return an empty result in the above example. > What is the correct way to obtain what I'm looking for? > > My current workaround is that I simply retrieve the latest value for all my > columns and filter them in my application code. > > The HBase version I currently have installed here is HBase 0.94.6-cdh4.4.0 > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes > -- Best regards / Met vriendelijke groeten, Niels Basjes
