Re: Finding the latest updated rows

Michael Segel Tue, 21 Jan 2014 04:15:27 -0800

Using the timestamp to find the last updated row is going to cause problems...

1) It will have to be the first portion of your composite row-key, otherwise 
you still end up performing a full table scan.
2) Hotspotting will occur

3) Does your row key change if you insert columns to an existing row? 

A better cheat would be to create a metadata table that had a row for each 
table and when you inserted in to the base table, you updated the audit table. 
It would be very small table and because the coprocessor model is flawed... you 
could run in to issues of deadlocking when you attempt to maintain this table. 

Or you may want to consider using zookeeper and then flush it to a table or 
something. 

On Jan 21, 2014, at 1:55 AM, Joshi, Rekha <[email protected]> wrote:

> Hi Wiliam,
> 
> The timestamp part of rowkey schema design caters to this., usually
> efficient but your SLA may differ.
> 
> http://hbase.apache.org/book.html#reverse.timestamp
> 
> http://hbase.apache.org/book.html#schema.casestudies
> 
> http://hbase.apache.org/book.html#timeseries
> 
> 
> Thanks
> Rekha
> 
> On 21/01/14 9:36 AM, "William Kang" <[email protected]> wrote:
> 
>> Hi,
>> In HBase, the time stamp is set for each column, not for the entire row.
>> If
>> somehow I want to find the latest updated (put new row, or update only
>> certain columns in some rows, etc) rows, is there an efficient way to do
>> it?
>> 
>> Many thanks.
>> 
>> 
>> William
> 
> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Finding the latest updated rows

Reply via email to