Best? That's pretty subjective.

How are you planning on accessing the data? 
Since you don't want to overwrite the data you can't really rely on the 
timestamps.
(Or is the updated data a replacement?)

Depending on the data size and structure you could append to the same column 
family, column (record) You could create a new column and insert the data there.

Not sure which would be best, it would depend on how you want to access the 
data.

> Date: Mon, 1 Nov 2010 02:28:31 -0700
> Subject: Best strategy for row updates
> From: [email protected]
> To: [email protected]
> 
> We are populating some HBase tables from daily data streams that are
> stored in Hive.  When we see a row key that's already in the table,
> the data should be appended to that row's record.  What is the best
> way to achieve this?..  Should we be using the Java API?..  Rely on
> HBase cell timestamping?..  Create compound keys (row_id+date) and
> periodically run a separate MR job to coalesce all the data belonging
> to the same row_id?..
> 
> Any pointers greatly appreciated!
> 
> --Leo
                                          

Reply via email to