Best? That's pretty subjective. How are you planning on accessing the data? Since you don't want to overwrite the data you can't really rely on the timestamps. (Or is the updated data a replacement?)
Depending on the data size and structure you could append to the same column family, column (record) You could create a new column and insert the data there. Not sure which would be best, it would depend on how you want to access the data. > Date: Mon, 1 Nov 2010 02:28:31 -0700 > Subject: Best strategy for row updates > From: [email protected] > To: [email protected] > > We are populating some HBase tables from daily data streams that are > stored in Hive. When we see a row key that's already in the table, > the data should be appended to that row's record. What is the best > way to achieve this?.. Should we be using the Java API?.. Rely on > HBase cell timestamping?.. Create compound keys (row_id+date) and > periodically run a separate MR job to coalesce all the data belonging > to the same row_id?.. > > Any pointers greatly appreciated! > > --Leo
