I am using HBase with S3 as data storage on EMR. I found that when the EMR 
cluster is crashed the data in the memstore and in WAL is not recovered. EMR 
stores data in EBS, and when cluster is crashed EBS is gone. Therefore we 
decided to put the WAL on S3. But writes to S3 are done asynchronously in large 
chunks therefore if the cluster crashes when the WAL is in cache it is not 
recovered. We want to use Hbase as durable storage. We want to know how should 
we resolve this.   
   - How do we keep track of last updated record in Hbase.
   - How do we know that the update of data from Memstore is not successful.
   - How should we configure and tune HBase to achieve durability.

Reply via email to