Hello,

We are looking into HBase replication to separate our clients'-facing HBase 
cluster and the one we need to run analytics against (likely heavy MR jobs + 
potentially big scans).

1. How long does it take for edits to be propagated to a slave cluster?

As far as I understand from HBase Replication page 
(http://hbase.apache.org/replication.html) there's a separate buffer held by 
each region server which accumulates data (edits which should be replicated 
from 
the edit log) before sending to Slave cluster's RSs. So basically data are sent 
to slave cluster when:
* buffer is full. Is there an option to configure its size (as a way to affect 
flushing frequency)?
* the end of edit log is reached by this "working thread". Does thread process 
the edit log periodically or is it watching for edit log to change and acts 
"immediately"? If the former, what is the default interval between runs? Can it 
be configured?

2. How reliable is replication?

It looks like when there are some networking issues and slave cluster can't be 
reached, this is handled gracefully by replication mechanism. It sounds like 
this should also cover slave cluster going down for some reason. Are there any 
possible scenarios when replication can be broken?

3. Replication of existing (and possibly big) cluster after the fact.

What are the options to replicate all existing data to a new (& empty) slave 
cluster if replication wasn't configured from the start and keep replicating 
from that point?  It seems that because edit logs on the master cluster get 
cleaned this might not be possible?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Reply via email to