Hello, We are looking into HBase replication to separate our clients'-facing HBase cluster and the one we need to run analytics against (likely heavy MR jobs + potentially big scans).
1. How long does it take for edits to be propagated to a slave cluster? As far as I understand from HBase Replication page (http://hbase.apache.org/replication.html) there's a separate buffer held by each region server which accumulates data (edits which should be replicated from the edit log) before sending to Slave cluster's RSs. So basically data are sent to slave cluster when: * buffer is full. Is there an option to configure its size (as a way to affect flushing frequency)? * the end of edit log is reached by this "working thread". Does thread process the edit log periodically or is it watching for edit log to change and acts "immediately"? If the former, what is the default interval between runs? Can it be configured? 2. How reliable is replication? It looks like when there are some networking issues and slave cluster can't be reached, this is handled gracefully by replication mechanism. It sounds like this should also cover slave cluster going down for some reason. Are there any possible scenarios when replication can be broken? 3. Replication of existing (and possibly big) cluster after the fact. What are the options to replicate all existing data to a new (& empty) slave cluster if replication wasn't configured from the start and keep replicating from that point? It seems that because edit logs on the master cluster get cleaned this might not be possible? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
