Joep Rottinghuis commented on YARN-4062:

While discussing flush and compaction with [~vrushalic] I just realized that 
there might be a complication with cross-dc replication.

Potentially the RS in two different datacenters might decide to flush/compact 
values for one row at the same time. We need to think through the consequences 
what happens if they make a different decision (because one DC might have later 
information that hasn't been replicated across such as app completion for 
example). Even if the order and the decisions are deterministic, we need to 
consider what happens if two regions modify the same row.
With hRaven we have been able to make master-master replication work because we 
were guaranteed that every row is "owned" and therefore manipulated only 

Perhaps we can do the same here, where flush and compactions happen only in the 
HBase cluster located in the datacenter where the row is owned. For example, 
only if the rowkey starts with the same datacenter as where the copro runs. 
This would ensure that each row is flushed/compacted only in one DC and the 
other DCs would be followers.

This would have to be configurable and disabled for installations with a single 
HBase instance that are written to remotely by multiple datacenters, otherwise 
no compaction will happen at all (at least perhaps functionally correct even if 
not optimal for space usage).

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> -------------------------------------------------------------------------------------------
>                 Key: YARN-4062
>                 URL: https://issues.apache.org/jira/browse/YARN-4062
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 

This message was sent by Atlassian JIRA

Reply via email to