[
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790987#comment-14790987
]
Joep Rottinghuis commented on YARN-4062:
----------------------------------------
While discussing flush and compaction with [~vrushalic] I just realized that
there might be a complication with cross-dc replication.
Potentially the RS in two different datacenters might decide to flush/compact
values for one row at the same time. We need to think through the consequences
what happens if they make a different decision (because one DC might have later
information that hasn't been replicated across such as app completion for
example). Even if the order and the decisions are deterministic, we need to
consider what happens if two regions modify the same row.
With hRaven we have been able to make master-master replication work because we
were guaranteed that every row is "owned" and therefore manipulated only
locally.
Perhaps we can do the same here, where flush and compactions happen only in the
HBase cluster located in the datacenter where the row is owned. For example,
only if the rowkey starts with the same datacenter as where the copro runs.
This would ensure that each row is flushed/compacted only in one DC and the
other DCs would be followers.
This would have to be configurable and disabled for installations with a single
HBase instance that are written to remotely by multiple datacenters, otherwise
no compaction will happen at all (at least perhaps functionally correct even if
not optimal for space usage).
> Add the flush and compaction functionality via coprocessors and scanners for
> flow run table
> -------------------------------------------------------------------------------------------
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Vrushali C
> Assignee: Vrushali C
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into
> the flow_run table. It also needs a flush & compaction processing in the
> coprocessor and perhaps a new scanner to deal with the data during flushing
> and compaction stages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)