[
https://issues.apache.org/jira/browse/YARN-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868965#comment-15868965
]
Jonathan Hung commented on YARN-5946:
-------------------------------------
Thanks [~leftnoteasy] for the comments.
bq. It is actually means "last confirmed" transaction id, correct? I found in
the step 5 it get increased even if update failed.
It is the txnid for which all logs with a lesser txnid do not need to be
replayed on recovery. Either this means the log has been persisted to the store
in case of successful refresh, or the mutation has been deemed invalid in case
of failure to refresh (which is why it is incremented even if update fails). So
in this case perhaps confirmMutation(long id) should be confirmMutation(long
id, boolean isValid).
bq. So I suggest to persist a transaction-id in addition to "last good"
configuration to table-1.
Sure, I think this is implementation dependent, in general though we can have a
configuration entry with key="transaction.id" or something similar.
bq. Who will generate "id" for each logItem?
I think the YarnConfigurationStore should maintain the current id and generate
new ones, which are returned upon logMutation calls. So when MCM receives a
mutation, it will log it, which will then return an incremented id "id", then
MCM will try to refresh, and will call confirmMutation("id", true/false).
Here the YarnConfigurationStore can store a map of "id" to LogMutation in
memory, so it can quickly store the LogMutation into table1 if
confirmMutation(id, true) is called.
bq. YarnConfigurationStore#retrieve, does it mean get from table-1 or get from
table-1/2/3 (which described by your "for the failover case ..." in your
previous comment)? I would prefer the latter one.
On failover MCM would call retrieve (which returns a "conf"), and
getPendingMutations, apply each pendingMutation one by one to "conf", and
confirmMutation(pendingMutation.id, true/false) if refresh is
successful/unsuccessful. So YarnConfigurationStore#retrieve on its own returns
from table1 which may not have all logs applied, but MCM will reconstruct the
updated configuration from getPendingMutations. So not sure if
retrieveLatestConf is necessary (the third API in previous comment).
Since MCM stores an in memory configuration, YarnConfigurationStore#retrieve
and getPendingMutations should be only called once, on failover.
So my proposal is: {noformat}1) initialize(Configuration conf, Map<String,
String> schedConf);
2) retrieve which returns conf stored in table1
3) logMutation to save the new mutation in table2
4) confirmMutation(long id, boolean isValid) to increment txnid stored in
table1, and persist the logged mutation if isValid==true
5) List<LogMutation> getPendingMutations(void) for getting unconfirmed
mutations{noformat}
I think we can add getConfirmedConfHistory in a later patch.
If no concerns with this approach, will upload patch. Thanks!
> Create YarnConfigurationStore interface and InMemoryConfigurationStore class
> ----------------------------------------------------------------------------
>
> Key: YARN-5946
> URL: https://issues.apache.org/jira/browse/YARN-5946
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Jonathan Hung
> Assignee: Jonathan Hung
> Attachments: YARN-5946.001.patch, YARN-5946-YARN-5734.002.patch
>
>
> This class provides the interface to persist YARN configurations in a backing
> store.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]