[
https://issues.apache.org/jira/browse/YARN-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866427#comment-15866427
]
Wangda Tan commented on YARN-5946:
----------------------------------
Thanks [~jhung] for writing this up, it is very clear to me.
One thing to confirm:
bq. and a table3 with the "last good" transaction id (initialized at 0)
It is actually means "last confirmed" transaction id, correct? I found in the
step 5 it get increased even if update failed.
And one minor suggestion to the data persisted:
bq. If success, MCM stores the mutation in table1 and increments the txn id in
table3 (both of these are done together atomically)
I think derby may support this, but I'm not sure if this is common to different
storage (for example, atomically update 2 HDFS file, or 2 ZK node, etc.). So I
suggest to persist a transaction-id in addition to "last good" configuration to
table-1. So even if write to table3 failed, we can recover the latest config in
table-1.
For the API, some suggestions to hide internal implementation details:
1) Do we really want {{Collection<String> removes}} as a part of logItem? I
think set a key to empty value is equivalent to remove a key, correct? I would
prefer to not add the {{removes}} field.
2) Who will generate "id" for each logItem? And suggest to make it to be long
instead of int.
3) YarnConfigurationStore#retrieve, does it mean get from table-1 or get from
table-1/2/3 (which described by your "for the failover case ..." in your
previous comment)? I would prefer the latter one.
4) readPersistedId/getMutations look like internal implementation to me. Is it
better to update them to {{List<LogMutation> getPendingMutations(void)}}?
In summary, I think following APIs will be sufficient:
{code}
1) initialize(Configuration conf, Map<String, String> schedConf);
2) retrieveLatestConfirmedConf which returns latest *good* configuration. This
will be called when recovery
3) retrieveLatestConf which returns latest *not yet confirmed* configuration,
this will be used by scheduler to try reinitialize.
4) logMutation to save the new mutation, and {{retrieveLatestConf}} can get
updated accordingly.
5) confirmMutation(long id), to confirm the mutation, and
{{retrieveLatestConfirmedConf}} can get updated accordingly.
6) List<LogMutation> getPendingMutations(void), this will be called when
recovery
7) optional but may useful: List<Map<String, String>>
getConfirmedConfHistory(long fromId). Admin can use this API to retrieve config
history.
{code}
Please let me know your thoughts.
> Create YarnConfigurationStore interface and InMemoryConfigurationStore class
> ----------------------------------------------------------------------------
>
> Key: YARN-5946
> URL: https://issues.apache.org/jira/browse/YARN-5946
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Jonathan Hung
> Assignee: Jonathan Hung
> Attachments: YARN-5946.001.patch, YARN-5946-YARN-5734.002.patch
>
>
> This class provides the interface to persist YARN configurations in a backing
> store.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]