[
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061559#comment-14061559
]
Junping Du commented on YARN-2045:
----------------------------------
[~jlowe], thanks for all good comments. Please see my reply below:
bq. for tracking protobuf schema changes for any protobufs passed via the
NMStateStoreService interface directly. That's not quite the same concept as
the layout of the state store itself (i.e.: implementation-specific schema).
For example, if leveldb decided to rework the way it uses keys in the database
that would be a schema change not reflected in this version, correct?
I agree the concept is not quite the same but I tend to handle them both
together as either of change (protobuf schema or layout schema) will bring
difficulty/risky for NMStateStoreService to load old version of data. If
incompatible changes happen at protobuf schema, it cannot deserialize object
from loaded data, or it cannot search/load data correctly if changes happens at
layout. However, these "incompatible" changes can be defined with flexibility
in future. If we still want to support backward compatibility for some changes
in layout, we can have some extra code to handle special version layout - like
what HDFS was doing and we don't have to increase the major number of version.
bq. I just found it odd that the state store implementation code is checking
this value yet it's exposed in a state store interface which is not
implementation specific. That's what felt wrong to me.
Are you talking about the interfaces
(loadVersion/storeVersion/getCurrentVersion) exposed in NMStateStoreService? I
think these interfaces are already implemented specifically in each state store
implementations and can be leveraged by version check specifically.
checkVersion() is not exposed in interface as this method is only necessary for
some implementation of state store. Do I miss something here?
bq. For this kind of schema version, the code to marshal data between
compatible versions should be in a common place, not in each state store
implementation, and either all state stores end up supporting the compatible
versions or they don't.
IMO, the code to marshal data between compatible versions (if necessary) can be
implementation-specific. It can happen in state store initialize stage with
detecting version and doing translating work. Some pseudo code example to do
this in future I can imagine is like below:
{code}
version = loadVersion();
if (version == specificCompatibleOldVersion) {
object[] o1 = loadO1WithOldLogic();
object[] o2 = loadO2WithOldLogic()
...
}
persistObjectsBack(o1, o2, ...).
storeNewVersionNumber();
{code}
This may not be necessary for protobuf object compatibility as old object
should have reasonable default value after deserialize in new code. But could
be necessary for layout changes that we want to support backward compatibility.
Thoughts?
bq. We could just have NMDBSchemaVersion class wrap the raw protobuf and have
the same easy-to-use interface without all the extra PBImpl code. And as a
bonus, NMDBSchemaVersion's hashCode and equals methods can delegate to the raw
protobuf methods to remove even more code.
I can understand your point here. Are you suggesting NMDBSchemaVersion to play
as PBImpl directly to include raw protobuf or something else? Do we have
example like this before?
> Data persisted in NM should be versioned
> ----------------------------------------
>
> Key: YARN-2045
> URL: https://issues.apache.org/jira/browse/YARN-2045
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.4.1
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: YARN-2045-v2.patch, YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related
> data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure
--
This message was sent by Atlassian JIRA
(v6.2#6252)