[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

Junping Du (JIRA) Mon, 14 Jul 2014 19:22:18 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061559#comment-14061559
 ]


Junping Du commented on YARN-2045:
----------------------------------

[~jlowe], thanks for all good comments. Please see my reply below:
bq.  for tracking protobuf schema changes for any protobufs passed via the 
NMStateStoreService interface directly. That's not quite the same concept as 
the layout of the state store itself (i.e.: implementation-specific schema). 
For example, if leveldb decided to rework the way it uses keys in the database 
that would be a schema change not reflected in this version, correct?
I agree the concept is not quite the same but I tend to handle them both 
together as either of change (protobuf schema or layout schema) will bring 
difficulty/risky for NMStateStoreService to load old version of data. If 
incompatible changes happen at protobuf schema, it cannot deserialize object 
from loaded data, or it cannot search/load data correctly if changes happens at 
layout. However, these "incompatible" changes can be defined with flexibility 
in future. If we still want to support backward compatibility for some changes 
in layout, we can have some extra code to handle special version layout - like 
what HDFS was doing and we don't have to increase the major number of version. 

bq.  I just found it odd that the state store implementation code is checking 
this value yet it's exposed in a state store interface which is not 
implementation specific. That's what felt wrong to me.
Are you talking about the interfaces 
(loadVersion/storeVersion/getCurrentVersion) exposed in NMStateStoreService? I 
think these interfaces are already implemented specifically in each state store 
implementations and can be leveraged by version check specifically. 
checkVersion() is not exposed in interface as this method is only necessary for 
some implementation of state store. Do I miss something here?

bq.  For this kind of schema version, the code to marshal data between 
compatible versions should be in a common place, not in each state store 
implementation, and either all state stores end up supporting the compatible 
versions or they don't.
IMO, the code to marshal data between compatible versions (if necessary) can be 
implementation-specific. It can happen in state store initialize stage with 
detecting version and doing translating work. Some pseudo code example to do 
this in future I can imagine is like below:
{code}
version = loadVersion();
if (version == specificCompatibleOldVersion) {
  object[] o1 = loadO1WithOldLogic();
  object[] o2 = loadO2WithOldLogic()
  ...
}
persistObjectsBack(o1, o2, ...).
storeNewVersionNumber();
{code}
This may not be necessary for protobuf object compatibility as old object 
should have reasonable default value after deserialize in new code. But could 
be necessary for layout changes that we want to support backward compatibility. 
Thoughts?

bq. We could just have NMDBSchemaVersion class wrap the raw protobuf and have 
the same easy-to-use interface without all the extra PBImpl code. And as a 
bonus, NMDBSchemaVersion's hashCode and equals methods can delegate to the raw 
protobuf methods to remove even more code.
I can understand your point here. Are you suggesting NMDBSchemaVersion to play 
as PBImpl directly to include raw protobuf or something else? Do we have 
example like this before?

> Data persisted in NM should be versioned
> ----------------------------------------
>
>                 Key: YARN-2045
>                 URL: https://issues.apache.org/jira/browse/YARN-2045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.4.1
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-2045-v2.patch, YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related 
> data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

Reply via email to