[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

Jason Lowe (JIRA) Tue, 15 Jul 2014 08:40:46 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062212#comment-14062212
 ]


Jason Lowe commented on YARN-2045:
----------------------------------

bq. I agree the concept is not quite the same but I tend to handle them both 
together as either of change (protobuf schema or layout schema) will bring 
difficulty/risky for NMStateStoreService to load old version of data.

I think lumping them together and handling them in the implementation-specific 
code is fine, but if the implementation is handling all the details then why is 
it exposed in the interface?  I think the most telling point is that in the 
proposed patch no common code actually uses the interfaces that were added.  
Each implementation does its own version setting, its own compatibility check, 
and I assume its own marshaling in the future if necessary.  The interfaces 
aren't called by common code.  Maybe I'm not seeing the future use case of 
these methods?  I guess it could be useful for common code to do 
logging/reporting of the persisted/current versions or maybe to do a very 
simplistic incompatibility check (e.g.: assume different major numbers means 
incompatible), although arguably the implementation could simply log these 
numbers as it initializes and is already doing an implementation-specific 
compatibility check.

However I'm particularly doubtful of the storeVersion method as it seems like 
the only way to safely convert versions in the general sense is with 
implementation-specific code.  Using the conversion pseudo-code above as an 
example, if we crash halfway through the conversion of a series of objects then 
we have a mix of old and new data on the next restart but the stored version 
number is still old (or vice-versa if we store the new version first then 
convert).  In an implementation-specific approach it may be possible to make 
the conversion atomic, e.g.: using a batch write for the entire conversion in 
leveldb.  Therefore it makes more sense to me that an implementation should be 
responsible for deciding when and how to update the persisted schema version.  
I would expect implementations to do this sort of conversion during 
initialization and potentially the old persisted version would never be seen 
since it would already be converted.  Do you have an example where using the 
storeVersion method in the interface via implementation-independent code would 
be more appropriate and therefore the storeVersion method in the interface is 
necessary?

To summarize, I can see exposing the ability to get the persisted and current 
state store versions in the interface for logging, etc.  However I don't see 
how implementation-independent code can properly update the version via the 
interface.  We're lumping both interface and implementation-specific schema 
changes in the same version number, and it isn't possible to do an update of 
multiple store objects atomically via the current interface.

bq. Are you suggesting NMDBSchemaVersion to play as PBImpl directly to include 
raw protobuf or something else?

Sort of a subset of what the PBImpl is doing.  I was thinking of having 
NMDBSchemaVersion wrap the protobuf but in a read-only way (i.e.: no set 
methods, no builder stuff).  If one wants to change the version number, create 
a new protobuf.  PBImpls tend to get into trouble because they can be written, 
and it's simpler to treat the protobufs as immutable as they were intended.  
Another approach would be to simply have some static helper util methods that 
take two protobufs to do the compatible checks, etc.  Although I don't think we 
can really implement a useful isCompatibleTo check in implementaion-independent 
code since the version numbers encodes implementation-specific schema 
information.


Anyway I didn't mean to drag out this change for too long.  I'm wondering about 
these interfaces since I'm a strong believer that interfaces should be minimal 
and necessary, and I'm having difficulty seeing how these interfaces are really 
going to be used.  However I'm probably in the minority on these methods.  If 
people feel strongly that these interfaces are necessary and useful then go 
ahead and add them.  It seems to me that these interfaces will either never be 
called or only called for trivial reasons (e.g.: logging).  However I don't 
think having them is going to break anything or be an unreasonable burden on an 
implementation, rather just extra baggage that state store implementations have 
to expose.  As for the PBImpl, it's mostly a nit.  If you really would rather 
keep it in I guess that's fine.  We should be able to remove it later if we 
realize we don't have a use for it.  The main change I think has to be made is 
the leveldb schema check should handle the original method for storing the 
schema.  Two ways to handle that are either explicitly check for the "1.0" 
string before trying to parse the value as a protobuf or store the new version 
type under a different key and check for the old schema key if the new schema 
key is missing.

> Data persisted in NM should be versioned
> ----------------------------------------
>
>                 Key: YARN-2045
>                 URL: https://issues.apache.org/jira/browse/YARN-2045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.4.1
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-2045-v2.patch, YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related 
> data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

Reply via email to