[jira] [Commented] (OOZIE-346) GH-558: Serialization/deserialization of WorkflowInstance

Hadoop QA (JIRA) Fri, 09 Sep 2011 19:32:22 -0700

    [ 
https://issues.apache.org/jira/browse/OOZIE-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101855#comment-13101855
 ]


Hadoop QA commented on OOZIE-346:
---------------------------------

topher-zicornell remarked:
Hi,

Erm... I'm not convinced that reason is strong enough to glom it away in a 
blob.  But, let's put that aside for the moment.  It's a blob, like image data 
or something.  Ok.

Regarding JSON serializations.  On the surface it looks like a simple and 
straight-forward little trick.  But we used this heavily in a recent project 
and ran into some difficulties.  So, not to persuade or dissuade, but just to 
comment:

1) To support container objects which reference interfaces, additional metadata 
will need to be provided describing  the implementation class which should be 
used to unmarshall the object.  That can be provided through some additional 
code-level configuration, or XML-based configuration, or even as meta-data 
embedded in the JSON object itself.  Hopefully your JSON library will take care 
of that for you.  (Ours didn't.  Arg.)

2) Similarly, to unmarshall standard generics containers (lists, maps, etc), 
you'll need the same type of thing describing the generics contained.  If your 
container is an interface (often the case), you'll need the same type of 
provisions as in 1.

3) The code working with the unmarshalled objects must be tolerant of the 
structure from previous versions.  When a field is added in 3.2, the code 
should be adjusted to Do The Right Thing if that field is missing in order to 
support pre-3.2 items.

4) Extra fields aren't really a problem as long as your JSON library knows to 
ignore them.  That's usually a configuration thing.

5) When annotating a POJO for serialization, you'll want to make sure not to 
accidentally pull in transient data.  I've seen cases where tracking lists 
significantly bloated serializations and were useless once the object was 
reconstituted.  The bloat can cause overhead issues with the database, or at 
least, did in our case.

Items 1 & 2 can be a major headache to get right.

The versioning thing might be valuable depending on the circumstances and 
implementation, but it could just as easily become a cumbersome frustrating 
necessity.  I've seen that kind of scheme work well, and I've seen it flop 
badly.  That might merit more discussion.  How would you envision the migration 
step being triggered?  

.  Topher

> GH-558: Serialization/deserialization of WorkflowInstance
> ---------------------------------------------------------
>
>                 Key: OOZIE-346
>                 URL: https://issues.apache.org/jira/browse/OOZIE-346
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie team at yahoo has recently experienced multiple production issues when 
> a new oozie version is upgraded attributed to the modifications of Workflow 
> tables' structure.
> More specifically, we added a new field into workflow table. Hence, for 
> example, if a user submits a WF job in earlier oozie version and if the job 
> is still active after the upgrade, oozie fails to de-serialize the WFInstance 
> object. In other words, the object was originally serialized using the old 
> structure whereas oozie tries to deserailize using the new structures after 
> the upgrade. Therefore it throws exception.
> Some observations that came up from our internal discussion:
> 1. Is it required to store the blob into table? Can't we create the the 
> object from the other fields of the table? I know it might not be that 
> straight forward. However, other options might be worse than this.
> 2. If we want to keep the blob, the new field(s) should be added at the end 
> during serialization. However if some fields are removed, how could we handle 
> that? Might not be a flexible idea.
> 3. During serialization, we could use some type of version at the beginning, 
> that would help to de-serailize the object. This might make the coding very 
> ugly depending on how many old versions we would like to support.  
> 4. Since it is a very well-known problem, there should be some standard 
> procedure. However they might not be easy too.
> Anyway these are just the initial thoughts. We didn't come up in any 
> conclusion yet.
> Please feel free to make comment?
> Thanks,
> Mohammad

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OOZIE-346) GH-558: Serialization/deserialization of WorkflowInstance

Reply via email to