[
https://issues.apache.org/jira/browse/OOZIE-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101855#comment-13101855
]
Hadoop QA commented on OOZIE-346:
---------------------------------
topher-zicornell remarked:
Hi,
Erm... I'm not convinced that reason is strong enough to glom it away in a
blob. But, let's put that aside for the moment. It's a blob, like image data
or something. Ok.
Regarding JSON serializations. On the surface it looks like a simple and
straight-forward little trick. But we used this heavily in a recent project
and ran into some difficulties. So, not to persuade or dissuade, but just to
comment:
1) To support container objects which reference interfaces, additional metadata
will need to be provided describing the implementation class which should be
used to unmarshall the object. That can be provided through some additional
code-level configuration, or XML-based configuration, or even as meta-data
embedded in the JSON object itself. Hopefully your JSON library will take care
of that for you. (Ours didn't. Arg.)
2) Similarly, to unmarshall standard generics containers (lists, maps, etc),
you'll need the same type of thing describing the generics contained. If your
container is an interface (often the case), you'll need the same type of
provisions as in 1.
3) The code working with the unmarshalled objects must be tolerant of the
structure from previous versions. When a field is added in 3.2, the code
should be adjusted to Do The Right Thing if that field is missing in order to
support pre-3.2 items.
4) Extra fields aren't really a problem as long as your JSON library knows to
ignore them. That's usually a configuration thing.
5) When annotating a POJO for serialization, you'll want to make sure not to
accidentally pull in transient data. I've seen cases where tracking lists
significantly bloated serializations and were useless once the object was
reconstituted. The bloat can cause overhead issues with the database, or at
least, did in our case.
Items 1 & 2 can be a major headache to get right.
The versioning thing might be valuable depending on the circumstances and
implementation, but it could just as easily become a cumbersome frustrating
necessity. I've seen that kind of scheme work well, and I've seen it flop
badly. That might merit more discussion. How would you envision the migration
step being triggered?
. Topher
> GH-558: Serialization/deserialization of WorkflowInstance
> ---------------------------------------------------------
>
> Key: OOZIE-346
> URL: https://issues.apache.org/jira/browse/OOZIE-346
> Project: Oozie
> Issue Type: Bug
> Reporter: Hadoop QA
>
> Oozie team at yahoo has recently experienced multiple production issues when
> a new oozie version is upgraded attributed to the modifications of Workflow
> tables' structure.
> More specifically, we added a new field into workflow table. Hence, for
> example, if a user submits a WF job in earlier oozie version and if the job
> is still active after the upgrade, oozie fails to de-serialize the WFInstance
> object. In other words, the object was originally serialized using the old
> structure whereas oozie tries to deserailize using the new structures after
> the upgrade. Therefore it throws exception.
> Some observations that came up from our internal discussion:
> 1. Is it required to store the blob into table? Can't we create the the
> object from the other fields of the table? I know it might not be that
> straight forward. However, other options might be worse than this.
> 2. If we want to keep the blob, the new field(s) should be added at the end
> during serialization. However if some fields are removed, how could we handle
> that? Might not be a flexible idea.
> 3. During serialization, we could use some type of version at the beginning,
> that would help to de-serailize the object. This might make the coding very
> ugly depending on how many old versions we would like to support.
> 4. Since it is a very well-known problem, there should be some standard
> procedure. However they might not be easy too.
> Anyway these are just the initial thoughts. We didn't come up in any
> conclusion yet.
> Please feel free to make comment?
> Thanks,
> Mohammad
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira