JoshRosen commented on PR #36885:
URL: https://github.com/apache/spark/pull/36885#issuecomment-1159547579

   > Can I ask a newbie questions about version compatibility / optional field 
handling? The existing code is capable of being lenient in some cases and 
ignoring certain extra/missing fields. Does the translation from Json4s -> 
Jackson preserve that behavior?
   
   Yes, it should.
   
   There are two different types of compatibility that we need to consider:
   
   - **Compatibility for reading JSON produced by old versions of Spark:** 
     - When we add a new field to an existing event type, we need the read path 
to treat that field as optional. Both the old and new code achieve this via the 
`jsonOption` function, which allows us to control behavior when a field is 
missing at read time. 
     - There are [some existing tests for 
this](https://github.com/apache/spark/blob/a859dd25019715165ddb0defe3ddfd8e3cba866e/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala#L239-L241)
 in `JsonProtocolSuite`. These tests remove fields from the JSON (simulating 
JSON generated by old versions that don't emit those fields) and tests that 
events still parse.
     - There are also some older eventlogs which are used by history server 
tests: 
https://github.com/apache/spark/tree/master/core/src/test/resources/spark-events
     - As long as we've been consistent about marking new fields with 
`jsonOption`, then I think that the existing tests + code review to ensure that 
we've preserved `jsonOption` should give us pretty high confidence in terms of 
our ability to parse old logs.
   - **Writing event JSON which is readable by old HistoryServer versions**:
     - `JsonProtocolSuite` has tests which serialize events and check that 
Json4s parses the generated JSON into the same tree as an expected JSON 
response (stored in a string literal); see 
[`assertJsonStringEquals`](https://github.com/apache/spark/blob/a859dd25019715165ddb0defe3ddfd8e3cba866e/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala#L918-L929).
 Right now this PR is replacing that use of Json4s with Jackson, but I will 
update this to check that the trees are considered equal by _both_ Jackson and 
Json4s (and will leave a comment in the code explaining why we have this 
compatibility test).
     - A stronger guarantee would be to check that the Json4s and Jackson 
versions of `JsonProtocol` produce character-for-character identical output. I 
am going to experiment with running a test where I copy the existing `Json4s` 
version of `JsonProtocol` to a class named `OldJsonProtocol` then generate 
random events (likely via ScalaCheck) and check that the Json output is exactly 
identical. If this test passes in all cases then we have strong assurance that 
our new output will be compatible with old history servers. I'm not planning to 
check this test in because I don't want to have two versions of JsonProtocol 
that need to be kept in sync when making updates (and I think the risk of 
future changes introducing breakage _which wouldn't be caught by another test_ 
is small).
   
   I think (but will confirm) that our Json4s code path is using Jackson under 
the hood (behind the Json4s interfaces), which gives me further confidence that 
things should be compatible; if the underlying json parsers were different then 
we might have to worry about corner-cases in how numbers are parsed, etc., but 
hopefully they're using the same low-level reader.
   
   I'll post an update here once I've had a chance to run those more extensive 
compatibility experiments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to