JoshRosen commented on PR #36885: URL: https://github.com/apache/spark/pull/36885#issuecomment-1167855113
I've made some significant progress in compatibility testing: in a separate branch, I copied the old version of the code into an `OldJsonProtocol.scala` class, wrote a set of Scalacheck generators for generating random SparkListener event instances, and performed the following test: - Given a random event, serialize it to JSON with both the old and new implementations. Verify that the resulting JSON is _character-for-character identical_ between both implementations. This should guarantee compatibility with old versions of the History Server (and other consumers of the JSON). - Given the JSON, use both the old and new implementations to deserialize the event and verify that the result is equal to the original event. My code for this test can be found in https://github.com/apache/spark/commit/b895b78d66f868dce9d55b276b771f57c061fc9a . I don't want to commit this code because it's a bit messy and I don't think it makes sense to keep both the old and new versions of JsonProtocol checked into the repo: if we can more-or-less exhaustively verify that the new implementation is equivalent to the old one then I don't think it makes sense to keep both around and add a flag, etc. These tests pass with hundreds of examples of each event, giving us high confidence that the two implementations behave the same. I _did_ manage to find some pre-existing bugs in `JsonProtocol` but am _not_ planning to fix them in this PR (although I'm planning to submit a separate PR containing fixes for these issues): - `TaskResultRequest` loses precision for values < 0.5. The `amount` is a floating point number which is either between 0 and 0.5 or is a positive integer, but the read path assumes it is an integer. - ExecutorResourceRequset integer overflows for values larger than an integer because the write path writes longs but the read path assumes integers. - The `OFF_HEAP` storage level is not handled properly: the `offHeap` field isn't included in the JSON, so this StorageLevel cannot be round-tripped through JSON. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
