JoshRosen commented on PR #36885:
URL: https://github.com/apache/spark/pull/36885#issuecomment-1167855113

   I've made some significant progress in compatibility testing: in a separate 
branch, I copied the old version of the code into an `OldJsonProtocol.scala` 
class, wrote a set of Scalacheck generators for generating random SparkListener 
event instances, and performed the following test:
   
   - Given a random event, serialize it to JSON with both the old and new 
implementations. Verify that the resulting JSON is _character-for-character 
identical_ between both implementations. This should guarantee compatibility 
with old versions of the History Server (and other consumers of the JSON).
   - Given the JSON, use both the old and new implementations to deserialize 
the event and verify that the result is equal to the original event.
   
   My code for this test can be found in 
https://github.com/apache/spark/commit/b895b78d66f868dce9d55b276b771f57c061fc9a 
. I don't want to commit this code because it's a bit messy and I don't think 
it makes sense to keep both the old and new versions of JsonProtocol checked 
into the repo: if we can more-or-less exhaustively verify that the new 
implementation is equivalent to the old one then I don't think it makes sense 
to keep both around and add a flag, etc. 
   
   These tests pass with hundreds of examples of each event, giving us high 
confidence that the two implementations behave the same. 
   
   I _did_ manage to find some pre-existing bugs in `JsonProtocol` but am _not_ 
planning to fix them in this PR (although I'm planning to submit a separate PR 
containing fixes for these issues):
   
   - `TaskResultRequest` loses precision for values < 0.5. The `amount` is a 
floating point number which is either between 0 and 0.5 or is a positive 
integer, but the read path assumes it is an integer.
   - ExecutorResourceRequset integer overflows for values larger than an 
integer because the write path writes longs but the read path assumes integers.
   - The `OFF_HEAP` storage level is not handled properly: the `offHeap` field 
isn't included in the JSON, so this StorageLevel cannot be round-tripped 
through JSON.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to