JoshRosen commented on PR #36885:
URL: https://github.com/apache/spark/pull/36885#issuecomment-1167840285

   > @JoshRosen Would you consider using 
[jsoniter-scala](https://github.com/plokhotnyuk/jsoniter-scala) instead of 
Jackson?
   
   @plokhotnyuk, I don't think that `jsoniter-scala` will easily address our 
current use-case: it looks like that library is focused on object mapping but 
Spark's `JsonProtocol` does not currently use object mapping to serialize its 
JSON events. I think that one of the reasons for this is the fact that 
`JsonProtocol` events have fields which aren't Scala classes or POJOs and thus 
can't easily be object mapped. For example, `taskMetricsFromJson` extracts 
fields from the JSON and then calls "increment metric" setter methods on a `new 
TaskMetrics` instance.
   
   If we wanted to use object mapping, we could define a separate set of case 
classes for events' JSON representations and could have a helper function for 
translating from those intermediate classes into the actual 
`SparkListenerEvent` classes. We'd have to be careful to properly handle all of 
the backwards- and forwards-compatibility constraints, including supplying 
default values for missing fields.
   
   We'd also have to weight the costs / benefits of adding another external 
dependency to Spark: Spark's dependency on libraries can cause library 
conflicts if users also depend on those same libraries in their own code.
   
   I think it's certainly possible that we could re-architect this code in 
order to let an object mapping library do more of the heavy lifting, but I 
don't don't want to make that change in this PR: my short-term goal is to land 
an easy-to-understand, easy-to-verify patch in order to achieve large 
performance improvements over the JSON4s-based status quo. I would be open to 
reviewing a patch that does the refactoring work needed to use object mapping 
libraries in case that results in significant performance improvements or code 
simplification, but I don't have the time to develop such a patch myself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to