JoshRosen commented on PR #36885: URL: https://github.com/apache/spark/pull/36885#issuecomment-1167840285
> @JoshRosen Would you consider using [jsoniter-scala](https://github.com/plokhotnyuk/jsoniter-scala) instead of Jackson? @plokhotnyuk, I don't think that `jsoniter-scala` will easily address our current use-case: it looks like that library is focused on object mapping but Spark's `JsonProtocol` does not currently use object mapping to serialize its JSON events. I think that one of the reasons for this is the fact that `JsonProtocol` events have fields which aren't Scala classes or POJOs and thus can't easily be object mapped. For example, `taskMetricsFromJson` extracts fields from the JSON and then calls "increment metric" setter methods on a `new TaskMetrics` instance. If we wanted to use object mapping, we could define a separate set of case classes for events' JSON representations and could have a helper function for translating from those intermediate classes into the actual `SparkListenerEvent` classes. We'd have to be careful to properly handle all of the backwards- and forwards-compatibility constraints, including supplying default values for missing fields. We'd also have to weight the costs / benefits of adding another external dependency to Spark: Spark's dependency on libraries can cause library conflicts if users also depend on those same libraries in their own code. I think it's certainly possible that we could re-architect this code in order to let an object mapping library do more of the heavy lifting, but I don't don't want to make that change in this PR: my short-term goal is to land an easy-to-understand, easy-to-verify patch in order to achieve large performance improvements over the JSON4s-based status quo. I would be open to reviewing a patch that does the refactoring work needed to use object mapping libraries in case that results in significant performance improvements or code simplification, but I don't have the time to develop such a patch myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
