Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10061#discussion_r63822163
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +100,7 @@ private[spark] object JsonProtocol {
executorMetricsUpdateToJson(metricsUpdate)
case blockUpdated: SparkListenerBlockUpdated =>
throw new MatchError(blockUpdated) // TODO(ekl) implement this
+ case _ => parse(mapper.writeValueAsString(event))
--- End diff --
> Events are a public API, and they should be carefully crafted, since
changing them affects user applications (including event logs). If there is
unnecessary information in the event, then it's a bug in the event definition,
not here.
Yea. I totally agree. However, my concern is that having this line at here
will make the developer harder to spot issues during the development. Since the
serialization works automatically, we are not making a self-review on what will
be serialized and what methods will be called during serialization a mandatory
step, which makes the auditing work much harder. Although it introduces more
work to the developer to make every event explicitly handled, when we review
the pull request, we can clearly know what will be serialized and how a event
is serialized when a pull request is submitted. What do you think?
btw, if I am missing any context, please let me know :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]