JoshRosen commented on code in PR #39763:
URL: https://github.com/apache/spark/pull/39763#discussion_r1088589402


##########
core/src/main/scala/org/apache/spark/util/JsonProtocol.scala:
##########
@@ -39,6 +40,16 @@ import org.apache.spark.scheduler.cluster.ExecutorInfo
 import org.apache.spark.storage._
 import org.apache.spark.util.Utils.weakIntern
 
+/**
+ * Helper class for passing configuration options to JsonProtocol.
+ * We use this instead of passing SparkConf directly because it lets us avoid
+ * repeated re-parsing of configuration values on each read.
+ */
+private[spark] class JsonProtocolOptions(conf: SparkConf) {

Review Comment:
   This pattern is a bit weird, so I'd like to explain it in greater length 
here:
   
   To date, JsonProtocol has just been a big object with no configuration 
options of its own. All other event logging configurations take effect at 
higher levels, such as EventLoggingListener. We cannot implement this 
particular change at those levels, though: we shouldn't mutate the mutable 
TaskInfo or StageInfo objects referenced by the listener events and adding 
`copy()` methods to them would be expensive.
   
   JsonProtocol is currently implemented as a global object, so there's no 
straightforward way to make it easily configurable and testable. If I did 
something like "get the active SparkContext's configuration and read the 
configuration value" on each call then this would destroy performance because 
configuration reading is not cheap.
   
   One approach would be to refactor this so that JsonProtocol is no longer a 
singleton: we could give it a proper constructor which accepts a SparkConf. 
That's a much larger refactoring than I want to undertake right now, though, 
and it would involve significant updates in a bunch of test code.
   
   The approach taken here is to thread through an `JsonProtocolOptions` class 
which holds the configuration values. Right now it only holds a single value, 
but this is done as future-proofing in case we add additional configurability. 
The lone caller in EventLoggingListener constructs the `JsonProtocolOptions` 
using SparkConf, so the configuration is only read from the conf once (avoiding 
conf reader performance overheads).
   
   This is a more C-style approach than the usual OOP approach that we 
generally take, but I think it's okay in this limited internal context.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to