Github user markgrover commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15971#discussion_r89240387
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
    @@ -231,6 +233,19 @@ private[spark] class EventLoggingListener(
         }
       }
     
    +
    +  private def redactEvent(event: SparkListenerEnvironmentUpdate): 
SparkListenerEnvironmentUpdate = {
    --- End diff --
    
    Good question. I thought about this quite a bit when making this change. I 
should have posted that decision in the PR description, apologies for not 
providing that context. The way I see it - there are three places where 
redaction can be done:
    1. Right at the source of SparkListenerEnvironmentUpdate ([here, in 
SparkContext.scala](https://github.com/apache/spark/blob/39a1d30636857715247c82d551b200e1c331ad69/core/src/main/scala/org/apache/spark/SparkContext.scala#L2214)).
    2. [In 
JsonProtocol.scala](https://github.com/apache/spark/blob/39a1d30636857715247c82d551b200e1c331ad69/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L167)
 when converting the event to JSON.
    3. In 
[EventLogginListener.scala](https://github.com/apache/spark/blob/39a1d30636857715247c82d551b200e1c331ad69/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L167)
 (where it is right now), when being persisted to disk.
    
    A user could write a custom listener that listened to the environment 
updates and did something useful with them. And, I didn't want to impose 
redaction on them. They could be using it to create a clone of their 
environment, for example and may need to the same sensitive properties. So, I 
ruled out 1.
    
    And, JsonProtocol seemed like a generic utility to convert events to JSON. 
While I could be selective about only redacting events of 
`SparkListenerEnvironmentUpdate` type, I didn't want to assume that everyone 
translating the environment update to JSON should only be seeing redacted 
configuration. So, that made me rule out 2.
    
    I decided that it was best redact "closest to disk", which made me put the 
redaction code where I did - in EventLoggingListener. Hope that makes sense, 
happy to hear your thoughts if you think otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to