peter-toth opened a new pull request, #56139:
URL: https://github.com/apache/spark/pull/56139

   ### What changes were proposed in this pull request?
   
   The standalone Worker UI serves `GET /json/`, which returns 
`JsonProtocol.writeWorkerState(...)` and includes each executor's 
`ApplicationDescription.command` rendered as `Command.toString`. Since 
`Command` is a case class, that string contains the full `environment` map and 
`javaOpts` sequence.
   
   This PR redacts `Command.environment` via `Utils.redact(conf, ...)` and 
`Command.javaOpts` via `Utils.redactCommandLineArgs(conf, ...)` before calling 
`toString`, reusing the same redaction APIs already applied to the launch 
command in `ExecutorRunner` logging
   (`ExecutorRunner.scala:162-164`). The `command` field stays a string 
rendered from a redacted `Command.copy(...)`, so the JSON schema is unchanged.
   
   `ExecutorRunner.conf` is promoted to `val` so
   `JsonProtocol.writeExecutorRunner` can plumb the worker's `SparkConf` into 
`writeApplicationDescription` for `spark.redaction.regex` lookups.
   
   ### Why are the changes needed?
   
   `environment` and `javaOpts` routinely carry secrets: JDBC passwords, AWS 
credentials, SSL keystore passwords, Hadoop credential store passwords, 
`spark.executorEnv.*` values, etc. `ExecutorRunner` already redacts the same 
content when writing the launch command to logs, but `JsonProtocol` emits it 
unredacted over the Worker UI HTTP endpoint. The Worker UI listens on port 8081 
with no authentication by default, so any caller with network access to the 
worker can read the secrets with a single `curl worker:8081/json | jq 
.executors[].appdesc.command`. The fix removes the inconsistency between 
log-path and HTTP-path redaction.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. The `command` field returned by `GET /json/` on the standalone Worker 
UI now has secret-bearing values in `environment` and `javaOpts` replaced with 
`*********(redacted)` when they match `spark.redaction.regex` (the default 
pattern matches keys like `secret`, `password`, `token`, etc.). The JSON schema 
is unchanged -- `command` remains a single string in `Command.toString` format 
-- so existing tooling that parses this endpoint continues to work; only the 
sensitive values that were previously leaked are now masked.
   
   ### How was this patch tested?
   
   - Added `SPARK-57098: secrets in executor command are redacted in worker 
JSON endpoint` to `JsonProtocolSuite`, covering both environment-variable and 
`-D` java-opt secret carriers, asserting that redacted values are scrubbed 
while non-sensitive values (`JAVA_HOME`, `-Xmx2g`) pass through.
   - `build/sbt 'core/testOnly org.apache.spark.deploy.JsonProtocolSuite'` -- 
11/11 tests pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to