cloud-fan opened a new pull request, #56630:
URL: https://github.com/apache/spark/pull/56630
### What changes were proposed in this pull request?
`JDBCOptions.getRedactUrl()` is used to surface the JDBC URL in logs and in
the
`FAILED_JDBC.*` error messages (e.g. `FAILED_JDBC.CONNECTION`). Today it is
implemented as:
```scala
def getRedactUrl(): String =
Utils.redact(SQLConf.get.stringRedactionPattern, url)
```
`SQLConf.get.stringRedactionPattern` (`spark.sql.redaction.string.regex`) is
**unset by
default**, so `Utils.redact(None, url)` returns the URL unchanged. JDBC URLs
routinely embed
credentials — either as userinfo in the authority
(`jdbc:mysql://user:password@host/db`) or as
connection properties (`...?password=secret`, `...;token=abc`) — so by
default those secrets are
printed verbatim in error messages and logs.
This PR makes `getRedactUrl()` redact credentials unconditionally,
independent of the optional
`spark.sql.redaction.string.regex`:
- A new `JDBCOptions.redactUrl(url, regex)` helper redacts the password in
the authority's userinfo
(keeping the username) and the values of sensitive connection properties
(`password`, `pwd`,
`token`, `secret`, `accessKey`, `credential`, …), then still applies the
user-configured
`regex` on top, preserving existing behavior.
- `getRedactUrl()` now delegates to this helper.
Non-sensitive parts of the URL (scheme, host, port, database, and non-secret
properties) are kept,
so error messages remain useful for debugging while no longer leaking
credentials.
### Why are the changes needed?
`FAILED_JDBC.*` errors and connection logs can currently leak JDBC
credentials to anyone who can
read query error messages or driver logs, because the default redaction is a
no-op. Redacting at
the single `getRedactUrl()` chokepoint fixes every `FAILED_JDBC.*` call site
at once.
### Does this PR introduce _any_ user-facing change?
Yes. When a JDBC URL contains embedded credentials, `FAILED_JDBC.*` error
messages (and any log
line built from `getRedactUrl()`) now show the credentials replaced with
`*********(redacted)`
instead of in clear text. URLs without embedded credentials are unchanged.
### How was this patch tested?
New unit test `redactUrl redacts credentials embedded in a JDBC URL` in
`JdbcUtilsSuite`, covering:
credential-free URLs (passthrough), userinfo password redaction (username
kept), bare-username
passthrough, sensitive connection properties across `&`/`;` separators and
mixed case,
non-sensitive properties kept, combined userinfo + property redaction, and
null/empty inputs.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]