sarutak opened a new pull request, #56361: URL: https://github.com/apache/spark/pull/56361
### What changes were proposed in this pull request? This PR changes `SparkSQLDriver.scala` to redact a query before `setJobDescription`. ### Why are the changes needed? In the current implementation, when a query is executed through `SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description in the table on the top of `/SQL/execution` is redacted. <img width="1083" height="349" alt="sql-execution-page-top-table" src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda" /> But the description in the table on the `/jobs` page and the one in the table on the bottom of `/SQL/execution` page are not redacted. <img width="525" height="692" alt="jobs-page-before" src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117" /> <img width="515" height="274" alt="sql-execution-page-before" src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac" /> NOTE: Even after this PR is merged, when a job description is set manually using `sc.setJobDescription`, the description displayed in the `/jobs` page and the one on the bottom of `/SQL/execution` page are not redacted though the one on the top of `SQL/execution` page is redacted. ``` $ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*" scala> val s = "SELECT * FROM (SELECT 'secret=1')" scala> sc.setJobDescription(s) scala> sql(s).show() +--------+ |secret=1| +--------+ |secret=1| +--------+ ``` **description in `/jobs` page** <img width="555" height="226" alt="jobs-page-not-redacted" src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d" /> **description in `/SQL/execution` (top)** <img width="913" height="203" alt="sql-execution-page-redacted" src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da" /> **description in `/SQL/execution` (bottom)** <img width="536" height="292" alt="sql-execution-page-not-redacted" src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1" /> This is consistent with the previous behavior and not a regression. There is no simple way to redact them and doing it is out of scope of this PR. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Added new test and confirmed the test `SQL execution description should respect spark.sql.redaction.string.regex` added in #56358 passed. Also confirmed descriptions are redacted in UI. ``` $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*" spark-sql (default)> CREATE TABLE test1(secret string); spark-sql (default)> SELECT * FROM test1 WHERE secret=1; ``` <img width="607" height="213" alt="jobs-page-after-2" src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462" /> <img width="589" height="274" alt="sql-execution-page-after-2" src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0" /> ### Was this patch authored or co-authored using generative AI tooling? Kiro CLI / Claude -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
