sarutak opened a new pull request, #56361:
URL: https://github.com/apache/spark/pull/56361

   ### What changes were proposed in this pull request?
   This PR changes `SparkSQLDriver.scala` to redact a query before 
`setJobDescription`.
   
   ### Why are the changes needed?
   In the current implementation, when a query is executed through 
`SparkSQLDriver`, redaction is done in `SQLExecution.scala` so the description 
in the table on the top of `/SQL/execution` is redacted.
   <img width="1083" height="349" alt="sql-execution-page-top-table" 
src="https://github.com/user-attachments/assets/b06fb255-2b46-473d-9046-1b2d578e3bda";
 />
   
   But the description in the table on the `/jobs` page and the one in the 
table on the bottom of `/SQL/execution` page are not redacted.
   <img width="525" height="692" alt="jobs-page-before" 
src="https://github.com/user-attachments/assets/31c88b98-779b-4305-bf71-58f19a1d7117";
 />
   <img width="515" height="274" alt="sql-execution-page-before" 
src="https://github.com/user-attachments/assets/012be251-f642-4ded-8f77-32f811b05cac";
 />
   
   NOTE:
   Even after this PR is merged, when a job description is set manually using 
`sc.setJobDescription`, the description displayed in the `/jobs` page and the 
one on the bottom of `/SQL/execution` page are not redacted though the one on 
the top of `SQL/execution` page is redacted.
   
   ```
   $ bin/spark-shell -c spark.sql.redaction.string.regex="secret.*=.*"
   scala> val s = "SELECT * FROM (SELECT 'secret=1')"
   scala> sc.setJobDescription(s)
   scala> sql(s).show()
   +--------+
   |secret=1|
   +--------+
   |secret=1|
   +--------+
   ```
   
   **description in `/jobs` page**
   <img width="555" height="226" alt="jobs-page-not-redacted" 
src="https://github.com/user-attachments/assets/b4e084ad-b648-4ba6-b049-ef42f570398d";
 />
   **description in `/SQL/execution` (top)**
   <img width="913" height="203" alt="sql-execution-page-redacted" 
src="https://github.com/user-attachments/assets/91e745f0-aa7f-4618-98e9-5b4b117415da";
 />
   **description in `/SQL/execution` (bottom)**
   <img width="536" height="292" alt="sql-execution-page-not-redacted" 
src="https://github.com/user-attachments/assets/761aad76-0d1b-49af-9e03-58510cd474d1";
 />
   
   This is consistent with the previous behavior and not a regression. There is 
no simple way to redact them and doing it is out of scope of this PR.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   
   ### How was this patch tested?
   Added new test and confirmed the test `SQL execution description should 
respect spark.sql.redaction.string.regex` added in #56358 passed.
   Also confirmed descriptions are redacted in UI.
   ```
   $ bin/spark-sql --conf spark.sql.redaction.string.regex="secret.*=.*"
   spark-sql (default)>  CREATE TABLE test1(secret string);
   spark-sql (default)> SELECT * FROM test1 WHERE secret=1;
   ```
   <img width="607" height="213" alt="jobs-page-after-2" 
src="https://github.com/user-attachments/assets/62646cfc-67c3-46b5-a9f9-695b1f874462";
 />
   <img width="589" height="274" alt="sql-execution-page-after-2" 
src="https://github.com/user-attachments/assets/597db0da-58fb-4275-b6aa-7e8b301f15d0";
 />
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Kiro CLI / Claude
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to