sarutak opened a new pull request, #56452:
URL: https://github.com/apache/spark/pull/56452

   ### What changes were proposed in this pull request?
   
   This PR backports #56406 to `branch-4.0`.
   Replace the `show()`-based doctest for `sql_keywords()` with a `.columns` 
check.
   
   ### Why are the changes needed?
   
   SPARK-57133 (#56247) added 7 new non-reserved keywords (BIN, WIDTH, ALIGN, 
etc.), which changed the top-20 row output of `sql_keywords().show()` and 
consequently the column width in the formatted output. This broke the 
`pyspark-connect-old-client` CI job, which runs `branch-4.0` client tests 
against a `master` server. The `branch-4.0` doctest still expects the old 
column width.
   
   https://github.com/sarutak/spark/actions/runs/27188469096/job/80265105548
   
   ```
   **********************************************************************
   File "/__w/spark/spark-4.0/python/pyspark/sql/connect/tvf.py", line ?, in 
pyspark.sql.connect.tvf.TableValuedFunction.sql_keywords
   Failed example:
       spark.tvf.sql_keywords().show()
   Expected:
       +-------------+--------+
       |      keyword|reserved|
       +-------------+--------+
       ...
       +-------------+--------+...
   Got:
       +----------+--------+
       |   keyword|reserved|
       +----------+--------+
       |       ADD|   false|
       |     AFTER|   false|
       | AGGREGATE|   false|
       |     ALIGN|   false|
       |       ALL|   false|
       |     ALTER|   false|
       |    ALWAYS|   false|
       |   ANALYZE|   false|
       |       AND|   false|
       |      ANTI|   false|
       |       ANY|   false|
       | ANY_VALUE|   false|
       |    APPROX|   false|
       |   ARCHIVE|   false|
       |     ARRAY|   false|
       |        AS|   false|
       |       ASC|   false|
       |ASENSITIVE|   false|
       |        AT|   false|
       |    ATOMIC|   false|
       +----------+--------+
       only showing top 20 rows
   **********************************************************************
      1 of   1 in pyspark.sql.connect.tvf.TableValuedFunction.sql_keywords
   ***Test Failed*** 1 failures.
   ```
   
   The `show()` output is inherently fragile for this TVF because any keyword 
addition changes the formatting. Since a dedicated unittest 
(`test_sql_keywords` in `test_tvf.py`) already verifies the full output via 
`assertDataFrameEqual`, the doctest only needs to confirm that the method 
returns a valid DataFrame. Using `.columns` achieves this without being 
sensitive to keyword list changes.
   
   ### Does this PR introduce *any* user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing `test_sql_keywords` unittest continues to pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Kiro CLI / Claude
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to