sarutak opened a new pull request, #56406:
URL: https://github.com/apache/spark/pull/56406

   ### What changes were proposed in this pull request?
   
   Replace the `show()`-based doctest for `sql_keywords()` with a `.columns` 
check.
   
   ### Why are the changes needed?
   
   SPARK-57133 added 7 new non-reserved keywords (BIN, WIDTH, ALIGN, etc.), 
which changed the top-20 row output of `sql_keywords().show()` and consequently 
the column width in the formatted output. This broke the 
`pyspark-connect-old-client` CI job, which runs `branch-4.0` client tests 
against a `master` server. The `branch-4.0` doctest still expects the old 
column width.
   
   https://github.com/sarutak/spark/actions/runs/27188469096/job/80265105548
   
   ```
   **********************************************************************
   File "/__w/spark/spark-4.0/python/pyspark/sql/connect/tvf.py", line ?, in 
pyspark.sql.connect.tvf.TableValuedFunction.sql_keywords
   Failed example:
       spark.tvf.sql_keywords().show()
   Expected:
       +-------------+--------+
       |      keyword|reserved|
       +-------------+--------+
       ...
       +-------------+--------+...
   Got:
       +----------+--------+
       |   keyword|reserved|
       +----------+--------+
       |       ADD|   false|
       |     AFTER|   false|
       | AGGREGATE|   false|
       |     ALIGN|   false|
       |       ALL|   false|
       |     ALTER|   false|
       |    ALWAYS|   false|
       |   ANALYZE|   false|
       |       AND|   false|
       |      ANTI|   false|
       |       ANY|   false|
       | ANY_VALUE|   false|
       |    APPROX|   false|
       |   ARCHIVE|   false|
       |     ARRAY|   false|
       |        AS|   false|
       |       ASC|   false|
       |ASENSITIVE|   false|
       |        AT|   false|
       |    ATOMIC|   false|
       +----------+--------+
       only showing top 20 rows
   **********************************************************************
      1 of   1 in pyspark.sql.connect.tvf.TableValuedFunction.sql_keywords
   ***Test Failed*** 1 failures.
   ```
   
   The `show()` output is inherently fragile for this TVF because any keyword 
addition changes the formatting. Since a dedicated unittest 
(`test_sql_keywords` in `test_tvf.py`) already verifies the full output via 
`assertDataFrameEqual`, the doctest only needs to confirm that the method 
returns a valid DataFrame. Using `.columns` achieves this without being 
sensitive to keyword list changes.
   
   Currently the only branch affected in CI is `branch-4.0` (via 
`pyspark-connect-old-client`), but this change is made on `master` for 
consistency and will be backported to older branches.
   
   ### Does this PR introduce *any* user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing `test_sql_keywords` unittest continues to pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Kiro CLI / Claude
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to