zhengruifeng opened a new pull request, #46895:
URL: https://github.com/apache/spark/pull/46895
### What changes were proposed in this pull request?
Enable doctest `pyspark.sql.connect.column`
### Why are the changes needed?
test coverage
### Does this PR introduce _any_ user-facing change?
no, test only
### How was this patch tested?
manually check:
I manually broke some doctests in `Column`, then found
`pyspark.sql.connect.column` didn't fail:
```
(spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k
--python-executables python3 --testnames 'pyspark.sql.classic.column'
Running PySpark tests. Output is in
/Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.classic.column']
python3 python_implementation is CPython
python3 version is: Python 3.12.2
Starting test(python3): pyspark.sql.classic.column (temp output:
/Users/ruifeng.zheng/Dev/spark/python/target/4bdd14b8-92ba-43ba-a7fb-655e6769aeb9/python3__pyspark.sql.classic.column__i2_c1zct.log)
WARNING: Using incubator modules: jdk.incubator.vector
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
**********************************************************************
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/column.py", line
385, in pyspark.sql.column.Column.contains
Failed example:
df.filter(df.name.contains('o')).collect()
Differences (ndiff with -expected +actual):
- [Row(age=5, name='Bobx')]
? -
+ [Row(age=5, name='Bob')]
**********************************************************************
1 of 2 in pyspark.sql.column.Column.contains
***Test Failed*** 1 failures.
Had test failures in pyspark.sql.classic.column with python3; see logs.
(spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k
--python-executables python3 --testnames 'pyspark.sql.connect.column'
Running PySpark tests. Output is in
/Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.connect.column']
python3 python_implementation is CPython
python3 version is: Python 3.12.2
Starting test(python3): pyspark.sql.connect.column (temp output:
/Users/ruifeng.zheng/Dev/spark/python/target/2acaff3c-ef1d-41eb-b63e-509f3e0192c0/python3__pyspark.sql.connect.column__66td62h9.log)
Finished test(python3): pyspark.sql.connect.column (3s)
Tests passed in 3 seconds
```
after this PR, it fails as expected:
```
(spark_dev_312) ➜ spark git:(master) ✗ python/run-tests -k
--python-executables python3 --testnames 'pyspark.sql.connect.column'
Running PySpark tests. Output is in
/Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.connect.column']
python3 python_implementation is CPython
python3 version is: Python 3.12.2
Starting test(python3): pyspark.sql.connect.column (temp output:
/Users/ruifeng.zheng/Dev/spark/python/target/390ff7ae-7683-425c-b0d2-ee336e1ad452/python3__pyspark.sql.connect.column__f69b3smc.log)
WARNING: Using incubator modules: jdk.incubator.vector
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
org.apache.spark.SparkSQLException: [INVALID_CURSOR.DISCONNECTED] The cursor
is invalid. The cursor has been disconnected by the server. SQLSTATE: HY109
at
org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender.execute(ExecuteGrpcResponseSender.scala:281)
at
org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender$$anon$1.run(ExecuteGrpcResponseSender.scala:101)
**********************************************************************
File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/column.py", line
385, in pyspark.sql.column.Column.contains
Failed example:
df.filter(df.name.contains('o')).collect()
Expected:
[Row(age=5, name='Bobx')]
Got:
[Row(age=5, name='Bob')]
**********************************************************************
1 of 2 in pyspark.sql.column.Column.contains
***Test Failed*** 1 failures.
Had test failures in pyspark.sql.connect.column with python3; see logs.
```
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]