ueshin commented on code in PR #42595:
URL: https://github.com/apache/spark/pull/42595#discussion_r1308015934
##########
python/pyspark/sql/udtf.py:
##########
@@ -70,9 +90,25 @@ class AnalyzeResult:
----------
schema : :class:`StructType`
The schema that the Python UDTF will return.
+ with_single_partition : bool
+ If true, the UDTF is specifying for Catalyst to repartition all rows
of the input TABLE
+ argument to one collection for consumption by exactly one instance of
the correpsonding
+ UDTF class.
+ partition_by : Sequence[PartitioningColumn]
+ If non-empty, this is a sequence of columns that the UDTF is
specifying for Catalyst to
+ partition the input TABLE argument by. In this case, calls to the UDTF
may not include any
+ explicit PARTITION BY clause, in which case Catalyst will return an
error. This option is
+ mutually exclusive with 'with_single_partition'.
+ order_by: Sequence[OrderingColumn]
+ If non-empty, this is a sequence of columns that the UDTF is
specifying for Catalyst to
+ sort the input TABLE argument by. Note that the 'partition_by' list
must also be non-empty
+ in this case.
"""
schema: StructType
+ with_single_partition: bool = False
+ partition_by: Sequence[PartitioningColumn] = ()
+ order_by: Sequence[OrderingColumn] = ()
Review Comment:
`()` shouldn't be used as a default value of the fields.
```py
from dataclass import field
```
then
```py
partition_by: Sequence[PartitioningColumn] = field(default_factory=tuple) #
tuple or list
order_by: Sequence[OrderingColumn] = field(default_factory=tuple)
```
##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:
##########
@@ -643,11 +643,17 @@ class SQLQueryTestSuite extends QueryTest with
SharedSparkSession with SQLHelper
s"$testCaseName - ${udf.prettyName}", absPath, resultFile, udf)
}
} else if
(file.getAbsolutePath.startsWith(s"$inputFilePath${File.separator}udtf")) {
- Seq(TestPythonUDTF("udtf")).map { udtf =>
- UDTFTestCase(
- s"$testCaseName - ${udtf.prettyName}", absPath, resultFile, udtf
- )
- }
+ val udtfs = Seq(
+ TestPythonUDTF("udtf"),
+ TestPythonUDTFCountSumLast,
+ TestPythonUDTFWithSinglePartition,
+ TestPythonUDTFPartitionBy,
+ TestPythonUDTFInvalidPartitionByAndWithSinglePartition,
+ TestPythonUDTFInvalidOrderByWithoutPartitionBy
+ )
+ Seq(UDTFTestCase(
+ s"$testCaseName - Python UDTFs", absPath, resultFile, udtfs
+ ))
Review Comment:
`UDTFTestCase` is supposed to be for one test class I guess?
```scala
Seq(TestPythonUDTF("udtf"), ...).map { udtf =>
UDTFTestCase(
s"$testCaseName - ${udtf.prettyName}", absPath, resultFile, udtf
)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]