ueshin commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726491351
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
else:
raise TypeError("numPartitions should be an int or Column")
- def repartitionByRange(self, numPartitions, *cols):
+ @overload
+ def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") ->
"DataFrame":
+ ...
+
+ @overload
+ def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+ ...
+
+ def repartitionByRange( # type: ignore[misc]
+ self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"
Review comment:
In runtime, when we call something like `sdf.repartitionByRange('col1',
'col2', ...)`, the `numPartitions` will be `ColumnOrName`.
In fact, it checks whether it's `str` or `Column`:
https://github.com/apache/spark/blob/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122/python/pyspark/sql/dataframe.py#L1089-L1091
Even though this won't be showed externally, we need it to make `mypy` check
the function body properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]