[GitHub] [spark] ueshin commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

GitBox Mon, 11 Oct 2021 11:31:28 -0700


ueshin commented on a change in pull request #34225:
URL: https://github.com/apache/spark/pull/34225#discussion_r726491351




##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -967,7 +1022,17 @@ def repartition(self, numPartitions, *cols):
         else:
             raise TypeError("numPartitions should be an int or Column")
 
-    def repartitionByRange(self, numPartitions, *cols):
+    @overload
+    def repartitionByRange(self, numPartitions: int, *cols: "ColumnOrName") -> 
"DataFrame":
+        ...
+
+    @overload
+    def repartitionByRange(self, *cols: "ColumnOrName") -> "DataFrame":
+        ...
+
+    def repartitionByRange(  # type: ignore[misc]
+        self, numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName"

Review comment:
       In runtime, when we call something like `sdf.repartitionByRange('col1', 
'col2', ...)`, the `numPartitions` will be `ColumnOrName`.
   In fact, it checks whether it's `str` or `Column`:
   
https://github.com/apache/spark/blob/f366fc8f0e4ed9a5a1d810d128d11d5224bdd122/python/pyspark/sql/dataframe.py#L1089-L1091
   
   Even though this won't be showed externally, we need it to make `mypy` check 
the function body properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin commented on a change in pull request #34225: [SPARK-36885][PYTHON] Inline type hints for pyspark.sql.dataframe

Reply via email to