Re: [PR] [SPARK-46167][PS] Add axis implementation to DataFrame.rank [spark]

via GitHub Tue, 27 Jan 2026 22:14:59 -0800


gaogaotiantian commented on code in PR #54009:
URL: https://github.com/apache/spark/pull/54009#discussion_r2735019871



##########
python/pyspark/pandas/frame.py:
##########
@@ -11361,9 +11361,13 @@ def _result_aggregated(
         # dtype: bool
         return first_series(DataFrame(internal))
 
-    # TODO(SPARK-46167): add axis, pct, na_option parameter
+    # TODO(SPARK-46167): add pct, na_option parameter
     def rank(
-        self, method: str = "average", ascending: bool = True, numeric_only: 
bool = False
+        self,
+        method: str = "average",
+        ascending: bool = True,
+        numeric_only: bool = False,
+        axis: Axis = 0,

Review Comment:
   We need to make a decision for where `axis` should be. `pandas` has it at 
the very beginning - we are doing a different thing, which means if the user is 
sending the argument positionally, we would have a different result. On the 
other hand, if they are doing that, moving `axis` to the beginning would break 
their existing code too.
   
   On a side note, `pandas` is moving towards keyword-only APIs very eagerly. 
We could also consider doing that here to avoid user sending the wrong argument.
   
   We are incompatible with `pandas` now - might be a good chance to fix that 
and hurt the users early.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46167][PS] Add axis implementation to DataFrame.rank [spark]

Reply via email to