edgan8 commented on a change in pull request #33625:
URL: https://github.com/apache/spark/pull/33625#discussion_r767972928



##########
File path: python/pyspark/pandas/frame.py
##########
@@ -3459,6 +3458,109 @@ def mask(
         cond_inversed = cond._apply_series_op(lambda psser: ~psser)
         return self.where(cond_inversed, other)
 
+    # TODO: Support axis as 1 or 'columns'
+    def mode(self, axis: Axis, numeric_only: bool = False, dropna: bool = 
True) -> "DataFrame":
+        """
+        Get the mode(s) of each element along the selected axis.
+
+        The mode of a set of values is the value that appears most often.
+        It can be multiple values.
+
+        Notes
+        -----
+        The current implementation of mode requires joins multiple times

Review comment:
       The implementation of mode would still be expensive since you need to do 
a group by per column but at least you avoid the join when there are many modes

##########
File path: python/pyspark/pandas/frame.py
##########
@@ -3459,6 +3458,109 @@ def mask(
         cond_inversed = cond._apply_series_op(lambda psser: ~psser)
         return self.where(cond_inversed, other)
 
+    # TODO: Support axis as 1 or 'columns'
+    def mode(self, axis: Axis, numeric_only: bool = False, dropna: bool = 
True) -> "DataFrame":
+        """
+        Get the mode(s) of each element along the selected axis.
+
+        The mode of a set of values is the value that appears most often.
+        It can be multiple values.
+
+        Notes
+        -----
+        The current implementation of mode requires joins multiple times

Review comment:
       If the join is expensive is there a way to just add/concatenate all of 
the columns to the same dataframe without joining? Since you don't care about 
the order of the modes anyway, the columns are more or less independent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to