Re: [PR] [SPARK-46165][PS] Add support for pandas.DataFrame.all axis=1 [spark]

via GitHub Mon, 12 Jan 2026 13:39:01 -0800


devin-petersohn commented on code in PR #53507:
URL: https://github.com/apache/spark/pull/53507#discussion_r2683986135



##########
python/pyspark/pandas/frame.py:
##########
@@ -11118,28 +11120,58 @@ def all(
         dtype: bool
         """
         axis = validate_axis(axis)
-        if axis != 0:
-            raise NotImplementedError('axis should be either 0 or "index" 
currently.')
-
         column_labels = self._internal.column_labels
         if bool_only:
             column_labels = self._bool_column_labels(column_labels)
         if len(column_labels) == 0:
             return ps.Series([], dtype=bool)
+        if axis == 0:
+            applied: List[PySparkColumn] = []
+            for label in column_labels:
+                scol = self._internal.spark_column_for(label)
 
-        applied: List[PySparkColumn] = []
-        for label in column_labels:
-            scol = self._internal.spark_column_for(label)
+                if isinstance(self._internal.spark_type_for(label), 
NumericType) or skipna:
+                    # np.nan takes no effect to the result; None takes no 
effect if `skipna`
+                    all_col = F.min(F.coalesce(scol.cast("boolean"), 
F.lit(True)))
+                else:
+                    # Take None as False when not `skipna`
+                    all_col = F.min(
+                        F.when(scol.isNull(), 
F.lit(False)).otherwise(scol.cast("boolean"))
+                    )
+                applied.append(F.when(all_col.isNull(), 
True).otherwise(all_col))
 
-            if isinstance(self._internal.spark_type_for(label), NumericType) 
or skipna:
-                # np.nan takes no effect to the result; None takes no effect 
if `skipna`
-                all_col = F.min(F.coalesce(scol.cast("boolean"), F.lit(True)))
-            else:
-                # Take None as False when not `skipna`
-                all_col = F.min(F.when(scol.isNull(), 
F.lit(False)).otherwise(scol.cast("boolean")))
-            applied.append(F.when(all_col.isNull(), True).otherwise(all_col))
+            return self._result_aggregated(column_labels, applied)
+        elif axis == 1:
+            from pyspark.pandas.series import first_series
 
-        return self._result_aggregated(column_labels, applied)
+            sdf = self._internal.spark_frame.select(
+                *self._internal_frame.index_spark_columns,
+                F.least(
+                    *[
+                        F.coalesce(
+                            
self._internal.spark_column_for(label).cast("boolean"),
+                            # pandas treats all NA values as True in `all()`
+                            F.lit(True),

Review Comment:
   I thought so too, and my first commit had this but the tests kept failing. 
Turns out that in pandas docs it says that NA values are treated as True: 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.all.html
   
   <img width="380" height="372" alt="Screenshot 2026-01-12 at 3 34 54 PM" 
src="https://github.com/user-attachments/assets/9f9752ed-7a85-4630-a839-c03905c3a396";
 />
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46165][PS] Add support for pandas.DataFrame.all axis=1 [spark]

Reply via email to