Re: [PR] [SPARK-53645][PS] Implement `skipna` parameter for ps.DataFrame `any()` [spark]

via GitHub Tue, 23 Sep 2025 20:03:23 -0700


zhengruifeng commented on code in PR #52393:
URL: https://github.com/apache/spark/pull/52393#discussion_r2373755031



##########
python/pyspark/pandas/frame.py:
##########
@@ -11201,8 +11208,14 @@ def any(self, axis: Axis = 0, bool_only: 
Optional[bool] = None) -> "Series":
         applied: List[PySparkColumn] = []
         for label in column_labels:
             scol = self._internal.spark_column_for(label)
-            any_col = F.max(F.coalesce(scol.cast("boolean"), F.lit(False)))
-            applied.append(F.when(any_col.isNull(), False).otherwise(any_col))
+            if skipna:
+                # When skipna=True, nulls count as False
+                any_col = F.max(F.coalesce(scol.cast("boolean"), F.lit(False)))
+                applied.append(F.when(any_col.isNull(), 
False).otherwise(any_col))
+            else:
+                # When skipna=False, nulls count as True
+                any_col = F.max(scol.cast("boolean"))

Review Comment:
   I still feel it is a bit weird to use function `max` here, which relies on 
the ordering of `null/true/false`.
   maybe we can use function `some/bool_or` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-53645][PS] Implement `skipna` parameter for ps.DataFrame `any()` [spark]

Reply via email to