zhengruifeng commented on code in PR #52393: URL: https://github.com/apache/spark/pull/52393#discussion_r2373755031
########## python/pyspark/pandas/frame.py: ########## @@ -11201,8 +11208,14 @@ def any(self, axis: Axis = 0, bool_only: Optional[bool] = None) -> "Series": applied: List[PySparkColumn] = [] for label in column_labels: scol = self._internal.spark_column_for(label) - any_col = F.max(F.coalesce(scol.cast("boolean"), F.lit(False))) - applied.append(F.when(any_col.isNull(), False).otherwise(any_col)) + if skipna: + # When skipna=True, nulls count as False + any_col = F.max(F.coalesce(scol.cast("boolean"), F.lit(False))) + applied.append(F.when(any_col.isNull(), False).otherwise(any_col)) + else: + # When skipna=False, nulls count as True + any_col = F.max(scol.cast("boolean")) Review Comment: I still feel it is a bit weird to use function `max` here, which relies on the ordering of `null/true/false`. maybe we can use function `some/bool_or` instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org