cloud-fan commented on code in PR #52046: URL: https://github.com/apache/spark/pull/52046#discussion_r2303024565
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala: ########## @@ -95,24 +95,82 @@ package object expressions { StructType(attrs.map(a => StructField(a.name, a.dataType, a.nullable, a.metadata))) } - // It's possible that `attrs` is a linked list, which can lead to bad O(n) loops when - // accessing attributes by their ordinals. To avoid this performance penalty, convert the input - // to an array. - @transient private lazy val attrsArray = attrs.toArray + // Compute min and max expression IDs in a single pass + @transient private lazy val minMaxExprId: (Long, Long) = { + if (attrs.isEmpty) { + (0L, -1L) + } else { + var min = Long.MaxValue + var max = Long.MinValue + attrs.foreach { attr => + val id = attr.exprId.id Review Comment: One possible reason that it breaks shuffle reuse: Previous we use `ExprId` as the hash map key, but now we completely ignore `ExprId#jvmId`. We should record the jvmId and return -1 directly in `def indexOf` if the jvmId is different. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org