arnavgupta03 commented on code in PR #52046:
URL: https://github.com/apache/spark/pull/52046#discussion_r2294202755


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala:
##########
@@ -95,24 +95,82 @@ package object expressions  {
       StructType(attrs.map(a => StructField(a.name, a.dataType, a.nullable, 
a.metadata)))
     }
 
-    // It's possible that `attrs` is a linked list, which can lead to bad O(n) 
loops when
-    // accessing attributes by their ordinals. To avoid this performance 
penalty, convert the input
-    // to an array.
-    @transient private lazy val attrsArray = attrs.toArray
+    // Compute min and max expression IDs in a single pass
+    @transient private lazy val minMaxExprId: (Long, Long) = {
+      if (attrs.isEmpty) {
+        (0L, -1L)
+      } else {
+        var min = Long.MaxValue
+        var max = Long.MinValue
+        attrs.foreach { attr =>
+          val id = attr.exprId.id
+          if (id < min) min = id
+          if (id > max) max = id
+        }
+        (min, max)
+      }
+    }
+
+    // Extract as primitive fields to avoid boxing on access
+    @transient private lazy val minExprId: Long = minMaxExprId._1
+    @transient private lazy val maxExprId: Long = minMaxExprId._2
+
+    // Create a directly indexed array with the min and max expression
+    // IDs as an offset.
+    @transient private lazy val ordinalArrays: (Array[Int], Array[Attribute]) 
= {
+      if (attrs.isEmpty) {
+        (Array.empty[Int], Array.empty[Attribute])
+      } else if (
+        maxExprId - minExprId > Int.MaxValue ||  // prevent overflow
+          maxExprId - minExprId > 2 * attrs.length  // in case of sparse 
ExprIds
+      ) {
+        (Array.empty[Int], attrs.toArray)

Review Comment:
   In the fallback case, we might still need `attrsArray` for 
`AttributeSeq.apply`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to