[PR] [SPARK-55411][SQL][4.0] SPJ may throw ArrayIndexOutOfBoundsException when join keys are less than cluster keys [spark]

via GitHub Tue, 10 Feb 2026 18:37:56 -0800


pan3793 opened a new pull request, #54260:
URL: https://github.com/apache/spark/pull/54260


   Backport https://github.com/apache/spark/issues/54182 to branch-4.0
   
   ### What changes were proposed in this pull request?
   
   Fix a `java.lang.ArrayIndexOutOfBoundsException` when 
`spark.sql.sources.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled=true`,
 by correcting the `expression`(should pass the full partition expression 
instead of the projected one) passed to `KeyGroupedPartitioning#project`.
   
   Also, fix a test code issue, change the calculation result of 
`BucketTransform` defined at `InMemoryBaseTable.scala` to match 
`BucketFunctions` defined at `transformFunctions.scala` (thanks peter-toth for 
pointing this out!)
   
   ### Why are the changes needed?
   
   It's a bug fix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Some queries that failed when 
`spark.sql.sources.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled=true`
 now run normally.
   
   ### How was this patch tested?
   
   New UT is added, previously it failed with `ArrayIndexOutOfBoundsException`, 
now passed.
   
   ```
   $ build/sbt "sql/testOnly *KeyGroupedPartitioningSuite -- -z SPARK=55411"
   ...
   [info] - bug *** FAILED *** (1 second, 884 milliseconds)
   [info]   java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for 
length 1
   [info]   at 
scala.collection.immutable.ArraySeq$ofRef.apply(ArraySeq.scala:331)
   [info]   at 
org.apache.spark.sql.catalyst.plans.physical.KeyGroupedPartitioning$.$anonfun$project$1(partitioning.scala:471)
   [info]   at 
org.apache.spark.sql.catalyst.plans.physical.KeyGroupedPartitioning$.$anonfun$project$1$adapted(partitioning.scala:471)
   [info]   at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75)
   [info]   at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
   [info]   at 
org.apache.spark.sql.catalyst.plans.physical.KeyGroupedPartitioning$.project(partitioning.scala:471)
   [info]   at 
org.apache.spark.sql.execution.KeyGroupedPartitionedScan.$anonfun$getOutputKeyGroupedPartitioning$5(KeyGroupedPartitionedScan.scala:58)
   ...
   ```
   
   UTs affected by `bucket()` calculate logic change are tuned.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55411][SQL][4.0] SPJ may throw ArrayIndexOutOfBoundsException when join keys are less than cluster keys [spark]

Reply via email to