Emil Ejbyfeldt created SPARK-45592: -------------------------------------- Summary: AQE and InMemoryTableScanExec correctness bug Key: SPARK-45592 URL: https://issues.apache.org/jira/browse/SPARK-45592 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Emil Ejbyfeldt
The following query should return 1000000 {code:java} import org.apache.spark.storage.StorageLevelval df = spark.range(0, 1000000, 1, 5).map(l => (l, l)) val ee = df.select($"_1".as("src"), $"_2".as("dst")) .persist(StorageLevel.MEMORY_AND_DISK) ee.count() val minNbrs1 = ee .groupBy("src").agg(min(col("dst")).as("min_number")) .persist(StorageLevel.MEMORY_AND_DISK) val join = ee.join(minNbrs1, "src") join.count(){code} but on spark 3.5.0 there is a correctness bug causing it to return `104800` or some other smaller value. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org