virrrat commented on PR #47516:
URL: https://github.com/apache/spark/pull/47516#issuecomment-2265784261
> Do you have a simple repro (end-to-end query) to trigger this bug?
Can you please use the below reproducer? This is join between two tables
that shuffles data. This can be run in a spark-shell.
```
import scala.util._
def randString() = Random.alphanumeric take 30 mkString
val x = sc.parallelize(0 until 100000, 100)
val y = sc.parallelize(100000 until 2000000, 100)
val a = x.map(x => (x,randString()))
val b = y.map(y => (y,randString()))
val df1 = spark.createDataFrame(a).toDF("col1", "col2")
val df2 = spark.createDataFrame(b).toDF("col3", "col4")
df1.createOrReplaceTempView("t1")
df2.createOrReplaceTempView("t2")
spark.sql("select * from t1, t2 where t1.col1 = t2.col3").collect
```
Attaching screenshots, data in spark UI is not correct and it doesn't match
between spark UI and history server for Spark `3.5.0`. Data in spark UI for
Spark `3.3.2` is correct.
`3.5.0` Spark UI:
[spark_ui_350.pdf](https://github.com/user-attachments/files/16473062/spark_ui_350.pdf)
`3.5.0` History Server:
[history_server_350.pdf](https://github.com/user-attachments/files/16473065/history_server_350.pdf)
`3.3.2` Spark UI:
[spark_ui_332.pdf](https://github.com/user-attachments/files/16473066/spark_ui_332.pdf)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]