Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

via GitHub Fri, 02 Aug 2024 09:49:00 -0700


virrrat commented on PR #47516:
URL: https://github.com/apache/spark/pull/47516#issuecomment-2265784261


   >  Do you have a simple repro (end-to-end query) to trigger this bug?
   
   Can you please use the below reproducer? This is join between two tables 
that shuffles data. This can be run in a spark-shell.
   
   ```
   import scala.util._
   
   def randString() = Random.alphanumeric take 30 mkString
   
   val x = sc.parallelize(0 until 100000, 100)
   val y = sc.parallelize(100000 until 2000000, 100)
   
   val a = x.map(x => (x,randString()))
   val b = y.map(y => (y,randString()))
   
   val df1 = spark.createDataFrame(a).toDF("col1", "col2")
   val df2 = spark.createDataFrame(b).toDF("col3", "col4")
   
   df1.createOrReplaceTempView("t1")
   df2.createOrReplaceTempView("t2")
   
   
   spark.sql("select * from t1, t2 where t1.col1 = t2.col3").collect
   ```
   
   Attaching screenshots, data in spark UI is not correct and it doesn't match 
between spark UI and history server for Spark `3.5.0`. Data in spark UI for 
Spark `3.3.2` is correct.
   
   `3.5.0` Spark UI: 
[spark_ui_350.pdf](https://github.com/user-attachments/files/16473062/spark_ui_350.pdf)
   `3.5.0` History Server: 
[history_server_350.pdf](https://github.com/user-attachments/files/16473065/history_server_350.pdf)
   `3.3.2` Spark UI: 
[spark_ui_332.pdf](https://github.com/user-attachments/files/16473066/spark_ui_332.pdf)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

Reply via email to