[GitHub] [spark] warrenzhu25 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

GitBox Fri, 19 Mar 2021 18:42:40 -0700


warrenzhu25 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r598044400




##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
     stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value 
!= null }
   }
 
-  def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+  def hasInput(stageData: StageData): Boolean = {
+    stageData.inputBytes > 0 || stageData.inputRecords > 0

Review comment:
       I think root cause is as below, 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala#L96:
   ```
   private class MetricsHandler extends Logging with Serializable {
     private val inputMetrics = TaskContext.get().taskMetrics().inputMetrics
     private val startingBytesRead = inputMetrics.bytesRead
     private val getBytesRead = 
SparkHadoopUtil.get.getFSBytesReadOnThreadCallback()
   
     def updateMetrics(numRows: Int, force: Boolean = false): Unit = {
       inputMetrics.incRecordsRead(numRows)
       val shouldUpdateBytesRead =
         inputMetrics.recordsRead % 
SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0
       if (shouldUpdateBytesRead || force) {
         inputMetrics.setBytesRead(startingBytesRead + getBytesRead())
       }
     }
   }
   ```
   `DataSourceRDD` only supports updating records from hadoop read




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] warrenzhu25 commented on a change in pull request #31869: [SPARK-34777][UI] StagePage input size records not show when records greater than zero

Reply via email to