warrenzhu25 commented on a change in pull request #31869:
URL: https://github.com/apache/spark/pull/31869#discussion_r598044400
##########
File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
##########
@@ -786,9 +786,13 @@ private[spark] object ApiHelper {
stageData.accumulatorUpdates.exists { acc => acc.name != null && acc.value
!= null }
}
- def hasInput(stageData: StageData): Boolean = stageData.inputBytes > 0
+ def hasInput(stageData: StageData): Boolean = {
+ stageData.inputBytes > 0 || stageData.inputRecords > 0
Review comment:
I think root cause is as below,
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala#L96:
```
private class MetricsHandler extends Logging with Serializable {
private val inputMetrics = TaskContext.get().taskMetrics().inputMetrics
private val startingBytesRead = inputMetrics.bytesRead
private val getBytesRead =
SparkHadoopUtil.get.getFSBytesReadOnThreadCallback()
def updateMetrics(numRows: Int, force: Boolean = false): Unit = {
inputMetrics.incRecordsRead(numRows)
val shouldUpdateBytesRead =
inputMetrics.recordsRead %
SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0
if (shouldUpdateBytesRead || force) {
inputMetrics.setBytesRead(startingBytesRead + getBytesRead())
}
}
}
```
`DataSourceRDD` only supports updating records from hadoop read
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]