xuanyuanking commented on a change in pull request #23327: [SPARK-26222][SQL] 
Track file listing time
URL: https://github.com/apache/spark/pull/23327#discussion_r242646521
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 ##########
 @@ -325,8 +327,10 @@ case class FileSourceScanExec(
   override lazy val metrics =
     Map("numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"),
       "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of files"),
-      "metadataTime" -> SQLMetrics.createMetric(sparkContext, "metadata time"),
-      "scanTime" -> SQLMetrics.createTimingMetric(sparkContext, "scan time"))
+      "scanTime" -> SQLMetrics.createTimingMetric(sparkContext, "scan time"),
+      "metadataTime" -> SQLMetrics.createMetric(sparkContext, "metadata time 
(ms)"),
+      "fileListingStart" -> SQLMetrics.createTimestampMetric(sparkContext, 
"file listing start"),
 
 Review comment:
   Here's a problem existing, for the metrics like metadataTime and start/end 
timestamp, they are driver-only, which only create, update and display 
controlled on driver-side. The current usage has little waste on sending 
metrics to the executor and aggregates back. I'll address this in a separate 
JIRA and PR, this is a common work for both SPARK-26222/SPARK-26223.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to