Hi, I'm wondering why are the metrics repeated in FileSourceScanExec.metrics [1] since it is a ColumnarBatchScan [2] and so inherits the two metrics numOutputRows and scanTime from ColumnarBatchScan.metrics [3].
Shouldn't FileSourceScanExec.metrics be as follows then: override lazy val metrics = super.metrics ++ Map( "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of files"), "metadataTime" -> SQLMetrics.createMetric(sparkContext, "metadata time (ms)")) I'd like to send a pull request with a fix if no one objects. Anyone? [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L315-L319 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L164 [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala#L38-L40 Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski