[GitHub] [spark] LantaoJin commented on a change in pull request #23327: [SPARK-26222][SQL] Track file listing time

GitBox Wed, 16 Oct 2019 06:34:51 -0700

LantaoJin commented on a change in pull request #23327: [SPARK-26222][SQL] 
Track file listing time
URL: https://github.com/apache/spark/pull/23327#discussion_r335477121


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala
 ##########
 @@ -82,4 +83,16 @@ trait FileIndex {
    * to update the metrics.
    */
   def metadataOpsTimeNs: Option[Long] = None
+
+  /**
+   * Returns the latest phase summary of file listing in the current 
FileIndex, we should also
+   * clean the phase summary cause in the scenario of the cached plan, we 
shouldn't report the
+   * old phase summary.
+   * This interface is only overridden in [[InMemoryFileIndex]] and 
[[CatalogFileIndex]], we do
+   * not override this in [[PartitioningAwareFileIndex]] cause all its 
subclass using in scan
+   * node already track file listing time.
+   *
+   * @return An optional phase summary to record the start and end timestamp 
for listing file.
+   */
+  def getAndCleanFileListingPhaseSummary: Option[PhaseSummary] = None
 
 Review comment:
   Just FYI. After patched this method, current Delta-Lake df.show will throw 
`AbstractMethodError`
   ```
   java.lang.AbstractMethodError
     at 
org.apache.spark.sql.execution.FileSourceScanExec.fileListingPhaseSummary$lzycompute(DataSourceScanExec.scala:248)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LantaoJin commented on a change in pull request #23327: [SPARK-26222][SQL] Track file listing time

Reply via email to