xuanyuanking commented on a change in pull request #23327: [SPARK-26222][SQL]
Track file listing time
URL: https://github.com/apache/spark/pull/23327#discussion_r243938558
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala
##########
@@ -56,11 +60,17 @@ class CatalogFileIndex(
override def listFiles(
partitionFilters: Seq[Expression], dataFilters: Seq[Expression]):
Seq[PartitionDirectory] = {
- filterPartitions(partitionFilters).listFiles(Nil, dataFilters)
+ val (partitions, phase) = createPhaseSummary {
Review comment:
Thank you Wenchen for your review.
Not so hard, but need more code changes on caller class. I choose this just
for little change and code clean. The current approach need call
`createPhaseSummary` here and `InmemoryFileIndex.refresh0`, if changes to
caller side need tracker more place call listFiles and `new InMemoryFileIndex`.
Another problem is how to pass the phase to ScanExec. If we agree still keep
it in FileIndex, need a set function of the phaseSummary for caller side.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]