[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

GitBox Thu, 15 Aug 2019 08:30:13 -0700

wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521683954
 
 
   > @wangyum do you mean CommandUtils.getSizeInBytesFallBackToHdfs is very 
slow if there are many files?
   
   `CommandUtils.getSizeInBytesFallBackToHdfs` is not very slow.
   I have no idea why 
[PartitioningAwareFileIndex.sizeInBytes](https://github.com/apache/spark/blob/b276788d57b270d455ef6a7c5ed6cf8a74885dde/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L103)
 is faster than `CommandUtils.getSizeInBytesFallBackToHdfs`.
   It may be related to the cluster load, I plan to switch to an  idle cluster 
to test tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

Reply via email to