wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523029568
@cloud-fan I added a test for external partitioned table:
https://github.com/apache/spark/pull/24715/
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-522441286
retest this please
This is an automa
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521943871
retest this please
This is an automa
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521867590
I switched to an idle cluster,
[PartitioningAwareFileIndex.sizeInBytes](https://github.com/apache/spa
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521683954
> @wangyum do you mean CommandUtils.getSizeInBytesFallBackToHdfs is very
slow if there are many files
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521611355
I did some benchmark.
Prepare data:
```scala
spark.range(1).repartition(1).w
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-506934789
retest this please
This is an automa