cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523010988
Does anyone has an answer for
https://github.com/apache/spark/pull/24715#issuecomment-522414331 ?
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-522414331
Actually, can we really fallback to HDFS size for partitioned tables? The
partitions can be
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-522412152
> Then shouldn't we update the table statistics after it computes first
time, and next time
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521868936
let's discuss the perf problem later, and focus on the bug fix first.
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521626600
@wangyum do you mean `CommandUtils.getSizeInBytesFallBackToHdfs` is very
slow if there are many
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521528661
The idea LGTM, can you rebase this PR?