[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-523010988 Does anyone has an answer for https://github.com/apache/spark/pull/24715#issuecomment-522414331 ?

[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-18 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-522414331 Actually, can we really fallback to HDFS size for partitioned tables? The partitions can be

[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-18 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-522412152 > Then shouldn't we update the table statistics after it computes first time, and next time

[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521868936 let's discuss the perf problem later, and focus on the bug fix first.

[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521626600 @wangyum do you mean `CommandUtils.getSizeInBytesFallBackToHdfs` is very slow if there are many

[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521528661 The idea LGTM, can you rebase this PR?