[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-523029568 @cloud-fan I added a test for external partitioned table: https://github.com/apache/spark/pull/24715/

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-19 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-522441286 retest this please This is an automa

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-16 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521943871 retest this please This is an automa

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521867590 I switched to an idle cluster, [PartitioningAwareFileIndex.sizeInBytes](https://github.com/apache/spa

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521683954 > @wangyum do you mean CommandUtils.getSizeInBytesFallBackToHdfs is very slow if there are many files

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-521611355 I did some benchmark. Prepare data: ```scala spark.range(1).repartition(1).w

[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-06-29 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation URL: https://github.com/apache/spark/pull/24715#issuecomment-506934789 retest this please This is an automa