[GitHub] [spark] cloud-fan commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-14 Thread GitBox
cloud-fan commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-574507348 In general, shuffles are added by Spark and Spark can pick the best # of partitions. However, for user-specified shuff

[GitHub] [spark] cloud-fan commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-13 Thread GitBox
cloud-fan commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-573548617 can we try option 1? Seems like we need to do some experiments here. I'm not sure which option is better without seein

[GitHub] [spark] cloud-fan commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-08 Thread GitBox
cloud-fan commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-572348283 an issue is that at logical phase we don't know the physical partitioning/sorting info (e.g. SMJ), so we can't optimiz