spark git commit: [SPARK-22682][SQL] HashExpression does not need to create global variables

2017-12-04 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 295df746e -> a8af4da12 [SPARK-22682][SQL] HashExpression does not need to create global variables ## What changes were proposed in this pull request? It turns out that `HashExpression` can pass around some values via parameter when

spark git commit: [SPARK-22677][SQL] cleanup whole stage codegen for hash aggregate

2017-12-04 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 3887b7eef -> 295df746e [SPARK-22677][SQL] cleanup whole stage codegen for hash aggregate ## What changes were proposed in this pull request? The `HashAggregateExec` whole stage codegen path is a little messy and hard to understand, this

spark git commit: [SPARK-22665][SQL] Avoid repartitioning with empty list of expressions

2017-12-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master 1d5597b40 -> 3887b7eef [SPARK-22665][SQL] Avoid repartitioning with empty list of expressions ## What changes were proposed in this pull request? Repartitioning by empty set of expressions is currently possible, even though it is a case

spark git commit: [SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case

2017-12-04 Thread lixiao
Repository: spark Updated Branches: refs/heads/master e1dd03e42 -> 1d5597b40 [SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case ## What changes were proposed in this pull request? This PR improves documentation for not using zero `numRows` statistics and simplifies

spark git commit: [SPARK-22372][CORE, YARN] Make cluster submission use SparkApplication.

2017-12-04 Thread vanzin
Repository: spark Updated Branches: refs/heads/master f81401e1c -> e1dd03e42 [SPARK-22372][CORE, YARN] Make cluster submission use SparkApplication. The main goal of this change is to allow multiple cluster-mode submissions from the same JVM, without having them end up with mixed

spark git commit: [SPARK-22162] Executors and the driver should use consistent JobIDs in the RDD commit protocol

2017-12-04 Thread vanzin
Repository: spark Updated Branches: refs/heads/master 3927bb9b4 -> f81401e1c [SPARK-22162] Executors and the driver should use consistent JobIDs in the RDD commit protocol I have modified SparkHadoopWriter so that executors and the driver always use consistent JobIds during the hadoop

spark git commit: [SPARK-22473][FOLLOWUP][TEST] Remove deprecated Date functions

2017-12-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 4131ad03f -> 3927bb9b4 [SPARK-22473][FOLLOWUP][TEST] Remove deprecated Date functions ## What changes were proposed in this pull request? #19696 replaced the deprecated usages for `Date` and `Waiter`, but a few methods were missed. The