spark git commit: [SPARK-14359] Create built-in functions for typed aggregates in Java

2016-04-04 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7db56244f -> 064623014 [SPARK-14359] Create built-in functions for typed aggregates in Java ## What changes were proposed in this pull request? This adds the corresponding Java static functions for built-in typed aggregates already

spark git commit: [SPARK-14368][PYSPARK] Support python.spark.worker.memory with upper-case unit.

2016-04-04 Thread sarutak
Repository: spark Updated Branches: refs/heads/branch-1.6 91530b09e -> 285cb9c66 [SPARK-14368][PYSPARK] Support python.spark.worker.memory with upper-case unit. ## What changes were proposed in this pull request? This fix tries to address the issue in PySpark where

spark git commit: [SPARK-14386][ML] Changed spark.ml ensemble trees methods to return concrete types

2016-04-04 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master ba24d1ee9 -> 8f50574ab [SPARK-14386][ML] Changed spark.ml ensemble trees methods to return concrete types ## What changes were proposed in this pull request? In spark.ml, GBT and RandomForest expose the trait DecisionTreeModel in the

spark git commit: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7201f033c -> ba24d1ee9 [SPARK-14287] isStreaming method for Dataset With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will

spark git commit: [SPARK-12425][STREAMING] DStream union optimisation

2016-04-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master a172e11cb -> 7201f033c [SPARK-12425][STREAMING] DStream union optimisation Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and preserving the partitioner. Author: Guillaume Poulin Closes

spark git commit: [SPARK-14366] Remove sbt-idea plugin

2016-04-04 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 24d7d2e45 -> a172e11cb [SPARK-14366] Remove sbt-idea plugin ## What changes were proposed in this pull request? Remove sbt-idea plugin as importing sbt project provides much better support. Author: Luciano Resende

spark git commit: [SPARK-13579][BUILD] Stop building the main Spark assembly.

2016-04-04 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 400b2f863 -> 24d7d2e45 [SPARK-13579][BUILD] Stop building the main Spark assembly. This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and

spark git commit: [SPARK-14259] [SQL] Merging small files together based on the cost of opening

2016-04-04 Thread davies
Repository: spark Updated Branches: refs/heads/master cc70f1741 -> 400b2f863 [SPARK-14259] [SQL] Merging small files together based on the cost of opening ## What changes were proposed in this pull request? This PR basically re-do the things in #12068 but with a different model, which

spark git commit: [SPARK-14334] [SQL] add toLocalIterator for Dataset/DataFrame

2016-04-04 Thread davies
Repository: spark Updated Branches: refs/heads/master 714390470 -> cc70f1741 [SPARK-14334] [SQL] add toLocalIterator for Dataset/DataFrame ## What changes were proposed in this pull request? RDD.toLocalIterator() could be used to fetch one partition at a time to reduce the memory usage.

spark git commit: [SPARK-11327][MESOS] Backport dispatcher does not respect all args f…

2016-04-04 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.6 f12f11e57 -> 91530b09e [SPARK-11327][MESOS] Backport dispatcher does not respect all args f… Backport for https://github.com/apache/spark/pull/10370 andrewor14 Author: Jo Voordeckers Closes #12101 from

spark git commit: [SPARK-14358] Change SparkListener from a trait to an abstract class

2016-04-04 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 27dad6f65 -> 714390470 [SPARK-14358] Change SparkListener from a trait to an abstract class ## What changes were proposed in this pull request? Scala traits are difficult to maintain binary compatibility on, and as a result we had to

spark git commit: [SPARK-14364][SPARK] HeartbeatReceiver object should be private

2016-04-04 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 5743c6476 -> 27dad6f65 [SPARK-14364][SPARK] HeartbeatReceiver object should be private ## What changes were proposed in this pull request? It's a mistake that HeartbeatReceiver object was made public in Spark 1.x. ## How was this patch

spark git commit: [SPARK-12981] [SQL] extract Pyhton UDF in physical plan

2016-04-04 Thread davies
Repository: spark Updated Branches: refs/heads/master 855ed44ed -> 5743c6476 [SPARK-12981] [SQL] extract Pyhton UDF in physical plan ## What changes were proposed in this pull request? Currently we extract Python UDFs into a special logical plan EvaluatePython in analyzer, But

spark git commit: [SPARK-13784][ML] Persistence for RandomForestClassifier, RandomForestRegressor

2016-04-04 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 745425332 -> 89f3befab [SPARK-13784][ML] Persistence for RandomForestClassifier, RandomForestRegressor ## What changes were proposed in this pull request? **Main change**: Added save/load for RandomForestClassifier, RandomForestRegressor

spark git commit: [SPARK-14137] [SQL] Cleanup hash join

2016-04-04 Thread davies
Repository: spark Updated Branches: refs/heads/master 0340b3d27 -> 745425332 [SPARK-14137] [SQL] Cleanup hash join ## What changes were proposed in this pull request? This PR did a few cleanup on HashedRelation and HashJoin: 1) Merge HashedRelation and UniqueHashedRelation together 2)

spark git commit: [SPARK-14360][SQL] QueryExecution.debug.codegen() to dump codegen

2016-04-04 Thread hvanhovell
Repository: spark Updated Branches: refs/heads/master 76f3c735a -> 0340b3d27 [SPARK-14360][SQL] QueryExecution.debug.codegen() to dump codegen ## What changes were proposed in this pull request? We recently added the ability to dump the generated code for a given query. However, the method