spark git commit: [SPARK-10740] [SQL] handle nondeterministic expressions correctly for set operations

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 1ca5e2e0b -> 5017c685f [SPARK-10740] [SQL] handle nondeterministic expressions correctly for set operations https://issues.apache.org/jira/browse/SPARK-10740 Author: Wenchen Fan Closes #8858 from cloud-fan/non-deter.

spark git commit: [SPARK-10740] [SQL] handle nondeterministic expressions correctly for set operations

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 c3112a92f -> 54334d378 [SPARK-10740] [SQL] handle nondeterministic expressions correctly for set operations https://issues.apache.org/jira/browse/SPARK-10740 Author: Wenchen Fan Closes #8858 from

spark git commit: [SPARK-10704] Rename HashShuffleReader to BlockStoreShuffleReader

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 22d40159e -> 1ca5e2e0b [SPARK-10704] Rename HashShuffleReader to BlockStoreShuffleReader The current shuffle code has an interface named ShuffleReader with only one implementation, HashShuffleReader. This naming is confusing, since the

spark git commit: [SPARK-10577] [PYSPARK] DataFrame hint for broadcast join

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master bf20d6c9f -> 0180b849d [SPARK-10577] [PYSPARK] DataFrame hint for broadcast join https://issues.apache.org/jira/browse/SPARK-10577 Author: Jian Feng Closes #8801 from Jianfeng-chs/master. Project:

spark git commit: [Minor] style fix for previous commit f24316e

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master f24316e6d -> fd61b0048 [Minor] style fix for previous commit f24316e Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fd61b004 Tree:

spark git commit: [SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati…

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master f3b727c80 -> 0bd0e5bed [SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati… …on for spark.mesos.constraints parameter. Author: Akash Mishra Closes #8816 from SleepyThread/constraint-fix.

spark git commit: [SPARK-10419] [SQL] Adding SQLServer support for datetimeoffset types to JdbcDialects

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0180b849d -> 781b21ba2 [SPARK-10419] [SQL] Adding SQLServer support for datetimeoffset types to JdbcDialects Reading from Microsoft SQL Server over jdbc fails when the table contains datetimeoffset types. This patch registers a

spark git commit: [SPARK-8567] [SQL] Increase the timeout of o.a.s.sql.hive.HiveSparkSubmitSuite to 5 minutes.

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 d0e6e5312 -> 03215e3e8 [SPARK-8567] [SQL] Increase the timeout of o.a.s.sql.hive.HiveSparkSubmitSuite to 5 minutes. https://issues.apache.org/jira/browse/SPARK-8567 Looks like "SPARK-8368: includes jars passed in through --jars" is

spark git commit: [SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 f83b6e625 -> d0e6e5312 [SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs **Note that this PR only for branch 1.5. See #8781 for the solution for Spark master.** The job group,

spark git commit: [SPARK-10458] [SPARK CORE] Added isStopped() method in SparkContext

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 1fcefef06 -> f24316e6d [SPARK-10458] [SPARK CORE] Added isStopped() method in SparkContext Added isStopped() method in SparkContext Author: Madhusudanan Kandasamy Closes #8749 from kmadhugit/SPARK-10458.

spark git commit: [SQL] [MINOR] map -> foreach.

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 03215e3e8 -> a2b0fee7b [SQL] [MINOR] map -> foreach. DataFrame.explain should use foreach to print the explain content. Author: Reynold Xin Closes #8862 from rxin/map-foreach. (cherry picked from commit

spark git commit: [SQL] [MINOR] map -> foreach.

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 4da32bc0e -> f3b727c80 [SQL] [MINOR] map -> foreach. DataFrame.explain should use foreach to print the explain content. Author: Reynold Xin Closes #8862 from rxin/map-foreach. Project:

spark git commit: [SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati…

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 a2b0fee7b -> 646155e6e [SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati… …on for spark.mesos.constraints parameter. Author: Akash Mishra Closes #8816 from

spark git commit: [SPARK-9821] [PYSPARK] pyspark-reduceByKey-should-take-a-custom-partitioner

2015-09-22 Thread davies
Repository: spark Updated Branches: refs/heads/master c986e933a -> 1cd674157 [SPARK-9821] [PYSPARK] pyspark-reduceByKey-should-take-a-custom-partitioner from the issue: In Scala, I can supply a custom partitioner to reduceByKey (and other aggregation/repartitioning methods like

spark git commit: [SPARK-10716] [BUILD] spark-1.5.0-bin-hadoop2.6.tgz file doesn't uncompress on OS X due to hidden file

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.5 bb8e481bc -> f83b6e625 [SPARK-10716] [BUILD] spark-1.5.0-bin-hadoop2.6.tgz file doesn't uncompress on OS X due to hidden file Remove ._SUCCESS.crc hidden file that may cause problems in distribution tar archive, and is not used

spark git commit: [SPARK-8567] [SQL] Increase the timeout of o.a.s.sql.hive.HiveSparkSubmitSuite to 5 minutes.

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master fd61b0048 -> 4da32bc0e [SPARK-8567] [SQL] Increase the timeout of o.a.s.sql.hive.HiveSparkSubmitSuite to 5 minutes. https://issues.apache.org/jira/browse/SPARK-8567 Looks like "SPARK-8368: includes jars passed in through --jars" is

spark git commit: [SPARK-10446][SQL] Support to specify join type when calling join with usingColumns

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 781b21ba2 -> 1fcefef06 [SPARK-10446][SQL] Support to specify join type when calling join with usingColumns JIRA: https://issues.apache.org/jira/browse/SPARK-10446 Currently the method `join(right: DataFrame, usingColumns: Seq[String])`

spark git commit: [SPARK-10750] [ML] ML Param validate should print better error information

2015-09-22 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master f4a3c4e34 -> 7104ee0e5 [SPARK-10750] [ML] ML Param validate should print better error information Currently when you set illegal value for params of array type (such as IntArrayParam, DoubleArrayParam, StringArrayParam), it will throw

spark git commit: [SPARK-10593] [SQL] fix resolve output of Generate

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 646155e6e -> c3112a92f [SPARK-10593] [SQL] fix resolve output of Generate The output of Generate should not be resolved as Reference. Author: Davies Liu Closes #8755 from davies/view. (cherry picked from

spark git commit: [SPARK-9962] [ML] Decision Tree training: prevNodeIdsForInstances.unpersist() at end of training

2015-09-22 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 870b8a2ed -> f4a3c4e34 [SPARK-9962] [ML] Decision Tree training: prevNodeIdsForInstances.unpersist() at end of training NodeIdCache: prevNodeIdsForInstances.unpersist() needs to be called at end of training. Author: Holden Karau

spark git commit: [SPARK-9585] Delete the input format caching because some input format are non thread safe

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7104ee0e5 -> 2ea0f2e11 [SPARK-9585] Delete the input format caching because some input format are non thread safe If we cache the InputFormat, all tasks on the same executor will share it. Some InputFormat is thread safety, but some are

spark git commit: [SPARK-10672] [SQL] Do not fail when we cannot save the metadata of a data source table in a hive compatible way

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 54334d378 -> d83dcc9a0 [SPARK-10672] [SQL] Do not fail when we cannot save the metadata of a data source table in a hive compatible way https://issues.apache.org/jira/browse/SPARK-10672 With changes in this PR, we will fallback to

spark git commit: [SPARK-10672] [SQL] Do not fail when we cannot save the metadata of a data source table in a hive compatible way

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5017c685f -> 2204cdb28 [SPARK-10672] [SQL] Do not fail when we cannot save the metadata of a data source table in a hive compatible way https://issues.apache.org/jira/browse/SPARK-10672 With changes in this PR, we will fallback to same

spark git commit: [SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator (branch-1.3 backport)

2015-09-22 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 64730a3de -> e54525f4a [SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in OutputCommitCoordinator (branch-1.3 backport) This is a backport of #8544 to `branch-1.3` for inclusion in 1.3.2. Author: Josh Rosen

spark git commit: [SPARK-10640] History server fails to parse TaskCommitDenied

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 118ebd405 -> 26187ab74 [SPARK-10640] History server fails to parse TaskCommitDenied ... simply because the code is missing! Author: Andrew Or Closes #8828 from andrewor14/task-end-reason-json. Conflicts:

spark git commit: [SPARK-10310] [SQL] Fixes script transformation field/line delimiters

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 61d4c07f4 -> 84f81e035 [SPARK-10310] [SQL] Fixes script transformation field/line delimiters **Please attribute this PR to `Zhichao Li `.** This PR is based on PR #8476 authored by zhichao-li. It fixes SPARK-10310 by adding field

spark git commit: [SPARK-10737] [SQL] When using UnsafeRows, SortMergeJoin may return wrong results

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 d83dcc9a0 -> 6b1e5c2db [SPARK-10737] [SQL] When using UnsafeRows, SortMergeJoin may return wrong results https://issues.apache.org/jira/browse/SPARK-10737 Author: Yin Huai Closes #8854 from yhuai/SMJBug.

spark git commit: [SPARK-10737] [SQL] When using UnsafeRows, SortMergeJoin may return wrong results

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2204cdb28 -> 5aea987c9 [SPARK-10737] [SQL] When using UnsafeRows, SortMergeJoin may return wrong results https://issues.apache.org/jira/browse/SPARK-10737 Author: Yin Huai Closes #8854 from yhuai/SMJBug.

spark git commit: [SPARK-10714] [SPARK-8632] [SPARK-10685] [SQL] Refactor Python UDF handling

2015-09-22 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 5aea987c9 -> a96ba40f7 [SPARK-10714] [SPARK-8632] [SPARK-10685] [SQL] Refactor Python UDF handling This patch refactors Python UDF handling: 1. Extract the per-partition Python UDF calling logic from PythonRDD into a PythonRunner.

spark git commit: [SPARK-10714] [SPARK-8632] [SPARK-10685] [SQL] Refactor Python UDF handling

2015-09-22 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.5 6b1e5c2db -> 3339916ef [SPARK-10714] [SPARK-8632] [SPARK-10685] [SQL] Refactor Python UDF handling This patch refactors Python UDF handling: 1. Extract the per-partition Python UDF calling logic from PythonRDD into a PythonRunner.

spark git commit: [SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method.

2015-09-22 Thread meng
Repository: spark Updated Branches: refs/heads/master 84f81e035 -> 558e9c7e6 [SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method. The Scala example under the "Example: Pipeline" heading in this document initializes the "test" variable to a DataFrame. Because test is already

spark git commit: [SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method.

2015-09-22 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.5 73d062184 -> 7f07cc6d0 [SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method. The Scala example under the "Example: Pipeline" heading in this document initializes the "test" variable to a DataFrame. Because test is

spark git commit: [SPARK-10652] [SPARK-10742] [STREAMING] Set meaningful job descriptions for all streaming jobs

2015-09-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 558e9c7e6 -> 5548a2547 [SPARK-10652] [SPARK-10742] [STREAMING] Set meaningful job descriptions for all streaming jobs Here is the screenshot after adding the job descriptions to threads that run receivers and the scheduler thread running

spark git commit: [SPARK-10652] [SPARK-10742] [STREAMING] Set meaningful job descriptions for all streaming jobs

2015-09-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 7f07cc6d0 -> 8a23ef59b [SPARK-10652] [SPARK-10742] [STREAMING] Set meaningful job descriptions for all streaming jobs Here is the screenshot after adding the job descriptions to threads that run receivers and the scheduler thread

spark git commit: [SPARK-10310] [SQL] Fixes script transformation field/line delimiters

2015-09-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 26187ab74 -> 73d062184 [SPARK-10310] [SQL] Fixes script transformation field/line delimiters **Please attribute this PR to `Zhichao Li `.** This PR is based on PR #8476 authored by zhichao-li. It fixes SPARK-10310 by adding field

spark git commit: [SPARK-10640] History server fails to parse TaskCommitDenied

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master a96ba40f7 -> 61d4c07f4 [SPARK-10640] History server fails to parse TaskCommitDenied ... simply because the code is missing! Author: Andrew Or Closes #8828 from andrewor14/task-end-reason-json. Project:

spark git commit: [SPARK-10640] History server fails to parse TaskCommitDenied

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 3339916ef -> 5ffd0841e [SPARK-10640] History server fails to parse TaskCommitDenied ... simply because the code is missing! Author: Andrew Or Closes #8828 from andrewor14/task-end-reason-json. Conflicts:

spark git commit: Revert "[SPARK-10640] History server fails to parse TaskCommitDenied"

2015-09-22 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.5 5ffd0841e -> 118ebd405 Revert "[SPARK-10640] History server fails to parse TaskCommitDenied" This reverts commit 5ffd0841e016301807b0a008af7c3346e9f59e7a. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-10716] [BUILD] spark-1.5.0-bin-hadoop2.6.tgz file doesn't uncompress on OS X due to hidden file

2015-09-22 Thread rxin
Repository: spark Updated Branches: refs/heads/master 1cd674157 -> bf20d6c9f [SPARK-10716] [BUILD] spark-1.5.0-bin-hadoop2.6.tgz file doesn't uncompress on OS X due to hidden file Remove ._SUCCESS.crc hidden file that may cause problems in distribution tar archive, and is not used Author:

spark git commit: [SPARK-10706] [MLLIB] Add java wrapper for random vector rdd

2015-09-22 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7278f792a -> 870b8a2ed [SPARK-10706] [MLLIB] Add java wrapper for random vector rdd Add java wrapper for random vector rdd holdenk srowen Author: Meihua Wu Closes #8841 from rotationsymmetry/SPARK-10706. Project:

spark git commit: [SPARK-10718] [BUILD] Update License on conf files and corresponding excludes file update

2015-09-22 Thread srowen
Repository: spark Updated Branches: refs/heads/master 0bd0e5bed -> 7278f792a [SPARK-10718] [BUILD] Update License on conf files and corresponding excludes file update Update License on conf files and corresponding excludes file update Author: Rekha Joshi Author: Joshi