spark git commit: [MINOR][DOC] Minor typo fixes

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master cabd54d93 -> 4f83e442b [MINOR][DOC] Minor typo fixes ## What changes were proposed in this pull request? Minor typo fixes ## How was this patch tested? local build Author: Zheng RuiFeng Closes #12755 from

spark git commit: [SPARK-14829][MLLIB] Deprecate GLM APIs using SGD

2016-04-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 769a909d1 -> cabd54d93 [SPARK-14829][MLLIB] Deprecate GLM APIs using SGD ## What changes were proposed in this pull request? According to the [SPARK-14829](https://issues.apache.org/jira/browse/SPARK-14829), deprecate API of

spark git commit: [SPARK-7264][ML] Parallel lapply for sparkR

2016-04-28 Thread meng
Repository: spark Updated Branches: refs/heads/master 4607f6e7f -> 769a909d1 [SPARK-7264][ML] Parallel lapply for sparkR ## What changes were proposed in this pull request? This PR adds a new function in SparkR called `sparkLapply(list, function)`. This function implements a distributed

spark git commit: [SPARK-14991][SQL] Remove HiveNativeCommand

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6f9a18fe3 -> 4607f6e7f [SPARK-14991][SQL] Remove HiveNativeCommand ## What changes were proposed in this pull request? This patch removes HiveNativeCommand, so we can continue to remove the dependency on Hive. This pull request also

spark git commit: [HOTFIX][CORE] fix a concurrence issue in NewAccumulator

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9c7c42bc6 -> 6f9a18fe3 [HOTFIX][CORE] fix a concurrence issue in NewAccumulator ## What changes were proposed in this pull request? `AccumulatorContext` is not thread-safe, that's why all of its methods are synchronized. However, there

spark git commit: Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local"

2016-04-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2398e3d69 -> 9c7c42bc6 Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local" This reverts commit dae538a4d7c36191c1feb02ba87ffc624ab960dc. Project:

spark git commit: [SPARK-14836][YARN] Zip all the jars before uploading to distributed cache

2016-04-28 Thread vanzin
Repository: spark Updated Branches: refs/heads/master 4f4721a21 -> 2398e3d69 [SPARK-14836][YARN] Zip all the jars before uploading to distributed cache ## What changes were proposed in this pull request? Currently if neither `spark.yarn.jars` nor `spark.yarn.archive` is set (by default),

spark git commit: [SPARK-14862][ML] Updated Classifiers to not require labelCol metadata

2016-04-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master dae538a4d -> 4f4721a21 [SPARK-14862][ML] Updated Classifiers to not require labelCol metadata ## What changes were proposed in this pull request? Updated Classifier, DecisionTreeClassifier, RandomForestClassifier, GBTClassifier to not

spark git commit: [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local

2016-04-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 78c8aaf84 -> dae538a4d [SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in

spark git commit: [SPARK-14555] Second cut of Python API for Structured Streaming

2016-04-28 Thread tdas
Repository: spark Updated Branches: refs/heads/master d584a2b8a -> 78c8aaf84 [SPARK-14555] Second cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This PR adds Python APIs for: - `ContinuousQueryManager` - `ContinuousQueryException` The

spark git commit: [SPARK-12810][PYSPARK] PySpark CrossValidatorModel should support avgMetrics

2016-04-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 0ee5419b6 -> d584a2b8a [SPARK-12810][PYSPARK] PySpark CrossValidatorModel should support avgMetrics ## What changes were proposed in this pull request? support avgMetrics in CrossValidatorModel with Python ## How was this patch tested?

spark git commit: [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a directory if there is user specified schema

2016-04-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d5ab42ceb -> 0ee5419b6 [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a directory if there is user specified schema ## What changes were proposed in this pull request? The FileCatalog object gets created even if the

spark git commit: [SPARK-14916][MLLIB] A more friendly tostring for FreqItemset in mllib.fpm

2016-04-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master 5ee72454d -> d5ab42ceb [SPARK-14916][MLLIB] A more friendly tostring for FreqItemset in mllib.fpm ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-14916 FreqItemset as the result of

spark git commit: [SPARK-14965][SQL] Indicate an exception is thrown for a missing struct field

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.6 f4af6a8b3 -> 8ac0ce6dd [SPARK-14965][SQL] Indicate an exception is thrown for a missing struct field ## What changes were proposed in this pull request? Fix to ScalaDoc for StructType. ## How was this patch tested? Built locally.

spark git commit: [SPARK-14965][SQL] Indicate an exception is thrown for a missing struct field

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 89addd40a -> 12c360c05 [SPARK-14965][SQL] Indicate an exception is thrown for a missing struct field ## What changes were proposed in this pull request? Fix to ScalaDoc for StructType. ## How was this patch tested? Built locally.

spark git commit: [SPARK-14852][ML] refactored GLM summary into training, non-training summaries

2016-04-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 12c360c05 -> 5ee72454d [SPARK-14852][ML] refactored GLM summary into training, non-training summaries ## What changes were proposed in this pull request? This splits GeneralizedLinearRegressionSummary into 2 summary types: *

spark git commit: [SPARK-14945][PYTHON] SparkSession Python API

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5743352a2 -> 89addd40a [SPARK-14945][PYTHON] SparkSession Python API ## What changes were proposed in this pull request? ``` Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ /

spark git commit: [SPARK-14935][CORE] DistributedSuite "local-cluster format" shouldn't actually launch clusters

2016-04-28 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master bed0b0020 -> 5743352a2 [SPARK-14935][CORE] DistributedSuite "local-cluster format" shouldn't actually launch clusters https://issues.apache.org/jira/browse/SPARK-14935 In DistributedSuite, the "local-cluster format" test actually

spark git commit: [SPARK-14882][DOCS] Clarify that Spark can be cross-built for other Scala versions

2016-04-28 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8b44bd52f -> bed0b0020 [SPARK-14882][DOCS] Clarify that Spark can be cross-built for other Scala versions ## What changes were proposed in this pull request? Add simple clarification that Spark can be cross-built for other Scala

spark git commit: [SPARK-6735][YARN] Add window based executor failure tracking mechanism for long running service

2016-04-28 Thread tgraves
Repository: spark Updated Branches: refs/heads/master 9e785079b -> 8b44bd52f [SPARK-6735][YARN] Add window based executor failure tracking mechanism for long running service This work is based on twinkle-sachdeva 's proposal. In parallel to such mechanism for AM failures, here add similar

spark git commit: [SPARK-12235][SPARKR] Enhance mutate() to support replace existing columns.

2016-04-28 Thread shivaram
Repository: spark Updated Branches: refs/heads/master 23256be0d -> 9e785079b [SPARK-12235][SPARKR] Enhance mutate() to support replace existing columns. Make the behavior of mutate more consistent with that in dplyr, besides support for replacing existing columns. 1. Throw error message when

spark git commit: [SPARK-14576][WEB UI] Spark console should display Web UI url

2016-04-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7c6937a88 -> 23256be0d [SPARK-14576][WEB UI] Spark console should display Web UI url ## What changes were proposed in this pull request? This is a proposal to print the Spark Driver UI link when spark-shell is launched. ## How was this

spark git commit: [SPARK-14487][SQL] User Defined Type registration without SQLUserDefinedType annotation

2016-04-28 Thread meng
Repository: spark Updated Branches: refs/heads/master bf5496dbd -> 7c6937a88 [SPARK-14487][SQL] User Defined Type registration without SQLUserDefinedType annotation ## What changes were proposed in this pull request? Currently we use `SQLUserDefinedType` annotation to register UDTs for user

[3/3] spark git commit: [SPARK-14654][CORE] New accumulator API

2016-04-28 Thread rxin
[SPARK-14654][CORE] New accumulator API ## What changes were proposed in this pull request? This PR introduces a new accumulator API which is much simpler than before: 1. the type hierarchy is simplified, now we only have an `Accumulator` class 2. Combine `initialValue` and `zeroValue`

[2/3] spark git commit: [SPARK-14654][CORE] New accumulator API

2016-04-28 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/bf5496db/core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala