spark git commit: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square feature selection

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 6f341310b - c081b21b1 [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square feature selection The following is implemented: 1) generic traits for feature selection and filtering 2) trait for feature selection of LabeledPoint with discrete data 3)

spark git commit: [SPARK-5530] Add executor container to executorIdToContainer

2015-02-02 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 3f941b68a - 62a93a169 [SPARK-5530] Add executor container to executorIdToContainer when call killExecutor method, it will only go to the else branch, because the variable executorIdToContainer never be put any value. Author: Xutingjun

[2/2] spark git commit: Make sure only owner can read / write to directories created for the job.

2015-02-02 Thread joshrosen
Make sure only owner can read / write to directories created for the job. Whenever a directory is created by the utility method, immediately restrict its permissions so that only the owner has access to its contents. Signed-off-by: Josh Rosen joshro...@databricks.com Project:

spark git commit: SPARK-5425: Use synchronised methods in system properties to create SparkConf

2015-02-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master bff65b5cc - 5a5526164 SPARK-5425: Use synchronised methods in system properties to create SparkConf SPARK-5425: Fixed usages of system properties This patch fixes few problems caused by the fact that the Scala wrapper over system

spark git commit: SPARK-4585. Spark dynamic executor allocation should use minExecutors as...

2015-02-02 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master c081b21b1 - b2047b55c SPARK-4585. Spark dynamic executor allocation should use minExecutors as... ... initial number Author: Sandy Ryza sa...@cloudera.com Closes #4051 from sryza/sandy-spark-4585 and squashes the following commits:

[1/2] spark git commit: [SPARK-5212][SQL] Add support of schema-less, custom field delimiter and SerDe for HiveQL transform

2015-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 62a93a169 - 683e93824 http://git-wip-us.apache.org/repos/asf/spark/blob/683e9382/sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala -- diff --git

spark git commit: [HOTFIX] Add jetty references to build for YARN module.

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/master e908322cd - 2321dd1ef [HOTFIX] Add jetty references to build for YARN module. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2321dd1e Tree:

spark git commit: [SPARK-5514] DataFrame.collect should call executeCollect

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master dca6faa29 - 8aa3cfff6 [SPARK-5514] DataFrame.collect should call executeCollect Author: Reynold Xin r...@databricks.com Closes #4313 from rxin/SPARK-5514 and squashes the following commits: e34e91b [Reynold Xin] [SPARK-5514]

spark git commit: [SPARK-5534] [graphx] Graph getStorageLevel fix

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 8aa3cfff6 - f133dece5 [SPARK-5534] [graphx] Graph getStorageLevel fix This fixes getStorageLevel for EdgeRDDImpl and VertexRDDImpl (and therefore for Graph). See code example on JIRA which failed before but works with this patch:

[1/2] spark git commit: Spark 3883: SSL support for HttpServer and Akka

2015-02-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master ef65cf09b - cfea30037 http://git-wip-us.apache.org/repos/asf/spark/blob/cfea3003/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index

spark git commit: [SPARK-5461] [graphx] Add isCheckpointed, getCheckpointedFiles methods to Graph

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 5a5526164 - 842d00032 [SPARK-5461] [graphx] Add isCheckpointed, getCheckpointedFiles methods to Graph Added the 2 methods to Graph and GraphImpl. Both make calls to the underlying vertex and edge RDDs. This is needed for another PR (for

spark git commit: [SPARK-5513][MLLIB] Add nonnegative option to ml's ALS

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 1646f89d9 - 46d50f151 [SPARK-5513][MLLIB] Add nonnegative option to ml's ALS This PR ports the NNLS solver to the new ALS implementation. CC: coderxiang Author: Xiangrui Meng m...@databricks.com Closes #4302 from mengxr/SPARK-5513 and

spark git commit: [SPARK-5540] hide ALS.solveLeastSquares

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master f133dece5 - ef65cf09b [SPARK-5540] hide ALS.solveLeastSquares This method survived the code review and it has been there since v1.1.0. It exposes jblas types. Let's remove it from the public API. I think no one calls it directly.

spark git commit: Revert [SPARK-4508] [SQL] build native date type to conform behavior to Hive

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/master cfea30037 - eccb9fbb2 Revert [SPARK-4508] [SQL] build native date type to conform behavior to Hive This reverts commit 1646f89d967913ee1f231d9606f8502d13c25804. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: SPARK-5500. Document that feeding hadoopFile into a shuffle operation wi...

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 842d00032 - 830934976 SPARK-5500. Document that feeding hadoopFile into a shuffle operation wi... ...ll cause problems Author: Sandy Ryza sa...@cloudera.com Closes #4293 from sryza/sandy-spark-5500 and squashes the following commits:

spark git commit: [SPARK-2309][MLlib] Multinomial Logistic Regression

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 46d50f151 - b1aa8fe98 [SPARK-2309][MLlib] Multinomial Logistic Regression #1379 is automatically closed by asfgit, and github can not reopen it once it's closed, so this will be the new PR. Binary Logistic Regression can be extended to

spark git commit: [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the MetastoreRelation's sameresult method only compare databasename and table name)

2015-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 b978c9fee - 54864403c [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the MetastoreRelation's sameresult method only compare databasename and table name) override the MetastoreRelation's sameresult method only compare

spark git commit: Revert [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the MetastoreRelation's sameresult method only compare databasename and table name)

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 54864403c - 88e0f2d5c Revert [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the MetastoreRelation's sameresult method only compare databasename and table name) This reverts commit 54864403c4f132d9c1380c015122a849dd44dff8.

[1/2] spark git commit: Preparing Spark release v1.2.1-rc3

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 a64c7a87c - 591cd8393 Preparing Spark release v1.2.1-rc3 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6eaf77d Tree:

[2/2] spark git commit: Revert Preparing Spark release v1.2.1-rc2

2015-02-02 Thread pwendell
Revert Preparing Spark release v1.2.1-rc2 This reverts commit b77f87673d1f9f03d4c83cf583158227c551359b. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a64c7a87 Tree:

[1/2] spark git commit: Revert Preparing development version 1.2.2-SNAPSHOT

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 88e0f2d5c - a64c7a87c Revert Preparing development version 1.2.2-SNAPSHOT This reverts commit 0a16abadc59082b7d3a24d7f3625236658632813. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

Git Push Summary

2015-02-02 Thread pwendell
Repository: spark Updated Tags: refs/tags/v1.2.1-rc3 [created] b6eaf77d4 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the MetastoreRelation's sameresult method only compare databasename and table name)

2015-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master b1aa8fe98 - dca6faa29 [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the MetastoreRelation's sameresult method only compare databasename and table name) override the MetastoreRelation's sameresult method only compare

[2/2] spark git commit: [SQL] Improve DataFrame API error reporting

2015-02-02 Thread rxin
[SQL] Improve DataFrame API error reporting 1. Throw UnsupportedOperationException if a Column is not computable. 2. Perform eager analysis on DataFrame so we can catch errors when they happen (not when an action is run). Author: Reynold Xin r...@databricks.com Author: Davies Liu

[1/2] spark git commit: [SQL] Improve DataFrame API error reporting

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master eccb9fbb2 - 554403fd9 http://git-wip-us.apache.org/repos/asf/spark/blob/554403fd/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala -- diff --git

spark git commit: SPARK-3996: Add jetty servlet and continuations.

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 0ef38f5fa - 7930d2bef SPARK-3996: Add jetty servlet and continuations. These are needed transitively from the other Jetty libraries we include. It was not picked up by unit tests because we disable the UI. Author: Patrick Wendell

spark git commit: [Doc] Minor: Fixes several formatting issues

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7930d2bef - 60f67e7a1 [Doc] Minor: Fixes several formatting issues Fixes several minor formatting issues in the [Continuous Compilation] [1] section. [1]: http://spark.apache.org/docs/latest/building-spark.html#continuous-compilation

spark git commit: [SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs

2015-02-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master eb0da6c4b - c31c36c4a [SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs .this was https://github.com/apache/spark/pull/2676 https://issues.apache.org/jira/browse/SPARK-3778 This affects if someone is trying

spark git commit: [SPARK-5219][Core] Add locks to avoid scheduling race conditions

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 60f67e7a1 - c306555f4 [SPARK-5219][Core] Add locks to avoid scheduling race conditions Author: zsxwing zsxw...@gmail.com Closes #4019 from zsxwing/SPARK-5219 and squashes the following commits: 36a8b4e [zsxwing] Add locks to avoid race

spark git commit: [SPARK-4979][MLLIB] Streaming logisitic regression

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master c306555f4 - eb0da6c4b [SPARK-4979][MLLIB] Streaming logisitic regression This adds support for streaming logistic regression with stochastic gradient descent, in the same manner as the existing implementation of streaming linear

spark git commit: [SPARK-5012][MLLib][PySpark]Python API for Gaussian Mixture Model

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master c31c36c4a - 50a1a874e [SPARK-5012][MLLib][PySpark]Python API for Gaussian Mixture Model Python API for the Gaussian Mixture Model clustering algorithm in MLLib. Author: FlytxtRnD meethu.mat...@flytxt.com Closes #4059 from

spark git commit: [SPARK-5414] Add SparkFirehoseListener class for consuming all SparkListener events

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 13531dd97 - b8ebebeaa [SPARK-5414] Add SparkFirehoseListener class for consuming all SparkListener events There isn't a good way to write a SparkListener that receives all SparkListener events and which will be future-compatible (e.g. it

spark git commit: [SPARK-5536] replace old ALS implementation by the new one

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master b8ebebeaa - 0cc7b88c9 [SPARK-5536] replace old ALS implementation by the new one The only issue is that `analyzeBlock` is removed, which was marked as a developer API. I didn't change other tests in the ALSSuite under `spark.mllib` to

spark git commit: [SPARK-1405] [mllib] Latent Dirichlet Allocation (LDA) using EM

2015-02-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 0cc7b88c9 - 980764f3c [SPARK-1405] [mllib] Latent Dirichlet Allocation (LDA) using EM **This PR introduces an API + simple implementation for Latent Dirichlet Allocation (LDA).** The [design doc for this

spark git commit: [SPARK-5501][SPARK-5420][SQL] Write support for the data source API

2015-02-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 50a1a874e - 13531dd97 [SPARK-5501][SPARK-5420][SQL] Write support for the data source API This PR aims to support `INSERT INTO/OVERWRITE TABLE tableName` and `CREATE TABLE tableName AS SELECT` for the data source API (partitioned tables

spark git commit: [Docs] Fix Building Spark link text

2015-02-02 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master f5e63751f - 3f941b68a [Docs] Fix Building Spark link text Author: Nicholas Chammas nicholas.cham...@gmail.com Closes #4312 from nchammas/patch-2 and squashes the following commits: 9d943aa [Nicholas Chammas] [Docs] Fix Building Spark

spark git commit: Disabling Utils.chmod700 for Windows

2015-02-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 00746a5c9 - b978c9fee Disabling Utils.chmod700 for Windows This patch makes Spark 1.2.1rc2 work again on Windows. Without it you get following log output on creating a Spark context: INFO org.apache.spark.SparkEnv:59 - Registering

spark git commit: [SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python

2015-02-02 Thread tdas
Repository: spark Updated Branches: refs/heads/master 554403fd9 - 0561c4544 [SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python This PR brings the Python API for Spark Streaming Kafka data source. ``` class KafkaUtils(__builtin__.object) | Static methods defined

spark git commit: [SPARK-5472][SQL] A JDBC data source for Spark SQL.

2015-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1bcd46574 - 8f471a66d [SPARK-5472][SQL] A JDBC data source for Spark SQL. This pull request contains a Spark SQL data source that can pull data from, and can put data into, a JDBC database. I have tested both read and write support with

spark git commit: SPARK-5492. Thread statistics can break with older Hadoop versions

2015-02-02 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 63dfe21dc - 6f341310b SPARK-5492. Thread statistics can break with older Hadoop versions Author: Sandy Ryza sa...@cloudera.com Closes #4305 from sryza/sandy-spark-5492 and squashes the following commits: b7d4497 [Sandy Ryza] SPARK-5492.