spark git commit: ALS implicit: added missing parameter alpha in doc string

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/master c6e0c2ab1 - cedc3b5aa ALS implicit: added missing parameter alpha in doc string Author: Felix Maximilian Möller felixmaximilian.moel...@immobilienscout24.de Closes #3343 from felixmaximilian/fix-documentation and squashes the following

spark git commit: ALS implicit: added missing parameter alpha in doc string

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 2d3a5a504 - 4f0477d6f ALS implicit: added missing parameter alpha in doc string Author: Felix Maximilian Möller felixmaximilian.moel...@immobilienscout24.de Closes #3343 from felixmaximilian/fix-documentation and squashes the

spark git commit: [SPARK-4435] [MLlib] [PySpark] improve classification

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/master cedc3b5aa - 8fbf72b79 [SPARK-4435] [MLlib] [PySpark] improve classification This PR add setThrehold() and clearThreshold() for LogisticRegressionModel and SVMModel, also support RDD of vector in LogisticRegressionModel.predict(),

spark git commit: [SPARK-4435] [MLlib] [PySpark] improve classification

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 4f0477d6f - a28902f25 [SPARK-4435] [MLlib] [PySpark] improve classification This PR add setThrehold() and clearThreshold() for LogisticRegressionModel and SVMModel, also support RDD of vector in LogisticRegressionModel.predict(),

spark git commit: [SPARK-4396] allow lookup by index in Python's Rating

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 a28902f25 - 48d601f0b [SPARK-4396] allow lookup by index in Python's Rating In PySpark, ALS can take an RDD of (user, product, rating) tuples as input. However, model.predict outputs an RDD of Rating. So on the input side, users can

spark git commit: [SPARK-4396] allow lookup by index in Python's Rating

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/master 8fbf72b79 - b54c6ab3c [SPARK-4396] allow lookup by index in Python's Rating In PySpark, ALS can take an RDD of (user, product, rating) tuples as input. However, model.predict outputs an RDD of Rating. So on the input side, users can use

spark git commit: [SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; use HashedWheelTimer (For branch-1.1)

2014-11-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.1 aa9ebdaa2 - 91b5fa824 [SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks:

spark git commit: [SQL] Support partitioned parquet tables that have the key in both the directory and the file

2014-11-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master b54c6ab3c - 90d72ec85 [SQL] Support partitioned parquet tables that have the key in both the directory and the file Author: Michael Armbrust mich...@databricks.com Closes #3272 from marmbrus/keyInPartitionedTable and squashes the

spark git commit: [SQL] Support partitioned parquet tables that have the key in both the directory and the file

2014-11-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 48d601f0b - 047b45800 [SQL] Support partitioned parquet tables that have the key in both the directory and the file Author: Michael Armbrust mich...@databricks.com Closes #3272 from marmbrus/keyInPartitionedTable and squashes the

spark git commit: [SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name.

2014-11-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 90d72ec85 - bfebfd8b2 [SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name. This PR adds a regression test for SPARK-4434. Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #3326 from

spark git commit: [SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name.

2014-11-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 047b45800 - 9e9111845 [SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name. This PR adds a regression test for SPARK-4434. Author: Kousuke Saruta saru...@oss.nttdata.co.jp Closes #3326 from

spark git commit: [SPARK-4404] remove sys.exit() in shutdown hook

2014-11-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master bfebfd8b2 - 80f317788 [SPARK-4404] remove sys.exit() in shutdown hook If SparkSubmit die first, then bootstrapper will be blocked by shutdown hook. sys.exit() in a shutdown hook will cause some kind of dead lock. cc andrewor14 Author:

spark git commit: [SPARK-4017] show progress bar in console

2014-11-18 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 2d26c6248 - 04b1bdbae [SPARK-4017] show progress bar in console The progress bar will look like this:

spark git commit: [SPARK-4017] show progress bar in console

2014-11-18 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 80f317788 - e34f38ff1 [SPARK-4017] show progress bar in console The progress bar will look like this:

spark git commit: [SPARK-4463] Add (de)select all button for add'l metrics.

2014-11-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master e34f38ff1 - 010bc86e4 [SPARK-4463] Add (de)select all button for add'l metrics. This commit removes the behavior where when a user clicks Show additional metrics on the stage page, all of the additional metrics are automatically selected;

spark git commit: [SPARK-4463] Add (de)select all button for add'l metrics.

2014-11-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.2 04b1bdbae - a93d64c8c [SPARK-4463] Add (de)select all button for add'l metrics. This commit removes the behavior where when a user clicks Show additional metrics on the stage page, all of the additional metrics are automatically

spark git commit: [SPARK-4306] [MLlib] Python API for LogisticRegressionWithLBFGS

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 a93d64c8c - 4ae78abe6 [SPARK-4306] [MLlib] Python API for LogisticRegressionWithLBFGS ``` class LogisticRegressionWithLBFGS | train(cls, data, iterations=100, initialWeights=None, corrections=10, tolerance=0.0001, regParam=0.01,

spark git commit: [SPARK-3721] [PySpark] broadcast objects larger than 2G

2014-11-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 4ae78abe6 - bb7a173d9 [SPARK-3721] [PySpark] broadcast objects larger than 2G This patch will bring support for broadcasting objects larger than 2G. pickle, zlib, FrameSerializer and Array[Byte] all can not support objects larger

spark git commit: [SPARK-3721] [PySpark] broadcast objects larger than 2G

2014-11-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master d2e29516f - 4a377aff2 [SPARK-3721] [PySpark] broadcast objects larger than 2G This patch will bring support for broadcasting objects larger than 2G. pickle, zlib, FrameSerializer and Array[Byte] all can not support objects larger than

spark git commit: [SPARK-4433] fix a racing condition in zipWithIndex

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 bb7a173d9 - bf76164f1 [SPARK-4433] fix a racing condition in zipWithIndex Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in

spark git commit: [SPARK-4433] fix a racing condition in zipWithIndex

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/master 4a377aff2 - bb4604615 [SPARK-4433] fix a racing condition in zipWithIndex Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in

spark git commit: [SPARK-4433] fix a racing condition in zipWithIndex

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 91b5fa824 - ae9b1f690 [SPARK-4433] fix a racing condition in zipWithIndex Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in

spark git commit: [SPARK-4433] fix a racing condition in zipWithIndex

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.0 66d9070bd - f3c8ad044 [SPARK-4433] fix a racing condition in zipWithIndex Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in

spark git commit: [SPARK-4327] [PySpark] Python API for RDD.randomSplit()

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/master bb4604615 - 7f22fa81e [SPARK-4327] [PySpark] Python API for RDD.randomSplit() ``` pyspark.RDD.randomSplit(self, weights, seed=None) Randomly splits this RDD with the provided weights. :param weights: weights for splits, will be

spark git commit: [SPARK-4327] [PySpark] Python API for RDD.randomSplit()

2014-11-18 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 bf76164f1 - 70d9e3871 [SPARK-4327] [PySpark] Python API for RDD.randomSplit() ``` pyspark.RDD.randomSplit(self, weights, seed=None) Randomly splits this RDD with the provided weights. :param weights: weights for splits, will

spark git commit: [SPARK-4468][SQL] Backports #3334 to branch-1.1

2014-11-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 ae9b1f690 - f9739b9c8 [SPARK-4468][SQL] Backports #3334 to branch-1.1 !-- Reviewable:start -- [img src=https://reviewable.io/review_button.png; height=40 alt=Review on Reviewable/](https://reviewable.io/reviews/apache/spark/3338) !--

spark git commit: [SPARK-4468][SQL] Fixes Parquet filter creation for inequality predicates with literals on the left hand side

2014-11-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7f22fa81e - 423baea95 [SPARK-4468][SQL] Fixes Parquet filter creation for inequality predicates with literals on the left hand side For expressions like `10 someVar`, we should create an `Operators.Gt` filter, but right now an

spark git commit: [SPARK-4468][SQL] Fixes Parquet filter creation for inequality predicates with literals on the left hand side

2014-11-18 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 70d9e3871 - 790c8741e [SPARK-4468][SQL] Fixes Parquet filter creation for inequality predicates with literals on the left hand side For expressions like `10 someVar`, we should create an `Operators.Gt` filter, but right now an

spark git commit: [SPARK-4380] Log more precise number of bytes spilled (1.1)

2014-11-18 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.1 f9739b9c8 - e22a75923 [SPARK-4380] Log more precise number of bytes spilled (1.1) This is the branch-1.1 version of #3243. Author: Andrew Or and...@databricks.com Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the

spark git commit: Bumping version to 1.3.0-SNAPSHOT.

2014-11-18 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 423baea95 - 397d3aae5 Bumping version to 1.3.0-SNAPSHOT. Author: Marcelo Vanzin van...@cloudera.com Closes #3277 from vanzin/version-1.3 and squashes the following commits: 7c3c396 [Marcelo Vanzin] Added temp repo to sbt build. 5f404ff

spark git commit: [SPARK-4441] Close Tachyon client when TachyonBlockManager is shutdown

2014-11-18 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 397d3aae5 - 67e9876b3 [SPARK-4441] Close Tachyon client when TachyonBlockManager is shutdown Currently Tachyon client is not closed when TachyonBlockManager is shut down. which causes some resources in Tachyon not reclaimed Author:

spark git commit: [SPARK-4441] Close Tachyon client when TachyonBlockManager is shutdown

2014-11-18 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 790c8741e - d1d6de630 [SPARK-4441] Close Tachyon client when TachyonBlockManager is shutdown Currently Tachyon client is not closed when TachyonBlockManager is shut down. which causes some resources in Tachyon not reclaimed Author:

spark git commit: [Spark-4432]close InStream after the block is accessed

2014-11-18 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 d1d6de630 - e0a20994f [Spark-4432]close InStream after the block is accessed InStream is not closed after data is read from Tachyon. which makes the blocks in Tachyon locked after accessed. Author: Mingfei mingfei@intel.com