spark git commit: [SPARK-10851] [SPARKR] Exception not failing R applications (in yarn cluster mode)

2015-09-30 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 16fd2a2f4 -> c7b29ae64 [SPARK-10851] [SPARKR] Exception not failing R applications (in yarn cluster mode) The YARN backend doesn't like when user code calls System.exit, since it cannot know the exit status and thus cannot set an

spark git commit: [SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row.

2015-09-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master c7b29ae64 -> 03cca5dce [SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row. Author: Reynold Xin Closes #8900 from rxin/SPARK-10770-1. Project:

spark git commit: [SPARK-10736] [ML] Use 1 for all ratings if $(ratingCol) = ""

2015-09-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 4d5a005b0 -> 2931e89f0 [SPARK-10736] [ML] Use 1 for all ratings if $(ratingCol) = "" For some implicit dataset, ratings may not exist in the training data. In this case, we can assume all observed pairs to be positive and treat their

spark git commit: [SPARK-10811] [SQL] Eliminates unnecessary byte array copying

2015-09-30 Thread lian
Repository: spark Updated Branches: refs/heads/master c1ad373f2 -> 4d5a005b0 [SPARK-10811] [SQL] Eliminates unnecessary byte array copying When reading Parquet string and binary-backed decimal values, Parquet `Binary.getBytes` always returns a copied byte array, which is unnecessary. Since

spark git commit: [SPARK-9617] [SQL] Implement json_tuple

2015-09-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 03cca5dce -> 89ea0041a [SPARK-9617] [SQL] Implement json_tuple This is an implementation of Hive's `json_tuple` function using Jackson Streaming. Author: Nathan Howell Closes #7946 from NathanHowell/SPARK-9617.

[1/2] spark git commit: [SPARK-9741] [SQL] Approximate Count Distinct using the new UDAF interface.

2015-09-30 Thread davies
Repository: spark Updated Branches: refs/heads/master 2931e89f0 -> 16fd2a2f4 http://git-wip-us.apache.org/repos/asf/spark/blob/16fd2a2f/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala

[2/2] spark git commit: [SPARK-9741] [SQL] Approximate Count Distinct using the new UDAF interface.

2015-09-30 Thread davies
[SPARK-9741] [SQL] Approximate Count Distinct using the new UDAF interface. This PR implements a HyperLogLog based Approximate Count Distinct function using the new UDAF interface. The implementation is inspired by the ClearSpring HyperLogLog implementation and should produce the same results.