spark git commit: [SPARK-11313][SQL] implement cogroup on DataSets (support 2 datasets)

2015-10-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5f1cee6f1 -> 075ce4914 [SPARK-11313][SQL] implement cogroup on DataSets (support 2 datasets) A simpler version of https://github.com/apache/spark/pull/9279, only support 2 datasets. Author: Wenchen Fan Closes

spark git commit: [SPARK-11303][SQL] filter should not be pushed down into sample

2015-10-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.5 86ee81e5c -> 3bd596de4 [SPARK-11303][SQL] filter should not be pushed down into sample When sampling and then filtering DataFrame, the SQL Optimizer will push down filter into sample and produce wrong result. This is due to the

spark git commit: [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/master d9c603989 -> 826e1e304 [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python

spark git commit: [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.4 ad112c94d -> ef42ce613 [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related

spark git commit: [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.5 abb0ca7a9 -> 86ee81e5c [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related

spark git commit: [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.3 25203d9d0 -> 0ce148533 [SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix returns incorrect answer in some cases Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related

spark git commit: [MINOR][ML] fix compile warns

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/master 826e1e304 -> 82c1c5772 [MINOR][ML] fix compile warns This fixes some compile time warnings. Author: Xiangrui Meng Closes #9319 from mengxr/mllib-compile-warn-20151027. Project:

spark git commit: [SPARK-11332] [ML] Refactored to use ml.feature.Instance instead of WeightedLeastSquare.Instance

2015-10-28 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 82c1c5772 -> 5f1cee6f1 [SPARK-11332] [ML] Refactored to use ml.feature.Instance instead of WeightedLeastSquare.Instance WeightedLeastSquares now uses the common Instance class in ml.feature instead of a private one. Author: Nakul Jindal

spark git commit: [SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solver

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/master fba9e9545 -> f92b7b98e [SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solver [SPARK-10668](https://issues.apache.org/jira/browse/SPARK-10668) has provided ```WeightedLeastSquares``` solver("normal") in

spark git commit: Typo in mllib-evaluation-metrics.md

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.5 3bd596de4 -> 9e3197aaa Typo in mllib-evaluation-metrics.md Recall by threshold snippet was using "precisionByThreshold" Author: Mageswaran.D Closes #9333 from

spark git commit: Typo in mllib-evaluation-metrics.md

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/master 075ce4914 -> fd9e345ce Typo in mllib-evaluation-metrics.md Recall by threshold snippet was using "precisionByThreshold" Author: Mageswaran.D Closes #9333 from Mageswaran1989/Typo_in_mllib-evaluation-metrics.md.

spark git commit: [SPARK-11369][ML][R] SparkR glm should support setting standardize

2015-10-28 Thread meng
Repository: spark Updated Branches: refs/heads/master fd9e345ce -> fba9e9545 [SPARK-11369][ML][R] SparkR glm should support setting standardize SparkR glm currently support : ```formula, family = c(“gaussian”, “binomial”), data, lambda = 0, alpha = 0``` We should also support setting

spark git commit: [SPARK-11377] [SQL] withNewChildren should not convert StructType to Seq

2015-10-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f92b7b98e -> 032748bb9 [SPARK-11377] [SQL] withNewChildren should not convert StructType to Seq This is minor, but I ran into while writing Datasets and while it wasn't needed for the final solution, it was super confusing so we should

spark git commit: [SPARK-11322] [PYSPARK] Keep full stack trace in captured exception

2015-10-28 Thread davies
Repository: spark Updated Branches: refs/heads/master 0cb7662d8 -> 3dfa4ea52 [SPARK-11322] [PYSPARK] Keep full stack trace in captured exception JIRA: https://issues.apache.org/jira/browse/SPARK-11322 As reported by JoshRosen in

spark git commit: [SPARK-11376][SQL] Removes duplicated `mutableRow` field

2015-10-28 Thread lian
Repository: spark Updated Branches: refs/heads/master 20dfd4674 -> e5b89978e [SPARK-11376][SQL] Removes duplicated `mutableRow` field This PR fixes a mistake in the code generated by `GenerateColumnAccessor`. Interestingly, although the code is illegal in Java (the class has two fields with

spark git commit: [SPARK-11292] [SQL] Python API for text data source

2015-10-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 032748bb9 -> 5aa052191 [SPARK-11292] [SQL] Python API for text data source Adds DataFrameReader.text and DataFrameWriter.text. Author: Reynold Xin Closes #9259 from rxin/SPARK-11292. Project:

spark git commit: [SPARK-11363] [SQL] LeftSemiJoin should be LeftSemi in SparkStrategies

2015-10-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5aa052191 -> 20dfd4674 [SPARK-11363] [SQL] LeftSemiJoin should be LeftSemi in SparkStrategies JIRA: https://issues.apache.org/jira/browse/SPARK-11363 In SparkStrategies some places use LeftSemiJoin. It should be LeftSemi. cc