spark git commit: [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference

2016-06-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 890baaca5 -> 6ecedf39b [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference ## What changes were proposed in this pull request? When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ``

spark git commit: [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference

2016-06-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 9e16f23e7 -> e21a9ddef [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference ## What changes were proposed in this pull request? When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and

spark git commit: [SPARK-15738][PYSPARK][ML] Adding Pyspark ml RFormula __str__ method similar to Scala API

2016-06-10 Thread yliang
Repository: spark Updated Branches: refs/heads/master 254bc8c34 -> 7d7a0a5e0 [SPARK-15738][PYSPARK][ML] Adding Pyspark ml RFormula __str__ method similar to Scala API ## What changes were proposed in this pull request? Adding __str__ to RFormula and model that will show the set formula param

spark git commit: [SPARK-15738][PYSPARK][ML] Adding Pyspark ml RFormula __str__ method similar to Scala API

2016-06-10 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 8b6742a37 -> 80b8711b3 [SPARK-15738][PYSPARK][ML] Adding Pyspark ml RFormula __str__ method similar to Scala API ## What changes were proposed in this pull request? Adding __str__ to RFormula and model that will show the set formula pa

spark git commit: [SPARK-15945][MLLIB] Conversion between old/new vector columns in a DataFrame (Scala/Java)

2016-06-14 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 b75542603 -> f277cdf78 [SPARK-15945][MLLIB] Conversion between old/new vector columns in a DataFrame (Scala/Java) ## What changes were proposed in this pull request? This PR provides conversion utils between old/new vector columns in

spark git commit: [SPARK-15945][MLLIB] Conversion between old/new vector columns in a DataFrame (Scala/Java)

2016-06-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 42a28caf1 -> 63e0aebe2 [SPARK-15945][MLLIB] Conversion between old/new vector columns in a DataFrame (Scala/Java) ## What changes were proposed in this pull request? This PR provides conversion utils between old/new vector columns in a D

spark git commit: [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression

2016-06-16 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 b3678eb7e -> 68e7a25cc [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic r

spark git commit: [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression

2016-06-16 Thread yliang
Repository: spark Updated Branches: refs/heads/master d9c6628c4 -> 9040d83bc [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic regre

spark git commit: [SPARK-15946][MLLIB] Conversion between old/new vector columns in a DataFrame (Python)

2016-06-17 Thread yliang
Repository: spark Updated Branches: refs/heads/master af2a4b082 -> edb23f9e4 [SPARK-15946][MLLIB] Conversion between old/new vector columns in a DataFrame (Python) ## What changes were proposed in this pull request? This PR implements python wrappers for #13662 to convert old/new vector colu

spark git commit: [SPARK-15946][MLLIB] Conversion between old/new vector columns in a DataFrame (Python)

2016-06-17 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 f0de45cb1 -> 0a8fd2eb8 [SPARK-15946][MLLIB] Conversion between old/new vector columns in a DataFrame (Python) ## What changes were proposed in this pull request? This PR implements python wrappers for #13662 to convert old/new vector

spark git commit: [SPARK-16242][MLLIB][PYSPARK] Conversion between old/new matrix columns in a DataFrame (Python)

2016-06-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master f6b497fcd -> e158478a9 [SPARK-16242][MLLIB][PYSPARK] Conversion between old/new matrix columns in a DataFrame (Python) ## What changes were proposed in this pull request? This PR implements python wrappers for #13888 to convert old/new mat

spark git commit: [SPARK-16242][MLLIB][PYSPARK] Conversion between old/new matrix columns in a DataFrame (Python)

2016-06-28 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 af70ad028 -> b349237e4 [SPARK-16242][MLLIB][PYSPARK] Conversion between old/new matrix columns in a DataFrame (Python) ## What changes were proposed in this pull request? This PR implements python wrappers for #13888 to convert old/new

spark git commit: [SPARK-16241][ML] model loading backward compatibility for ml NaiveBayes

2016-06-30 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 c8a7c2305 -> 1d274455c [SPARK-16241][ML] model loading backward compatibility for ml NaiveBayes ## What changes were proposed in this pull request? model loading backward compatibility for ml NaiveBayes ## How was this patch tested?

spark git commit: [SPARK-16241][ML] model loading backward compatibility for ml NaiveBayes

2016-06-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2c3d96134 -> b30a2dc7c [SPARK-16241][ML] model loading backward compatibility for ml NaiveBayes ## What changes were proposed in this pull request? model loading backward compatibility for ml NaiveBayes ## How was this patch tested? exis

spark git commit: [SPARK-16260][ML][EXAMPLE] PySpark ML Example Improvements and Cleanup

2016-07-03 Thread yliang
Repository: spark Updated Branches: refs/heads/master 262833397 -> a539b724c [SPARK-16260][ML][EXAMPLE] PySpark ML Example Improvements and Cleanup ## What changes were proposed in this pull request? 1). Remove unused import in Scala example; 2). Move spark session import outside example off;

spark git commit: [SPARK-16260][ML][EXAMPLE] PySpark ML Example Improvements and Cleanup

2016-07-03 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 0c6fd03fa -> 3ecee573c [SPARK-16260][ML][EXAMPLE] PySpark ML Example Improvements and Cleanup ## What changes were proposed in this pull request? 1). Remove unused import in Scala example; 2). Move spark session import outside example

spark git commit: [SPARK-16249][ML] Change visibility of Object ml.clustering.LDA to public for loading

2016-07-06 Thread yliang
Repository: spark Updated Branches: refs/heads/master 5f342049c -> 5497242c7 [SPARK-16249][ML] Change visibility of Object ml.clustering.LDA to public for loading ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-16249 Change visibility of O

spark git commit: [SPARK-16249][ML] Change visibility of Object ml.clustering.LDA to public for loading

2016-07-06 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 521fc7186 -> 25006c8bc [SPARK-16249][ML] Change visibility of Object ml.clustering.LDA to public for loading ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-16249 Change visibility

spark git commit: [SPARK-16307][ML] Add test to verify the predicted variances of a DT on toy data

2016-07-06 Thread yliang
Repository: spark Updated Branches: refs/heads/master 7e28fabdf -> 909c6d812 [SPARK-16307][ML] Add test to verify the predicted variances of a DT on toy data ## What changes were proposed in this pull request? The current tests assumes that `impurity.calculate()` returns the variance correct

spark git commit: [SPARK-17585][PYSPARK][CORE] PySpark SparkContext.addFile supports adding files recursively

2016-09-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master 61876a427 -> d3b886976 [SPARK-17585][PYSPARK][CORE] PySpark SparkContext.addFile supports adding files recursively ## What changes were proposed in this pull request? Users would like to add a directory as dependency in some cases, they ca

spark git commit: [SPARK-17577][SPARKR][CORE] SparkR support add files to Spark job and get by executors

2016-09-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master 7cbe21644 -> c133907c5 [SPARK-17577][SPARKR][CORE] SparkR support add files to Spark job and get by executors ## What changes were proposed in this pull request? Scala/Python users can add files to Spark job by submit options ```--files```

spark git commit: [SPARK-17315][FOLLOW-UP][SPARKR][ML] Fix print of Kolmogorov-Smirnov test summary

2016-09-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master c133907c5 -> 6902edab7 [SPARK-17315][FOLLOW-UP][SPARKR][ML] Fix print of Kolmogorov-Smirnov test summary ## What changes were proposed in this pull request? #14881 added Kolmogorov-Smirnov Test wrapper to SparkR. I found that ```print.sum

spark git commit: [SPARK-17281][ML][MLLIB] Add treeAggregateDepth parameter for AFTSurvivalRegression

2016-09-22 Thread yliang
Repository: spark Updated Branches: refs/heads/master 646f38346 -> 72d9fba26 [SPARK-17281][ML][MLLIB] Add treeAggregateDepth parameter for AFTSurvivalRegression ## What changes were proposed in this pull request? Add treeAggregateDepth parameter for AFTSurvivalRegression to keep consistent

spark git commit: [MINOR][SPARKR] Add sparkr-vignettes.html to gitignore.

2016-09-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 248916f55 -> 7945daed1 [MINOR][SPARKR] Add sparkr-vignettes.html to gitignore. ## What changes were proposed in this pull request? Add ```sparkr-vignettes.html``` to ```.gitignore```. ## How was this patch tested? No need test. Author: Ya

[2/2] spark git commit: [SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF()

2016-09-26 Thread yliang
[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF() ## What changes were proposed in this pull request? This was suggested in https://github.com/apache/spark/commit/101663f1ae222a919fc40510aa4f2bad22d1be6f#commitcomment-17114968. This PR adds `testImplicits` to `MLlibTestSp

[1/2] spark git commit: [SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF()

2016-09-26 Thread yliang
Repository: spark Updated Branches: refs/heads/master 50b89d05b -> f234b7cd7 http://git-wip-us.apache.org/repos/asf/spark/blob/f234b7cd/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala -- diff --git a/

spark git commit: [SPARK-17577][FOLLOW-UP][SPARKR] SparkR spark.addFile supports adding directory recursively

2016-09-26 Thread yliang
Repository: spark Updated Branches: refs/heads/master 00be16df6 -> 93c743f1a [SPARK-17577][FOLLOW-UP][SPARKR] SparkR spark.addFile supports adding directory recursively ## What changes were proposed in this pull request? #15140 exposed ```JavaSparkContext.addFile(path: String, recursive: Bool

spark git commit: [SPARK-17138][ML][MLIB] Add Python API for multinomial logistic regression

2016-09-27 Thread yliang
Repository: spark Updated Branches: refs/heads/master 85b0a1575 -> 7f16affa2 [SPARK-17138][ML][MLIB] Add Python API for multinomial logistic regression ## What changes were proposed in this pull request? Add Python API for multinomial logistic regression. - add `family` param in python api.

spark git commit: [SPARK-16356][FOLLOW-UP][ML] Enforce ML test of exception for local/distributed Dataset.

2016-09-29 Thread yliang
Repository: spark Updated Branches: refs/heads/master 37eb9184f -> a19a1bb59 [SPARK-16356][FOLLOW-UP][ML] Enforce ML test of exception for local/distributed Dataset. ## What changes were proposed in this pull request? #14035 added ```testImplicits``` to ML unit tests and promoted ```toDF()```

spark git commit: [SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement.

2016-09-29 Thread yliang
Repository: spark Updated Branches: refs/heads/master a19a1bb59 -> f7082ac12 [SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement. ## What changes were proposed in this pull request? Several performance improvement for ```ChiSqSelector```: 1, Keep ```selectedFeatures``` ordered ascen

spark git commit: [SPARK-14077][ML] Refactor NaiveBayes to support weighted instances

2016-09-29 Thread yliang
Repository: spark Updated Branches: refs/heads/master 74ac1c438 -> 1fad55968 [SPARK-14077][ML] Refactor NaiveBayes to support weighted instances ## What changes were proposed in this pull request? 1,support weighted data 2,use dataset/dataframe instead of rdd 3,make mllib as a wrapper to call

spark git commit: [SPARK-14077][ML][FOLLOW-UP] Revert change for NB Model's Load to maintain compatibility with the model stored before 2.0

2016-09-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 1fad55968 -> 8e491af52 [SPARK-14077][ML][FOLLOW-UP] Revert change for NB Model's Load to maintain compatibility with the model stored before 2.0 ## What changes were proposed in this pull request? Revert change for NB Model's Load to maint

spark git commit: [SPARK-17744][ML] Parity check between the ml and mllib test suites for NB

2016-10-04 Thread yliang
Repository: spark Updated Branches: refs/heads/master 7d5160883 -> c17f97183 [SPARK-17744][ML] Parity check between the ml and mllib test suites for NB ## What changes were proposed in this pull request? 1,parity check and add missing test suites for ml's NB 2,remove some unused imports ## Ho

spark git commit: [MINOR][ML] Avoid 2D array flatten in NB training.

2016-10-05 Thread yliang
Repository: spark Updated Branches: refs/heads/master b678e465a -> 7aeb20be7 [MINOR][ML] Avoid 2D array flatten in NB training. ## What changes were proposed in this pull request? Avoid 2D array flatten in ```NaiveBayes``` training, since flatten method might be expensive (It will create anot

spark git commit: [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types

2016-10-06 Thread yliang
Repository: spark Updated Branches: refs/heads/master 49d11d499 -> 3713bb199 [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types ## What changes were proposed in this pull request? Before, we computed `instances` in LinearRegression in two

spark git commit: [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types

2016-10-06 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 b1a9c41e8 -> 594a2cf6f [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types ## What changes were proposed in this pull request? Before, we computed `instances` in LinearRegression in

spark git commit: [SPARK-15957][ML] RFormula supports forcing to index label

2016-10-10 Thread yliang
Repository: spark Updated Branches: refs/heads/master b515768f2 -> 19401a203 [SPARK-15957][ML] RFormula supports forcing to index label ## What changes were proposed in this pull request? ```RFormula``` will index label only when it is string type currently. If the label is numeric type and w

spark git commit: [SPARK-17745][ML][PYSPARK] update NB python api - add weight col parameter

2016-10-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master 6f20a92ca -> 0d4a69527 [SPARK-17745][ML][PYSPARK] update NB python api - add weight col parameter ## What changes were proposed in this pull request? update python api for NaiveBayes: add weight col parameter. ## How was this patch tested

spark git commit: [SPARK-17835][ML][MLLIB] Optimize NaiveBayes mllib wrapper to eliminate extra pass on data

2016-10-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0d4a69527 -> 21cb59f1c [SPARK-17835][ML][MLLIB] Optimize NaiveBayes mllib wrapper to eliminate extra pass on data ## What changes were proposed in this pull request? [SPARK-14077](https://issues.apache.org/jira/browse/SPARK-14077) copied t

spark git commit: [SPARK-15957][FOLLOW-UP][ML][PYSPARK] Add Python API for RFormula forceIndexLabel.

2016-10-13 Thread yliang
Repository: spark Updated Branches: refs/heads/master 9dc0ca060 -> 44cbb61b3 [SPARK-15957][FOLLOW-UP][ML][PYSPARK] Add Python API for RFormula forceIndexLabel. ## What changes were proposed in this pull request? Follow-up work of #13675, add Python API for ```RFormula forceIndexLabel```. ##

spark git commit: [SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/load

2016-10-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2fb12b0a3 -> 1db8feab8 [SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/load ## What changes were proposed in this pull request? Since ```ml.evaluation``` has supported save/load at Scala side, supporting it at Python s

spark git commit: [SPARK-14634][ML] Add BisectingKMeansSummary

2016-10-14 Thread yliang
Repository: spark Updated Branches: refs/heads/master 1db8feab8 -> a1b136d05 [SPARK-14634][ML] Add BisectingKMeansSummary ## What changes were proposed in this pull request? Add BisectingKMeansSummary ## How was this patch tested? unit test Author: Zheng RuiFeng Closes #12394 from zhengrui

spark git commit: [SPARK-17986][ML] SQLTransformer should remove temporary tables

2016-10-22 Thread yliang
Repository: spark Updated Branches: refs/heads/master 01b26a064 -> ab3363e9f [SPARK-17986][ML] SQLTransformer should remove temporary tables ## What changes were proposed in this pull request? A call to the method `SQLTransformer.transform` previously would create a temporary table and never

spark git commit: [SPARK-17986][ML] SQLTransformer should remove temporary tables

2016-10-22 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 a0c03c925 -> b959dab32 [SPARK-17986][ML] SQLTransformer should remove temporary tables ## What changes were proposed in this pull request? A call to the method `SQLTransformer.transform` previously would create a temporary table and n

spark git commit: [SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet

2016-10-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 483c37c58 -> 78d740a08 [SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet ## What changes were proposed in this pull request? 1. Make a pluggable solver interface for `WeightedLeastSquares` 2. Add a `QuasiNewton`

spark git commit: [SPARK-14634][ML][FOLLOWUP] Delete superfluous line in BisectingKMeans

2016-10-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 6f31833db -> 38cdd6ccd [SPARK-14634][ML][FOLLOWUP] Delete superfluous line in BisectingKMeans ## What changes were proposed in this pull request? As commented by jkbradley in https://github.com/apache/spark/pull/12394, `model.setSummary(su

spark git commit: [SPARK-17748][FOLLOW-UP][ML] Fix build error for Scala 2.10.

2016-10-25 Thread yliang
Repository: spark Updated Branches: refs/heads/master 38cdd6ccd -> ac8ff920f [SPARK-17748][FOLLOW-UP][ML] Fix build error for Scala 2.10. ## What changes were proposed in this pull request? #15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing te

spark git commit: [SPARK-17748][FOLLOW-UP][ML] Reorg variables of WeightedLeastSquares.

2016-10-26 Thread yliang
Repository: spark Updated Branches: refs/heads/master 4bee95407 -> 312ea3f7f [SPARK-17748][FOLLOW-UP][ML] Reorg variables of WeightedLeastSquares. ## What changes were proposed in this pull request? This is follow-up work of #15394. Reorg some variables of ```WeightedLeastSquares``` and fix on

spark git commit: [SPARK-18109][ML] Add instrumentation to GMM

2016-10-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master ab5f938bc -> 569788a55 [SPARK-18109][ML] Add instrumentation to GMM ## What changes were proposed in this pull request? Add instrumentation to GMM ## How was this patch tested? Test in spark-shell Author: Zheng RuiFeng Closes #15636 f

spark git commit: [SPARK-18133][EXAMPLES][ML] Python ML Pipeline Example has syntax e…

2016-10-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master 569788a55 -> e9746f87d [SPARK-18133][EXAMPLES][ML] Python ML Pipeline Example has syntax e… ## What changes were proposed in this pull request? In Python 3, there is only one integer type (i.e., int), which mostly behaves like the long

spark git commit: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate' of pyspark GBTClassifier

2016-11-03 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0ea5d5b24 -> 9dc9f9a5d [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate' of pyspark GBTClassifier ## What changes were proposed in this pull request? Add missing 'subsamplingRate' of pyspark GBTClassifier ## How was this patch test

spark git commit: [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate' of pyspark GBTClassifier

2016-11-03 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 71104c9c9 -> 99891e56e [SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate' of pyspark GBTClassifier ## What changes were proposed in this pull request? Add missing 'subsamplingRate' of pyspark GBTClassifier ## How was this patch

spark git commit: [SPARK-18276][ML] ML models should copy the training summary and set parent

2016-11-05 Thread yliang
Repository: spark Updated Branches: refs/heads/master 15d392688 -> 23ce0d1e9 [SPARK-18276][ML] ML models should copy the training summary and set parent ## What changes were proposed in this pull request? Only some of the models which contain a training summary currently set the summaries in

spark git commit: [SPARK-18276][ML] ML models should copy the training summary and set parent

2016-11-05 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 e9f1d4aaa -> c42301f1e [SPARK-18276][ML] ML models should copy the training summary and set parent ## What changes were proposed in this pull request? Only some of the models which contain a training summary currently set the summarie

spark git commit: [SPARK-18210][ML] Pipeline.copy does not create an instance with the same UID

2016-11-06 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 dcbf3fd4b -> d2f2cf68a [SPARK-18210][ML] Pipeline.copy does not create an instance with the same UID ## What changes were proposed in this pull request? Motivation: `org.apache.spark.ml.Pipeline.copy(extra: ParamMap)` does not create a

spark git commit: [SPARK-18210][ML] Pipeline.copy does not create an instance with the same UID

2016-11-06 Thread yliang
Repository: spark Updated Branches: refs/heads/master 340f09d10 -> b89d0556d [SPARK-18210][ML] Pipeline.copy does not create an instance with the same UID ## What changes were proposed in this pull request? Motivation: `org.apache.spark.ml.Pipeline.copy(extra: ParamMap)` does not create an i

spark git commit: [SPARK-18291][SPARKR][ML] SparkR glm predict should output original label when family = binomial.

2016-11-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master a814eeac6 -> daa975f4b [SPARK-18291][SPARKR][ML] SparkR glm predict should output original label when family = binomial. ## What changes were proposed in this pull request? SparkR ```spark.glm``` predict should output original label when f

spark git commit: [SPARK-18291][SPARKR][ML] SparkR glm predict should output original label when family = binomial.

2016-11-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 df40ee2b4 -> 6b332909f [SPARK-18291][SPARKR][ML] SparkR glm predict should output original label when family = binomial. ## What changes were proposed in this pull request? SparkR ```spark.glm``` predict should output original label wh

spark git commit: [SPARK-18401][SPARKR][ML] SparkR random forest should support output original label.

2016-11-10 Thread yliang
Repository: spark Updated Branches: refs/heads/master a3356343c -> 5ddf69470 [SPARK-18401][SPARKR][ML] SparkR random forest should support output original label. ## What changes were proposed in this pull request? SparkR ```spark.randomForest``` classification prediction should output origin

spark git commit: [SPARK-18401][SPARKR][ML] SparkR random forest should support output original label.

2016-11-10 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 064d4315f -> 51dca6143 [SPARK-18401][SPARKR][ML] SparkR random forest should support output original label. ## What changes were proposed in this pull request? SparkR ```spark.randomForest``` classification prediction should output or

spark git commit: [SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes

2016-11-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master bc41d997e -> 22cb3a060 [SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes ## What changes were proposed in this pull request? * Refactor out ```trainWithLabelCheck``` and make ```mllib.NaiveBayes``` call into it. * Avoi

spark git commit: [SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes

2016-11-12 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 893355143 -> b2ba83d10 [SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes ## What changes were proposed in this pull request? * Refactor out ```trainWithLabelCheck``` and make ```mllib.NaiveBayes``` call into it. *

spark git commit: [SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms training on libsvm data

2016-11-13 Thread yliang
Repository: spark Updated Branches: refs/heads/master b91a51bb2 -> 07be232ea [SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms training on libsvm data ## What changes were proposed in this pull request? * Fix the following exceptions which throws when ```spark.randomFores

spark git commit: [SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms training on libsvm data

2016-11-13 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 0c69224ed -> 8fc6455c0 [SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms training on libsvm data ## What changes were proposed in this pull request? * Fix the following exceptions which throws when ```spark.randomF

spark git commit: [SPARK-18438][SPARKR][ML] spark.mlp should support RFormula.

2016-11-16 Thread yliang
Repository: spark Updated Branches: refs/heads/master 4ac9759f8 -> 95eb06bd7 [SPARK-18438][SPARKR][ML] spark.mlp should support RFormula. ## What changes were proposed in this pull request? ```spark.mlp``` should support ```RFormula``` like other ML algorithm wrappers. BTW, I did some cleanup

spark git commit: [SPARK-18438][SPARKR][ML] spark.mlp should support RFormula.

2016-11-16 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 436ae201f -> 7b57e480d [SPARK-18438][SPARKR][ML] spark.mlp should support RFormula. ## What changes were proposed in this pull request? ```spark.mlp``` should support ```RFormula``` like other ML algorithm wrappers. BTW, I did some clea

spark git commit: [SPARK-18434][ML] Add missing ParamValidations for ML algos

2016-11-16 Thread yliang
Repository: spark Updated Branches: refs/heads/master 241e04bc0 -> c68f1a38a [SPARK-18434][ML] Add missing ParamValidations for ML algos ## What changes were proposed in this pull request? Add missing ParamValidations for ML algos ## How was this patch tested? existing tests Author: Zheng Rui

spark git commit: [SPARK-18434][ML] Add missing ParamValidations for ML algos

2016-11-16 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 820847008 -> 6b6eb4e52 [SPARK-18434][ML] Add missing ParamValidations for ML algos ## What changes were proposed in this pull request? Add missing ParamValidations for ML algos ## How was this patch tested? existing tests Author: Zheng

spark git commit: [SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM and BKM

2016-11-21 Thread yliang
Repository: spark Updated Branches: refs/heads/master 658547974 -> e811fbf9e [SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM and BKM ## What changes were proposed in this pull request? Add model summary APIs for `GaussianMixtureModel` and `BisectingKMeansModel` in pyspark.

spark git commit: [SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM and BKM

2016-11-21 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 fb4e6359d -> 31002e4a7 [SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM and BKM ## What changes were proposed in this pull request? Add model summary APIs for `GaussianMixtureModel` and `BisectingKMeansModel` in pysp

spark git commit: [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package.

2016-11-22 Thread yliang
Repository: spark Updated Branches: refs/heads/master ebeb0830a -> acb971577 [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package. ## What changes were proposed in this pull request? When running SparkR job in yarn-cluster mode, it will download Spark pa

spark git commit: [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package.

2016-11-22 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 aaa2a173a -> c70214075 [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package. ## What changes were proposed in this pull request? When running SparkR job in yarn-cluster mode, it will download Spar

spark git commit: [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package.

2016-11-22 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.0 9dad3a7b0 -> a37238b06 [SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package. ## What changes were proposed in this pull request? When running SparkR job in yarn-cluster mode, it will download Spar

spark git commit: [SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collinear data

2016-11-22 Thread yliang
Repository: spark Updated Branches: refs/heads/master d0212eb0f -> 982b82e32 [SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collinear data ## What changes were proposed in this pull request? * Fix SparkR ```spark.glm``` errors when fitting on collinear data, since ```standard

spark git commit: [SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collinear data

2016-11-22 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 3be2d1e0b -> fc5fee83e [SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collinear data ## What changes were proposed in this pull request? * Fix SparkR ```spark.glm``` errors when fitting on collinear data, since ```stand

spark git commit: [SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansModel and GaussianMixtureModel

2016-11-24 Thread yliang
Repository: spark Updated Branches: refs/heads/master 223fa218e -> 2dfabec38 [SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansModel and GaussianMixtureModel ## What changes were proposed in this pull request? add `setFeaturesCol` and `setPredictionCol` for BiKModel and GMMod

spark git commit: [SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansModel and GaussianMixtureModel

2016-11-24 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 27d81d000 -> 04ec74f12 [SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansModel and GaussianMixtureModel ## What changes were proposed in this pull request? add `setFeaturesCol` and `setPredictionCol` for BiKModel and G

spark git commit: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML

2016-11-26 Thread yliang
Repository: spark Updated Branches: refs/heads/master a88329d45 -> c4a7eef0c [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML ## What changes were proposed in this pull request? Remove deprecated methods for ML. ## How was this patch tested? Existing tests. Author: Yanbo Liang

spark git commit: [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML

2016-11-26 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 da66b9742 -> 830ee1345 [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML ## What changes were proposed in this pull request? Remove deprecated methods for ML. ## How was this patch tested? Existing tests. Author: Yanbo Lia

spark git commit: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of PySpark

2016-11-29 Thread yliang
Repository: spark Updated Branches: refs/heads/master 489845f3a -> 4c82ca86d [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of PySpark ## What changes were proposed in this pull request? Add python api for KMeansSummary ## How was this patch tested? unit test added Author: Jeff Zhang

spark git commit: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of PySpark

2016-11-29 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 55b1142bd -> b95aad7ca [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of PySpark ## What changes were proposed in this pull request? Add python api for KMeansSummary ## How was this patch tested? unit test added Author: Jeff Z

spark git commit: [SPARK-18476][SPARKR][ML] SparkR Logistic Regression should should support output original label.

2016-11-30 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0a811210f -> 2eb6764fb [SPARK-18476][SPARKR][ML] SparkR Logistic Regression should should support output original label. ## What changes were proposed in this pull request? Similar to SPARK-18401, as a classification algorithm, logistic r

spark git commit: [SPARK-18476][SPARKR][ML] SparkR Logistic Regression should should support output original label.

2016-11-30 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 7d4596734 -> e8d8e3509 [SPARK-18476][SPARKR][ML] SparkR Logistic Regression should should support output original label. ## What changes were proposed in this pull request? Similar to SPARK-18401, as a classification algorithm, logist

spark git commit: [SPARK-18625][ML] OneVsRestModel should support setFeaturesCol and setPredictionCol

2016-12-05 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 c13c2939f -> 88e07efe8 [SPARK-18625][ML] OneVsRestModel should support setFeaturesCol and setPredictionCol ## What changes were proposed in this pull request? add `setFeaturesCol` and `setPredictionCol` for `OneVsRestModel` ## How was

spark git commit: [SPARK-18625][ML] OneVsRestModel should support setFeaturesCol and setPredictionCol

2016-12-05 Thread yliang
Repository: spark Updated Branches: refs/heads/master e9730b707 -> bdfe7f674 [SPARK-18625][ML] OneVsRestModel should support setFeaturesCol and setPredictionCol ## What changes were proposed in this pull request? add `setFeaturesCol` and `setPredictionCol` for `OneVsRestModel` ## How was thi

spark git commit: [SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide.

2016-12-05 Thread yliang
Repository: spark Updated Branches: refs/heads/master bdfe7f674 -> eb8dd6813 [SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide. ## What changes were proposed in this pull request? Add R examples to ML programming guide for the following algorithms as POC: * spark.glm * spar

spark git commit: [SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide.

2016-12-05 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 88e07efe8 -> 1821cbead [SPARK-18279][DOC][ML][SPARKR] Add R examples to ML programming guide. ## What changes were proposed in this pull request? Add R examples to ML programming guide for the following algorithms as POC: * spark.glm *

spark git commit: [SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.logit.

2016-12-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 5c6bcdbda -> 90b59d1bf [SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.logit. ## What changes were proposed in this pull request? Several cleanup and improvements for ```spark.logit```: * ```summary``` should return coe

spark git commit: [SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.logit.

2016-12-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 3750c6e9b -> 340e9aea4 [SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.logit. ## What changes were proposed in this pull request? Several cleanup and improvements for ```spark.logit```: * ```summary``` should return

spark git commit: [SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and elastic-net

2016-12-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 9ab725eab -> 82253617f [SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and elastic-net ## What changes were proposed in this pull request? WeightedLeastSquares now supports L1 and elastic net penalties and has a

spark git commit: [SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and elastic-net

2016-12-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 617ce3ba7 -> ab865cfd9 [SPARK-18705][ML][DOC] Update user guide to reflect one pass solver for L1 and elastic-net ## What changes were proposed in this pull request? WeightedLeastSquares now supports L1 and elastic net penalties and h

spark git commit: [SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API for 2.1

2016-12-07 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 ab865cfd9 -> 1c3f1da82 [SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API for 2.1 ## What changes were proposed in this pull request? Reviewing SparkR ML wrappers API for 2.1 release, mainly two issues: * Remove ```probabilityCol``

spark git commit: [SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API for 2.1

2016-12-07 Thread yliang
Repository: spark Updated Branches: refs/heads/master 82253617f -> 97255497d [SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API for 2.1 ## What changes were proposed in this pull request? Reviewing SparkR ML wrappers API for 2.1 release, mainly two issues: * Remove ```probabilityCol``` fr

spark git commit: [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide

2016-12-08 Thread yliang
Repository: spark Updated Branches: refs/heads/master b47b892e4 -> 9bf8f3cd4 [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide ## What changes were proposed in this pull request? * Add all R examples for ML wrappers which were added during 2.1 release cycle. * Split the

spark git commit: [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide

2016-12-08 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 48aa6775d -> 9095c152e [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide ## What changes were proposed in this pull request? * Add all R examples for ML wrappers which were added during 2.1 release cycle. * Split

spark git commit: [MINOR][SPARKR] fix kstest example error and add unit test

2016-12-13 Thread yliang
Repository: spark Updated Branches: refs/heads/master e104e55c1 -> f2ddabfa0 [MINOR][SPARKR] fix kstest example error and add unit test ## What changes were proposed in this pull request? While adding vignettes for kstest, I found some errors in the example: 1. There is a typo of kstest; 2. p

spark git commit: [MINOR][SPARKR] fix kstest example error and add unit test

2016-12-13 Thread yliang
Repository: spark Updated Branches: refs/heads/branch-2.1 019d1fa3d -> 8ef005931 [MINOR][SPARKR] fix kstest example error and add unit test ## What changes were proposed in this pull request? While adding vignettes for kstest, I found some errors in the example: 1. There is a typo of kstest;

spark git commit: [SPARK-17645][MLLIB][ML] add feature selector method based on: False Discovery Rate (FDR) and Family wise error rate (FWE)

2016-12-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master 2af8b5cff -> 79ff85363 [SPARK-17645][MLLIB][ML] add feature selector method based on: False Discovery Rate (FDR) and Family wise error rate (FWE) ## What changes were proposed in this pull request? Univariate feature selection works by se

spark git commit: [MINOR][ML] Correct test cases of LoR raw2prediction & probability2prediction.

2016-12-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master 79ff85363 -> 9cff67f34 [MINOR][ML] Correct test cases of LoR raw2prediction & probability2prediction. ## What changes were proposed in this pull request? Correct test cases of ```LogisticRegression``` raw2prediction & probability2predictio

spark git commit: [SPARK-17772][ML][TEST] Add test functions for ML sample weights

2016-12-28 Thread yliang
Repository: spark Updated Branches: refs/heads/master d7bce3bd3 -> 6a475ae46 [SPARK-17772][ML][TEST] Add test functions for ML sample weights ## What changes were proposed in this pull request? More and more ML algos are accepting sample weights, and they have been tested rather heterogeneou

  1   2   3   >