[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67989793 --- Diff: R/pkg/R/mllib.R --- @@ -471,24 +469,13 @@ setMethod("write.ml", signature(object = "AFTSurvivalRegressionMo

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67989738 --- Diff: R/pkg/R/mllib.R --- @@ -124,24 +137,20 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67989718 --- Diff: R/pkg/R/mllib.R --- @@ -99,10 +114,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67989707 --- Diff: R/pkg/R/mllib.R --- @@ -99,10 +114,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"

[GitHub] spark pull request #13828: [MINOR] [MLLIB] DefaultParamsReadable/Writable sh...

2016-06-21 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13828 [MINOR] [MLLIB] DefaultParamsReadable/Writable should be DeveloperApi ## What changes were proposed in this pull request? `DefaultParamsReadable/Writable` are not user-facing. Only

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67969489 --- Diff: R/pkg/R/mllib.R --- @@ -66,8 +67,9 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' \url{ht

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67969321 --- Diff: R/pkg/R/mllib.R --- @@ -99,10 +114,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"

[GitHub] spark issue #13821: [SPARK-16118] [MLLIB] add getDropLast to OneHotEncoder

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13821 Merged into master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67961184 --- Diff: R/pkg/R/mllib.R --- @@ -76,7 +78,21 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' df <- createDataFr

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67961059 --- Diff: R/pkg/R/mllib.R --- @@ -112,36 +125,23 @@ setMethod("spark.glm", signature(data = "SparkDataFrame"

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67960796 --- Diff: R/pkg/R/mllib.R --- @@ -112,36 +125,23 @@ setMethod("spark.glm", signature(data = "SparkDataFrame"

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67960833 --- Diff: R/pkg/R/mllib.R --- @@ -112,36 +125,23 @@ setMethod("spark.glm", signature(data = "SparkDataFrame"

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67960790 --- Diff: R/pkg/R/mllib.R --- @@ -112,36 +125,23 @@ setMethod("spark.glm", signature(data = "SparkDataFrame"

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67960697 --- Diff: R/pkg/R/mllib.R --- @@ -99,10 +115,7 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"

[GitHub] spark pull request #13820: [SPARK-16107] [R] group glm methods in documentat...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13820#discussion_r67960537 --- Diff: R/pkg/R/mllib.R --- @@ -53,9 +53,10 @@ setClass("AFTSurvivalRegressionModel", representation(jobj = "jobj")) #' @note KMe

[GitHub] spark issue #13820: [SPARK-16107] [R] group glm methods in documentation

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13820 @shivaram @felixcheung I think this significantly improves the current doc. We can add more content to description later. Do you know how to remove the `##D` prefixes in the example code

[GitHub] spark issue #13820: [SPARK-16107] [R] group glm methods in documentation

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13820 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13823: [MINOR] [MLLIB] deprecate setLabelCol in ChiSqSel...

2016-06-21 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13823 [MINOR] [MLLIB] deprecate setLabelCol in ChiSqSelectorModel ## What changes were proposed in this pull request? Deprecate `labelCol`, which is not used by ChiSqSelectorModel

[GitHub] spark pull request #13821: [SPARK-16118] [MLLIB] add getDropLast to OneHotEn...

2016-06-21 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13821 [SPARK-16118] [MLLIB] add getDropLast to OneHotEncoder ## What changes were proposed in this pull request? We forgot the getter of `dropLast` in `OneHotEncoder` ## How

[GitHub] spark issue #13819: [SPARK-16117] [MLLIB] hide LibSVMFileFormat and move its...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13819 cc: @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13819: [SPARK-16117] [MLLIB] hide LibSVMFileFormat and m...

2016-06-21 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13819 [SPARK-16117] [MLLIB] hide LibSVMFileFormat and move its doc to LibSVMDataSource ## What changes were proposed in this pull request? LibSVMFileFormat implements data source for LIBSVM

[GitHub] spark issue #13813: [MINOR] [MLLIB] move setCheckpointInterval to non-expert...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13813 This is a trivial change. Merged into master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13812: [SPARK-16086] [SQL] [PYSPARK] create Row without any fie...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13812 Thanks for fixing master and branch-2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13813: [MINOR] [MLLIB] move setCheckpointInterval to non...

2016-06-21 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13813 [MINOR] [MLLIB] move setCheckpointInterval to non-expert setters ## What changes were proposed in this pull request? The `checkpointInterval` is a non-expert param. This PR moves its setter

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r67928879 --- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala --- @@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r67928887 --- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala --- @@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite

[GitHub] spark issue #13672: [SPARK-15741][PYSPARK][ML] Pyspark cleanup of set defaul...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13672 Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12938: [SPARK-15954][SPARK-15162][SPARK-15164][PySpark][DOCS][M...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/12938 @holdenk @MLnick Could this PR be split into smaller ones? I don't see a good reason to put changes to `HiveTest` and pyspark.ml Experimental annotations under the same PR. Keeping PRs minimal

[GitHub] spark issue #13023: [SPARK-15177] [SparkR] [ML] SparkR 2.0 QA: New R APIs an...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13023 @yanboliang We are going to split the work into multiple PRs (SPARK-16090). Do you mind closing this PR for now? Thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` descript...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13403 @dongjoon-hyun @srowen I made a comment just before @dongjoon-hyun updated the PR: ~~~ I think it should be approximate equality but with a very small tolerance, e.g. 1e-14. Both calls

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r67924857 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -733,8 +733,10 @@ public Boolean call(Double x) { assertEquals(20/6.0

[GitHub] spark issue #13801: [SPARK-15177.1] [R] make SparkR model params and default...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13801 Merged into master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13778 @viirya Do we need to fix this in Spark 2.0? UDTs are private APIs and the only intended use case is Vector/Matrix UDTs for MLlib, which doesn't put vectors or matrices inside an array inside

[GitHub] spark issue #13375: [SPARK-16045][ML][Doc] Spark 2.0 ML.feature: doc update ...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13375 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13762: [SPARK-14926] [ML] OneVsRest labelMetadata uses incorrec...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13762 @josh-howes Did you try compiling the code? `predictionCol` is not a `Metadata` instance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #13801: [SPARK-15177.1] [R] make SparkR model params and default...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13801 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13800: [SPARK-13792][SQL] Addendum: Fix Python API

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13800 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13793: [SPARK-16086] [SQL] fix Python UDF without arguments (fo...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13793 I reverted the changes in branch-2.0 and master and updated the JIRA. Please re-submit PRs to branch-2.0 and master if they need fixes. --- If your project is set up for it, you can reply

[GitHub] spark issue #13793: [SPARK-16086] [SQL] fix Python UDF without arguments (fo...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13793 @davies The PR was sent to branch-1.6 and Jenkins didn't run it for branch-2.0 and master. Does it apply to branch-2.0 and master? --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13800: [SPARK-13792][SQL] Addendum: Fix Python API

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13800 Maybe https://github.com/apache/spark/pull/13793 broke master. It was sent to branch-1.6 but merged into master and branch-2.0. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13796 @sethah Thanks for implementing `MultinomialLogisticRegression`! This would be a major feature for Spark 2.1. @dbtsai is probably the best people to review this PR. But he is taking a break now. Do

[GitHub] spark pull request #13801: [SPARK-15177.1] [R] make SparkR model params and ...

2016-06-21 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13801#discussion_r67818257 --- Diff: R/pkg/R/mllib.R --- @@ -298,17 +296,17 @@ setMethod("summary", signature(object = "NaiveBayesModel"), #' @expo

[GitHub] spark issue #13801: [SPARK-15177.1] [R] make SparkR model params and default...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13801 cc: @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13109 Created https://issues.apache.org/jira/browse/SPARK-16090 to follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13109 Yes, we should merge this PR first and discuss the grouping later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #13801: [SPARK-15177.1] [R] make SparkR model params and ...

2016-06-21 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13801 [SPARK-15177.1] [R] make SparkR model params and default values consistent with MLlib ## What changes were proposed in this pull request? This PR is a subset of #13023 by @yanboliang

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13109 Methods documented in `colSums` share the same parameters and each was only documented once. Roxygen2 supports that if each param doc only appears once in the comment. That grouping looks okay to me

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13109 I just checked the generated R doc and I felt that we shouldn't group many methods together. For example, in this PR, the `DESCRIPTION` section looks okay because we used `crosstab

[GitHub] spark issue #13023: [SPARK-15177] [SparkR] [ML] SparkR 2.0 QA: New R APIs an...

2016-06-20 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13023 It would be nice to get this in. @yanboliang is traveling. I can help send a PR based on this one. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13787: [SPARK-16079][PYSPARK][ML] Added missing import for Deci...

2016-06-20 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13787 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13789: [SPARK-16074] [MLLIB] expose VectorUDT/MatrixUDT ...

2016-06-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13789#discussion_r67783265 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/dataTypes.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #13789: [SPARK-16074] [MLLIB] expose VectorUDT/MatrixUDT ...

2016-06-20 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13789 [SPARK-16074] [MLLIB] expose VectorUDT/MatrixUDT in a public API ## What changes were proposed in this pull request? Both VectorUDT and MatrixUDT are private APIs, because UserDefinedType

[GitHub] spark issue #13750: [SPARK-16035][PYSPARK] Fix SparseVector parser assertion...

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13750 Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13641: [SPARK-10258][DOC][ML] Add @Since annotations to ...

2016-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13641#discussion_r67596152 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ElementwiseProduct.scala --- @@ -33,21 +33,26 @@ import org.apache.spark.sql.types.DataType

[GitHub] spark pull request #13641: [SPARK-10258][DOC][ML] Add @Since annotations to ...

2016-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13641#discussion_r67596150 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ElementwiseProduct.scala --- @@ -33,21 +33,26 @@ import org.apache.spark.sql.types.DataType

[GitHub] spark pull request #13745: [Spark 15997][DOC][ML] Update user guide for Hash...

2016-06-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13745#discussion_r67595234 --- Diff: docs/ml-features.md --- @@ -46,14 +46,16 @@ In MLlib, we separate TF and IDF to make them flexible. `HashingTF` is a `Transformer` which takes

[GitHub] spark issue #13745: [Spark 15997][DOC][ML] Update user guide for HashingTF, ...

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13745 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13750: [SPARK-16035][PYSPARK] Fix SparseVector parser assertion...

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13750 LGTM. It is bad that we don't have full coverage here. Given the fact that `parse` is more like an internal feature paired with `str`, it is probably okay to manual test this. --- If your project

[GitHub] spark issue #13750: [SPARK-16035][PYSPARK] Fix SparseVector parser assertion...

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13750 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13285: [Spark-15129][R][DOC]R API changes in ML

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13285 Merged into master and branch-2.0. Saw some very minor issues. I make another pass and fix them in a follow-up PR. Thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #13725: [SPARK-15892][ML] Backport correctly merging AFTAggregat...

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13725 LGTM. Merged into branch-1.6. Thanks for backporting the patch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13729: [SPARK-16008][ML] Remove unnecessary serialization in lo...

2016-06-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13729 Nice catch and LGTM! Merging into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13731: [SPARK-15946] [MLLIB] Conversion between old/new ...

2016-06-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13731 [SPARK-15946] [MLLIB] Conversion between old/new vector columns in a DataFrame (Python) ## What changes were proposed in this pull request? This PR implements python wrappers for #13662

[GitHub] spark pull request #13662: [SPARK-15945] [MLLIB] Conversion between old/new ...

2016-06-14 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13662 [SPARK-15945] [MLLIB] Conversion between old/new vector columns in a DataFrame (Scala/Java) ## What changes were proposed in this pull request? This PR provides conversion utils between

[GitHub] spark issue #13219: [SPARK-15364][ML][PySpark] Implement PySpark picklers fo...

2016-06-13 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13219 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #12731: [SPARK-13590] [ML] [Doc] Document spark.ml LiR, LoR and ...

2016-06-07 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/12731 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13480: [MINOR] clean up style for storage param setters in ALS

2016-06-02 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13480 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13478: [SPARK-15740] [MLLIB] ignore big model load / sav...

2016-06-02 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/13478 [SPARK-15740] [MLLIB] ignore big model load / save in Word2VecSuite ## What changes were proposed in this pull request? @andrewor14 noticed some OOM errors caused by "test big model

[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13419 I will prefer refreshing the dataset every time a dataset is reloaded but keeping existing ones unchanged. ~~~scala val df1 = sqlContext.read.parquet(dir).cache() df1.count

[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13311#discussion_r64675561 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala --- @@ -107,7 +107,7 @@ private[libsvm] class LibSVMOutputWriter

[GitHub] spark pull request: [SPARK-15413] [ML] [MLLIB] Change `toBreeze` t...

2016-05-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13198#issuecomment-221481150 It is more correct and it is a private API. So this LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-15413] [ML] [MLLIB] Change `toBreeze` t...

2016-05-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13198#issuecomment-221480985 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15364][ML][PySpark] Implement PySpark p...

2016-05-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13219#issuecomment-221480232 @viirya I think we need a test for the picklers. See https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/api/python

[GitHub] spark pull request: [SPARK-15177] [SparkR] [ML] SparkR 2.0 QA: New...

2016-05-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13023#discussion_r64070351 --- Diff: R/pkg/R/mllib.R --- @@ -269,9 +349,29 @@ setMethod("summary", signature(object = "NaiveBayesModel"), return(li

[GitHub] spark pull request: [SPARK-13590] [ML] [Doc] Document spark.ml LiR...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12731#issuecomment-220654893 @yanboliang Please rename "Spark ML" to "MLlib". "Spark ML" is not an official name of the component. Thanks! --- If your project i

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13000#issuecomment-220654174 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15444][PySpark][ML][HotFix] Default val...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13220#issuecomment-220653822 @viirya Thanks for fixing this quickly! @MLnick If master is broken, we should revert the commit first to unblock others (especially during QA period). We can re-submit

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220652565 @davies @rxin It seems that this PR caused OOO in master builds. ~~~ *** RUN ABORTED *** java.lang.OutOfMemoryError: Java heap space

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13000#issuecomment-220530103 Not part of this PR, we can use `@include_example` in the user guide, though it might require tags support. cc @yinxusen --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13000#discussion_r63997601 --- Diff: examples/src/main/r/ml.R --- @@ -25,30 +25,102 @@ library(SparkR) sc <- sparkR.init(appName="SparkR-ML-example")

[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...

2016-05-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13000#discussion_r63997572 --- Diff: examples/src/main/r/ml.R --- @@ -25,30 +25,102 @@ library(SparkR) sc <- sparkR.init(appName="SparkR-ML-example")

[GitHub] spark pull request: [SPARK-13590] [ML] [Doc] Document spark.ml LiR...

2016-05-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/12731#discussion_r63997411 --- Diff: docs/ml-classification-regression.md --- @@ -62,6 +62,8 @@ For more background and more details about the implementation, refer to the docu

[GitHub] spark pull request: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.n...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12930#issuecomment-220529140 Is it okay to always cast the target label column to string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-15339] [ML] ML 2.0 QA: Scala APIs and c...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13129#issuecomment-220528642 Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15394][ML][DOCS] User guide typos and g...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13180#issuecomment-220527824 Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15398][ML] Update the warning message t...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13190#issuecomment-220527358 Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15363][ML][Example]:Example code should...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13213#issuecomment-220526674 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15172][ML] Explicitly tell user initial...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12948#issuecomment-220526389 Merged into branch-2.0 as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13101#issuecomment-220509850 Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13101#issuecomment-220497757 LGTM pending Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13101#discussion_r63978870 --- Diff: mllib/src/test/java/org/apache/spark/SharedSparkSession.java --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13101#discussion_r63978864 --- Diff: mllib/src/test/java/org/apache/spark/SharedSparkSession.java --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13101#discussion_r63978848 --- Diff: mllib/src/test/java/org/apache/spark/SharedSparkSession.java --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13101#discussion_r63978717 --- Diff: mllib/src/test/java/org/apache/spark/SharedSparkSession.java --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [Minor] [ML] [PySpark] ml.evaluation Scala and...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13195#issuecomment-220491522 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15341] [Doc] [ML] Add documentation for...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13131#issuecomment-220491339 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15341] [Doc] [ML] Add documentation for...

2016-05-19 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13131#discussion_r63977325 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -137,6 +137,13 @@ class GaussianMixtureModel private[ml

[GitHub] spark pull request: [SPARK-15341] [Doc] [ML] Add documentation for...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13131#issuecomment-220488159 I'm making a pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15414][MLlib] Make the mllib,ml linalg ...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13202#issuecomment-220487463 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15296][MLlib] Refactor All Java Tests t...

2016-05-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13101#issuecomment-220461336 Great to see 900 lines were removed:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

<    1   2   3   4   5   6   7   8   9   10   >