[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/12675 @GayathriMurali If you are not able to proceed, I can take over. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 close to trigger windows test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 Open to trigger --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
GitHub user wangmiao1981 reopened a pull request: https://github.com/apache/spark/pull/16800 [SPARK-19456][SparkR]:Add LinearSVC R API ## What changes were proposed in this pull request? Linear SVM classifier is newly added into ML and python API has been added. This JIRA

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-09 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/16800 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 @felixcheung I have addressed the comments. cc @yanboliang @hhbyyh Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710324 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710345 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710369 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710403 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710453 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LinearSVCWrapper.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-12 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100710481 --- Diff: R/pkg/R/mllib_utils.R --- @@ -35,7 +35,8 @@ #' @seealso \link{spark.als}, \link{spark.bisectingKmeans}, \link{spark.gaussianMi

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-13 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100929519 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100976153 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,116 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r100976207 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LinearSVCWrapper.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r101176118 --- Diff: R/pkg/R/generics.R --- @@ -1380,6 +1380,10 @@ setGeneric("spark.kstest", function(data, ...) { standardGeneric(&qu

[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16800#discussion_r101176046 --- Diff: R/pkg/R/mllib_classification.R --- @@ -39,6 +46,131 @@ setClass("MultilayerPerceptronClassificationModel", representation(jo

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/12675 #15777 has resolved this issue. We should close this one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #12675: [SPARK-14894][PySpark] Add result summary api to Gaussia...

2017-02-14 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/12675 @HyukjinKwon @srowen This should be closed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16800 @felixcheung I will do the example and vignettes today. For the document, I will wait for @hhbyyh to merge his main document first. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark pull request #16761: [BackPort-2.1][SPARK-19319][SparkR]:SparkR Kmeans...

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/16761 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16945: [SPARK-19616][SparkR]:weightCol and aggregationDe...

2017-02-15 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16945 [SPARK-19616][SparkR]:weightCol and aggregationDepth should be improved for some SparkR APIs ## What changes were proposed in this pull request? This is a follow-up PR of #16800

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-15 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @erikerlandson Are you still working on this PR? Thanks! Miao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @erikerlandson I am just helping clearing the stale PRs. :) I have no idea whether they have intention to accept it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 I will add tests. Now I am looking for dataset other than iris to be used in the document. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16969: [SPARK-19639][SPARKR][Example]:Add spark.svmLinea...

2017-02-16 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16969 [SPARK-19639][SPARKR][Example]:Add spark.svmLinear example and update vignettes ## What changes were proposed in this pull request? We recently add the spark.svmLinear API for SparkR

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Thanks for your review! I will address the comments soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 I add a test of weightCol for spark.logit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-17 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r101870770 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-17 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r101871069 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the

[GitHub] spark issue #16945: [SPARK-19616][SparkR]:weightCol and aggregationDepth sho...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16945 @felixcheung I have made suggested changes. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102308292 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102330772 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r102337526 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Thanks for your response. In the original JIRA, we have discussed why we want it to be a transformer. Let me find it and post it here. --- If your project is set up for it, you

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Joseph K. Bradley added a comment - 31/Oct/16 18:14 Miao Wang Sorry for the slow response here. I do want us to add PIC to spark.ml, but we should discuss the design before the PR

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 Yanbo Liang added a comment - 02/Nov/16 09:30 - edited I'm prefer to #1 and #3, but it looks like we can achieve both goals. Graph can be represented by GraphX/GraphFra

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/13440 @thunterdb Can you take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-21 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 I am checking ALS out to understand your suggestions. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-22 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb Per discussion with Yanbo, there is one concern of making it an Estimator. For every `transform`, there is an additional data shuffle. cc @yanboliang @jkbradley Thanks! --- If

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-22 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/17032 [SPARK-19460][SparkR]:Update dataset used in R documentation, examples to reduce warning noise and confusions ## What changes were proposed in this pull request? Replace `iris

[GitHub] spark issue #17032: [SPARK-19460][SparkR]:Update dataset used in R documenta...

2017-02-22 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17032 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102801880 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -25,20 +25,21 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-bisectingK

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102801930 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -565,11 +565,10 @@ We use a simple example to demonstrate `spark.logit` usage. In general, there

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102801964 --- Diff: R/pkg/R/mllib_tree.R --- @@ -143,14 +143,15 @@ print.summary.treeEnsemble <- function(x) { #' #' # fit a Gradient

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102802291 --- Diff: R/pkg/R/mllib_tree.R --- @@ -143,14 +143,15 @@ print.summary.treeEnsemble <- function(x) { #' #' # fit a Gradient

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102802725 --- Diff: examples/src/main/r/ml/glm.R --- @@ -25,11 +25,12 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-glm-ex

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102823636 --- Diff: examples/src/main/r/ml/glm.R --- @@ -25,11 +25,12 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-glm-ex

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102823739 --- Diff: examples/src/main/r/ml/bisectingKmeans.R --- @@ -25,20 +25,21 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-bisectingK

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102856602 --- Diff: examples/src/main/r/ml/kmeans.R --- @@ -26,10 +26,12 @@ sparkR.session(appName = "SparkR-ML-kmeans-example") # $

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @thunterdb @yanboliang Do we reach an agreement on whether to make it a transformer or an estimator now? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-23 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r102857809 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -565,11 +565,10 @@ We use a simple example to demonstrate `spark.logit` usage. In general, there

[GitHub] spark pull request #17032: [SPARK-19460][SparkR]:Update dataset used in R do...

2017-02-24 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17032#discussion_r103004945 --- Diff: examples/src/main/r/ml/glm.R --- @@ -25,12 +25,12 @@ library(SparkR) sparkR.session(appName = "SparkR-ML-glm-ex

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-26 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley Thanks for your reply! I quickly go through your suggestions. If I understand correctly, you prefer making it a `Transformer`, as we previously discussed, but changing the input

[GitHub] spark issue #17032: [SPARK-19460][SparkR]:Update dataset used in R documenta...

2017-02-26 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17032 @felixcheung I have made the changes per our review discussion. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16393: [SPARK-18993] [Build] Revert Split test-tags into...

2016-12-26 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16393#discussion_r93900810 --- Diff: common/tags/pom.xml --- @@ -34,6 +34,14 @@ tags + + + org.scalatest + scalatest_

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-03 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16464 [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer correctly ## What changes were proposed in this pull request? spark.lda passes the optimizer "em" or "online

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-04 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 cc @felixcheung @yanboliang A bug fix. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r94721880 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-05 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r94866311 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r94983934 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -123,6 +126,10 @@ private[r] object LDAWrapper extends MLReadable

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r94984564 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -172,6 +187,8 @@ private[r] object LDAWrapper extends MLReadable

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 `graph.vertices.aggregate(0.0)(seqOp, _ + _)` when calculating `logPrior` gets different values for the two models, even if all parameters of the Model are the same. I print out each

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 `graph.vertices.aggregate(0.0)(seqOp, _ + _)` the vertex sequence is different in the two models. VertexID sequence for original model: -8 -6 -3 10 -5 4 11 -1 0 1 -2 6 -7 7 8 -10 -9 9

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 I print out the values of each step of `seqOp`, `graph.vertices.aggregate(0.0)(seqOp, _ + _)` just returns the `seqOp` of the last vertex. VertexID sequence for original model: -8 -6

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 I think it is a bug. I will file a PR to fix it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...

2017-01-06 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16491 [SPARK-19110][ML][MLLIB]:DistributedLDAModel returns different logPrior for original and loaded model ## What changes were proposed in this pull request? While adding

[GitHub] spark issue #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel returns dif...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16491 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 @felixcheung PR #16491 is filed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 Sure. I will revert it to the previous commit once the 16491 is in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16491#discussion_r95036829 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -260,6 +260,14 @@ class LDASuite extends SparkFunSuite with

[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...

2017-01-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16491#discussion_r95045854 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -260,6 +260,14 @@ class LDASuite extends SparkFunSuite with

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 @felixcheung I made modifications and don't save the two metrics of DistributedModels. Thanks! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #16523: [SPARK-19142][SparkR]:spark.kmeans should take se...

2017-01-09 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16523 [SPARK-19142][SparkR]:spark.kmeans should take seed, initSteps, and tol as parameters ## What changes were proposed in this pull request? spark.kmeans doesn't have interface t

[GitHub] spark pull request #16524: [SPARK-19110][MLLIB][FollowUP]: Add a unit test

2017-01-09 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16524 [SPARK-19110][MLLIB][FollowUP]: Add a unit test ## What changes were proposed in this pull request? #16491 added the fix to mllib and a unit test to ml. This followup PR, add unit tests

[GitHub] spark issue #16523: [SPARK-19142][SparkR]:spark.kmeans should take seed, ini...

2017-01-09 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16523 cc @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16523: [SPARK-19142][SparkR]:spark.kmeans should take se...

2017-01-10 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16523#discussion_r95429832 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -204,11 +208,16 @@ setMethod("write.ml", signature(object = "GaussianMixtureModel&qu

[GitHub] spark pull request #16523: [SPARK-19142][SparkR]:spark.kmeans should take se...

2017-01-10 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16523#discussion_r95429974 --- Diff: R/pkg/inst/tests/testthat/test_mllib_clustering.R --- @@ -99,7 +99,8 @@ test_that("spark.kmeans", { take(t

[GitHub] spark issue #16524: [SPARK-19110][MLLIB][FollowUP]: Add a unit test for test...

2017-01-10 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16524 @srowen Title is updated. Unit tests for [ML] have been added in the original fix, but the MLLIB case is not added. So this followup adds back a simple unit test for the two parameters of

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-10 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r95436115 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala --- @@ -45,6 +45,11 @@ private[r] class LDAWrapper private ( import

[GitHub] spark pull request #16523: [SPARK-19142][SparkR]:spark.kmeans should take se...

2017-01-11 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16523#discussion_r95640984 --- Diff: R/pkg/inst/tests/testthat/test_mllib_clustering.R --- @@ -99,7 +99,8 @@ test_that("spark.kmeans", { take(t

[GitHub] spark issue #16523: [SPARK-19142][SparkR]:spark.kmeans should take seed, ini...

2017-01-12 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16523 @felixcheung I will review these items after wrapping up my current work. Now I am working on two items: The bug 18011; and bisecting kmeans. bisecting kmeans should be ready soon. Bug

[GitHub] spark pull request #16566: [SparkR]: add bisecting kmeans R wrapper

2017-01-12 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16566 [SparkR]: add bisecting kmeans R wrapper ## What changes were proposed in this pull request? Add R wrapper for bisecting Kmeans. As JIRA is down, I will update title to link

[GitHub] spark issue #16566: [SparkR]: add bisecting kmeans R wrapper

2017-01-13 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16566 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-13 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/16464 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-13 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r96047751 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-13 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r96049008 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-13 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r96049147 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),

[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r96125283 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r96125393 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-15 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r96165457 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @

[GitHub] spark pull request #15365: [SPARK-17157][SPARKR]: Add multiclass logistic re...

2016-10-05 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/15365 [SPARK-17157][SPARKR]: Add multiclass logistic regression SparkR Wrapper ## What changes were proposed in this pull request? As we discussed in #14818, I added a separate R wrapper

[GitHub] spark pull request #14818: [SPARK-17157][SPARKR][WIP]: Add multiclass logist...

2016-10-05 Thread wangmiao1981
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/14818 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14818: [SPARK-17157][SPARKR][WIP]: Add multiclass logistic regr...

2016-10-05 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14818 @felixcheung I opened #15365 based on our discussion in this PR. I close this PR now and please review #15365 . Thanks! --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15365 @felixcheung When run check-cran, there are errors: Error : requireNamespace("e1071", quietly = TRUE) is not TRUE Error : requireNamespace("e1071", quietly

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15365 From Jekins, I saw the error: WARNING: There was 1 warning. NOTE: There were 3 notes. See '/home/jenkins/workspace/SparkPullRequestBuilder/R/SparkR.Rcheck/00check.log

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15365 I see the error message on local test: LaTeX errors when creating PDF version. This typically indicates Rd problems. * checking PDF version of manual without hyperrefs or index

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15365 @vectorijk Thanks for your information! I installed e1071 and installed tex package. I just want to find what causes the error. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15365 My local tests passed. Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

  1   2   3   4   5   6   7   >