[GitHub] spark pull request #15140: [SPARK-17585][PySpark][Core] PySpark SparkContext...

2016-09-22 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15140#discussion_r80030192 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -670,6 +670,19 @@ class JavaSparkContext(val sc: SparkContext

[GitHub] spark issue #15113: [SPARK-17508][PYSPARK][ML] PySpark treat Param values No...

2016-09-22 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15113 @BryanCutler Thanks for working on this. I'm a bit worried that if users set ```weightCol = None``` for Python means he would like to set ```weightCol = null``` for Scala. The c

[GitHub] spark issue #15113: [SPARK-17508][PYSPARK][ML] PySpark treat Param values No...

2016-09-22 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15113 Further more, ```weightCol=None``` in the Python API doc may be confused for users, I think we can add some annotations to clarify the meanings. Thanks. --- If your project is set up for it

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-23 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15214 [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector and add ML Python API. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...

2016-09-23 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15131 @HyukjinKwon Sounds good. Do you think only backport the URI related change is OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15215: [Minor][SparkR] Add sparkr-vignettes.html to giti...

2016-09-23 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15215 [Minor][SparkR] Add sparkr-vignettes.html to gitignore. ## What changes were proposed in this pull request? Add ```sparkr-vignettes.html``` to ```.gitignore```. ## How was this

[GitHub] spark issue #15215: [Minor][SparkR] Add sparkr-vignettes.html to gitignore.

2016-09-23 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15215 cc @shivaram @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15216: [SPARK-17577][Follow-up][SparkR] SparkR spark.add...

2016-09-23 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15216 [SPARK-17577][Follow-up][SparkR] SparkR spark.addFile supports adding directory recursively ## What changes were proposed in this pull request? #15140 exposed ```JavaSparkContext.addFile

[GitHub] spark pull request #15217: [SPARK-17577][Core] Update SparkContext.addFile t...

2016-09-23 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15217 [SPARK-17577][Core] Update SparkContext.addFile to make it work well on Windows [2.0 backport] ## What changes were proposed in this pull request? Update ```SparkContext.addFile``` to

[GitHub] spark issue #15217: [SPARK-17577][Core] Update SparkContext.addFile to make ...

2016-09-23 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15217 cc @HyukjinKwon @sarutak @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...

2016-09-23 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15131 Opened backport PR at #15217. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15217: [SPARK-17577][Core] Update SparkContext.addFile t...

2016-09-23 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/15217 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15217: [SPARK-17577][Core] Update SparkContext.addFile to make ...

2016-09-23 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15217 Close this PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-24 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80355100 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -143,13 +149,13 @@ final class ChiSqSelector @Since("

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-24 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80355103 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -160,6 +166,12 @@ final class ChiSqSelector @Since("

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-24 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80355108 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala --- @@ -76,7 +76,7 @@ class ChiSqSelectorSuite extends

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-24 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15214 @mpjlu The most important cause of this change is that the fit/train model should not dependent on the order of users setting params. In other words, users should get the same model whether set

[GitHub] spark issue #15215: [Minor][SparkR] Add sparkr-vignettes.html to gitignore.

2016-09-24 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15215 Merged into master. Thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-09-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r80376680 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -116,7 +117,7 @@ class

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-09-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r80376849 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala --- @@ -282,9 +281,7 @@ class MLUtilsSuite extends SparkFunSuite with

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-09-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r80376739 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala --- @@ -57,8 +58,7 @@ class MinMaxScalerSuite extends SparkFunSuite

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-09-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r80376695 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala --- @@ -42,9 +43,10 @@ class RegressionEvaluatorSuite

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-09-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14035 @HyukjinKwon I have made a pass and this PR look good overall. Could you double check whether all ML test cases are covered? Since I found we used implicit import of different style at

[GitHub] spark pull request #15216: [SPARK-17577][Follow-up][SparkR] SparkR spark.add...

2016-09-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15216#discussion_r80377484 --- Diff: R/pkg/R/context.R --- @@ -231,17 +231,21 @@ setCheckpointDir <- function(sc, dirName) { #' filesystems), or an HTTP, HTTPS or FTP

[GitHub] spark pull request #15216: [SPARK-17577][Follow-up][SparkR] SparkR spark.add...

2016-09-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15216#discussion_r80377496 --- Diff: R/pkg/R/context.R --- @@ -231,17 +231,21 @@ setCheckpointDir <- function(sc, dirName) { #' filesystems), or an HTTP, HTTPS or FTP

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15214 @srowen @mpjlu Another important reason for this change: it's error prone for Python ML API. ``` def __init__(self, numTopFeatures=50, featuresCol="features",

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15214 And you can also refer all other Estimator in ML, even you swap the arguments setting order, you still get the same model. Thanks. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-09-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r80449593 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala --- @@ -29,8 +29,7 @@ class ChiSqSelectorSuite extends SparkFunSuite

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-09-26 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14035 LGTM, merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15216: [SPARK-17577][Follow-up][SparkR] SparkR spark.addFile su...

2016-09-26 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15216 Merged into master, thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14852: [SPARK-17138][ML][MLib] Add Python API for multinomial l...

2016-09-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14852 LGTM2, merged into master. Thanks! @WeichenXu123 @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #15261: [SPARK-16356][Follow-up][ML] Enforce ML test of e...

2016-09-27 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15261 [SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. ## What changes were proposed in this pull request? #14035 added ```testImplicits``` to ML unit

[GitHub] spark issue #15261: [SPARK-16356][Follow-up][ML] Enforce ML test of exceptio...

2016-09-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15261 cc @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15261: [SPARK-16356][Follow-up][ML] Enforce ML test of exceptio...

2016-09-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15261 @HyukjinKwon The root cause of this is Spark supported creating local Dataset which may not trigger a Spark job. This satisfied the design of Dataset, and in most case the have the same behavior

[GitHub] spark issue #15261: [SPARK-16356][Follow-up][ML] Enforce ML test of exceptio...

2016-09-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15261 @srowen Would you mind to have a look when you are available? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15277 [SPARK-17704][ML][MLlib] ChiSqSelector performance improvement. ## What changes were proposed in this pull request? Several performance improvement for ```ChiSqSelector```: 1, Keep

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-28 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r80863514 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -220,18 +231,22 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-28 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r80863287 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector method...

2016-09-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15212 @mpjlu I made some changes to improve ```ChiSqSelector``` performance at #15277. Let work together to get that in first, and then we can work on this. Thanks! --- If your project is set up for

[GitHub] spark issue #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15277 cc @mpjlu @srowen @avulanov --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15261: [SPARK-16356][Follow-up][ML] Enforce ML test of exceptio...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15261 Merged into master, thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15261: [SPARK-16356][Follow-up][ML] Enforce ML test of e...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15261#discussion_r81082669 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala --- @@ -121,10 +119,17 @@ class VectorIndexerSuite extends

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r81085184 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r81101636 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81106187 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala --- @@ -150,6 +150,54 @@ class NaiveBayesSuite extends

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81105095 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -27,11 +27,14 @@ import org.json4s.jackson.JsonMethods

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81105041 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -27,11 +27,14 @@ import org.json4s.jackson.JsonMethods

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81105501 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -355,79 +356,33 @@ class NaiveBayes private

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81106353 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala --- @@ -150,6 +150,54 @@ class NaiveBayesSuite extends

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81107309 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -355,79 +356,33 @@ class NaiveBayes private

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12819 @zhengruifeng Only left some minor comments, otherwise, looks good. I think we should also make parity check between the ml and mllib test suites, and complement missing test cases for ml since

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r8257 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r81116099 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark issue #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15277 Merged into master. Thanks for all your review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15277 @srowen I'm sorry for misunderstand. I'll revert it firstly and let's continue the discussion. Thanks. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #15298: Revert "[SPARK-17704][ML][MLLIB] ChiSqSelector pe...

2016-09-29 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15298 Revert "[SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement." ## What changes were proposed in this pull request? Revert "[SPARK-17704][ML][MLLIB] ChiSqSelec

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r81119230 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark issue #15298: Revert "[SPARK-17704][ML][MLLIB] ChiSqSelector performan...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15298 @srowen Sure. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15211: [SPARK-14709][ML] [WIP] spark.ml API for linear S...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r81136637 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/SVM.scala --- @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15211: [SPARK-14709][ML] [WIP] spark.ml API for linear S...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r81136865 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/SVM.scala --- @@ -0,0 +1,527 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15298: Revert "[SPARK-17704][ML][MLLIB] ChiSqSelector pe...

2016-09-29 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/15298 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15298: Revert "[SPARK-17704][ML][MLLIB] ChiSqSelector performan...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15298 OK, I'll close this one and move to #15299 . Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 @srowen I think this PR may fail MiMa tests, since it makes binary incompatible change. The major disagreement between this and #15277 is whether to keep ```selectedFeatures``` sorted. I think

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 We need ```sort``` cost in any case, and put it in fit/training or model has no difference. So I think if we want to introduce this binary incompatible change, there should be strong

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 Oh, ```isSorted``` is left and it's not introduce binary incompatible right now. Thanks for your remind. I'm neutral for this change. Thanks! --- If your project is set up for it, you

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 @srowen It looks strange to left it protected, and deprecating it looks ok to me except someone tells me any reason. BTW, please update Python API docs to reflect that ```selectedFeatures``` is

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r81285828 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -355,79 +356,33 @@ class NaiveBayes private

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12819 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12819 @zhengruifeng Please create JIRAs for the follow-up works: * Parity check between the ml and mllib test suites, and complement missing test cases for ml. * Investigate how to handle ```-1

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 @MLnick This change will not break binary compatibility currently. It marks ```isSorted``` as deprecated and will break binary compatibility when we delete that method. This should be not a big

[GitHub] spark issue #15313: [SPARK-14077][ML][FOLLOW-UP] Revert change for NB Model'...

2016-09-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15313 LGTM, merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14937: [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KMeans i...

2016-10-03 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14937 @srowen Please feel free to send that PR. This PR involves some significant change and should be careful discussed, it may not be merged too fast. Thanks! --- If your project is set up for it

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14150#discussion_r70820419 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala --- @@ -784,7 +784,13 @@ class DistributedLDAModel private[clustering

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14150#discussion_r70821003 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -508,8 +508,9 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14150#discussion_r70821947 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/LDASuite.scala --- @@ -118,8 +118,8 @@ class LDASuite extends SparkFunSuite with

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14150#discussion_r70822797 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -42,7 +43,9 @@ class PCASuite extends SparkFunSuite with

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14150#discussion_r70823371 --- Diff: dev/deps/spark-deps-hadoop-2.7 --- @@ -163,6 +163,7 @@ scala-parser-combinators_2.11-1.0.4.jar scala-reflect-2.11.8.jar scala-xml_2.11

[GitHub] spark issue #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-14 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14150 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-17 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14150 @srowen I found no obvious compatibility issues after reading the release notes. If this looks good, please let it get in, since [SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181

[GitHub] spark issue #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-18 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14150 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-18 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14150#discussion_r71151532 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaPCASuite.java --- @@ -107,7 +107,11 @@ public VectorPair call(Tuple2 pair

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2016-07-23 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/14326 [SPARK-3181] [ML] Implement RobustRegression with huber loss. ## What changes were proposed in this pull request? The current implementation is a straight forward porting for Python scikit

[GitHub] spark issue #14265: [PySpark] add picklable SparseMatrix in pyspark.ml.commo...

2016-07-24 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14265 LGTM, merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2016-07-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r72031054 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,473 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2016-07-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r72031141 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #14326: [SPARK-3181] [ML] Implement RobustRegression with huber ...

2016-07-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14326 cc @dbtsai @MechCoder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...

2016-07-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r72033498 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14346: [SPARK-16710] [SparkR] [ML] spark.glm should supp...

2016-07-25 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/14346 [SPARK-16710] [SparkR] [ML] spark.glm should support weightCol ## What changes were proposed in this pull request? Training GLMs on weighted dataset is very important use cases. Users can

[GitHub] spark pull request #14369: [Minor] [ML] Fix some mistake in LinearRegression...

2016-07-26 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/14369 [Minor] [ML] Fix some mistake in LinearRegression formula. ## What changes were proposed in this pull request? Fix some mistake in ```LinearRegression``` formula. ## How was this

[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...

2016-07-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72376639 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/IsotonicRegressionWrapper.scala --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...

2016-07-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72377343 --- Diff: R/pkg/NAMESPACE --- @@ -24,7 +24,8 @@ exportMethods("glm", "spark.kmeans",

[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...

2016-07-26 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72377491 --- Diff: R/pkg/R/mllib.R --- @@ -292,6 +299,43 @@ setMethod("summary", signature(object = "NaiveBayesModel"),

[GitHub] spark pull request #14378: [SPARK-16750] [ML] Fix GaussianMixture training f...

2016-07-27 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/14378 [SPARK-16750] [ML] Fix GaussianMixture training failed due to feature column type mistake ## What changes were proposed in this pull request? ML ```GaussianMixture``` training failed due to

[GitHub] spark issue #14378: [SPARK-16750] [ML] Fix GaussianMixture training failed d...

2016-07-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14378 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...

2016-07-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72558746 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,9 @@ test_that("spark.survreg", { } })

[GitHub] spark pull request #14378: [SPARK-16750] [ML] Fix GaussianMixture training f...

2016-07-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14378#discussion_r72560646 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala --- @@ -111,7 +111,7 @@ class MinMaxScaler @Since("1.5.0") (@Si

[GitHub] spark pull request #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Mode...

2016-07-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/14392 [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapper in SparkR ## What changes were proposed in this pull request? Gaussian Mixture Model wrapper in SparkR, similarly to R&#

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-07-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14392 cc @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #14346: [SPARK-16710] [SparkR] [ML] spark.glm should support wei...

2016-07-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14346 cc @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #14378: [SPARK-16750] [ML] Fix GaussianMixture training f...

2016-07-28 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14378#discussion_r72623737 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala --- @@ -111,7 +111,7 @@ class MinMaxScaler @Since("1.5.0") (@Si

<    5   6   7   8   9   10   11   12   13   14   >