[GitHub] spark issue #16727: [SPARK-19421][ML][PySpark] Remove numClasses and numFeat...

2017-02-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16727 ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16776: [SPARK-19436][SQL] Add missing tests for approxQuantile

2017-02-02 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16776 cc @gatorsmile @HyukjinKwon @holdenk @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-02-02 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @MLnick I created SPARK-19436[https://issues.apache.org/jira/browse/SPARK-19436] for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #16776: [SPARK-14352][FOLLOWUP][SQL] add tests for approx...

2017-02-02 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16776 [SPARK-14352][FOLLOWUP][SQL] add tests for approxQuantile & ## What changes were proposed in this pull request? 1, check the behavior with illegal `quantiles` and `relativeError`

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-02-01 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @HyukjinKwon @gatorsmile Thanks for pointing out those issues. I will create a followup PR to fix them ASAP. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16763: [SPARK-19422][ML] Cache input data in algorithms

2017-02-01 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16763 @hhbyyh Thanks a lot for pointing this out! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16763: [SPARK-19422][ML] Cache input data in algorithms

2017-02-01 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16763 Jenkins, retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16763: [SPARK-19422][ML] Cache input data in algorithms

2017-02-01 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16763 [SPARK-19422][ML] Cache input data in algorithms ## What changes were proposed in this pull request? cache the input data in `DecisionTreeClassifier`, `GBTClassifier

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-31 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @holdenk Updated! Thanks for your careful checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16727: [SPARK-19421][ML][PySpark] Remove numClasses and numFeat...

2017-01-31 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16727 @holdenk I created another jira to track this issue. Thanks all for revewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #16754: [SPARK-19410][DOC] Fix brokens links in ml-pipeli...

2017-01-31 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16754 [SPARK-19410][DOC] Fix brokens links in ml-pipeline and ml-tuning ## What changes were proposed in this pull request? Fix brokens links in ml-pipeline and ml-tuning

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-29 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16028 ping @yanboliang ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16727: [SPARK-19336][FollowUp][ML][PySpark] Remove numClasses a...

2017-01-29 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16727 ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-29 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 ping @holdenk ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16727: [SPARK-19336][FollowUp][ML][PySpark] Remove numCl...

2017-01-28 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16727 [SPARK-19336][FollowUp][ML][PySpark] Remove numClasses and numFeatures methods in LinearSVC ## What changes were proposed in this pull request? Methods `numClasses` and `numFeatures` in

[GitHub] spark issue #16718: [SPARK-19384][ML] forget unpersist input dataset in Isot...

2017-01-27 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16718 @srowen I have use the regex to check other algs `find mllib/src -name '*.scala' | xargs -i bash -c 'egrep "handlePersistence" -n {} && echo {}'`

[GitHub] spark pull request #16718: [SPARK-19384][ML] forget unpersist input dataset ...

2017-01-27 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16718 [SPARK-19384][ML] forget unpersist input dataset in IsotonicRegression ## What changes were proposed in this pull request? unpersist the input dataset if `handlePersistence` = true

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r98170182 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -336,14 +361,15 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r98169691 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -336,14 +361,15 @@ class

[GitHub] spark pull request #12135: [SPARK-14352][SQL] approxQuantile should support ...

2017-01-27 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/12135#discussion_r98167651 --- Diff: python/pyspark/sql/tests.py --- @@ -835,11 +835,20 @@ def test_first_last_ignorenulls(self): self.assertEqual([Row(a=None, b=1, c

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-25 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16699 you should not modify sharedParams directly. "// DO NOT MODIFY THIS FILE! It was generated by SharedParamsCodeGen." And if there is no other algorithms inheriting ha

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 @srowen I agree that metric should be irrelevant to details of the algorithms. AUC is irrelevant to algorithms, it is just relevant to the dataset: In spark-ml, scikit-learn, or any other

[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16661 BTW, it maybe nice to add a `SymmetricMatrix` class, for symmetric matrice are widely used in computation of covariance/concurrence/etc --- If your project is set up for it, you can reply to

[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97700702 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -272,6 +277,10 @@ class GaussianMixture private

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 yes spark-18285 https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18285 --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-23 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Existing metrics (WSSSE,Loglikelihood) are relevant to detail of algorithm. Computation of WSSSE for KMeans/BisectKMeans use the average vectors as the centers, but for KMedoids the medoids

[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...

2017-01-23 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16171 cc @yanboliang @sethah @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

2017-01-23 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97478414 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -486,6 +491,9 @@ class GaussianMixture @Since("

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-23 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @MLnick @jkbradley Could you mind making a final pass? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 @srowen The concept of `center` don't exist in DBSCAN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 @srowen I think I had not clarify my thoughts. WSSSE and Loglikelihood are algorithm-specific metrics. For example: WSSSE dont make sense for clustering algorithms like DBSCAN

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 re-ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-21 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 I think now clustering metrics are not that general, comparing with classification/regression metrics: WSSSE only apply to `KMeans` and `BiKMeans` Loglikelihood only apply to `GMM

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-01-19 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15770#discussion_r97021451 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -0,0 +1,182 @@ +/* + * Licensed to the

[GitHub] spark pull request #16457: [SPARK-19057][ML] Instances' weight must be non-n...

2017-01-19 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/16457 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-19 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16457 I think it better to discuss in the JIRA. When we come to an agreement, I will reopen this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16654: [SPARK-19303][ML][WIP] Add evaluate method in clu...

2017-01-19 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16654 [SPARK-19303][ML][WIP] Add evaluate method in clustering models ## What changes were proposed in this pull request? 1, add evaluation metric in summary 2, add an evaluate() method

[GitHub] spark issue #16571: [SPARK-19208][ML][WIP] MaxAbsScaler and MinMaxScaler are...

2017-01-19 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16571 In the jira, we decide to optimize MultivariateOnlineSummarizer first, so this pr will be closed. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16571: [SPARK-19208][ML][WIP] MaxAbsScaler and MinMaxSca...

2017-01-19 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/16571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-18 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-18 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12064 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-18 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-17 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12064 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-17 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 ping @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-16 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12064 ping @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-16 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 re-ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-16 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15671 re-ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15324 @jkbradley What's your opinion about whether GNB should be a separated Classifier or a modeltype in existing NB? --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15671 @jkbradley Updated. Thanks for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12064 @yanboliang Updated! Thanks for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16539: [SPARK-8855][MLlib][PySpark] Python API for Association ...

2017-01-11 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16539 I think the mllib in the python side is also in maintenance mode, and we should only fix bugs for it. @yanboliang am I right? For this pr, I think it's reasonable to wait for the porti

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12064 ping @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15831 @techaddict @sethah I have some time to work on the porting, but I dont find the umbrella JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12064 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15831: [SPARK-18385][ML] Make the transformer's natively in ml ...

2017-01-09 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15831 the same TODO also appear in `HashingTF`, what about include it in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #16471: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,Sta...

2017-01-09 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/16471 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16471: [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardSc...

2017-01-09 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16471 @sethah Yes. I will close the duplicated PR and JIRA, and help to reivewing that PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

2017-01-09 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16471 @imatiach-msft Thanks for your suggestion and reviewing. I start this PR because I found that in source same `// TODO: Make the transformer natively in ml framework to avoid extra conversion

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-09 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15671 ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15211#discussion_r94723792 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -0,0 +1,554 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94718428 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94717710 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94717542 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94717473 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94717245 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94717055 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94716584 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94716372 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94716281 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94716047 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94715820 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r94715503 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 @jkbradley Updated. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15671 @jkbradley Update according to your comments, including adding `quantileProbabilities` and `docConcentration`. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94712844 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -905,7 +911,10 @@ class LDA @Since("1.6.0") (

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94709674 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -225,7 +230,7 @@ class LinearRegression @Since("

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94709552 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala --- @@ -227,6 +227,11 @@ class AFTSurvivalRegression @Since

[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/15671#discussion_r94709216 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -888,6 +888,12 @@ class LDA @Since("1.6.0") ( @Si

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 Updated. Thanks for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16471: [SPARK-19078] PCAModel.transform avoid extra vector conv...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16471 cc @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #12578: [SPARK-10496][SQL] Add DataFrame cumulative sum

2017-01-04 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/12578 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #12578: [SPARK-10496][SQL] Add DataFrame cumulative sum

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12578 This PR is out of date, I think it's time to close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request #11411: [SPARK-13385][MLlib] Enable AssociationRules to g...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/11411 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16471: [SPARK-19078] PCAModel.transform avoid extra vect...

2017-01-04 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16471 [SPARK-19078] PCAModel.transform avoid extra vector conversion ## What changes were proposed in this pull request? As suggested in the source, avoid the vector conversion in

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16457 Agreed. Now five algs inherit `HasWeightCol`: GLR/LoR/LiR/NB/IsotonicReg I found that some algs use `RDD[Instance]` in `train` : GLR/LoR/LiR ``` val instances: RDD[Instance

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16457 @srowen OK. This is the list of algs that deals with weights: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16453: [SPARK-19054][ML] Eliminate extra pass in NB

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16453 Updated. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #11520: [SPARK-13677][MLLIB] Support Tree-Based Feature T...

2017-01-03 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/11520 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16457: [SPARK-19057][ML] Instances' weight must be non-n...

2017-01-03 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16457 [SPARK-19057][ML] Instances' weight must be non-negative ## What changes were proposed in this pull request? 1, add non-negative checking in `Instance` 2, fix doc ## Ho

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/12135 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #11303: [SPARK-13435] [MLlib] Add Weighted Cohen's kappa ...

2017-01-03 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/11303 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r94382823 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -101,7 +100,7 @@ private[classification

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r94382491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -421,6 +435,18 @@ object LinearRegression extends

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15324 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16028 cc @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15671 ping @sethah @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-02 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 ping @srowen @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

<    1   2   3   4   5   6   7   8   9   >