[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110982433 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -287,6 +290,16 @@ class LinearSVCModel private[classification

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110981812 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110978675 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -287,6 +290,16 @@ class LinearSVCModel private[classification

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110980991 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LinearSVCExample.scala --- @@ -44,6 +44,12 @@ object LinearSVCExample

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r110981511 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable

[GitHub] spark pull request #17461: [SPARK-20082][ml][WIP] LDA incremental model lear...

2017-04-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r110827869 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -315,6 +315,27 @@ class LDA private

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-04-09 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 No unit test is added for now as I'm not sure if this is something that would interests the community. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-04-09 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17583 [SPARK-20271]Add FuncTransformer to simplify custom transformer creation ## What changes were proposed in this pull request? Just to share some code I implemented to help easily create

[GitHub] spark pull request #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence...

2017-04-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17336#discussion_r109745116 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -85,38 +85,58 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark pull request #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence...

2017-04-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17336#discussion_r109741396 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -85,38 +85,58 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-31 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17336 The major thing I'm concerned is that `transform` will have to recompute the association rules each time it's invoked. If that's not a problem, changing association rules to method would be much

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17336 ping @jkbradley as this is something we should fix before release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17478: [SPARK-18901][ML]:Require in LR LogisticAggregator is re...

2017-03-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17478 Thanks for @wangmiao1981 for the PR and @sethah for the comments. Maybe I should be more clear when I created the jira. I would prefer to remove the require here permanently

[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-29 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17324 The test was interrupted and need a retest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-29 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r108600468 --- Diff: docs/ml-frequent-pattern-mining.md --- @@ -0,0 +1,75 @@ +--- +layout: global +title: Frequent Pattern Mining +displayTitle

[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-29 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17324#discussion_r108597395 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-27 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 Updated with Python example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17324#discussion_r108235749 --- Diff: examples/src/main/python/ml/imputer.py --- @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17324#discussion_r108234190 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalc...

2017-03-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108094236 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,22 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalc...

2017-03-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108094126 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -385,6 +385,22 @@ class

[GitHub] spark pull request #17407: [SPARK-20043][ML] DecisionTreeModel: ImpurityCalc...

2017-03-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17407#discussion_r108094114 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -178,6 +178,22 @@ class DecisionTreeRegressorSuite

[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17324 Updated with python example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

2017-03-23 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17218 @jkbradley Regarding the question, in most definition Association Rules are defined between two ItemSets and ArrayType seems to be a more intuitive choice for me. It just happens

[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...

2017-03-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17361 @tdas Just FYI, I'm getting lint-java error: yuhao@yuhao-devbox:~/workspace/github/hhbyyh/spark$ ./dev/lint-java ~Using `mvn` from path: /usr/bin/mvn Checkstyle checks failed

[GitHub] spark pull request #17014: [SPARK-18608][ML] Fix double-caching in ML algori...

2017-03-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17014#discussion_r107515202 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -110,12 +111,17 @@ class DecisionTreeClassifier

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-03-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17014 I'm trying to refresh my memory and clear the targets on the topic, basically we want to achieve: 1. Avoid double caching. If Input Dataset is already cached, then we should not cache

[GitHub] spark pull request #17326: [SPARK-19985][ML] Fixed copy method for some ML M...

2017-03-17 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17326#discussion_r106765091 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -74,6 +74,7 @@ class

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-03-17 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17014 Hi @zhengruifeng , is there any update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence...

2017-03-17 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17336#discussion_r106721298 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -95,28 +125,17 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-03-17 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17336 ping @jkbradley and @srowen to be aware of the issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence...

2017-03-17 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17336 [SPARK-20003] [ML] FPGrowthModel setMinConfidence should affect rules generation and transform ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-16 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 Please hold on merging this until https://github.com/apache/spark/pull/17321 is resolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

2017-03-16 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17324 [SPARK-19969] [ML] Imputer doc and example ## What changes were proposed in this pull request? Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples

[GitHub] spark pull request #17316: [SPARK-15040][ML][PYSPARK] Add Imputer to PySpark

2017-03-16 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17316#discussion_r106488785 --- Diff: python/pyspark/ml/feature.py --- @@ -871,6 +872,164 @@ def idf(self): @inherit_doc +class Imputer(JavaEstimator, HasInputCols

[GitHub] spark pull request #17316: [SPARK-15040][ML][PYSPARK] Add Imputer to PySpark

2017-03-16 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17316#discussion_r106489373 --- Diff: python/pyspark/ml/feature.py --- @@ -871,6 +872,164 @@ def idf(self): @inherit_doc +class Imputer(JavaEstimator, HasInputCols

[GitHub] spark pull request #17316: [SPARK-15040][ML][PYSPARK] Add Imputer to PySpark

2017-03-16 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17316#discussion_r106490851 --- Diff: python/pyspark/ml/feature.py --- @@ -871,6 +872,164 @@ def idf(self): @inherit_doc +class Imputer(JavaEstimator, HasInputCols

[GitHub] spark pull request #11780: [SPARK-8884][MLlib] 1-sample Anderson-Darling Goo...

2017-03-15 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/11780 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #11780: [SPARK-8884][MLlib] 1-sample Anderson-Darling Goodness-o...

2017-03-15 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11780 Close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-15 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 Refined some comments and minor things. This should be ready for review. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2017-03-15 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17280 I'll first focus on https://github.com/apache/spark/pull/17130 and resolve conflict here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-14 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 Thanks for the review. I'll wait for https://github.com/apache/spark/pull/17283 to be merged first. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

2017-03-13 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/13656 Close this and add the support to ml.fpm. https://github.com/apache/spark/pull/17280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

2017-03-13 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/13656 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #17283: [SPARK-19940][ML][MINOR] FPGrowthModel.transform ...

2017-03-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17283#discussion_r105813634 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -103,6 +103,22 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark pull request #17283: [SPARK-19940][ML][MINOR] FPGrowthModel.transform ...

2017-03-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17283#discussion_r105813424 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -103,6 +103,22 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark pull request #17283: [SPARK-19940][ML][MINOR] FPGrowthModel.transform ...

2017-03-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17283#discussion_r105813550 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -103,6 +103,22 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark pull request #17283: [SPARK-19940][ML][MINOR] FPGrowthModel.transform ...

2017-03-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17283#discussion_r105798334 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -103,6 +103,22 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark pull request #17283: [SPARK-19940][ML][MINOR] FPGrowthModel.transform ...

2017-03-13 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17283#discussion_r105798138 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -103,6 +103,22 @@ class FPGrowthSuite extends SparkFunSuite

[GitHub] spark pull request #17280: [SPARK-19939] [ML] Add support for association ru...

2017-03-13 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17280 [SPARK-19939] [ML] Add support for association rules in ML ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-19939 Adding another

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r105518166 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -56,8 +56,8 @@ private[fpm] trait FPGrowthParams extends Params

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Thanks @MLnick for being the Shepherd and providing consistent help on discussion and review. The performance test matches what I got from my local environment. --- If your project is set up

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-06 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104524697 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-06 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104516526 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104280679 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Hi @MLnick I changed the surrogateDF format for better extensibility in the last update and added unit tests for multi-column support. Let me know if I miss anything. inputCol1|inputCol2

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104258573 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104258382 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104257956 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104257857 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r104257741 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Thanks a lot for making a pass @MLnick. The last update mainly focus on the interface and behavior change. I'll make a pass and also address your comments. --- If your project is set up

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-02 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 ping @jkbradley since we're changing the FPGrowth `transform`. Sean made a great suggestion to simplify `transform` code. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r104039923 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r104036137 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -56,8 +56,8 @@ private[fpm] trait FPGrowthParams extends Params

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-02 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r104010510 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -240,12 +240,13 @@ class FPGrowthModel private[ml] ( val predictUDF

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-01 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17130 [SPARK-19791] [ML] Add doc and example for fpgrowth ## What changes were proposed in this pull request? Add a new section for fpm Add Example for FPGrowth in scala and Java

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 Sorry to miss your comments. I can send a follow-up together with document. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-27 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17090 the same as https://github.com/apache/spark/pull/12574 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 > Btw, I could imagine us wanting to change this later. If we're recommending items a user could add to their basket, then we might want to suggest the most frequent item rather than noth

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 Thanks @jkbradley for contributing the code. That helps a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 Hi @jkbradley After further performance comparison, I found using broadcast would give much better performance for the transform. I tested with some public data from http://fimi.ua.ac.be

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102860331 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102856168 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102855117 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 I tried a few different ways to implement the transform. https://gist.github.com/hhbyyh/889b88ae2176d1263fdc9dd3e29d1c2d. The performance actually are similiar, while the current one can

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102840184 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102840479 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102647844 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102646065 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Looks like CI was interrupted. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73268/console --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Sent an update to add multi-column support. Let me know if this is not what you have in mind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 Hi @jkbradley We can hold the transform code. > wrap the old AssociationRules code Do you mean to make transform return the Association Rules DataFrame, like the curr

[GitHub] spark issue #17014: [SPARK-18608][ML][WIP] Fix double-caching in ML algorith...

2017-02-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17014 It's better if we can fix this without breaking API. Let's allow some time to see if there's a better solution. Meanwhile, if we have to add the new parameter, can we set some default value

[GitHub] spark pull request #17014: [SPARK-18608][ML][WIP] Fix double-caching in ML a...

2017-02-21 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17014#discussion_r102262826 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -126,9 +129,10 @@ abstract class Predictor[ * and copying parameters

[GitHub] spark pull request #11601: [SPARK-13568] [ML] Create feature transformer to ...

2017-02-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/11601#discussion_r102141627 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #16020: [SPARK-18596][ML] add checking and caching to bisecting ...

2017-02-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16020 Close this as it's better resolved in https://issues.apache.org/jira/browse/SPARK-18608. Thanks for the comments and discussion. --- If your project is set up for it, you can reply

[GitHub] spark pull request #16020: [SPARK-18596][ML] add checking and caching to bis...

2017-02-20 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/16020 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17000 Hi @ZunwenYou Do you know what's the reason that treeAggregate failed when feature dimension reach 20 million? I think this potentially can help with the 2G disk shuffle spill limit

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-20 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 @jkbradley Sent an update to refine the transform code and address the comments. Regarding to the behavior changing concern, I think different partition strategy will only affect

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-19 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r101939121 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #16968: [SPARK-19337] [ML] [Doc] Documentation and examples for ...

2017-02-19 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16968 Thanks for the review. Updated to binary. Also add the reference to R example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #16968: [SPARK-19337] [ML] [Doc] Documentation and exampl...

2017-02-17 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16968#discussion_r101872717 --- Diff: docs/ml-classification-regression.md --- @@ -363,6 +363,44 @@ Refer to the [R API docs](api/R/spark.mlp.html) for more details

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-17 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15415 Thanks @jkbradley . I'm also working on improving the `transform` performance and add more unit tests. I'll address the comments in a combined update. --- If your project is set up for it, you can

[GitHub] spark issue #16968: [SPARK-19337] [ML] [Doc] Documentation and examples for ...

2017-02-17 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16968 Thanks for the comment @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16968: [SPARK-19337] [ML] [Doc] Documentation and exampl...

2017-02-17 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16968#discussion_r101840357 --- Diff: docs/ml-classification-regression.md --- @@ -363,6 +363,51 @@ Refer to the [R API docs](api/R/spark.mlp.html) for more details

[GitHub] spark pull request #16968: [SPARK-19337] [ML] [Doc] Documentation and exampl...

2017-02-17 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16968#discussion_r101840341 --- Diff: docs/ml-classification-regression.md --- @@ -363,6 +363,51 @@ Refer to the [R API docs](api/R/spark.mlp.html) for more details

[GitHub] spark issue #16968: [SPARK-19337] [ML] [Dcoc] Documentation and examples for...

2017-02-16 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16968 I see. I will drop the R example here, whichever PR goes in later can finish the document update. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16968: [SPARK-19337] [ML] [Dcoc] Documentation and examp...

2017-02-16 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/16968 [SPARK-19337] [ML] [Dcoc] Documentation and examples for LinearSVC ## What changes were proposed in this pull request? Documentation and examples (Java, scala, python, R) for LinearSVC

[GitHub] spark issue #16763: [SPARK-19422][ML] Cache input data in algorithms

2017-02-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16763 Hi @zhengruifeng https://issues.apache.org/jira/browse/SPARK-18608 There's some ongoing discussion about the issue. --- If your project is set up for it, you can reply to this email and have

<    1   2   3   4   5   6   7   8   9   10   >