[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147327448 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -998,6 +1047,198 @@ class LinearRegressionSuite

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147323528 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -69,25 +70,103 @@ private[regression] trait

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147316970 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -998,6 +1047,198 @@ class LinearRegressionSuite

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147327208 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -480,10 +638,14 @@ object LinearRegression extends

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147322978 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147322642 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147319678 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r147321479 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r147226715 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19565 I'm curious about the performance comparison, if "filter before sample" triggers a filter over the whole dataset for each `submitMiniBatch`, then there'll be some performance imp

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-10-26 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r147207042 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-10-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r147020853 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

2017-10-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r147021004 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19565 I wonder if we should add cache() for lda training data, even not for this feature. @srowen Not sure where we're on caching the training data or not for different algorithms. Appre

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19439 @thunterdb @WeichenXu123 Let's keep only Array[Byte] for now. @WeichenXu123 for the origin column. Surely it maybe handy in some scenarios, but I'm most concerned about the objec

[GitHub] spark issue #10466: [SPARK-12375] [ML] add handleinvalid for vectorindexer

2017-10-24 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/10466 Feel free to work on it. I can help review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146169405 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146168734 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146168660 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19439 @thunterdb Thanks for the reply. > It does, indirectly: this is what the field types CV_32FXX do. You need to do some low-level casting to convert the byte array to array of numbers,

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Thanks @WeichenXu123 for the comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146166006 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146165706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

2017-10-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r146165449 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-10-18 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r145331849 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-10-18 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r145333064 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -122,17 +124,33 @@ private[ml] object Param { /** Decodes a param

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-10-18 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r145330685 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -122,17 +124,33 @@ private[ml] object Param { /** Decodes a param

[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-10-18 Thread hhbyyh
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/19525 [SPARK-22289] [ML] Add JSON support for Matrix parameters (LR with coefficients bound) ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-10-16 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Please let me know if there's any unresolved comments. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apach

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143818776 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -224,6 +224,24 @@ private[clustering] trait LDAParams extends Params with

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143112965 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143081342 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143080051 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143080481 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143080675 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -503,21 +533,22 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143077626 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143068229 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143067455 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143056727 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143055573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143058244 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-05 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r143057944 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,31 +463,60 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r142854372 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -322,6 +326,13 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r142853109 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -224,6 +224,20 @@ private[clustering] trait LDAParams extends Params with

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r142853643 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -224,6 +224,20 @@ private[clustering] trait LDAParams extends Params with

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142833499 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142833374 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18924 Yes, I think local test is enough for both correctness and performance. For consistency with old LDA, just some manual local test would be sufficient. You may well just use the LDA example

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142831316 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142571627 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142572013 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142574222 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142574453 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142571342 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142571728 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142571603 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should n...

2017-10-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18924#discussion_r142571685 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -462,36 +462,55 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-14 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/19208 It's OK to me to include the "dump model to disk" https://github.com/apache/spark/pull/18313 in this or other PR (or not). After reading the discussion, I feel it's an o

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-14 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18313 That's all right. Please just proceed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional com

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-12 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Tested with several larger data set with Hinge Loss function, to compare l-bfgs and owlqn solvers. Run until converged or exceed maxIter (2000). dataset | numRecords | numFeatures | l

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-09-11 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 Update: To support pipeline estimator, change the tuning summary column name to include full param reference: ![image](https://user-images.githubusercontent.com/7981698/30287417

[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-09-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16158#discussion_r138133273 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -85,6 +86,32 @@ private[ml] trait ValidatorParams extends HasSeed

[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-09-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16158#discussion_r138133238 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -85,6 +86,32 @@ private[ml] trait ValidatorParams extends HasSeed

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-31 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135430463 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -32,10 +34,7 @@ import org.apache.spark.ml.param._ import

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-31 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135430545 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -180,6 +179,29 @@ private[clustering] trait LDAParams extends Params with

[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-08-31 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18610 Thanks for the reply. Since there's already an agreement, I will hold my suggestion on initialModel data type. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-30 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Sure, I can find some larger dataset to test with. But I guess, as showed in the PR description, LBFGS will generally outperform OWLQS, but not in all the cases. I assume single large scale

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-08-28 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17862 Given the discussion above, I plan to replace OWLQN with LBFGS. I will send update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-08-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r135418170 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -226,6 +246,12 @@ class LinearRegression @Since("

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-08-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r135418289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -72,6 +72,22 @@ private[regression] trait LinearRegressionParams

[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-08-27 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18610 Just to confirm, so we have agreed that the initialModel should be of type [T <: Model[T]] rather than a String type (path to the saved model)? Sorry I didn't find the related discussion.

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135382461 --- Diff: docs/mllib-clustering.md --- @@ -243,6 +243,9 @@ configuration), this parameter specifies the frequency with which checkpoints will be

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135382496 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LDAIncrementalExample.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135382509 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LDAIncrementalExample.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135382471 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LDAIncrementalExample.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-25 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17461#discussion_r135382491 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/LDAIncrementalExample.scala --- @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-25 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17461 Got it. Will make a pass today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18315: [SPARK-21108] [ML] convert LinearSVC to aggregator frame...

2017-08-23 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18315 Thanks for the comment @sethah and @yanboliang . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #18315: [SPARK-21108] [ML] convert LinearSVC to aggregato...

2017-08-23 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18315#discussion_r134858430 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/HingeAggregatorSuite.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18315: [SPARK-21108] [ML] convert LinearSVC to aggregato...

2017-08-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18315#discussion_r134104799 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -219,8 +219,17 @@ class LinearSVC @Since("2.2.0") (

[GitHub] spark pull request #18315: [SPARK-21108] [ML] convert LinearSVC to aggregato...

2017-08-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18315#discussion_r134104829 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HingeAggregator.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18315: [SPARK-21108] [ML] convert LinearSVC to aggregato...

2017-08-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18315#discussion_r134104881 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/HingeAggregatorSuite.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18315: [SPARK-21108] [ML] convert LinearSVC to aggregato...

2017-08-20 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18315#discussion_r134104883 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/HingeAggregatorSuite.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-16 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18902 Thanks for the quick update. The implementation may be improved on some details. But first I'd want to confirm the "convert to null" method does not have any defect. @MLnick @sro

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-16 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133571846 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18902 Eh, I meant that it may be possible to get the mean values purely using DataFrame API. (convert missingValue/NaN to null) in one pass, so we may need to check the performance comparison. But I guess

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-10 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132605415 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18902 Hi @zhengruifeng Thanks for the idea and implementation. Definitely something worth exploring. As I understand, the new implementation improves the locality yet it leverages RDD API

[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-08-09 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 Move the tuningSummary to Models, and updated the name of the metrics column. ![image](https://user-images.githubusercontent.com/7981698/29146612-e3a7ac78-7d16-11e7-9a4d-9ece0935bd70.png

[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-08-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r132058489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2017-08-06 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17583 A gentle ping since I think this is quite helpful. @jkbradley @MLnick @yanboliang @srowen @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18733: [SPARK-21535][ML]Reduce memory requirement for Cr...

2017-08-06 Thread hhbyyh
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/18733 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-04 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16158#discussion_r131454343 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -133,7 +134,10 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16158#discussion_r131270741 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -133,7 +134,10 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #18733: [SPARK-21535][ML]Reduce memory requirement for Cr...

2017-08-03 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18733#discussion_r131268294 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -112,16 +112,16 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-08-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16774 I'm confused by your suggestions here and in #18733. I don't think it's appropriate to just "include" a similar work originated from another PR, and sugg

[GitHub] spark issue #18733: [SPARK-21535][ML]Reduce memory requirement for CrossVali...

2017-08-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18733 Features should be merged when they are reasonable and ready, but not waiting on uncertain changes especially when there's no conflicts. Spark is already way too slow. --- If your project i

[GitHub] spark issue #18733: [SPARK-21535][ML]Reduce memory requirement for CrossVali...

2017-08-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18733 Nothing of this change depends on #16774. The basic idea is that we should release the driver memory as soon as a trained model is evaluated. I don't see there's any conflict.

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-08-01 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18313 @jkbradley Thanks for the suggestion. After the discussion, I found that actually we can reduce the memory requirement for the tuning process. Check https://issues.apache.org/jira/browse/SPARK-21535

[GitHub] spark issue #18315: [SPARK-21108] [ML] [WIP] convert LinearSVC to aggregator...

2017-07-29 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18315 Thanks for the review. Updated to address the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

<    1   2   3   4   5   6   7   8   9   10   >