[GitHub] spark pull request: [SPARK-1892][MLLIB] Adding OWL-QN optimizer fo...

2014-10-01 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/840#issuecomment-57439459 @debasish83 and @codedeft The weighted method for OWLQN in breeze is merged https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c I

[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58183559 We had a build against the spark master on Oct 2, and when ran our application with data around 600GB, we got the following exception. Does this PR fix this issue which

[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2693 [SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10 In Breeze 0.10, the L1regParam can be configured through anonymous function in OWLQN, and each component can be penalized differently

[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58214186 I thought it was a close issue, so I moved my comment to JIRA. I ran into this issue in spark-shell not the standalone application, does SPARK-3762 apply

[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58276308 @dlwh David, do you know if there is dependency change in breeze-0.10 and is it compatible with both scala 2.10 and 2.11? Thanks. --- If your project is set up

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-21 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1518 [SPARK-2505][MLlib] Weighted Regularizer for Generalized Linear Model (Note: This is not ready to be merged. Need documentation, and make sure it's backforwad compatible with Spark 1.0 apis

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-21 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-49682150 I think it fails due to the apache license is not in the test file. As you suggest, I'll move it to be generated in the runtime. Would like to know the general feedback

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-21 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49682436 `!~==` will be used in the test since `!(a~==b)` will not work due to that (a~==b) is not returning false but throwing exception for messaging. I will replace

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49954543 @srowen @mengxr and @dorx Based on our discussion, I've implemented two different APIs for relative error, and absolute error. It makes sense that test writers

[GitHub] spark pull request: [SPARK-2479 (partial)][MLLIB] fix binary metri...

2014-07-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1576#issuecomment-50057950 @mengxr Feel free to merge this one first. After you merge, I'll rebase #1425 against current master, and address the conflicts. --- If your project is set up

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50064963 @mengxr `%+-` is used as an operator to indicate this is relative error. Users can write `assert(a ~== b %+- 1E-10)` for relative error, and `assert(a ~== b +- 1E-10

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50081864 @mengxr I just rebased against master, and it passes the test. Depending on whether we want to use `absErr`/`relErr`, `+-`/`%+-` or both, I can do further modification

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15443103 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala --- @@ -40,27 +41,51 @@ class KMeansSuite extends FunSuite

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-50293096 @mengxr Resolved all the conflicts after rebasing, and all the unittests are passed. Thanks. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-30 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50663418 I tried to make the bias really big to make the intercept smaller to avoid being regularized. The result is still quite different from R, and very sensitive

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-50982699 @mengxr Is there any problem with asfgit? This is not finished yet, why asfgit said it's merged into apache:master. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733217 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733221 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733244 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15733248 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/VectorTransformer.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15738936 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15740021 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-08-03 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1207#discussion_r15740240 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/StandardScalerSuite.scala --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-08-04 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-51151346 It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll use this at Alpine implementation first. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [MLlib] Use this.type as return type in k-mean...

2014-08-05 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1796 [MLlib] Use this.type as return type in k-means' builder pattern to ensure that the return object is itself. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-2852][MLLIB] Separate model from IDF/St...

2014-08-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1814#discussion_r15908219 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -35,38 +35,47 @@ import org.apache.spark.rdd.RDD * @param

[GitHub] spark pull request: [SPARK-2852][MLLIB] Separate model from IDF/St...

2014-08-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1814#discussion_r15908318 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -35,38 +35,47 @@ import org.apache.spark.rdd.RDD * @param

[GitHub] spark pull request: [SPARK-2852][MLLIB] Separate model from IDF/St...

2014-08-06 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1814#discussion_r15908504 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala --- @@ -177,18 +115,72 @@ private object IDF { private def isEmpty: Boolean

[GitHub] spark pull request: [SPARK-2852][MLLIB] Separate model from IDF/St...

2014-08-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1814#issuecomment-51511617 LGTM. Merged into both master and branch-1.1. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-08 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1862 [SPARK-2934][MLlib] Adding LogisticRegressionWithLBFGS Interface for training with LBFGS Optimizer which will converge faster than SGD. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1862#discussion_r16022431 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -188,3 +188,98 @@ object LogisticRegressionWithSGD

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1862#discussion_r16023077 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

2014-08-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1862#discussion_r16023299 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD

[GitHub] spark pull request: [SPARK-2979][MLlib ]Improve the convergence ra...

2014-08-11 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1897 [SPARK-2979][MLlib ]Improve the convergence rate by minimize the condition number Scaling to minimize the condition number: During the optimization process, the convergence (rate) depends

[GitHub] spark pull request: [SPARK-2979][MLlib] Improve the convergence ra...

2014-08-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1897#discussion_r16153527 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala --- @@ -137,11 +154,45 @@ abstract class

[GitHub] spark pull request: Minor change in the comment of spark-defaults....

2014-10-08 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2709 Minor change in the comment of spark-defaults.conf.template spark-defaults.conf is used in spark-shell as well, and this PR added this into the comment. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58361701 Jenkins, please start the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3856][MLLIB] use norm operator after br...

2014-10-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2718#issuecomment-58435304 LGTM Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58629065 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732030 It's failing at FlumeStreamSuite.scala:109 which seems to be unrelated to this patch. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: Minor change in the comment of spark-defaults....

2014-10-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2709#issuecomment-59667207 @andrewor14 Sorry for late reply since I was on vacation in Europe last week. I can continue work on this after I finish my talk in IOTA conf tomorrow. --- If your

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-20 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59871504 Jenkins, please start the test! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-10-28 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-60813678 @BigCrunsh I'm working on this. Let's see if we can merge in Spark 1.2 --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4129][MLlib] Performance tuning in Mult...

2014-10-28 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2992 [SPARK-4129][MLlib] Performance tuning in MultivariateOnlineSummarizer In MultivariateOnlineSummarizer, breeze's activeIterator is used to loop through the nonZero elements in the vector. However

[GitHub] spark pull request: [SPARK-1870] Ported from 1.0 branch to 0.9 bra...

2014-06-09 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1013#issuecomment-45551414 Tested in PivotalHD 1.1 Yarn 4 node cluster. With --addjars file:///somePath/to/jar, launching spark application works. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-1870] Made deployment with --jars work ...

2014-06-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1013#discussion_r13573544 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -507,12 +508,19 @@ object Client { Apps.addToEnvironment(env

[GitHub] spark pull request: Make sure that empty string is filtered out wh...

2014-06-09 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1027 Make sure that empty string is filtered out when we get the secondary jars from conf You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/490#discussion_r13624385 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -95,15 +96,18 @@ trait ClientBase extends Logging

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/490#discussion_r13624580 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -95,15 +96,18 @@ trait ClientBase extends Logging

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/490#discussion_r13624615 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -95,15 +96,18 @@ trait ClientBase extends Logging

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-12 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/490#issuecomment-45835283 @mengxr Do you think it's in good shape now? This is the only issue blocking us using vanilla spark. Thanks. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13897737 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -38,10 +38,10 @@ import org.apache.spark.mllib.linalg.{Vectors, Vector

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13897825 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46393840 I think it's legacy reason to have two different way to access the API. As far as I know, @mengxr is working on consolidating the interface. He probably can talk about

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1104#discussion_r13905548 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite

[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1104#issuecomment-46412293 I think it will be a problem for MIMA to change the signature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-2272 [MLlib] Feature scaling which stand...

2014-06-24 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1207 SPARK-2272 [MLlib] Feature scaling which standardizes the range of independent variables or features of data Feature scaling is a method used to standardize the range of independent variables

[GitHub] spark pull request: SPARK-2281 [MLlib] Simplify the duplicate code...

2014-06-25 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1215 SPARK-2281 [MLlib] Simplify the duplicate code in Gradient.scala The Gradient.compute which returns new tuple of (gradient: Vector, loss: Double) can be constructed by in-place version

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-06-26 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1099#issuecomment-47250277 Seems that the jenkins is missing the python runtime. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [WIP][SPARK-2174][MLLIB] treeReduce and treeAg...

2014-07-01 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1110#issuecomment-47683286 We benchmarked treeReduce in our random forest implementation, and since the trees generated from each partition are fairly large (more than 100MB), we found

[GitHub] spark pull request: Upgrade junit_xml_listener to 0.5.1 which fixe...

2014-07-08 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1333 Upgrade junit_xml_listener to 0.5.1 which fixes the following issues 1) fix the class name to be fully qualified classpath 2) make sure the the reporting time is in second not in miliseond, which

[GitHub] spark pull request: Upgrade junit_xml_listener to 0.5.1 which fixe...

2014-07-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1333#issuecomment-48417558 done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-2281 [MLlib] Simplify the duplicate code...

2014-07-09 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/1215 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1969][MLlib] Public available online su...

2014-07-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/955#discussion_r14796461 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/OnlineSummarizer.scala --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1177] Allow SPARK_JAR to be set program...

2014-07-11 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/987#issuecomment-48762832 #560 is merged. Close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-1177] Allow SPARK_JAR to be set program...

2014-07-11 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/987 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-11 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1379 [SPARK-2309][MLlib] Generalize the binary logistic regression into multinomial logistic regression Currently, there is no multi-class classifier in mllib. Logistic regression can be extended

[GitHub] spark pull request: [SPARK-2477][MLlib] Using appendBias for addin...

2014-07-14 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1410 [SPARK-2477][MLlib] Using appendBias for adding intercept in GeneralizedLinearAlgorithm Instead of using prependOne currently in GeneralizedLinearAlgorithm, we would like to use appendBias for 1

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-15 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1425 [SPARK-2479][MLlib] Comparing floating-point numbers using relative error in UnitTests Floating point math is not exact, and most floating-point numbers end up being slightly imprecise due

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15013544 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala --- @@ -81,9 +82,8 @@ class LogisticRegressionSuite

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1425#discussion_r15013786 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -20,8 +20,20 @@ package

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49221370 @mengxr Scalatest 2.x has the tolerance feature, but it's absolute error not relative error. For large numbers, the absolute error may not be meaningful

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49222983 I learn `almostEquals` from boost library. Anyway, in this case, how do we distinguish the one with throwing out the message, and the one just returning true/false

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49253108 @mengxr and @srowen What do you think `assert((0.0001 !~== 0.0) +- 1E-5)`? We have `~==` and `~==` which will have the error message in the latest commit from my co

[GitHub] spark pull request: SPARK-1157 L-BFGS Optimizer based on Breeze L-...

2014-04-07 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/53 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-1157: L-BFGS Optimizer based on Breeze's...

2014-04-07 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/353 SPARK-1157: L-BFGS Optimizer based on Breeze's implementation. This PR uses Breeze's L-BFGS implement, and Breeze dependency has already been introduced by Xiangrui's sparse input format work

[GitHub] spark pull request: SPARK-1157: L-BFGS Optimizer based on Breeze's...

2014-04-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11404094 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-1157: L-BFGS Optimizer based on Breeze's...

2014-04-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-39895140 @mengxr As you suggested, I moved the costFun to private CostFun class. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11460767 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11461398 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11463764 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11464280 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11464736 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/353#discussion_r11605070 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/353 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-40434555 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
GitHub user dbtsai reopened a pull request: https://github.com/apache/spark/pull/353 [SPARK-1157][MLlib] L-BFGS Optimizer based on Breeze's implementation. This PR uses Breeze's L-BFGS implement, and Breeze dependency has already been introduced by Xiangrui's sparse input format

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-40434626 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/353#issuecomment-40434691 Timeout for lastest jenkins run. It seems that CI is not stable now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-15 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/353 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: MLlib doc update for breeze dependency

2014-04-22 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/481 MLlib doc update for breeze dependency MLlib is now using breeze linear algebra library instead of jblas; this PR will update the doc to help users to install the blas native libraries to have

[GitHub] spark pull request: MLlib doc update for breeze dependency

2014-04-22 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/481 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

2014-04-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/422#discussion_r11841916 --- Diff: docs/mllib-guide.md --- @@ -3,63 +3,120 @@ layout: global title: Machine Learning Library (MLlib) --- +MLlib is a Spark

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-04-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/490#discussion_r11883381 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -77,7 +78,8 @@ trait ClientBase extends Logging { ).foreach

[GitHub] spark pull request: [SPARK-1516]Throw exception in yarn client ins...

2014-04-22 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/490#issuecomment-41114289 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2979][MLlib] Improve the convergence ra...

2014-08-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1897#issuecomment-52149162 Seems that Jenkins is not stable. Failing on issues related to akka. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3078][MLLIB] Make LRWithLBFGS API consi...

2014-08-15 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1973#discussion_r16319946 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -69,8 +69,17 @@ class LBFGS(private var gradient: Gradient, private var

[GitHub] spark pull request: [SPARK-3078][MLLIB] Make LRWithLBFGS API consi...

2014-08-15 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1973#issuecomment-52381503 LGTM. Merged into both master and branch-1.1. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2841][MLlib] Documentation for feature ...

2014-08-20 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2068 [SPARK-2841][MLlib] Documentation for feature transformations Documentation for newly added feature transformations: 1. TF-IDF 2. StandardScaler 3. Normalizer You can merge this pull

[GitHub] spark pull request: [SPARK-2841][MLlib] Documentation for feature ...

2014-08-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/2068#discussion_r16561045 --- Diff: docs/mllib-feature-extraction.md --- @@ -70,4 +70,110 @@ for((synonym, cosineSimilarity) - synonyms) { /div /div -## TFIDF

  1   2   3   4   5   6   7   8   9   10   >