[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/16037 Sorry for late review. Just come back to US. LGTM too! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 Thanks for keeping up with this merge request, I've learned a lot wrt the contribution process and good practice, and next contrib will hopefully be much more straightforward. Thanks to the Spark commiter team for this great piece of software :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16037 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16037 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70081/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #70081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70081/consoleFull)** for PR 16037 at commit [`18fcbba`](https://github.com/apache/spark/commit/18fcbba81168c741497c57ede2554ed0c8e48d2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #70081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70081/consoleFull)** for PR 16037 at commit [`18fcbba`](https://github.com/apache/spark/commit/18fcbba81168c741497c57ede2554ed0c8e48d2c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69722/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69722/consoleFull)** for PR 16037 at commit [`3b59ab2`](https://github.com/apache/spark/commit/3b59ab283797bb8448b2b4384caa0fb12ae9fece). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69722/consoleFull)** for PR 16037 at commit [`3b59ab2`](https://github.com/apache/spark/commit/3b59ab283797bb8448b2b4384caa0fb12ae9fece). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 @sethah Agreed, that's why I uggested to add a dedicated treeAggregate wrapper to MLlin Utils which would take care of that without fiddling with sparsity for each seqOP and comboOp. See closed #16078 for the idea... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16037 @MLnick Yeah, this is likely a problem with all the ML aggregators as well. We can probably take care of it using lazy evaluation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16037 Yes I'm pretty OK with merging this. If you can dig up any results, that's all the better. Will check in with you next week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 L-BFGS is the only optimizer I've used so far. I'm not sure how much time I can free to take care of the other ones, but I'll try :-) Regarding the bench, I'll check if we have archived the results, otherwise I'll relaunch it next week --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16037 I'm sure this will be net positive, and _shouldn't_ cause any regression. Still, we must be certain. @AnthonyTruchet can you provide for posterity the detailed test results for the vector sizes you mentioned? And perhaps also some results for some smaller sizes (since I imagine the benefit of this change for that scenario is quite small and we should just check there is no unexpected overhead or regression we've somehow missed from the `toDense` calls though I can't see that happening). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16037 By the way this same issue may also impact the `ml` optimizers that use L-BFGS. We should check the various gradient aggregators for `LogisticRegression`, `LinearRegression`, `MLP` etc. cc @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 @MLnick there it is all tests passing :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69470/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69470/consoleFull)** for PR 16037 at commit [`9b30c5c`](https://github.com/apache/spark/commit/9b30c5c8a01f7d3408eae9c27e2e3b52b216a0b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69469/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69469/consoleFull)** for PR 16037 at commit [`ec183a2`](https://github.com/apache/spark/commit/ec183a20166ec12c11b3c637b8722ef9cdb8bcc4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69470/consoleFull)** for PR 16037 at commit [`9b30c5c`](https://github.com/apache/spark/commit/9b30c5c8a01f7d3408eae9c27e2e3b52b216a0b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69469/consoleFull)** for PR 16037 at commit [`ec183a2`](https://github.com/apache/spark/commit/ec183a20166ec12c11b3c637b8722ef9cdb8bcc4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 This seems very related to the change. I know nothing wrt the way PySpark interfaces Python and Scala and I'm surprised that changing an internal f the Scala lib causes this ? Ready to learn and help though :-) ``` File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/classification.py", line 155, in __main__.LogisticRegressionModel Failed example: mcm = LogisticRegressionWithLBFGS.train(data, iterations=10, numClasses=3) Exception raised: Traceback (most recent call last): File "/usr/lib64/python2.6/doctest.py", line 1253, in __run compileflags, 1) in test.globs File "", line 1, in mcm = LogisticRegressionWithLBFGS.train(data, iterations=10, numClasses=3) [...] Py4JJavaError: An error occurred while calling o173.trainLogisticRegressionModelWithLBFGS. [...] Caused by: java.lang.IllegalArgumentException: axpy only supports adding to a dense vector but got type class org.apache.spark.mllib.linalg.SparseVector. at org.apache.spark.mllib.linalg.BLAS$.axpy(BLAS.scala:58) at org.apache.spark.mllib.optimization.LBFGS$CostFun$$anonfun$2.apply(LBFGS.scala:257) at org.apache.spark.mllib.optimization.LBFGS$CostFun$$anonfun$2.apply(LBFGS.scala:255) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69466/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69466/consoleFull)** for PR 16037 at commit [`0ce8c64`](https://github.com/apache/spark/commit/0ce8c644e3cbed3cf600fcb30526e46a8054e498). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69466/consoleFull)** for PR 16037 at commit [`0ce8c64`](https://github.com/apache/spark/commit/0ce8c644e3cbed3cf600fcb30526e46a8054e498). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 Scala style checks fixed, and spurious cast removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69463/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69463/consoleFull)** for PR 16037 at commit [`d7ebc7d`](https://github.com/apache/spark/commit/d7ebc7df63b89d7ba7cd7e3a688089749c393082). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16037 **[Test build #69463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69463/consoleFull)** for PR 16037 at commit [`d7ebc7d`](https://github.com/apache/spark/commit/d7ebc7df63b89d7ba7cd7e3a688089749c393082). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16037 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 @MLnick I have to add a transtyping denseGrad in the expression returned by seqOp from DenseVector to Vector. The only way I could find to do it with the API are: `Vectors.fromBreeze(denseGrad.asBreeze)` or `Vectors.dense(denseGrad.values)`: I'm a biut concerned with the potential recopy of a 128MB piece of memory... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 Hello @MLnick, Sorry that my push of a new version just crossed with your suggestion. I'll test you suggestion. Thanks to for having written it, I didn't get that Sean's hint was this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16037 What worries me more actually is that the initial vector when sent in the closure should be compressed. So why is this issue occurring? Is it a problem with serialization / compression? OR even after compression it is still too large? Would be good to understand that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16037 Right ok. So I think the approach of making the zero vector sparse then calling `toDense` in `seqOp` as @srowen suggested makes most sense. Currently the gradient vector *must* be dense in MLlib since both `axpy` and the logic for multinomial logreg requires it. So the thing that is initially serialized with the task should be tiny, and the call `toDense` for the first instance in each partition will essentially generate the dense zero vector. Thereafter it should be a no-op as the vector will be dense and `toDense` will just be a ref to the values array. Can we see if this works: ```scala val zeroVector = Vectors.sparse(n, Seq()) val (gradientSum, lossSum) = data.treeAggregate((zeroVector, 0.0))( seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) => val denseGrad = grad.toDense val l = localGradient.compute( features, label, bcW.value, denseGrad) (denseGrad, loss + l) }, combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => axpy(1.0, grad2, grad1) (grad1, loss1 + loss2) }) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16037 This should be the main PR @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16037 This is all a bit confusing - can we highlight which PR is actually to be reviewed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16037 Following https://github.com/apache/spark/pull/16038 I suggest this proceed by making the zero value a sparse vector, and then making it dense in the seqOp immediately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16037 OK, this is the fourth pull request though (not counting a not-quite-related 5th). You don't need to open a new PR to push more changes and it adds to the difficulty in reviewing. This still doesn't incorporate suggestions from the last review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16037 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org