[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...

2017-03-06 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 Ok, thanks. Do you sen any missing task for it to be merged then ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...

2017-03-06 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 For the "normal failure case", the one we encounter is that when the vocabulary is too small the embedding learning raises. And it is pretty legitimate in our use cases to have

[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...

2017-03-03 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/17076#discussion_r104115651 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -440,19 +440,9 @@ private class LinearSVCAggregator

[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...

2017-03-02 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 The blocking flag is not the issue. The issue is that there was not try / finally and the destruction was only done if the fitting succeded and skipped otherwise. In this PR I

[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...

2017-03-02 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 Actually we *did* observe memory leak (severe enough to lead to application failure) due to this when doing thousands of Word2Vec learning in one Spark application, a few of them failing

[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...

2017-03-02 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 I'm fixing this (issue compiling Spark locally delay me). The whole point is that they are *not* destroyed within the method in case of exception. --- If your project is set up for it, you

[GitHub] spark pull request #14299: [SPARK-16440][MLlib] Ensure broadcasted variables...

2017-03-01 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/14299#discussion_r103657660 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -314,6 +315,21 @@ class Word2Vec extends Serializable

[GitHub] spark pull request #14299: [SPARK-16440][MLlib] Ensure broadcasted variables...

2017-03-01 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/14299#discussion_r103656554 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -314,6 +315,21 @@ class Word2Vec extends Serializable

[GitHub] spark issue #16805: [SPARK-19353][CORE] Generalize PipedRDD to use I/O forma...

2017-02-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16805 @thunterdb @srowen could you please point us to the right reviewers / commiters regarding this proposal of improvement for PipedRDD, supporting binary streams ? --- If your project is set

[GitHub] spark pull request #14299: [SPARK-16440][MLlib] Ensure broadcasted variables...

2017-02-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/14299#discussion_r103421239 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -314,6 +315,20 @@ class Word2Vec extends Serializable

[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...

2017-02-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 @vanzin The ticket was filled a long time ago, I updated the PR to make it clearer. Is any manual linking in some other forge needed ? @thunterdb I have updated the review following

[GitHub] spark issue #14299: Ensure broadcasted variables are destroyed even in case ...

2017-02-27 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 @thunterdb Copy that, working on it and sorry for the acknowledge delay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...

2017-02-27 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/17076#discussion_r103254026 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -440,19 +440,9 @@ private class LinearSVCAggregator

[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...

2017-02-27 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/17076#discussion_r103249915 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -440,19 +440,9 @@ private class LinearSVCAggregator

[GitHub] spark issue #16279: [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sendi...

2017-01-10 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16279 We have backported it to our internal version of Spark anyhow. As told above feel free to close it if you consider this is not worth officially backporting :-) --- If your project is set

[GitHub] spark issue #16279: [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sendi...

2016-12-15 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16279 Ok, thanks for the pointer. I do agree this is a non critical judgement call . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16279: [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sendi...

2016-12-14 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16279 May I ask why ? There was no conflicts so no additional qualification work is required and this looks like a performance bug fix to me, not a new feature. In order to adjust our contribution

[GitHub] spark issue #16279: [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sendi...

2016-12-14 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16279 @srowen sorry to bother you but this is just the backport (without any conflict) of #16037. Could you please validate it ? --- If your project is set up for it, you can reply

[GitHub] spark pull request #16279: [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoi...

2016-12-14 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/16279 [SPARK-18471][MLLIB][BACKPORT-2.0] In LBFGS, avoid sending huge vectors of 0 # What changes were proposed in this pull request? # CostFun used to send a dense vector of zeroes

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-14 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 Thanks for keeping up with this merge request, I've learned a lot wrt the contribution process and good practice, and next contrib will hopefully be much more straightforward. Thanks

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-12-13 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/16037#discussion_r92150485 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,24 @@ object LBFGS extends Logging

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-12-12 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/16037#discussion_r91949738 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,24 @@ object LBFGS extends Logging

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-12-08 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/16037#discussion_r91493798 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,24 @@ object LBFGS extends Logging

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-06 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 @sethah Agreed, that's why I uggested to add a dedicated treeAggregate wrapper to MLlin Utils which would take care of that without fiddling with sparsity for each seqOP and comboOp. See

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-12-06 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/16037#discussion_r91036283 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,27 @@ object LBFGS extends Logging

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-02 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 L-BFGS is the only optimizer I've used so far. I'm not sure how much time I can free to take care of the other ones, but I'll try :-) Regarding the bench, I'll check if we have

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-01 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 @MLnick there it is all tests passing :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-12-01 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/16037#discussion_r90422375 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,27 @@ object LBFGS extends Logging

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-01 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 This seems very related to the change. I know nothing wrt the way PySpark interfaces Python and Scala and I'm surprised that changing an internal f the Scala lib causes this ? Ready to learn

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-01 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 Scala style checks fixed, and spurious cast removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 @MLnick I have to add a transtyping denseGrad in the expression returned by seqOp from DenseVector to Vector. The only way I could find to do it with the API are: `Vectors.fromBreeze

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16037 Hello @MLnick, Sorry that my push of a new version just crossed with your suggestion. I'll test you suggestion. Thanks to for having written it, I didn't get that Sean's

[GitHub] spark pull request #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero sen...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at: https://github.com/apache/spark/pull/16078 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16078 In order to reduce the mess around multiple PRs I'll close this one and rebase the change in #16037 as you requested. What is the right way,convenient for you, to share a proposal

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 Agreed this is too ML specific to be deserve a fix in core. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16038: [SPARK-18471][CORE] New treeAggregate overload fo...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at: https://github.com/apache/spark/pull/16038 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16078 It is necessary to address it in L-BFGS at least. We propose a solution in core which can be legitimately rejected as not relevant for core. And two solutions in MLlib, one provide

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 @mridulm we don't know how to monitor the size of the serialize task. Sure it would not 10MB due to all those zeros. But we nonetheless measure a significant increase in performance

[GitHub] spark pull request #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero sen...

2016-11-30 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/16078 [SPARK-18471][MLLIB] Fix huge vectors of zero send in closure in L-BFGS ## What changes were proposed in this pull request? Introduced util TreeAggregatorWithZeroGenerator and used

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 Actually aggregating (and thus sending on the network) on quite dense SparseVectors with 10s of million elements is not to taken lightly. This would required serious benchmarking. What I

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 THe big doubt I have is on *Of course, the first call to axpy should produce a dense vector, but, that's already on the executor*: the other operand is Sparse and has to be sparse

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 Yes sure the zero Vector is very sparse. Bu ti do not get your suggestion ? I see no way to pass a Sparse Vector as zero and get the type of vector to change underway to Dense Vector

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 No this does not do the trick as the result of the aggregation IS dense. And the zero in (tree)aggregate has the same type as the result. Said otherwise, in L-BFGS, we do aggregate vectors

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 Maybe we could have a live talk/chat this evening Paris time / Morning West Coast time? If I'm not mistaken both #16037 and this change fix the issue at two different levels

[GitHub] spark pull request #16042: Fix of dev scripts and new, Criteo specific, ones...

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at: https://github.com/apache/spark/pull/16042 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16042: Fix of dev scripts and new, Criteo specific, ones WIP

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16042 Sorry created by mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16042: Fix of dev scripts and new, Criteo specific, ones...

2016-11-28 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/16042 Fix of dev scripts and new, Criteo specific, ones WIP You can merge this pull request into a Git repository by running: $ git pull https://github.com/AnthonyTruchet/spark dev-tools

[GitHub] spark issue #16040: [SPARK-18612][MLLIB] Delete broadcasted variable in LBFG...

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16040 Agreed, and we might also consider making the broadcasted variable compatible with some Automatic Resource Management to avoid such leak (like https://github.com/jsuereth/scala-arm

[GitHub] spark pull request #16040: [SPARK-18612][MLLIB] Delete broadcasted variable ...

2016-11-28 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/16040 [SPARK-18612][MLLIB] Delete broadcasted variable in LBFGS CostFun ## What changes were proposed in this pull request? Fix a broadcasted variable leak occurring at each invocation

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 It is related as if this PR get accepted, then the fix for PR to #16037 becomes trivial: replace treeAggregate with a call to `treeAggregateWithZeroGenerator( () => (Vectors.zeros(n),

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 @srowen here is the companion PR to #16037. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16038: [SPARK-18471][CORE] New treeAggregate overload fo...

2016-11-28 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/16038 [SPARK-18471][CORE] New treeAggregate overload for big large aggregators ## What changes were proposed in this pull request? The zero for the aggregation used to be shipped

[GitHub] spark issue #15963: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15963 Once more (last time hopefully) I mistakenly fiddled with PR. Closing this one and replace it with #16037. Code style review above taken into account in new PR. --- If your project

[GitHub] spark pull request #15963: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-28 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at: https://github.com/apache/spark/pull/15963 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-28 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/16037 [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 ## What changes were proposed in this pull request? CostFun used to send a dense vector of zeroes as a closure

[GitHub] spark issue #15963: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-27 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15963 Hello @srowen , sorry not to have updated this lately, I've been taken by other emergencies at work. I'll update this on Monday. Actually I'll submit a variant of treeAggregate in core

[GitHub] spark issue #15963: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-21 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15963 @srowen Here is at last the real PR for SPARK-18471. Sorry for the noise due to GitHub fiddling... --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #15963: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-21 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/15963 [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 ## What changes were proposed in this pull request? CostFun used to send a dense vector of zeroes as a closure

[GitHub] spark issue #15931: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-18 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15931 Sure, sorry I mistakenly pushed a badly rebased version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15931: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-18 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at: https://github.com/apache/spark/pull/15931 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15931: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-18 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/15931 [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 ## What changes were proposed in this pull request? The zero for the aggregation used to be shipped into a closure

[GitHub] spark pull request #15905: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-17 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at: https://github.com/apache/spark/pull/15905 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15905: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-17 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15905 I missed part of my company guidelines. Closing this PR and creating a new one shortly from my company account. Sorry for the noise. --- If your project is set up for it, you can reply

[GitHub] spark pull request #15905: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-17 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/15905#discussion_r88409682 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala --- @@ -241,16 +241,28 @@ object LBFGS extends Logging

[GitHub] spark issue #15905: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-17 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15905 By he way do you think that this should be addressed in core or just in each ML specific use ? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15905: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge...

2016-11-16 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/15905 [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vectors of 0 ## What changes were proposed in this pull request? CostFun used to send a dense vector of zeroes as a closure

[GitHub] spark issue #15023: Backport [SPARK-5847] Allow for configuring MetricsSyste...

2016-09-13 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15023 This is perfectly understandable, Just in the same way that we err on the side of caution by not switching to 2.0 branch for prod just now :-) --- If your project is set up for it, you can

[GitHub] spark issue #15023: Backport [SPARK-5847] Allow for configuring MetricsSyste...

2016-09-12 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15023 I'm aware that features are not generally back-ported. The point is, for us this is a bug, preventing a deployment in production. We thus back-ported the fix internally and now propose

[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...

2016-09-09 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14270 Hello @markgrover @vanzin ! As this is just a backport of your work, would you please consider reviewing it ? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #15023: Backport [SPARK-5847] Allow for configuring Metri...

2016-09-09 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/15023 Backport [SPARK-5847] Allow for configuring MetricsSystem's prefix This is a backport of #14270. Because the spark.internal.config system does not exists in branch 1.6, a simpler

[GitHub] spark issue #14299: Ensure broadcasted variables are destroyed even in case ...

2016-07-31 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 @pzz2011 Nothing _that_ bad if your application is just learning once and saving / using the embeddings. But if you run hundreds or thousands of learning, you'll just crash because

[GitHub] spark pull request #14299: Ensure broadcasted variables are destroyed even i...

2016-07-21 Thread AnthonyTruchet
Github user AnthonyTruchet commented on a diff in the pull request: https://github.com/apache/spark/pull/14299#discussion_r71723751 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -313,133 +313,139 @@ class Word2Vec extends Serializable

[GitHub] spark pull request #14299: Ensure broadcasted variables are destroyed even i...

2016-07-21 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/14299 Ensure broadcasted variables are destroyed even in case of exception ## What changes were proposed in this pull request? Ensure broadcasted variable are destroyed even in case

[GitHub] spark pull request #14268: [SPARK-16440][MLlib] Destroy broadcasted variable...

2016-07-19 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request: https://github.com/apache/spark/pull/14268 [SPARK-16440][MLlib] Destroy broadcasted variables even on driver ## What changes were proposed in this pull request? Forgotten broadcasted variables were persisted into a previous #PR