[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450836#comment-15450836 ] Vladimir Feinberg commented on SPARK-15575: --- Some of the biggest issues with Breeze perf I've experienced is that a lot of operations you'd expect it to be fast for are not; and it's pretty syntax and heavy use of implicits makes it easy to accidentally use this. For instance: 1. Mixed dense/sparse operations frequently resort to a generic implementation in breeze that uses its Scala iterators. 2. Creation of vectors, under certain operations, will result in unnecessary boxing of doubles (and integers, for sparse vectors). 3. Slice vectors have no support for efficient operations. They are implemented in breeze in a way that makes them no better than Array[Double], which again makes us use Scala iterators whenever we want a fast, vectorized dot product, for instance. Usability is tough sometimes. Even though a Vector[Double] interface seems flexible, a lot of implementations require an explicit knowledge of the vector type (Sparse/dense), else breeze silently uses the slow Scala interface. Heavy use of implicits is also a problem here, since they're not implemented for all permutations of vector types. It's also easy to do, e.g. val `vec1 += vec2 * a * b`. This will create two intermediate vectors. I think the biggest issue is that `ml.linalg.Vector` is Breeze-backed. We should use our own linear algebra (we do have `BLAS`, though to support slicing this interface would have to be expanded) and move around `ArrayView[Double]` inside the vector instead. Breeze as a dependency, as mentioned below, is very useful still for optimization. I think we can keep it around for that, as long as it's only for that. > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410923#comment-15410923 ] xubo245 commented on SPARK-15575: - If remove breeze dependency,we need rewrite similar project? I think we can build mllib linalg project ,and which dependencies breeze or other library. Also we can update the project if breeze can not support scala2.12. > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360099#comment-15360099 ] Yanbo Liang commented on SPARK-15575: - Yes, {{LinearRegression}}, {{LogisticRegression}}, {{AFTSurvivalRegression}} and {{MultilayerPerceptronClassifier}} depend on L-BFGS/OWL-QN currently. We have implemented Spark owned L-BFGS/SGD optimizers in mllib package. And the four estimators mentioned above use different implementation. || Estimators || Optimizer implementation || | LinearRegression | breeze L-BFGS/OWL-QN | | LogisticRegression | breeze L-BFGS/OWL-QN | | AFTSurvivalRegression | breeze L-BFGS/OWL-QN | | MultilayerPerceptronClassifier | mllib L-BFGS/SGD | The L-BFGS implementation in mllib also calling breeze L-BFGS underneath. We should figure out a way to make the transformation smoothly. Since I'm also investigating the scalable version of L-BFGS (SPARK-10078) recently, I can start to write a draft to track the features and requirements of optimizers that Spark needed. Then we can discuss and design how to move to the new implementation. Thanks! > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358831#comment-15358831 ] Nick Pentreath commented on SPARK-15575: Also related to the discussion on SPARK-15581 and here, replacing Breeze is not just about replacing the core linalg operators. It also impacts L-BFGS and OWL-QN optimizers, which in turn impacts linear/logistic regression and MLP. So we would need to have a plan to replace that too. cc [~avulanov] [~rxin] [~Xiangrui] [~josephkb] [~srowen] [~freiss] [~mboehm7] > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358820#comment-15358820 ] Nick Pentreath commented on SPARK-15575: So it seems from the various discussion on SPARK-15581, SPARK-13944, and due to the issue in SPARK-14438 that there is enough interest and immediacy around this that it may make sense to review linear algebra within Spark and JVM in general, to refresh the current state with respect to features, performance, API / Java compat, and license. I've created a (very initial and WIP) [Google doc|https://docs.google.com/document/d/1UWhuP-YQXW6D-IZs2Dy9CrdxraUDMFG5DdX3eo6cuEs/edit?usp=sharing] to begin discussion and collating the details. Please feel free to discuss further here or on that doc. I'm happy to move ahead with the evaluation and performance testing (also happy to split up the work if desired). > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304546#comment-15304546 ] Nick Pentreath commented on SPARK-15575: What specifically are the "performance issues" with Breeze as it stands currently? > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304135#comment-15304135 ] koert kuipers commented on SPARK-15575: --- we can help out porting breeze to scala 2.12? > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304017#comment-15304017 ] Sean Owen commented on SPARK-15575: --- Hm, is Breeze really not supporting 2.12? It seems like a step backwards to write yet another linear algebra library, and go through a year of debugging it. I don't think supporting arbitrary libs helps anything. > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303465#comment-15303465 ] Yanbo Liang commented on SPARK-15575: - I'm interested in this and would like to take a look. > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org