[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-08-30 Thread Vladimir Feinberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450836#comment-15450836
 ] 

Vladimir Feinberg commented on SPARK-15575:
---

Some of the biggest issues with Breeze perf I've experienced is that a lot of 
operations you'd expect it to be fast for are not; and it's pretty syntax and 
heavy use of implicits makes it easy to accidentally use this.

For instance:
1. Mixed dense/sparse operations frequently resort to a generic implementation 
in breeze that uses its Scala iterators.
2. Creation of vectors, under certain operations, will result in unnecessary 
boxing of doubles (and integers, for sparse vectors).
3. Slice vectors have no support for efficient operations. They are implemented 
in breeze in a way that makes them no better than Array[Double], which again 
makes us use Scala iterators whenever we want a fast, vectorized dot product, 
for instance.

Usability is tough sometimes. Even though a Vector[Double] interface seems 
flexible, a lot of implementations require an explicit knowledge of the vector 
type (Sparse/dense), else breeze silently uses the slow Scala interface. Heavy 
use of implicits is also a problem here, since they're not implemented for all 
permutations of vector types.

It's also easy to do, e.g. val `vec1 += vec2 * a * b`. This will create two 
intermediate vectors.

I think the biggest issue is that `ml.linalg.Vector` is Breeze-backed. We 
should use our own linear algebra (we do have `BLAS`, though to support slicing 
this interface would have to be expanded) and move around `ArrayView[Double]` 
inside the vector instead.

Breeze as a dependency, as mentioned below, is very useful still for 
optimization. I think we can keep it around for that, as long as it's only for 
that.

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-08-07 Thread xubo245 (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410923#comment-15410923
 ] 

xubo245 commented on SPARK-15575:
-

If remove breeze dependency,we need rewrite similar project?
I think we can build mllib linalg project ,and which dependencies breeze or 
other library. Also we can update the project if breeze can not support 
scala2.12.


> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-07-02 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360099#comment-15360099
 ] 

Yanbo Liang commented on SPARK-15575:
-

Yes, {{LinearRegression}}, {{LogisticRegression}}, {{AFTSurvivalRegression}} 
and {{MultilayerPerceptronClassifier}} depend on L-BFGS/OWL-QN currently. We 
have implemented Spark owned L-BFGS/SGD optimizers in mllib package. And the 
four estimators mentioned above use different implementation.
|| Estimators || Optimizer implementation ||
| LinearRegression | breeze L-BFGS/OWL-QN |
| LogisticRegression | breeze L-BFGS/OWL-QN |
| AFTSurvivalRegression | breeze L-BFGS/OWL-QN |
| MultilayerPerceptronClassifier | mllib L-BFGS/SGD |

The L-BFGS implementation in mllib also calling breeze L-BFGS underneath. We 
should figure out a way to make the transformation smoothly. Since I'm also 
investigating the scalable version of L-BFGS (SPARK-10078) recently, I can 
start to write a draft to track the features and requirements of optimizers 
that Spark needed. Then we can discuss and design how to move to the new 
implementation. Thanks!  


> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-07-01 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358831#comment-15358831
 ] 

Nick Pentreath commented on SPARK-15575:


Also related to the discussion on SPARK-15581 and here, replacing Breeze is not 
just about replacing the core linalg operators. It also impacts L-BFGS and 
OWL-QN optimizers, which in turn impacts linear/logistic regression and MLP. So 
we would need to have a plan to replace that too.

cc [~avulanov] [~rxin] [~Xiangrui] [~josephkb] [~srowen] [~freiss] [~mboehm7] 

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-07-01 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358820#comment-15358820
 ] 

Nick Pentreath commented on SPARK-15575:


So it seems from the various discussion on SPARK-15581, SPARK-13944, and due to 
the issue in SPARK-14438 that there is enough interest and immediacy around 
this that it may make sense to review linear algebra within Spark and JVM in 
general, to refresh the current state with respect to features, performance, 
API / Java compat, and license.

I've created a (very initial and WIP) [Google 
doc|https://docs.google.com/document/d/1UWhuP-YQXW6D-IZs2Dy9CrdxraUDMFG5DdX3eo6cuEs/edit?usp=sharing]
 to begin discussion and collating the details. Please feel free to discuss 
further here or on that doc.

I'm happy to move ahead with the evaluation and performance testing (also happy 
to split up the work if desired). 

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-05-27 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304546#comment-15304546
 ] 

Nick Pentreath commented on SPARK-15575:


What specifically are the "performance issues" with Breeze as it stands 
currently?

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-05-27 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304135#comment-15304135
 ] 

koert kuipers commented on SPARK-15575:
---

we can help out porting breeze to scala 2.12?

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-05-27 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304017#comment-15304017
 ] 

Sean Owen commented on SPARK-15575:
---

Hm, is Breeze really not supporting 2.12? 
It seems like a step backwards to write yet another linear algebra library, and 
go through a year of debugging it.
I don't think supporting arbitrary libs helps anything.

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-05-26 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303465#comment-15303465
 ] 

Yanbo Liang commented on SPARK-15575:
-

I'm interested in this and would like to take a look.

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org