[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2017-04-15 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/15398
  
Ping @jodersky, @rxin, @mengxr, @yhuai, @hvanhovell we also run into this 
bug. Will be great we have it by 2.2 release. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...

2017-04-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17606#discussion_r111216578
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -571,6 +571,7 @@ object TypeCoercion {
 NaNvl(l, Cast(r, DoubleType))
   case NaNvl(l, r) if l.dataType == FloatType && r.dataType == 
DoubleType =>
 NaNvl(Cast(l, DoubleType), r)
+  case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, 
l.dataType))
--- End diff --

Since NaNvl evaluates to `right` when `left` is NaN, I think `right` should 
always cast to `left`. I wonder what is the behavior of other engines? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17600: [MINOR][SQL] Fix the @since tag when backporting ...

2017-04-12 Thread dbtsai
Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/17600


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17618: [SPARK-20291][SQL][BACKPORT] NaNvl(FloatType, Nul...

2017-04-12 Thread dbtsai
Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/17618


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17618: [SPARK-20291][SQL][BACKPORT] NaNvl(FloatType, NullType) ...

2017-04-12 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17618
  
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17618: [SPARK-20291][SQL][BACKPORT] NaNvl(FloatType, NullType) ...

2017-04-12 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17618
  
@viirya Added. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17618: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...

2017-04-12 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/17618

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to N…

…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

`NaNvl(float value, null)` will be converted into `NaNvl(float value, 
Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), 
Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17618.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17618


commit d3aa8bddac7ac738cb3de2cee0eadf1a9740eaa6
Author: DB Tsai <dbt...@dbtsai.com>
Date:   2017-04-11T00:29:33Z

fix since tag version

commit f060172004a9336d33f091b0d4e8fe795c3cafad
Author: DB Tsai <d...@netflix.com>
Date:   2017-04-12T06:36:24Z

backport




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17601: [MINOR][SQL] Fix the @since tag when backporting ...

2017-04-12 Thread dbtsai
Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/17601


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...

2017-04-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17606#discussion_r110991443
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -656,14 +656,20 @@ class TypeCoercionSuite extends PlanTest {
 
   test("nanvl casts") {
 ruleTest(TypeCoercion.FunctionArgumentConversion,
-  NaNvl(Literal.create(1.0, FloatType), Literal.create(1.0, 
DoubleType)),
-  NaNvl(Cast(Literal.create(1.0, FloatType), DoubleType), 
Literal.create(1.0, DoubleType)))
+  NaNvl(Literal.create(1.0f, FloatType), Literal.create(1.0, 
DoubleType)),
+  NaNvl(Cast(Literal.create(1.0f, FloatType), DoubleType), 
Literal.create(1.0, DoubleType)))
 ruleTest(TypeCoercion.FunctionArgumentConversion,
-  NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, 
FloatType)),
-  NaNvl(Literal.create(1.0, DoubleType), Cast(Literal.create(1.0, 
FloatType), DoubleType)))
+  NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0f, 
FloatType)),
+  NaNvl(Literal.create(1.0, DoubleType), Cast(Literal.create(1.0f, 
FloatType), DoubleType)))
 ruleTest(TypeCoercion.FunctionArgumentConversion,
   NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, 
DoubleType)),
   NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, 
DoubleType)))
+ruleTest(TypeCoercion.FunctionArgumentConversion,
+  NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, 
NullType)),
+  NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, 
FloatType)))
--- End diff --

Thanks. The test is fixed. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not...

2017-04-11 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17606
  
+cc @cloud-fan @gatorsmile @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...

2017-04-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17606#discussion_r110979361
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -571,6 +571,7 @@ object TypeCoercion {
 NaNvl(l, Cast(r, DoubleType))
   case NaNvl(l, r) if l.dataType == FloatType && r.dataType == 
DoubleType =>
 NaNvl(Cast(l, DoubleType), r)
+  case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, 
l.dataType))
--- End diff --

Yeah, this PR prevents casting from `NaNvl(FloatType, NullType)` to 
`NaNvl(DoubleType, DoubleType)` since we want to minimize the casting as much 
as possible. Also, if we want to replace `NaN` by `null`, we want to keep the 
output type the same as input type.

Whether `NaNvl(FloatType, DoubleType)` should be cast into 
`NaNvl(DoubleType, DoubleType)` is another story. I agree with you, we should 
downcast the replacement `DoubleType` into `FloatType`. And in my opinion, 
doing this implicit casting is error-prone, and we should do explicit casting 
by users instead. 

@gatorsmile maybe you can chime in, and give the feedback whether we should 
cast `NaNvl(FloatType, DoubleType)` to `NaNvl(DoubleType, DoubleType)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...

2017-04-11 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/17606

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to 
NaNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

`NaNvl(float value, null)` will be converted into `NaNvl(float value, 
Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), 
Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark fixNaNvl

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17606.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17606


commit fa5e1aff1319a75e89da8baf48f06b223b17eb8c
Author: DB Tsai <d...@netflix.com>
Date:   2017-04-11T08:50:24Z

Added new NaNvl type coercion rule




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17600: [MINOR][SQL] Fix the @since tag when backporting SPARK-1...

2017-04-10 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17600
  
Merged into mater.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17601: [MINOR][SQL] Fix the @since tag when backporting SPARK-1...

2017-04-10 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17601
  
Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17601: [MINOR][SQL] Fix the @since tag when backporting ...

2017-04-10 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/17601

[MINOR][SQL] Fix the @since tag when backporting critical bugs from 2.2 
branch into 2.0 branch

## What changes were proposed in this pull request?

Fix the @since tag when backporting critical bugs from 2.2 branch into 2.0 
branch.

## How was this patch tested?

N/A

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17601


commit d3aa8bddac7ac738cb3de2cee0eadf1a9740eaa6
Author: DB Tsai <dbt...@dbtsai.com>
Date:   2017-04-11T00:29:33Z

fix since tag version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17600: [MINOR][SQL] Fix the @since tag when backporting ...

2017-04-10 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/17600

[MINOR][SQL] Fix the @since tag when backporting critical bugs from 2.2 
branch into 2.1 branch

## What changes were proposed in this pull request?

Fix the @since tag when backporting critical bugs from 2.2 branch into 2.1 
branch.

## How was this patch tested?

N/A

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17600


commit 61e2497f36a9e38b06b512ae9552a24f26448a9e
Author: DB Tsai <dbt...@dbtsai.com>
Date:   2017-04-11T00:21:23Z

update spark version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17577: [SPARK-20270][SQL] na.fill should not change the values ...

2017-04-09 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17577
  
Merged into master.

@cloud-fan #15994 is still needed when a user wants to fill in default long 
value with a extremely large value into NaN. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17577: [SPARK-20270][SQL] na.fill should not change the values ...

2017-04-09 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17577
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17577: [SPARK-20270][SQL] na.fill should not change the ...

2017-04-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17577#discussion_r110549401
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
@@ -407,10 +407,11 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 val quotedColName = "`" + col.name + "`"
 val colValue = col.dataType match {
   case DoubleType | FloatType =>
-nanvl(df.col(quotedColName), lit(null)) // nanvl only supports 
these types
+// nanvl only supports these types
+nanvl(df.col(quotedColName), lit(null).cast(col.dataType))
--- End diff --

The float one is covered by another test in `fill with map`, but still nice 
to have them explicitly to avoid being changed unexpectedly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17577: [SPARK-20270][SQL] na.fill should not change the ...

2017-04-09 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/17577

[SPARK-20270][SQL] na.fill should not change the values in long or integer 
when the default value is in double

## What changes were proposed in this pull request?

This bug was partially addressed in SPARK-18555 
https://github.com/apache/spark/pull/15994, but the root cause isn't completely 
solved. This bug is pretty critical since it changes the member id in Long in 
our application if the member id can not be represented by Double losslessly 
when the member id is very big.

Here is an example how this happens, with
```
  Seq[(java.lang.Long, java.lang.Double)]((null, 3.14), 
(9123146099426677101L, null),
(9123146560113991650L, 1.6), (null, null)).toDF("a", 
"b").na.fill(0.2),
```
the logical plan will be
```
== Analyzed Logical Plan ==
a: bigint, b: double
Project [cast(coalesce(cast(a#232L as double), cast(0.2 as double)) as 
bigint) AS a#240L, cast(coalesce(nanvl(b#233, cast(null as double)), 0.2) as 
double) AS b#241]
+- Project [_1#229L AS a#232L, _2#230 AS b#233]
   +- LocalRelation [_1#229L, _2#230]
```

Note that even the value is not null, Spark will cast the Long into Double 
first. Then if it's not null, Spark will cast it back to Long which results in 
losing precision.

The behavior should be that the original value should not be changed if 
it's not null, but Spark will change the value which is wrong.

With the PR, the logical plan will be
```
== Analyzed Logical Plan ==
a: bigint, b: double
Project [coalesce(a#232L, cast(0.2 as bigint)) AS a#240L, 
coalesce(nanvl(b#233, cast(null as double)), cast(0.2 as double)) AS b#241]
+- Project [_1#229L AS a#232L, _2#230 AS b#233]
   +- LocalRelation [_1#229L, _2#230]
```
which behaves correctly without changing the original Long values and also 
avoids extra cost of unnecessary casting.

## How was this patch tested?

unit test added.

+cc @srowen @rxin @cloud-fan @gatorsmile 

Thanks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark fixnafill

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17577


commit cb5440787d676d2c74983ddcd1df31b38d009d71
Author: DB Tsai <d...@netflix.com>
Date:   2017-04-09T07:57:45Z

na.fill will change the values in long or integer when the default value is 
in double




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17419: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-03-27 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17419
  
Will be really interested to see the performance benchmark durning the QA 
period so users can know when to use the dataframe apis or existing rdd apis. 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17426: [SPARK-17137][ML][WIP] Compress logistic regressi...

2017-03-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17426#discussion_r108040945
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -617,26 +612,13 @@ class LogisticRegression @Since("1.2.0") (
   denseCoefficientMatrix.update(_ - coefficientMean)
 }
 
--- End diff --

BTW, we could compress the `coefficients` from driver to executors to save 
communication cost. Can do in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-24 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/15628
  
Thanks @sethah and Jenkins! Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107972466
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1079,4 +1267,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 
+ 12 + 12 + 1
+12L * numActives + 4L * numPtrs + 37L
--- End diff --

Nice that we can get `37L` using java apis to ensure the portability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107963752
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1079,4 +1267,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 
+ 12 + 12 + 1
+12L * numActives + 4L * numPtrs + 37L
+  }
+
+  private[ml] def getDenseSize(numCols: Long, numRows: Long): Long = {
+// 8 * values.length + 12 + 1
--- End diff --

Can you document what is the magical number `12 + 1`? Also, can we make it

`java.lang.Double.BYTES * numCols * numRows + 13L`

since the size of primitive type can depend on the implementation of JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107961128
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +168,116 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix while maintaining the layout 
of the current matrix.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
--- End diff --

`def toSparse: SparseMatrix = toSparseMatrix(colMajor = isColMajor)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107966467
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -1079,4 +1267,15 @@ object Matrices {
   SparseMatrix.fromCOO(numRows, numCols, entries)
 }
   }
+
+  private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
+// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 
+ 12 + 12 + 1
+12L * numActives + 4L * numPtrs + 37L
--- End diff --

`(java.lang.Double.BYTES + java.lang.Integer.BYTES) * numActives + 
java.lang.Integer.BYTES * numPtrs + 37L`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107961223
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +168,116 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix while maintaining the layout 
of the current matrix.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix while maintaining the layout 
of the current matrix.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed)
--- End diff --

`def toDense: DenseMatrix = toDenseMatrix(colMajor = isColMajor)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107844935
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107831775
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
+} else {
+  this
 }
-new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
   }
 
--- End diff --

Sounds good. Let's do it in another PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832496
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
+  } else {
+toSparseMatrix(colMajor = false)
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836197
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836216
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837259
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837491
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107834704
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
--- End diff --

Can you group the tests either by `dm1` and `dm2` or by the same methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832751
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +720,67 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!isTransposed && !colMajor) {
+  // it is row major and we want col major, use breeze to remove 
explicit zeros
+  val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]].t
+  
Matrices.fromBreeze(breezeTransposed).transpose.asInstanceOf[SparseMatrix]
+} else if (isTransposed && colMajor) {
+  // it is col major and we want row major, use breeze to remove 
explicit zeros
+  val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]]
+  Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix]
+} else {
+  val nnz = numNonzeros
+  if (nnz != numActives) {
+// remove explicit zeros
+val rr = new Array[Int](nnz)
+val vv = new Array[Double](nnz)
+val numPtrs = if (isTransposed) numRows else numCols
+val cc = new Array[Int](numPtrs + 1)
+var nzIdx = 0
+var j = 0
+while (j < numPtrs) {
+  var idx = colPtrs(j)
+  val idxEnd = colPtrs(j + 1)
+  cc(j) = nzIdx
+  while (idx < idxEnd) {
+if (values(idx) != 0.0) {
+  vv(nzIdx) = values(idx)
+  rr(nzIdx) = rowIndices(idx)
+  nzIdx += 1
+}
+idx += 1
+  }
+  j += 1
+}
+cc(j) = nnz
+new SparseMatrix(numRows, numCols, cc, rr, vv, isTransposed = 
isTransposed)
+  } else {
+this
+  }
+}
   }
 
-  override def numNonzeros: Int = values.count(_ != 0)
-
-  override def numActives: Int = values.length
+  /**
+   * Generate a `DenseMatrix` from the given `SparseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values are in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (colMajor) new DenseMatrix(numRows, numCols, toArray)
--- End diff --

`new DenseMatrix(numRows, numCols, this.toArray, isTransposed = false)` to 
make the style consistent. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836205
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835539
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835746
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837113
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836138
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832900
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
--- End diff --

nit, with `this`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837448
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107836155
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835989
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835326
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
--- End diff --

`val dm4 = dm1.toDenseRowMajor` and `val dm7 = dm1.toDenseRowMajor` are the 
same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832481
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107837410
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835213
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
--- End diff --

You tested `dm1.toSparseColMajor` twice.

Will be nice to group them like
```scala
val sm1 = dm1.toSparseColMajor

val sm2 = dm2.toSparseColMajor

val sm3 = dm3.toSparseColMajor

val sm4 = dm1.toSparseRowMajor

val sm5 = dm2.toSparseRowMajor

val sm6 = dm3.toSparseRowMajor

val sm7 = dm1.toSparse

val sm8 = dm2.toSparse

val sm9 = dm3.toSparse
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832469
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

I don't see the change here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107835655
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,395 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseRowMajor
+assert(dm4 === dm1)
+assert(dm4.isTransposed)
+assert(dm4.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm5 = dm2.toDenseColMajor
+assert(dm5 === dm2)
+assert(!dm5.isTransposed)
+assert(dm5.values === Array(5.0, 1.0, -9.0, -3.0, 4.0, -8.0))
+
+val dm6 = dm2.toDenseRowMajor
+assert(dm6 === dm2)
+assert(dm6.isTransposed)
+assert(dm6.values.equals(dm2.values))
+
+val dm7 = dm1.toDenseRowMajor
+assert(dm7 === dm1)
+assert(dm7.isTransposed)
+assert(dm7.values === Array(4.0, 2.0, -8.0, -1.0, 7.0, 4.0))
+
+val dm8 = dm1.toDenseColMajor
+assert(dm8 === dm1)
+assert(!dm8.isTransposed)
+assert(dm8.values.equals(dm1.values))
+
+val dm9 = dm2.toDense
+assert(dm9 === dm2)
+assert(dm9.isTransposed)
+assert(dm9.values.equals(dm2.values))
+  }
 
-val spMat1 = new SparseMatrix(m, n, colPtrs, rowIndices, values)
-val deMat1 = new DenseMatrix(m, n, allValues)
+  test("dense to sparse") {
+/*
+  dm1 = 0.0 4.0 5.0
+0.0 2.0 0.0
+
+  dm2 = 0.0 4.0 5.0
+0.0 2.0 0.0
 
-val spMat2 = deMat1.toSparse
-val deMat2 = spMat1.toDense
+  dm3 = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(0.0, 0.0, 4.0, 2.0, 5.0, 0.0))
+val dm2 = new DenseMatrix(2, 3, Array(0.0, 4.0, 5.0, 0.0, 2.0, 0.0), 
isTransposed = true)
+val dm3 = new DenseMatrix(2, 3, Array(0.0, 0.0, 0.0, 0.0, 0.0, 0.0))
+
+val sm1 = dm1.toSparseColMajor
+assert(sm1 === dm1)
+assert(!sm1.isTransposed)
+assert(sm1.values === Array(4.0, 2.0, 5.0))
+
+val sm2 = dm1.toSparseRowMajor
+assert(sm2 === dm1)
+assert(sm2.isTransposed)
+assert(sm2.values === Array(4.0, 5.0, 2.0))
+
+val sm3 = dm2.toSparseColMajor
+assert(sm3 === dm2)
+assert(!sm3.isTransposed)
+assert(sm3.values === Array(4.0, 2.0, 5.0))
+
+val sm4 = dm2.toSparseRowMajor
+assert(sm4 === dm2)
+assert(sm4.isTransposed)
+assert(sm4.values === Array(4.0, 5.0, 2.0))
+
+val sm5 = dm3.toSparseColMajor
+assert(sm5 === dm3)
+assert(sm5.values === Array.empty[Double])
+assert(!sm5.isTransposed)
+
+val sm6 = dm3.toSparseRowMajor
+assert(sm6 === dm3)
+assert(sm6.values === Array.empty[Double])
+assert(sm6.isTransposed)
+
+val sm7 = dm1.toSparse
+assert(sm7 === dm1)
+assert(sm7.values === Array(4.0, 2.0, 5.0))
+assert(!sm7.isTransposed)
+
+val sm8 = dm1.toSparseColMajor
+assert(sm8 === dm1)
+assert(sm8.values === Array(4.0, 2.0, 5.0))
+assert(!sm8.isTransposed)
+
+val sm9 = dm2.toSparseRowMajor
+assert(sm9 === dm2)
+assert(sm9.values === Array(4.0, 5.0, 2.0))
+assert(sm9.isTransposed)
+
+val sm10 = dm2.toSparse
+assert(sm10 === dm2)
+assert(sm10.values === Array(4.0, 5.0, 2.0))
+assert(sm10.isTransposed)
+  }
+
+  test("sparse to sparse") {
+/*
+  sm1 = sm2 = sm3 = sm4 = 0.0 4.0 5.0
+  0.0 2.0 0.0
+  smZeros = 0.0 0.0 0.0
+0.0 0.0 0.0
+ */
+val sm1 = new SparseMatrix(2, 3, Array(0, 0, 2, 3), Array(0, 1, 0), 
Array(4.0, 2.0, 5.0))
+val sm2 = new SparseMatrix(2, 3, Array(0, 2, 3), Array(1, 2, 1), 
Array(4.0, 5.0, 2.0),
+  isTransposed = true)
+val sm3 = new Spars

[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832867
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

`this.toDense`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107832826
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
--- End diff --

I'll call `this.toArray`, and `this.transpose.toArray` as you did in other 
place to make it explicit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107784952
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
+  } else {
+toSparseMatrix(colMajor = false)
--- End diff --

`toSparseRowMajor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107785054
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
--- End diff --

Call `toDense` if we decide to make `toDense` and `toSparse` outputting the 
same layout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107784897
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,118 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparseColMajor: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toSparseRowMajor: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in dense or sparse column major format, whichever 
uses less storage.
+   */
+  @Since("2.2.0")
+  def compressedColMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = true)) {
+  toDenseColMajor
+} else {
+  toSparseColMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense or sparse row major format, whichever uses 
less storage.
+   */
+  @Since("2.2.0")
+  def compressedRowMajor: Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor = false)) {
+  toDenseRowMajor
+} else {
+  toSparseRowMajor
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(colMajor = true)
+val csrSize = getSparseSizeInBytes(colMajor = false)
+if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
+  // dense matrix size is the same for column major and row major, so 
maintain current layout
+  toDenseMatrix(!isTransposed)
+} else {
+  if (cscSize <= csrSize) {
+toSparseMatrix(colMajor = true)
--- End diff --

`toSparseColumnMajor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107794035
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (isTransposed && colMajor) {
+  new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
+} else if (!isTransposed && !colMajor) {
+  new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = 
true)
+} else {
+  this
 }
-new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
   }
 
--- End diff --

Could we override the `toArray` in DenseMatrix so when `this` is column 
major, we just return `this.values`? Otherwise, it's very expensive to create a 
new array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107791878
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +404,49 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
--- End diff --

Minor, can we have 
```scala
protected def isColMajor = !isTransposed

protected def isRowMajor = isTransposed
```

So the code can be understood easier? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107781333
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
--- End diff --

ditto. should we consider to maintain the same layout?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107779234
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

But I thought this is a new api being added, so we can make it to maintain 
the same layout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107557490
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

This looks great to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107556503
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
--- End diff --

This is not a blocker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107519619
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

After thinking about it again, let's have it as `toSparseColumnMajor` to 
make the apis consistent with the dense ones if you don't mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107518109
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

BTW, it's nice to have return type in public method. Can you add `Unit` as 
return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107519221
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

The following could work, and we only need one implementation in trait. 
Thanks.

```scala
trait Matrix {

  var isTransposed: Boolean = true
  var numCols: Int = 0
  var numRows: Int = 0

  def foreachActive(f: (Int, Int, Double) => Unit): Unit

  def toDenseMatrix(colMajor: Boolean): Matrix = {
this match {
  case _: DenseMatrix if this.isTransposed != colMajor =>
this
  case _: SparseMatrix | _: DenseMatrix if this.isTransposed == 
colMajor =>
val newValues = new Array[Double](numCols * numRows)

this.foreachActive { case (row, col, value) =>
// filling the newValues
}
new DenseMatrix(numRows, numCols, newValues, isTransposed = 
!colMajor)
  case _ =>
throw new IllegalArgumentException("")
}
  }
}

class DenseMatrix extends Matrix {

  def foreachActive(f: (Int, Int, Double) => Unit): Unit = {

  }

}

class SparseMatrix extends Matrix {
  def foreachActive(f: (Int, Int, Double) => Unit): Unit = {

  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107313343
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
--- End diff --

Why not just make `dm2` dm1.transposed, but explicitly assign the value?  
Thus, you don't need to type the value in the array for the comparison. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107312629
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala ---
@@ -160,22 +160,385 @@ class MatricesSuite extends SparkMLFunSuite {
 assert(sparseMat.values(2) === 10.0)
   }
 
-  test("toSparse, toDense") {
-val m = 3
-val n = 2
-val values = Array(1.0, 2.0, 4.0, 5.0)
-val allValues = Array(1.0, 2.0, 0.0, 0.0, 4.0, 5.0)
-val colPtrs = Array(0, 2, 4)
-val rowIndices = Array(0, 1, 1, 2)
+  test("dense to dense") {
+/*
+  dm1 =  4.0 2.0 -8.0
+-1.0 7.0  4.0
+
+  dm2 = 5.0 -9.0  4.0
+1.0 -3.0 -8.0
+ */
+val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
+val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, 
-8.0), isTransposed = true)
+
+val dm3 = dm1.toDense
+assert(dm3 === dm1)
+assert(!dm3.isTransposed)
+assert(dm3.values.equals(dm1.values))
+
+val dm4 = dm1.toDenseMatrix(false)
--- End diff --

I would like to make `toDenseMatrix` as private, and we test against 
`toDenseRowMajor` which is more explicit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107312905
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

I'm debating that should we keep the same ordering of layout when we call 
`toSparse` or `toDense`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107306663
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

I'm not good at naming, but since we use `toDenseRowMajor` for dense 
vector, should we use `toSparseColumnMajor`? Almost many packages are using 
`toCSC`, but I think we can make them consistent. Just my 2 cents.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107312194
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
+  if (!isTransposed) {
+// it is row major and we want col major
+val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]].t
+
Matrices.fromBreeze(breezeTransposed).transpose.asInstanceOf[SparseMatrix]
+  } else {
+// it is col major and we want row major
+val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]]
+Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix]
+  }
+} else {
--- End diff --

Can we document here that it's when the layout of this and colMajor is 
different? Easier read than `(colMajor ^ isTranspose)` condition here. Even 
more readable to use pattern matching with exact boolean on both variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107307120
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toDenseRowMajor: DenseMatrix = toDenseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDenseColMajor: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
--- End diff --

Let's make it private, and follow the previous style. Should add 
`compressedRowMajor`, `compressedColumnMajor` since it can be dense matrix in 
certain situations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107309786
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -291,31 +395,60 @@ class DenseMatrix @Since("2.0.0") (
   override def numActives: Int = values.length
 
   /**
-   * Generate a `SparseMatrix` from the given `DenseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from the given `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `SparseMatrix` values will be 
in column major order.
*/
-  @Since("2.0.0")
-  def toSparse: SparseMatrix = {
-val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
-val colPtrs: Array[Int] = new Array[Int](numCols + 1)
-val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
-var nnz = 0
-var j = 0
-while (j < numCols) {
-  var i = 0
-  while (i < numRows) {
-val v = values(index(i, j))
-if (v != 0.0) {
-  rowIndices += i
-  spVals += v
-  nnz += 1
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!colMajor) this.transpose.toSparseMatrix(colMajor = true).transpose
+else {
+  val spVals: MArrayBuilder[Double] = new MArrayBuilder.ofDouble
+  val colPtrs: Array[Int] = new Array[Int](numCols + 1)
+  val rowIndices: MArrayBuilder[Int] = new MArrayBuilder.ofInt
+  var nnz = 0
+  var j = 0
+  while (j < numCols) {
+var i = 0
+while (i < numRows) {
+  val v = values(index(i, j))
+  if (v != 0.0) {
+rowIndices += i
+spVals += v
+nnz += 1
+  }
+  i += 1
 }
-i += 1
+j += 1
+colPtrs(j) = nnz
   }
-  j += 1
-  colPtrs(j) = nnz
+  new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), 
spVals.result())
+}
+  }
+
+  /**
+   * Generate a `DenseMatrix` from this `DenseMatrix`.
+   *
+   * @param colMajor Whether the resulting `DenseMatrix` values will be in 
column major order.
+   */
+  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = 
{
+if (!(isTransposed ^ colMajor)) {
+  val newValues = new Array[Double](numCols * numRows)
--- End diff --

Simpler to use `foreachActive`? With it, both `toDenseMatrix` can have the 
same implementation for sparse and dense in trait.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107311742
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -587,18 +722,69 @@ class SparseMatrix @Since("2.0.0") (
 }
   }
 
+  override def numNonzeros: Int = values.count(_ != 0)
+
+  override def numActives: Int = values.length
+
   /**
-   * Generate a `DenseMatrix` from the given `SparseMatrix`. The new 
matrix will have isTransposed
-   * set to false.
+   * Generate a `SparseMatrix` from this `SparseMatrix`, removing explicit 
zero values if they
+   * exist.
+   *
+   * @param colMajor Whether or not the resulting `SparseMatrix` values 
are in column major
+   *order.
*/
-  @Since("2.0.0")
-  def toDense: DenseMatrix = {
-new DenseMatrix(numRows, numCols, toArray)
+  private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix 
= {
+if (!(colMajor ^ isTransposed)) {
+  // breeze transpose rearranges values in column major and removes 
explicit zeros
--- End diff --

I think it's hacky to use breeze's transpose behavior to remove zeros in 
sparse matrices. Can we have our own implementation given we're potentially 
remove breeze?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107306774
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
--- End diff --

Same question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107303875
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) {
+  toDenseMatrix(colMajor)
+} else {
+  toSparseMatrix(colMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
--- End diff --

+1 on the later one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r107303720
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
--- End diff --

Fair enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105272926
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

Should be fine. Small enough change :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105274636
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
--- End diff --

we may `inline` this as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105272811
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param colMajor Whether the values of the resulting matrix should be 
in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.2.0")
+  def compressed(colMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(colMajor)) {
+  toDenseMatrix(colMajor)
+} else {
+  toSparseMatrix(colMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.2.0")
+  def compressed: Matrix = {
--- End diff --

Won't `compressed(colMajor: Boolean)` and `compressed` cause the 
overloading ambiguous issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105273751
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
--- End diff --

`toCSR`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105273579
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
--- End diff --

How about we follow 
[scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html),
 and call it as `toCSC`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105273397
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
--- End diff --

Maybe we can `inline` this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105274573
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +154,97 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param colMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a sparse matrix in row major order.
+   */
+  @Since("2.2.0")
+  def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param colMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toDenseMatrix(colMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.2.0")
+  def toDense: DenseMatrix = toDenseMatrix(colMajor = true)
+
--- End diff --

Could we add `toColumnMajorDense` and `toRowMajorDense`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105032243
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
--- End diff --

Can you make `foreachActive(f: (Int, Int, Double) => Unit)` public? This is 
public for vector. I believe it will be very useful, and I think it's stable 
enough to make it public.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r105068149
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
--- End diff --

Yeah, this is very hacky in my opinion too! 

The problem is that when one overloads a function without parenthesis, the 
ambiguity happens because when that function is invoked without parenthesis, 
this can be calling the actual function without parenthesis, or getting the 
function with parenthesis. The following is an example demonstrating the issue. 

In my opinion, I would like to call it `toSparse(columnMajor: Boolean)` and 
`toSparse() = toSparse(true)`, but in the vector api, we already use the one 
without parenthesis, so it will result inconsistency in the api design.

I think exposing the ability of converting it to `columnMajor` or 
`rowMajor` is very useful, as a result, we can expose it as `toCSRMatrix`, 
`toCSCMatrix`, and `toSparse` which converts the matrix to the one with 
smallest storage. 

```scala
scala> trait A {
 | def foo(b: Boolean): String
 | def foo: String = foo(true)
 | }
defined trait A

scala> class B extends A {
 | def foo(b: Boolean): String = b.toString
 | }
defined class B

scala> val b = new B
b: B = B@67b6d4ae

scala> b.foo
:18: error: ambiguous reference to overloaded definition,
both method foo in class B of type (b: Boolean)String
and  method foo in trait A of type => String
match expected type ?
   b.foo
 ^

scala> val x: String = b.foo
x: String = true

scala> val y: Boolean=> String = b.foo
y: Boolean => String = 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r87668326
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param columnMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toDense: DenseMatrix = toDenseMatrix(columnMajor = true)
--- End diff --

Nit, since we're using `numCols` already, should we call it `colMajor`? I 
saw couple packages using `colMajor` as the variable name.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15628#discussion_r87669463
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
   @Since("2.0.0")
   def numActives: Int
+
+  /**
+   * Converts this matrix to a sparse matrix.
+   *
+   * @param columnMajor Whether the values of the resulting sparse matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
+
+  /**
+   * Converts this matrix to a sparse matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toSparse: SparseMatrix = toSparseMatrix(columnMajor = true)
+
+  /**
+   * Converts this matrix to a dense matrix.
+   *
+   * @param columnMajor Whether the values of the resulting dense matrix 
should be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix
+
+  /**
+   * Converts this matrix to a dense matrix in column major order.
+   */
+  @Since("2.1.0")
+  def toDense: DenseMatrix = toDenseMatrix(columnMajor = true)
+
+  /**
+   * Returns a matrix in either dense or sparse format, whichever uses 
less storage.
+   *
+   * @param columnMajor Whether the values of the resulting matrix should 
be in column major
+   *or row major order. If `false`, resulting matrix 
will be row major.
+   */
+  @Since("2.1.0")
+  def compressed(columnMajor: Boolean): Matrix = {
+if (getDenseSizeInBytes < getSparseSizeInBytes(columnMajor)) {
+  toDenseMatrix(columnMajor)
+} else {
+  toSparseMatrix(columnMajor)
+}
+  }
+
+  /**
+   * Returns a matrix in dense column major, dense row major, sparse row 
major, or sparse column
+   * major format, whichever uses less storage. When dense representation 
is optimal, it maintains
+   * the current layout order.
+   */
+  @Since("2.1.0")
+  def compressed: Matrix = {
+val cscSize = getSparseSizeInBytes(columnMajor = true)
+val csrSize = getSparseSizeInBytes(columnMajor = false)
+val minSparseSize = cscSize.min(csrSize)
+if (getDenseSizeInBytes < minSparseSize) {
+  // size is the same either way, so maintain current layout
--- End diff --

``` scala
if (getDenseSizeInBytes < math.min(cscSize, csrSize)) 
...
...
if (cscSize < csrSize)
```

could be easier to read.

Also, can you elaborate the comment like 

```
// sizes for dense matrix in row major or column major are the same, so 
maintain current layout
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r105025830
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -300,6 +318,10 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
+  @Since("2.2.0")
+  def setInitialModel(value: KMeansModel): this.type = set(initialModel, 
value)
--- End diff --

Can you elaborate this? I don't fully understand why we can not overwrite 
setting in set method. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r105025280
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -335,17 +369,70 @@ class KMeans @Since("1.5.0") (
 model
   }
 
+  /**
+   * Check validity for interactions between parameters.
+   */
+  private def assertInitialModelValid(): Unit = {
+if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL) {
+  if (isSet(initialModel)) {
+val initialModelK = $(initialModel).parentModel.k
+if (initialModelK != $(k)) {
+  throw new IllegalArgumentException("The initial model's cluster 
count = " +
+s"$initialModelK, mismatched with k = $k.")
+}
+  } else {
+throw new IllegalArgumentException("Users must set param 
initialModel if you choose " +
+  "'initialModel' as the initialization algorithm.")
+  }
+} else {
+  if (isSet(initialModel)) {
+logWarning(s"Param initialModel will take no effect when initMode 
is $initMode.")
+  }
+}
+  }
+
   @Since("1.5.0")
   override def transformSchema(schema: StructType): StructType = {
+assertInitialModelValid()
--- End diff --

`transformSchema ` is called in `transform` method, and `model.transform` 
is called in computing the summary. I think we should fail it earlier instead 
of checking it in the end. Also, it's implicit that it's being checked when 
computing summary. We should explicitly check it. 

If we have small logic in checking, I'll have those checking code in `fit` 
method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r105002771
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -37,9 +37,9 @@ import org.apache.spark.storage.StorageLevel
 import org.apache.spark.util.VersionUtils.majorVersion
 
 /**
- * Common params for KMeans and KMeansModel
+ * Common params for KMeans and KMeansModel.
  */
-private[clustering] trait KMeansParams extends Params with HasMaxIter with 
HasFeaturesCol
+private[clustering] trait KMeansModelParams extends Params with HasMaxIter 
with HasFeaturesCol
   with HasSeed with HasPredictionCol with HasTol {
--- End diff --

Fair enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r105002617
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -152,6 +158,35 @@ class KMeansSuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultR
 val kmeans = new KMeans()
 testEstimatorAndModelReadWrite(kmeans, dataset, 
KMeansSuite.allParamSettings, checkModelData)
   }
+
+  test("training with initial model") {
+val kmeans = new KMeans().setK(2).setSeed(1)
+val model1 = kmeans.fit(rData)
+val model2 = 
kmeans.setInitMode("initialModel").setInitialModel(model1).fit(rData)
+model2.clusterCenters.zip(model1.clusterCenters)
+  .foreach { case (center2, center1) => assert(center2 ~== center1 
absTol 1E-8) }
+  }
+
+  test("training with initial model, error cases") {
+val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1)
+
+// Sets initMode with 'initialModel', but does not specify initial 
model.
+intercept[IllegalArgumentException] {
--- End diff --

@sethah +1 on the behavior you purpose. The only thing I would like to add 
on is `setK` should throw `IllegalArgumentException`.

```scala
// initialModel sets k and init mode
assert(km.getInitMode === MLlibKMeans.K_MEANS_INITIAL_MODEL)
assert(km.getK === initialK)
assert(km.getInitialModel.getK === initialK)

// setting k will throw exception.
intercept[IllegalArgumentException] {
  km.setK(initialK + 1)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104820054
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -335,17 +369,70 @@ class KMeans @Since("1.5.0") (
 model
   }
 
+  /**
+   * Check validity for interactions between parameters.
+   */
+  private def assertInitialModelValid(): Unit = {
+if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL) {
+  if (isSet(initialModel)) {
+val initialModelK = $(initialModel).parentModel.k
+if (initialModelK != $(k)) {
+  throw new IllegalArgumentException("The initial model's cluster 
count = " +
+s"$initialModelK, mismatched with k = $k.")
+}
+  } else {
+throw new IllegalArgumentException("Users must set param 
initialModel if you choose " +
+  "'initialModel' as the initialization algorithm.")
+  }
+} else {
+  if (isSet(initialModel)) {
--- End diff --

Also, this is not needed if we do the overwriting work in `setInitialModel`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104819439
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -335,17 +369,70 @@ class KMeans @Since("1.5.0") (
 model
   }
 
+  /**
+   * Check validity for interactions between parameters.
+   */
+  private def assertInitialModelValid(): Unit = {
+if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL) {
+  if (isSet(initialModel)) {
+val initialModelK = $(initialModel).parentModel.k
+if (initialModelK != $(k)) {
+  throw new IllegalArgumentException("The initial model's cluster 
count = " +
+s"$initialModelK, mismatched with k = $k.")
+}
+  } else {
+throw new IllegalArgumentException("Users must set param 
initialModel if you choose " +
+  "'initialModel' as the initialization algorithm.")
+  }
+} else {
+  if (isSet(initialModel)) {
+logWarning(s"Param initialModel will take no effect when initMode 
is $initMode.")
+  }
+}
+  }
+
   @Since("1.5.0")
   override def transformSchema(schema: StructType): StructType = {
+assertInitialModelValid()
--- End diff --

Why this is not checked in `fit`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104821332
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -335,17 +369,70 @@ class KMeans @Since("1.5.0") (
 model
   }
 
+  /**
+   * Check validity for interactions between parameters.
+   */
+  private def assertInitialModelValid(): Unit = {
--- End diff --

I think with overwriting above, the only thing we need to check will be 

```scala
if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL && 
!isSet(initialModel)) {
  throw new IllegalArgumentException("Users must set param initialModel if 
you choose " +
   "'initialModel' as the initialization.")
}
```

we can just have it in the body of `fit` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104803180
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -37,9 +37,9 @@ import org.apache.spark.storage.StorageLevel
 import org.apache.spark.util.VersionUtils.majorVersion
 
 /**
- * Common params for KMeans and KMeansModel
+ * Common params for KMeans and KMeansModel.
  */
-private[clustering] trait KMeansParams extends Params with HasMaxIter with 
HasFeaturesCol
+private[clustering] trait KMeansModelParams extends Params with HasMaxIter 
with HasFeaturesCol
   with HasSeed with HasPredictionCol with HasTol {
--- End diff --

Now, `KMeansModel` mixes `KMeansModelParams`, does it mean in the model 
level, we can not get the information of the `initiModel`? Also, in the model, 
why do we need to mix the seed in?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104819810
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -335,17 +369,70 @@ class KMeans @Since("1.5.0") (
 model
   }
 
+  /**
+   * Check validity for interactions between parameters.
+   */
+  private def assertInitialModelValid(): Unit = {
+if ($(initMode) == MLlibKMeans.K_MEANS_INITIAL_MODEL) {
+  if (isSet(initialModel)) {
+val initialModelK = $(initialModel).parentModel.k
+if (initialModelK != $(k)) {
--- End diff --

I don't think this check is needed if we overwrite `k` when `initialModel` 
is set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104815453
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -300,6 +318,10 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
+  @Since("2.2.0")
+  def setInitialModel(value: KMeansModel): this.type = set(initialModel, 
value)
--- End diff --

How about
```scala
def setInitialModel(value: KMeansModel): this.type = {
  if (getK ~= value.getK) {
log the warning
set(k, value)
  }
  set(initMode, MLlibKMeans.K_MEANS_INITIAL_MODEL) // We may log, but I 
don't really care for this one.
  set(initialModel, value)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-03-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17117#discussion_r104800325
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -152,6 +158,35 @@ class KMeansSuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultR
 val kmeans = new KMeans()
 testEstimatorAndModelReadWrite(kmeans, dataset, 
KMeansSuite.allParamSettings, checkModelData)
   }
+
+  test("training with initial model") {
+val kmeans = new KMeans().setK(2).setSeed(1)
+val model1 = kmeans.fit(rData)
+val model2 = 
kmeans.setInitMode("initialModel").setInitialModel(model1).fit(rData)
+model2.clusterCenters.zip(model1.clusterCenters)
+  .foreach { case (center2, center1) => assert(center2 ~== center1 
absTol 1E-8) }
+  }
+
+  test("training with initial model, error cases") {
+val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1)
+
+// Sets initMode with 'initialModel', but does not specify initial 
model.
+intercept[IllegalArgumentException] {
--- End diff --

My 2cents is the latter configuration should be able to overwrite the 
former settings and related settings with warning messages. 

In your example, when `kmeans.setInitMode("k-means||")` is performed, the 
first `setInitialModel` should be ignored with warning message.

 Even we do `setK(k =3)`, and later we do `.setInitialModel(initialModel)`, 
we should ignore the first `setK(k =3)` with warning. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17078: [SPARK-19746][ML] Faster indexing for logistic aggregato...

2017-02-27 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17078
  
Thanks. Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >