[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64526546
  
  [Test build #23884 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23884/consoleFull)
 for   PR 3468 at commit 
[`3b0e4e2`](https://github.com/apache/spark/commit/3b0e4e213d407b5fb42a4fd06bb157e3d2d10941).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64536426
  
  [Test build #23884 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23884/consoleFull)
 for   PR 3468 at commit 
[`3b0e4e2`](https://github.com/apache/spark/commit/3b0e4e213d407b5fb42a4fd06bb157e3d2d10941).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-26 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64671123
  
@brkyvz Thanks! I've merged this into master. Will submit another PR for 
branch-1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/3468

[SPARK-4614][MLLIB] Slight API changes in Matrix and Matrices

Before we have a full picture of the operators we want to add, it might be 
safer to hide `Matrix.transposeMultiply` in 1.2.0. Another update we want to 
change is `Matrix.randn` and `Matrix.rand`, both of which should take a 
`Random` implementation. Otherwise, it is very likely to produce inconsistent 
RDDs. I also added some unit tests for matrix factory methods. All APIs are new 
in 1.2, so there is no incompatible changes.

@brkyvz

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-4614

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3468


commit 6bfd8a4c5a7d7f00b7c71ccf6ad8aab6c6caab92
Author: Xiangrui Meng m...@databricks.com
Date:   2014-11-26T03:32:13Z

hide transposeMultiply; add rng to rand and randn; add unit tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64510830
  
  [Test build #23870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23870/consoleFull)
 for   PR 3468 at commit 
[`6bfd8a4`](https://github.com/apache/spark/commit/6bfd8a4c5a7d7f00b7c71ccf6ad8aab6c6caab92).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/3468#discussion_r20916331
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -112,4 +116,50 @@ class MatricesSuite extends FunSuite {
 assert(sparseMat(0, 1) === 10.0)
 assert(sparseMat.values(2) === 10.0)
   }
+
+  test(zeros) {
+val mat = Matrices.zeros(2, 3).asInstanceOf[DenseMatrix]
+assert(mat.numRows === 2)
+assert(mat.numCols === 3)
+assert(mat.values.forall(_ == 0.0))
+  }
+
+  test(ones) {
+val mat = Matrices.ones(2, 3).asInstanceOf[DenseMatrix]
+assert(mat.numRows === 2)
+assert(mat.numCols === 3)
+assert(mat.values.forall(_ == 1.0))
+  }
+
+  test(eye) {
+val mat = Matrices.eye(2).asInstanceOf[DenseMatrix]
+assert(mat.numCols === 2)
+assert(mat.numCols === 2)
+assert(mat.values.toSeq === Seq(1.0, 0.0, 0.0, 1.0))
--- End diff --

Small, but:
`mat.values.toSeq === Seq(1.0, 0.0, 0.0, 1.0)` - `mat.values === 
Array(1.0, 0.0, 0.0, 1.0)`
is there any reason the other wouldn't work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread brkyvz
Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64515054
  
Looks good to me! Just made one comment, no biggie though, it's fine as is 
(but if you decide to change it, there are 4 exact copies of it).
One comment/question about using `Random`: What would be the problem if we 
just pass a seed?
Since users wouldn't be able to use `XORShiftRandom`, which has better 
performance than 
`java.util.Random`, wouldn't passing the seed be enough to solve the 
inconsistency issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64515619
  
  [Test build #23870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23870/consoleFull)
 for   PR 3468 at commit 
[`6bfd8a4`](https://github.com/apache/spark/commit/6bfd8a4c5a7d7f00b7c71ccf6ad8aab6c6caab92).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64515623
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23870/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3468#discussion_r20920325
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala ---
@@ -112,4 +116,50 @@ class MatricesSuite extends FunSuite {
 assert(sparseMat(0, 1) === 10.0)
 assert(sparseMat.values(2) === 10.0)
   }
+
+  test(zeros) {
+val mat = Matrices.zeros(2, 3).asInstanceOf[DenseMatrix]
+assert(mat.numRows === 2)
+assert(mat.numCols === 3)
+assert(mat.values.forall(_ == 0.0))
+  }
+
+  test(ones) {
+val mat = Matrices.ones(2, 3).asInstanceOf[DenseMatrix]
+assert(mat.numRows === 2)
+assert(mat.numCols === 3)
+assert(mat.values.forall(_ == 1.0))
+  }
+
+  test(eye) {
+val mat = Matrices.eye(2).asInstanceOf[DenseMatrix]
+assert(mat.numCols === 2)
+assert(mat.numCols === 2)
+assert(mat.values.toSeq === Seq(1.0, 0.0, 0.0, 1.0))
--- End diff --

`Array.equals` only check the reference.

~~~
scala Array(1.0, 2.0) == Array(1.0, 2.0)
res0: Boolean = false
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4614][MLLIB] Slight API changes in Matr...

2014-11-25 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3468#issuecomment-64526447
  
@brkyvz `XORShiftRandom` implements `Random`. So users can use it directly. 
Using seed will create problems when we want to generate a sequence of 
matrices, e.g., generate a random block matrices, because `randn` and `rand` do 
not return the next seed. If we using a predefined sequence of seeds, the 
generated sequence won't have good quality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org