[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62164793
  
Those changes are made, and I removed the extra static methods that I 
added.  I agree--it's much cleaner now.  Not sure if any cleanup can be done on 
the existing static methods--looks like they're only used in the test suites, 
but I'm going to leave them alone for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62225034
  
@davies What do you think?  Merge-able now? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62225692
  
LGTM, but I have no permission to merge :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62226308
  
Ah, okay.  I'll wait for an admin then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62241558
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62241564
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62241770
  
  [Test build #23088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23088/consoleFull)
 for   PR 3095 at commit 
[`a6743ad`](https://github.com/apache/spark/commit/a6743ad3a3d5254a2438bbf4d253bf2be9ff1822).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62244388
  
  [Test build #23088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23088/consoleFull)
 for   PR 3095 at commit 
[`a6743ad`](https://github.com/apache/spark/commit/a6743ad3a3d5254a2438bbf4d253bf2be9ff1822).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62244390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23088/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62248364
  
LGTM. Merged into master and branch-1.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3095


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62010165
  
Okay @davies.  I changed `seed` to a `java.lang.Long` and did a null check, 
which is a much nicer way of doing things.  Thanks.  So the `if` branch is gone 
from the python, and the extra stub in `PythonMLLibAPI` is gone.  

In terms of cleaning up the API's in `ALS.scala`, I don't know what to do 
about that since I don't know what they're all for, other than the ones that I 
added to deal with `nonnegative`.  @mengxr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mdagost
Github user mdagost commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19956855
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable {
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  seed: Long,
--- End diff --

Oh, also, I left `seed` where it is in the python param list since both 
`seed` and `nonnegative` are optional.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62063174
  
@mdagost  Looks better now, how about put `seed` at the end? It's most 
no-needed one in common cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62065401
  
@davies Swapped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19987389
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -278,8 +278,18 @@ class PythonMLLibAPI extends Serializable {
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  seed: java.lang.Long,
+  nonnegative: Boolean): MatrixFactorizationModel = {
--- End diff --

swap seed and nonnegative here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62077209
  
Look good to me now, just one minor comment. waiting for @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-62078789
  
@davies Swapped there too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19988621
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -278,8 +278,18 @@ class PythonMLLibAPI extends Serializable {
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  nonnegative: Boolean,
+  seed: java.lang.Long): MatrixFactorizationModel = {
+if (seed == null) {
+  new MatrixFactorizationModelWrapper(
+// if the seed coming from python is None/null, let ALS use the
+// default, which is to use System.nanoTime
+ALS.train(ratings.rdd, rank, iterations, lambda, blocks, 
nonnegative))
--- End diff --

Let's use setters so we don't need to add many static methods.

~~~
val als = new ALS()
  .setRank(rank)
  .setIterations(iterations)
  .setLambda(lambda)
  .setBlocks(blocks)
  .setNonnegative(nonnegative)
if (seed != null) als.setSeed(seed)
val model =  als.run(ratings.rdd)
new MatrixFactorizationModelWrapper(model)
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19988624
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -685,6 +685,34 @@ object ALS {
* @param iterations number of iterations of ALS (recommended: 10-20)
* @param lambda regularization factor (recommended: 0.01)
* @param blocks level of parallelism to split computation into
+   * @param seed   random seed
+   * @param nonnegative whether to enforce nonnegativity
+   */
+  def train(
+  ratings: RDD[Rating],
+  rank: Int,
+  iterations: Int,
+  lambda: Double,
+  blocks: Int,
+  seed: Long,
+  nonnegative: Boolean
+): MatrixFactorizationModel = {
--- End diff --

Let's not add more static methods. See my comment above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-06 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19988630
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -45,30 +45,46 @@ class MatrixFactorizationModel(JavaModelWrapper):
  r3 = (2, 1, 2.0)
  ratings = sc.parallelize([r1, r2, r3])
  model = ALS.trainImplicit(ratings, 1)
- model.predict(2,2) is not None
-True
+ model.predict(2,2)
+0.4473...
--- End diff --

Is there any guarantee that it will always be `0.4473...`? Did we fix the 
seed? Same question applies to all tests below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread mdagost
Github user mdagost commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19889623
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper):
  latents = first_product[1]
  len(latents) == 4
 True
+
+ model = ALS.train(ratings, 1, nonnegative=True)
+ model.predict(2,2) is not None
+True
--- End diff --

Cool.  Making those changes now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread mdagost
Github user mdagost commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19889647
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -700,6 +700,32 @@ object ALS {
* Train a matrix factorization model given an RDD of ratings given by 
users to some products,
* in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
* product of two lower-rank matrices of a given rank (number of 
features). To solve for these
+   * features, we run a given number of iterations of ALS. This is done 
using a level of
+   * parallelism given by `blocks`.
+   *
+   * @param ratings RDD of (userID, productID, rating) pairs
+   * @param ranknumber of features to use
+   * @param iterations  number of iterations of ALS (recommended: 10-20)
+   * @param lambda  regularization factor (recommended: 0.01)
+   * @param blocks  level of parallelism to split computation into
+   * @param nonnegative whether to enforce nonnegativity
+   */
+  def train(
+  ratings: RDD[Rating],
+  rank: Int,
+  iterations: Int,
+  lambda: Double,
+  blocks: Int,
+  nonnegative: Boolean
+): MatrixFactorizationModel = {
+(new ALS(blocks, blocks, rank, iterations, lambda, false, 1.0)
+  .setNonnegative(nonnegative).run(ratings))
--- End diff --

I'm changing that now, as well as exposing the seed from python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-61868206
  
Okay.  This should be ready for another look now.  For the seed stuff, I 
decided the best way was to allow the scala code to use System.nanoTime itself 
the way that it does when no seed is passed into the scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19917238
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable {
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  seed: Long,
--- End diff --

We could use java.lang.Long for seed, then seed can be null (means not 
specified), then we do not need another stub.

Also, putting `seed` in the end may be better (`seed` is optional). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19917423
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -685,6 +685,34 @@ object ALS {
* @param iterations number of iterations of ALS (recommended: 10-20)
* @param lambda regularization factor (recommended: 0.01)
* @param blocks level of parallelism to split computation into
+   * @param seed   random seed
+   * @param nonnegative whether to enforce nonnegativity
+   */
+  def train(
+  ratings: RDD[Rating],
+  rank: Int,
+  iterations: Int,
+  lambda: Double,
+  blocks: Int,
+  seed: Long,
+  nonnegative: Boolean
+): MatrixFactorizationModel = {
--- End diff --

@mengxr, there are too many API here, can we simplify these?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread mdagost
Github user mdagost commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19917467
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable {
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  seed: Long,
--- End diff --

I'll move it to the end. 

The issue is that it would be nice to preserve the nanosecond level time 
seed that scala uses, but I couldn't find a way to do that directly in python 
and just pass it through. So I created the extra stub to let scala do it 
itself. 

 On Nov 5, 2014, at 6:06 PM, Davies Liu notificati...@github.com wrote:
 
 In 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala:
 
  @@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable {
 rank: Int,
 iterations: Int,
 lambda: Double,
  -  blocks: Int): MatrixFactorizationModel = {
  -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
  +  blocks: Int,
  +  seed: Long,
 We could use java.lang.Long for seed, then seed can be null (means not 
specified), then we do not need another stub.
 
 Also, putting seed in the end may be better (seed is optional).
 
 —
 Reply to this email directly or view it on GitHub.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19919042
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable {
   rank: Int,
   iterations: Int,
   lambda: Double,
-  blocks: Int): MatrixFactorizationModel = {
-new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, 
iterations, lambda, blocks))
+  blocks: Int,
+  seed: Long,
--- End diff --

If the `seed` could be null, then we could use nanosecond for it in Scala.
```
if (seed == null) {
  ...
} else {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-04 Thread mdagost
GitHub user mdagost opened a pull request:

https://github.com/apache/spark/pull/3095

[MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API

SPARK-1553 added alternating nonnegative least squares to MLLib, however 
it's not possible to access it via the python API.  This pull request resolves 
that.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mdagost/spark python_nmf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3095


commit a72fdc9ce4bfc5f075739140237f10c7150311f8
Author: Michelangelo D'Agostino mdagost...@civisanalytics.com
Date:   2014-11-04T17:41:57Z

Expose nonnegative ALS in the python API.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3095#issuecomment-61705249
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-04 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19844392
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper):
  latents = first_product[1]
  len(latents) == 4
 True
+
+ model = ALS.train(ratings, 1, nonnegative=True)
+ model.predict(2,2) is not None
+True
--- End diff --

It's better to show the result of predict() here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-04 Thread mdagost
Github user mdagost commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19844686
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper):
  latents = first_product[1]
  len(latents) == 4
 True
+
+ model = ALS.train(ratings, 1, nonnegative=True)
+ model.predict(2,2) is not None
+True
--- End diff --

I assumed it would be non-deterministic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-04 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19846425
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper):
  latents = first_product[1]
  len(latents) == 4
 True
+
+ model = ALS.train(ratings, 1, nonnegative=True)
+ model.predict(2,2) is not None
+True
--- End diff --

The non-deterministic could be removed by have a fixed `seed`, but right 
now we can not set seed in Python (it will be great if you could also fix it).

How about these:
```
 r1 = (1, 1, 1.0)
 r2 = (1, 2, 2.0)
 r3 = (2, 1, 2.0)
 ratings = sc.parallelize([r1, r2, r3])
 model = ALS.trainImplicit(ratings, 1)
 model.predict(2,2)
0.4473...

 testset = sc.parallelize([(1, 2), (1, 1)])
 model = ALS.train(ratings, 1)
 model.predictAll(testset).collect()
[Rating(1, 1, 1), Rating(1, 2, 1)]

 model = ALS.train(ratings, 4)
 model.userFeatures().collect()
[(2, array('d', [...])), (1, array('d', [...]))]
 model.productFeatures().collect()
[(2, array('d', [...])), (1, array('d', [...]))]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...

2014-11-04 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3095#discussion_r19846471
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -700,6 +700,32 @@ object ALS {
* Train a matrix factorization model given an RDD of ratings given by 
users to some products,
* in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
* product of two lower-rank matrices of a given rank (number of 
features). To solve for these
+   * features, we run a given number of iterations of ALS. This is done 
using a level of
+   * parallelism given by `blocks`.
+   *
+   * @param ratings RDD of (userID, productID, rating) pairs
+   * @param ranknumber of features to use
+   * @param iterations  number of iterations of ALS (recommended: 10-20)
+   * @param lambda  regularization factor (recommended: 0.01)
+   * @param blocks  level of parallelism to split computation into
+   * @param nonnegative whether to enforce nonnegativity
+   */
+  def train(
+  ratings: RDD[Rating],
+  rank: Int,
+  iterations: Int,
+  lambda: Double,
+  blocks: Int,
+  nonnegative: Boolean
+): MatrixFactorizationModel = {
+(new ALS(blocks, blocks, rank, iterations, lambda, false, 1.0)
+  .setNonnegative(nonnegative).run(ratings))
--- End diff --

NonNegative and seed can not be set in same time, do we need to fix this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org