[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62164793 Those changes are made, and I removed the extra static methods that I added. I agree--it's much cleaner now. Not sure if any cleanup can be done on the existing static methods--looks like they're only used in the test suites, but I'm going to leave them alone for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62225034 @davies What do you think? Merge-able now? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62225692 LGTM, but I have no permission to merge :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62226308 Ah, okay. I'll wait for an admin then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62241558 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62241564 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62241770 [Test build #23088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23088/consoleFull) for PR 3095 at commit [`a6743ad`](https://github.com/apache/spark/commit/a6743ad3a3d5254a2438bbf4d253bf2be9ff1822). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62244388 [Test build #23088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23088/consoleFull) for PR 3095 at commit [`a6743ad`](https://github.com/apache/spark/commit/a6743ad3a3d5254a2438bbf4d253bf2be9ff1822). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62244390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23088/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62248364 LGTM. Merged into master and branch-1.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3095 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62010165 Okay @davies. I changed `seed` to a `java.lang.Long` and did a null check, which is a much nicer way of doing things. Thanks. So the `if` branch is gone from the python, and the extra stub in `PythonMLLibAPI` is gone. In terms of cleaning up the API's in `ALS.scala`, I don't know what to do about that since I don't know what they're all for, other than the ones that I added to deal with `nonnegative`. @mengxr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19956855 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + seed: Long, --- End diff -- Oh, also, I left `seed` where it is in the python param list since both `seed` and `nonnegative` are optional. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62063174 @mdagost Looks better now, how about put `seed` at the end? It's most no-needed one in common cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62065401 @davies Swapped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19987389 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -278,8 +278,18 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + seed: java.lang.Long, + nonnegative: Boolean): MatrixFactorizationModel = { --- End diff -- swap seed and nonnegative here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62077209 Look good to me now, just one minor comment. waiting for @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-62078789 @davies Swapped there too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19988621 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -278,8 +278,18 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + nonnegative: Boolean, + seed: java.lang.Long): MatrixFactorizationModel = { +if (seed == null) { + new MatrixFactorizationModelWrapper( +// if the seed coming from python is None/null, let ALS use the +// default, which is to use System.nanoTime +ALS.train(ratings.rdd, rank, iterations, lambda, blocks, nonnegative)) --- End diff -- Let's use setters so we don't need to add many static methods. ~~~ val als = new ALS() .setRank(rank) .setIterations(iterations) .setLambda(lambda) .setBlocks(blocks) .setNonnegative(nonnegative) if (seed != null) als.setSeed(seed) val model = als.run(ratings.rdd) new MatrixFactorizationModelWrapper(model) ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19988624 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -685,6 +685,34 @@ object ALS { * @param iterations number of iterations of ALS (recommended: 10-20) * @param lambda regularization factor (recommended: 0.01) * @param blocks level of parallelism to split computation into + * @param seed random seed + * @param nonnegative whether to enforce nonnegativity + */ + def train( + ratings: RDD[Rating], + rank: Int, + iterations: Int, + lambda: Double, + blocks: Int, + seed: Long, + nonnegative: Boolean +): MatrixFactorizationModel = { --- End diff -- Let's not add more static methods. See my comment above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19988630 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -45,30 +45,46 @@ class MatrixFactorizationModel(JavaModelWrapper): r3 = (2, 1, 2.0) ratings = sc.parallelize([r1, r2, r3]) model = ALS.trainImplicit(ratings, 1) - model.predict(2,2) is not None -True + model.predict(2,2) +0.4473... --- End diff -- Is there any guarantee that it will always be `0.4473...`? Did we fix the seed? Same question applies to all tests below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19889623 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper): latents = first_product[1] len(latents) == 4 True + + model = ALS.train(ratings, 1, nonnegative=True) + model.predict(2,2) is not None +True --- End diff -- Cool. Making those changes now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19889647 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -700,6 +700,32 @@ object ALS { * Train a matrix factorization model given an RDD of ratings given by users to some products, * in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the * product of two lower-rank matrices of a given rank (number of features). To solve for these + * features, we run a given number of iterations of ALS. This is done using a level of + * parallelism given by `blocks`. + * + * @param ratings RDD of (userID, productID, rating) pairs + * @param ranknumber of features to use + * @param iterations number of iterations of ALS (recommended: 10-20) + * @param lambda regularization factor (recommended: 0.01) + * @param blocks level of parallelism to split computation into + * @param nonnegative whether to enforce nonnegativity + */ + def train( + ratings: RDD[Rating], + rank: Int, + iterations: Int, + lambda: Double, + blocks: Int, + nonnegative: Boolean +): MatrixFactorizationModel = { +(new ALS(blocks, blocks, rank, iterations, lambda, false, 1.0) + .setNonnegative(nonnegative).run(ratings)) --- End diff -- I'm changing that now, as well as exposing the seed from python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-61868206 Okay. This should be ready for another look now. For the seed stuff, I decided the best way was to allow the scala code to use System.nanoTime itself the way that it does when no seed is passed into the scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19917238 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + seed: Long, --- End diff -- We could use java.lang.Long for seed, then seed can be null (means not specified), then we do not need another stub. Also, putting `seed` in the end may be better (`seed` is optional). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19917423 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -685,6 +685,34 @@ object ALS { * @param iterations number of iterations of ALS (recommended: 10-20) * @param lambda regularization factor (recommended: 0.01) * @param blocks level of parallelism to split computation into + * @param seed random seed + * @param nonnegative whether to enforce nonnegativity + */ + def train( + ratings: RDD[Rating], + rank: Int, + iterations: Int, + lambda: Double, + blocks: Int, + seed: Long, + nonnegative: Boolean +): MatrixFactorizationModel = { --- End diff -- @mengxr, there are too many API here, can we simplify these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19917467 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + seed: Long, --- End diff -- I'll move it to the end. The issue is that it would be nice to preserve the nanosecond level time seed that scala uses, but I couldn't find a way to do that directly in python and just pass it through. So I created the extra stub to let scala do it itself. On Nov 5, 2014, at 6:06 PM, Davies Liu notificati...@github.com wrote: In mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala: @@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + seed: Long, We could use java.lang.Long for seed, then seed can be null (means not specified), then we do not need another stub. Also, putting seed in the end may be better (seed is optional). â Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19919042 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala --- @@ -278,8 +278,28 @@ class PythonMLLibAPI extends Serializable { rank: Int, iterations: Int, lambda: Double, - blocks: Int): MatrixFactorizationModel = { -new MatrixFactorizationModelWrapper(ALS.train(ratings.rdd, rank, iterations, lambda, blocks)) + blocks: Int, + seed: Long, --- End diff -- If the `seed` could be null, then we could use nanosecond for it in Scala. ``` if (seed == null) { ... } else { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
GitHub user mdagost opened a pull request: https://github.com/apache/spark/pull/3095 [MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API SPARK-1553 added alternating nonnegative least squares to MLLib, however it's not possible to access it via the python API. This pull request resolves that. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mdagost/spark python_nmf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3095 commit a72fdc9ce4bfc5f075739140237f10c7150311f8 Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Date: 2014-11-04T17:41:57Z Expose nonnegative ALS in the python API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3095#issuecomment-61705249 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19844392 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper): latents = first_product[1] len(latents) == 4 True + + model = ALS.train(ratings, 1, nonnegative=True) + model.predict(2,2) is not None +True --- End diff -- It's better to show the result of predict() here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user mdagost commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19844686 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper): latents = first_product[1] len(latents) == 4 True + + model = ALS.train(ratings, 1, nonnegative=True) + model.predict(2,2) is not None +True --- End diff -- I assumed it would be non-deterministic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19846425 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -69,6 +69,14 @@ class MatrixFactorizationModel(JavaModelWrapper): latents = first_product[1] len(latents) == 4 True + + model = ALS.train(ratings, 1, nonnegative=True) + model.predict(2,2) is not None +True --- End diff -- The non-deterministic could be removed by have a fixed `seed`, but right now we can not set seed in Python (it will be great if you could also fix it). How about these: ``` r1 = (1, 1, 1.0) r2 = (1, 2, 2.0) r3 = (2, 1, 2.0) ratings = sc.parallelize([r1, r2, r3]) model = ALS.trainImplicit(ratings, 1) model.predict(2,2) 0.4473... testset = sc.parallelize([(1, 2), (1, 1)]) model = ALS.train(ratings, 1) model.predictAll(testset).collect() [Rating(1, 1, 1), Rating(1, 2, 1)] model = ALS.train(ratings, 4) model.userFeatures().collect() [(2, array('d', [...])), (1, array('d', [...]))] model.productFeatures().collect() [(2, array('d', [...])), (1, array('d', [...]))] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [PYTHON] SPARK-4221: Expose nonnegativ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3095#discussion_r19846471 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -700,6 +700,32 @@ object ALS { * Train a matrix factorization model given an RDD of ratings given by users to some products, * in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the * product of two lower-rank matrices of a given rank (number of features). To solve for these + * features, we run a given number of iterations of ALS. This is done using a level of + * parallelism given by `blocks`. + * + * @param ratings RDD of (userID, productID, rating) pairs + * @param ranknumber of features to use + * @param iterations number of iterations of ALS (recommended: 10-20) + * @param lambda regularization factor (recommended: 0.01) + * @param blocks level of parallelism to split computation into + * @param nonnegative whether to enforce nonnegativity + */ + def train( + ratings: RDD[Rating], + rank: Int, + iterations: Int, + lambda: Double, + blocks: Int, + nonnegative: Boolean +): MatrixFactorizationModel = { +(new ALS(blocks, blocks, rank, iterations, lambda, false, 1.0) + .setNonnegative(nonnegative).run(ratings)) --- End diff -- NonNegative and seed can not be set in same time, do we need to fix this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org