[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49131064 Besides breaking the API, I'm also worried about two things: 1. The increase in storage. We had some discussion before v1.0 about whether we should switch to long

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49149396 Let me close this PR for now. I will fork or wrap as necessary. Keep it in mind, and maybe in a 2.x release this can be revisited. (Matei I ran into more problems with

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49201924 Sean, I'd still be okay with adding a LongALS class if you see benefit for it in some use cases. Let's just see how it works in comparison. --- If your project is set up

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49203419 @mateiz I think it would mean mostly cloning `ALS.scala`, as the `Rating` object is woven throughout. Probably some large chunks could be refactored and shared. Is that

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49210917 BTW this could also be a place to use the dreaded Scala @specialized annotation to template the code for Ints vs Longs, though as far as I know that's being deprecated by

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49210831 Yeah, that's what I meant, we can clone it at first but we might be able to share code later (at least the math code we run on each block, or stuff like that). But let's

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/1393 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48999523 I am trying to figure out why it happend, this might not be my conclusion but at the moment I feel that since this class has a private [mllib] constructor, there is

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48999631 And to my surprise also has `org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict` not sure why it has that. --- If your project is set up for it,

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49000211 Ahh understood (please ignore my previous theory.), so it happened because we have a function which is `@developerApi` in the same class with same name. So this was

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14926032 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -53,7 +53,7 @@ class MatrixFactorizationModel

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49013972 Yes you could also tell callers to track their own user-ID mapping and maintain it consistently everywhere. Callers have to share that state then somehow. Hashing is

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49072664 Yeah API stability is very important. I keep banging on about the flip-side -- freezing an API that may still need to change. You get a different important problem. I'm

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49099667 QA tests have started for PR 1393. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16694/consoleFull ---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49104938 Wise words Matei! anyway here's another cut that preserves the original API. Tests are still running. Up to you guys' judgment on whether it's worthwhile. --- If your

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14972737 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -53,7 +53,7 @@ class MatrixFactorizationModel

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14973020 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -53,15 +53,15 @@ private[recommendation] case class

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14973237 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala --- @@ -53,15 +53,15 @@ private[recommendation] case class

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49106922 @mateiz yeah it was `@Experimental` but if you want to preserve the API of this class I can only imagine it would have a `Long` field internally but still also expose

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49107533 QA results for PR 1393:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49110420 Yeah, this is tricky. Maybe we need to create a LongRating class, even though it's ugly. @mengxr what do you think? I do think we'd like to have support for Longs here.

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14916107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -53,7 +53,7 @@ class MatrixFactorizationModel

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1393#discussion_r14916135 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -36,10 +36,10 @@ import

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48987125 I'm somewhat worried about this because it seems difficult to do without breaking API changes. Do users really want to rely on luck to avoid collisions? It might be

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1393 SPARK-2465. Use long as user / item ID for ALS I'd like to float this for consideration: use longs instead of ints for user and product IDs in the ALS implementation. The main reason for is

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48842883 QA tests have started for PR 1393. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16603/consoleFull ---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread witgo
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48843123 The overall increase how much memory? Have a detailed contrast? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48843270 I think the most significant change is the Rating object. It goes from 8 + (ref) + 8 (object) + 4 (int) + 4 (int) + 8 (double) = 32 bytes to 8 (ref) + 8 (object) + 4

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48845420 QA results for PR 1393:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48846073 QA tests have started for PR 1393. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16604/consoleFull ---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48848503 QA results for PR 1393:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48850704 QA tests have started for PR 1393. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16607/consoleFull ---

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48852700 QA results for PR 1393:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class