Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49131064
Besides breaking the API, I'm also worried about two things:
1. The increase in storage. We had some discussion before v1.0 about
whether we should switch to long
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49149396
Let me close this PR for now. I will fork or wrap as necessary. Keep it in
mind, and maybe in a 2.x release this can be revisited. (Matei I ran into more
problems with
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49201924
Sean, I'd still be okay with adding a LongALS class if you see benefit for
it in some use cases. Let's just see how it works in comparison.
---
If your project is set up
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49203419
@mateiz I think it would mean mostly cloning `ALS.scala`, as the `Rating`
object is woven throughout. Probably some large chunks could be refactored and
shared. Is that
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49210917
BTW this could also be a place to use the dreaded Scala @specialized
annotation to template the code for Ints vs Longs, though as far as I know
that's being deprecated by
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49210831
Yeah, that's what I meant, we can clone it at first but we might be able to
share code later (at least the math code we run on each block, or stuff like
that). But let's
Github user srowen closed the pull request at:
https://github.com/apache/spark/pull/1393
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48999523
I am trying to figure out why it happend, this might not be my conclusion
but at the moment I feel that since this class has a private [mllib]
constructor, there is
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48999631
And to my surprise also has
`org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict` not
sure why it has that.
---
If your project is set up for it,
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49000211
Ahh understood (please ignore my previous theory.), so it happened because
we have a function which is `@developerApi` in the same class with same name.
So this was
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1393#discussion_r14926032
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49013972
Yes you could also tell callers to track their own user-ID mapping and
maintain it consistently everywhere. Callers have to share that state then
somehow. Hashing is
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49072664
Yeah API stability is very important. I keep banging on about the flip-side
-- freezing an API that may still need to change. You get a different important
problem. I'm
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49099667
QA tests have started for PR 1393. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16694/consoleFull
---
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49104938
Wise words Matei! anyway here's another cut that preserves the original
API. Tests are still running. Up to you guys' judgment on whether it's
worthwhile.
---
If your
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1393#discussion_r14972737
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1393#discussion_r14973020
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -53,15 +53,15 @@ private[recommendation] case class
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1393#discussion_r14973237
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -53,15 +53,15 @@ private[recommendation] case class
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49106922
@mateiz yeah it was `@Experimental` but if you want to preserve the API of
this class I can only imagine it would have a `Long` field internally but still
also expose
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49107533
QA results for PR 1393:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brcase class
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-49110420
Yeah, this is tricky. Maybe we need to create a LongRating class, even
though it's ugly. @mengxr what do you think? I do think we'd like to have
support for Longs here.
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1393#discussion_r14916107
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/1393#discussion_r14916135
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -36,10 +36,10 @@ import
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48987125
I'm somewhat worried about this because it seems difficult to do without
breaking API changes. Do users really want to rely on luck to avoid collisions?
It might be
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1393
SPARK-2465. Use long as user / item ID for ALS
I'd like to float this for consideration: use longs instead of ints for
user and product IDs in the ALS implementation.
The main reason for is
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48842883
QA tests have started for PR 1393. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16603/consoleFull
---
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48843123
The overall increase how much memory? Have a detailed contrast?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48843270
I think the most significant change is the Rating object. It goes from 8 +
(ref) + 8 (object) + 4 (int) + 4 (int) + 8 (double) = 32 bytes to 8 (ref) + 8
(object) + 4
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48845420
QA results for PR 1393:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brcase class
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48846073
QA tests have started for PR 1393. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16604/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48848503
QA results for PR 1393:br- This patch FAILED unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brcase class
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48850704
QA tests have started for PR 1393. This patch merges cleanly. brView
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16607/consoleFull
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48852700
QA results for PR 1393:br- This patch PASSES unit tests.br- This patch
merges cleanlybr- This patch adds the following public classes
(experimental):brcase class
33 matches
Mail list logo