Github user jtengyp commented on the issue:
https://github.com/apache/spark/pull/17936
I think you@ConeyLiu should directly test the Cartesian phase with the
following patch.
val user = model.userFeatures
val item = model.productFeatures
val start = System.nanoTime
Github user jtengyp closed the pull request at:
https://github.com/apache/spark/pull/17898
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jtengyp commented on the issue:
https://github.com/apache/spark/pull/17898
Here is my test:
Environment : 3 workers, each has 10 cores, 30G memory, 1 executor
Test data : users : 480,189, each is a 10-dim vector, and items : 17770,
each is a 10-dim vector.
With
Github user jtengyp commented on a diff in the pull request:
https://github.com/apache/spark/pull/17898#discussion_r115199537
--- Diff: core/src/main/scala/org/apache/spark/rdd/CartesianRDD.scala ---
@@ -72,8 +72,10 @@ class CartesianRDD[T: ClassTag, U: ClassTag
GitHub user jtengyp opened a pull request:
https://github.com/apache/spark/pull/17898
Update CartesianRDD.scala
In compute, group each iterator to multiple groups, reducing repeatedly
data fetching.
## What changes were proposed in this pull request?
In compute
Github user jtengyp commented on the issue:
https://github.com/apache/spark/pull/17742
I did some tests with the PR.
Here is the cluster configure:
3 workers, each has 10 cores and 30G memory.
With the netflix dataset (480,189 users and 17770 movies), the