from:"jtengyp"

[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...

2017-05-14 Thread jtengyp

Github user jtengyp commented on the issue: https://github.com/apache/spark/pull/17936 I think you@ConeyLiu should directly test the Cartesian phase with the following patch. val user = model.userFeatures val item = model.productFeatures val start = System.nanoTime

[GitHub] spark pull request #17898: [SPARK-20638][Core]Optimize the CartesianRDD to r...

2017-05-14 Thread jtengyp

Github user jtengyp closed the pull request at: https://github.com/apache/spark/pull/17898 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #17898: Optimize the CartesianRDD to reduce repeatedly data fetc...

2017-05-08 Thread jtengyp

Github user jtengyp commented on the issue: https://github.com/apache/spark/pull/17898 Here is my test: Environment : 3 workers, each has 10 cores, 30G memory, 1 executor Test data : users : 480,189, each is a 10-dim vector, and items : 17770, each is a 10-dim vector. With

[GitHub] spark pull request #17898: Optimize the CartesianRDD

2017-05-08 Thread jtengyp

Github user jtengyp commented on a diff in the pull request: https://github.com/apache/spark/pull/17898#discussion_r115199537 --- Diff: core/src/main/scala/org/apache/spark/rdd/CartesianRDD.scala --- @@ -72,8 +72,10 @@ class CartesianRDD[T: ClassTag, U: ClassTag

[GitHub] spark pull request #17898: Update CartesianRDD.scala

2017-05-08 Thread jtengyp

GitHub user jtengyp opened a pull request: https://github.com/apache/spark/pull/17898 Update CartesianRDD.scala In compute, group each iterator to multiple groups, reducing repeatedly data fetching. ## What changes were proposed in this pull request? In compute

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-27 Thread jtengyp

Github user jtengyp commented on the issue: https://github.com/apache/spark/pull/17742 I did some tests with the PR. Here is the cluster configure: 3 workers, each has 10 cores and 30G memory. With the netflix dataset (480,189 users and 17770 movies), the

[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...

[GitHub] spark pull request #17898: [SPARK-20638][Core]Optimize the CartesianRDD to r...

[GitHub] spark issue #17898: Optimize the CartesianRDD to reduce repeatedly data fetc...

[GitHub] spark pull request #17898: Optimize the CartesianRDD

[GitHub] spark pull request #17898: Update CartesianRDD.scala

[GitHub] spark issue #17742: [Spark-11968][ML][MLLIB]Optimize MLLIB ALS recommendForA...

6 matches

Site Navigation

Mail list logo

Footer information