GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/1731
[SPARK-1580][MLLIB] Estimate ALS communication and computation costs.
Continue the work from #493.
Closes #493 and Closes #593
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark tmyklebu-alscost
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1731.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1731
----
commit c774d7d4bff91c9387d059d1189799fa0ff1f4b0
Author: Tor Myklebust <[email protected]>
Date: 2014-04-14T22:01:18Z
Make the partitioner a member variable and use it instead of modding
directly.
commit c90b6d8e91f86cf89adf28de6f9185647c87e5c8
Author: Tor Myklebust <[email protected]>
Date: 2014-04-14T22:10:30Z
Scramble user and product ids before bucketing.
commit df27697649de50d364c42c76aebaebb34cbe87e2
Author: Tor Myklebust <[email protected]>
Date: 2014-04-15T19:47:17Z
Support custom partitioners. Currently we use the same partitioner for
users and products.
commit d872b098d41c4fc088e579e8fe199aca149bca64
Author: Tor Myklebust <[email protected]>
Date: 2014-04-16T12:19:48Z
Add negative id ALS test.
commit 36a0f43519a1e8ea800b960157f8c8b050139105
Author: Tor Myklebust <[email protected]>
Date: 2014-04-16T16:42:31Z
Make the partitioner private.
commit 5ec9e6cd237c4ac7c1b597614c880ae75bacceee
Author: Tor Myklebust <[email protected]>
Date: 2014-04-16T17:00:39Z
Clean a couple of things up using 'map'.
commit f8413451c807282100a9be506ca2c992abb81918
Author: Tor Myklebust <[email protected]>
Date: 2014-04-16T17:12:47Z
Fix daft bug creating 'pairs', also for -> foreach.
commit 40edc235e59aab56d6f65c73ffe98859c78a889b
Author: Tor Myklebust <[email protected]>
Date: 2014-04-16T18:14:38Z
Fix missing space.
commit 674933abb7a373dc1c913467d668bad9045e560f
Author: Tor Myklebust <[email protected]>
Date: 2014-04-19T23:36:52Z
Fix style.
commit 495784f2a172957ab490e0f77ea504c0179ab798
Author: Tor Myklebust <[email protected]>
Date: 2014-04-19T23:41:23Z
Merge branch 'master' of https://github.com/apache/spark
commit 23d6f91b52c88b7006ec78496f777b72b1881bb4
Author: Tor Myklebust <[email protected]>
Date: 2014-04-21T00:06:19Z
Stop making the partitioner configurable.
commit dcf583ac4001c6da8d6b85e45e88043861a351d8
Author: Tor Myklebust <[email protected]>
Date: 2014-04-21T19:56:48Z
Remove the partitioner member variable; instead, thread that needle
everywhere it needs to go.
commit 657a71b143103f9b37ed31976a2f4346bdbe4e7c
Author: Tor Myklebust <[email protected]>
Date: 2014-04-22T22:45:25Z
Simple-minded estimates of computation and communication costs in ALS.
commit a1184d123516fb7165f87e373ee4f70b65a1481a
Author: Tor Myklebust <[email protected]>
Date: 2014-04-23T01:37:15Z
Mark ALS.evaluatePartitioner DeveloperApi.
commit 6c31324a96d716949ab57fe2e9773476f6caa07a
Author: Tor Myklebust <[email protected]>
Date: 2014-04-23T01:38:25Z
Make it actually build...
commit 5530678330bfb3a790c6b54ffcc26fcfa936a8ff
Author: Tor Myklebust <[email protected]>
Date: 2014-04-23T01:41:46Z
Merge branch 'master' of https://github.com/apache/spark into alscost
commit 6615ed56f3c6109c22f151220a52080182285039
Author: Tor Myklebust <[email protected]>
Date: 2014-04-23T19:00:46Z
It's more useful to give per-partition estimates. Do that.
commit 8cbebf1037f5e0d0035b26c07db3a5e6d77ee08c
Author: Tor Myklebust <[email protected]>
Date: 2014-04-25T23:17:30Z
Rename and clean up the return format of cost estimator.
commit 8b21e6d98bc4d7b32d8a32cd191d8846b1268106
Author: Tor Myklebust <[email protected]>
Date: 2014-04-26T15:15:08Z
Fix overlong lines.
commit 2ab7a5d0d5c46def977ad2163a630f4e107659d5
Author: Tor Myklebust <[email protected]>
Date: 2014-04-29T06:12:23Z
Reindent estimateCost's declaration and make it return Seqs.
commit 2b2febe93a250e780235ccfd396001c3273fb4d0
Author: Tor Myklebust <[email protected]>
Date: 2014-05-01T03:51:14Z
Use `makeLinkRDDs` when estimating costs.
commit 0455cd455b2ad31f67cd82827df3b2204fb71414
Author: Tor Myklebust <[email protected]>
Date: 2014-05-01T23:19:37Z
Parens for collectAsMap.
commit 8cbb7185e4440459a725d04e4292ec1f015bfff8
Author: Tor Myklebust <[email protected]>
Date: 2014-05-01T23:27:24Z
Braces get spaces.
commit 217bd1d70a92fa8680e17b3ea4255d4cacad33ae
Author: Tor Myklebust <[email protected]>
Date: 2014-05-01T23:34:34Z
Documentation and choleskies -> subproblems.
commit 68a3229a33a5324a8cc57e08f536b4d0ee25cbc1
Author: Xiangrui Meng <[email protected]>
Date: 2014-08-02T01:27:40Z
merge master
commit 9b56a8bdee87698393a16ca4e4bfdebfd315c9fd
Author: Xiangrui Meng <[email protected]>
Date: 2014-08-02T02:45:13Z
updated API and added a simple test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---