GitHub user Ishiihara opened a pull request:
https://github.com/apache/spark/pull/1871
[SPARK-2907] [MLlib] Use mutable.HashMap to represent model in Word2Vec
Change list:
1. Used mutable.HashMap to represent syn0Global and syn1Global to reduce
shuffle size.
2. Introduced local vocabulary to perform more precise learning rate
update.
3. Replace layer1Size with vectorSize to correctly set vector size.
Previously, layer1Size was always the default value of vectorSize.
For 100 partitions, using mutable.HashMap reduces shuffle size from 8.1G
to 4G.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Ishiihara/spark Word2Vec-improve
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1871.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1871
----
commit 8d6befe21e26cc843fc96e4c2934a15c0797ce51
Author: Liquan Pei <[email protected]>
Date: 2014-08-01T07:45:22Z
initial commit
commit 0aafb1b02a19fe4f1689543baf1882a49a7ff11a
Author: Liquan Pei <[email protected]>
Date: 2014-08-01T15:34:11Z
Add comments, minor fixes
commit e4a04d32be284f9a7ab2d3f57d745342912930a7
Author: Liquan Pei <[email protected]>
Date: 2014-08-01T15:46:38Z
minor fix
commit 57dc50d3f24beda8eb0348c0baf8dc343065fd2d
Author: Liquan Pei <[email protected]>
Date: 2014-08-01T16:20:10Z
code formatting
commit 2e92b5991ad8f3f73bbeab9a056f452c4b532b3c
Author: Liquan Pei <[email protected]>
Date: 2014-08-02T01:17:38Z
modify according to feedback
commit 720b5a3ea697a881fc7d7c286b65ef110421f89e
Author: Liquan Pei <[email protected]>
Date: 2014-08-02T05:53:03Z
Add test for Word2Vec algorithm, minor fixes
commit 6bcc8be34f6253bc7d4f9d4dcb478bf91f108c86
Author: Liquan Pei <[email protected]>
Date: 2014-08-03T18:15:09Z
add multiple iteration support
commit 7efbb6f91ca94f9243dbb7a16ea3fc9b6f548b99
Author: Liquan Pei <[email protected]>
Date: 2014-08-03T19:16:19Z
use broadcast version of vocab in aggregate
commit 1a8fb4127b9433945e75beea16fc2d485a249219
Author: Liquan Pei <[email protected]>
Date: 2014-08-03T23:24:35Z
use weighted sum in combOp
commit e93e7263d74879379257e6fff40d5efc8417f2ce
Author: Liquan Pei <[email protected]>
Date: 2014-08-04T03:53:21Z
use treeAggregate instead of aggregate
commit 384c77185544d6f80de96bd366e19760eacbd936
Author: Xiangrui Meng <[email protected]>
Date: 2014-08-04T04:33:05Z
remove minCount and window from constructor
change model to use float instead of double
commit c14da411d4da1b6553759afff7952ac746c9fa15
Author: Xiangrui Meng <[email protected]>
Date: 2014-08-04T05:09:58Z
fix styles
commit 26a948d7e4b8f8cbc91cc7db5cf0acc7d6f08131
Author: Liquan Pei <[email protected]>
Date: 2014-08-04T05:15:27Z
Merge pull request #1 from mengxr/Ishiihara-master
some updates
commit e2484414d65c3b8aebffa79c3cac34452cf53d38
Author: Liquan Pei <[email protected]>
Date: 2014-08-04T05:47:53Z
minor style change
commit 2ba948384e96e79e95a529f032d4768f24236547
Author: Liquan Pei <[email protected]>
Date: 2014-08-04T05:59:40Z
minor fix for Word2Vec test
commit 74b647b3edb87212c57cf6c5e77d627b0aebb67f
Author: Liquan Pei <[email protected]>
Date: 2014-08-07T00:28:53Z
confict resolution
commit e73fd4c8688cc7bbbf49fa68456fb1c83a29d0e6
Author: Liquan Pei <[email protected]>
Date: 2014-08-10T03:44:15Z
Merge remote-tracking branch 'upstream/master'
commit a8ccea59e65708d1be708a602369084b90c6fc49
Author: Liquan Pei <[email protected]>
Date: 2014-08-10T04:44:17Z
use mutable.HashMap to represent model
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]