GitHub user travishegner opened a pull request:
https://github.com/apache/spark/pull/19191
[SPARK-21958][ML] Word2VecModel save: transform data in the cluster
## What changes were proposed in this pull request?
Change a data transformation while saving a Word2VecModel to happen with
distributed data instead of local driver data.
## How was this patch tested?
Unit tests for the ML sub-component still passes.
Running this patch against v2.2.0 in a fully distributed production cluster
allows a 4.0G model to save and load correctly, where it would not do so
without the patch.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/travishegner/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19191.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19191
----
commit 5f4ce997f6f30cd0d59bc2e2f4396f495c3c0fd8
Author: Travis Hegner <[email protected]>
Date: 2017-09-08T14:51:53Z
[SPARK-21958][ML] Word2VecModel save: transform data in the cluster
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]