GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/13662
[SPARK-15945] [MLLIB] Conversion between old/new vector columns in a
DataFrame (Scala/Java)
## What changes were proposed in this pull request?
This PR provides conversion utils between old/new vector columns in a
DataFrame. So users can use it to migrate their datasets and pipelines
manually. The methods are implemented under `MLUtils` and called
`convertVectorColumnsToML` and `convertVectorColumnsFromML`. Both take a
DataFrame and a list of vector columns to be converted. It is a no-op on vector
columns that are already converted. A warning message is logged if actual
conversion happens.
## How was this patch tested?
Unit tests in Scala and Java.
cc: @yanboliang
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-15945
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13662.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13662
----
commit e45fa3f270fafda41ad4d55f7fb63edda75197fc
Author: Xiangrui Meng <[email protected]>
Date: 2016-06-14T05:12:48Z
add convertOldVectorColumnToNew
commit 74f95336f71c96220f5c9c7d40985098f0f646e6
Author: Xiangrui Meng <[email protected]>
Date: 2016-06-14T06:04:18Z
add converters
commit faf45e01cc89e2e0f5e00cab7381473934fc311c
Author: Xiangrui Meng <[email protected]>
Date: 2016-06-14T06:19:18Z
add warning messages
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]