GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/12368
[SPARK-14605][ML][PYTHON] Changed Python to use unicode UIDs for spark.ml
Identifiable
## What changes were proposed in this pull request?
Python spark.ml Identifiable classes use UIDs of type str, but they should
use unicode (in Python 2.x) to match Java. This could be a problem if someone
created a class in Java with odd unicode characters, saved it, and loaded it in
Python.
This PR: Use unicode everywhere in Python.
## How was this patch tested?
Updated persistence unit test to check uid type
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark python-uid-unicode
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12368.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12368
----
commit 369d47698fbffdcd2acfe86e49bc51e4e5dd0e26
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-13T19:14:00Z
Changed Python to use unicode UIDs for spark.ml Identifiable classes
commit b4ac68eb65965e9444a94878eebc5ac7ce8995d7
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-13T20:06:58Z
added unit test which failed before fix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]