GitHub user holdenk opened a pull request:
https://github.com/apache/spark/pull/9581
[SPARK-7675][ML][PYSpark] sparkml params type conversion
From JIRA:
Currently, PySpark wrappers for spark.ml Scala classes are brittle when
accepting Param types. E.g., Normalizer's "p" param cannot be set to "2" (an
integer); it must be set to "2.0" (a float). Fixing this is not trivial since
there does not appear to be a natural place to insert the conversion before
Python wrappers call Java's Params setter method.
A possible fix will be to include a method "_checkType" to PySpark's Param
class which checks the type, prints an error if needed, and converts types when
relevant (e.g., int to float, or scipy matrix to array). The Java wrapper
method which copies params to Scala can call this method when available.
This fix instead checks the types at set time since I think failing sooner
is better, but I can switch it around to check at copy time if that would be
better. So far this only converts int to float and other conversions (like
scipymatrix to array) are left for the future.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/holdenk/spark
SPARK-7675-PySpark-sparkml-Params-type-conversion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9581.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9581
----
commit ff688c06fbcde6fa5f050e0c1d4a857b2323e0da
Author: Holden Karau <[email protected]>
Date: 2015-11-06T19:48:37Z
Start work on adding some basic type information so we can handle ints
showing up and convert them to floats for ml's params
commit 2c4aea51c2f2ab04b8b5288c722a81e826dadc82
Author: Holden Karau <[email protected]>
Date: 2015-11-06T19:54:31Z
Explicitly specify no default params and take types of of decission tree
params for now
commit c6a819adbb884fcb3d258e905ff927b6b0d51fa9
Author: Holden Karau <[email protected]>
Date: 2015-11-06T21:29:53Z
re-generate and fix how we were formatting the type names
commit b22b11233277e89c51692958c58a9735e043b483
Author: Holden Karau <[email protected]>
Date: 2015-11-08T06:11:42Z
Merge branch 'master' into SPARK-7675-PySpark-sparkml-Params-type-conversion
commit 26cda87d64ad0c92772f41938200691c28f7e1b2
Author: Holden Karau <[email protected]>
Date: 2015-11-09T23:04:41Z
Update a bit
commit d35d6dbf2225e346f2191cfe92552c4d77fb7d95
Author: Holden Karau <[email protected]>
Date: 2015-11-09T23:06:20Z
Merge branch 'master' into SPARK-7675-PySpark-sparkml-Params-type-conversion
commit fd876c2c91d398ff4afcb4fffe7a1120e9999582
Author: Holden Karau <[email protected]>
Date: 2015-11-10T01:19:47Z
Some quick progress
commit cc1ad2dec5e21a079b004b9b683ee5dc850b4c11
Author: Holden Karau <[email protected]>
Date: 2015-11-10T01:33:06Z
Switch to strings
commit 9138fba068f2eac34419f4e9d95fdf47fc6d72ab
Author: Holden Karau <[email protected]>
Date: 2015-11-10T01:38:48Z
pep8 fixes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]