GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/9581

    [SPARK-7675][ML][PYSpark] sparkml params type conversion

    From JIRA:
    Currently, PySpark wrappers for spark.ml Scala classes are brittle when 
accepting Param types. E.g., Normalizer's "p" param cannot be set to "2" (an 
integer); it must be set to "2.0" (a float). Fixing this is not trivial since 
there does not appear to be a natural place to insert the conversion before 
Python wrappers call Java's Params setter method.
    
    
    A possible fix will be to include a method "_checkType" to PySpark's Param 
class which checks the type, prints an error if needed, and converts types when 
relevant (e.g., int to float, or scipy matrix to array). The Java wrapper 
method which copies params to Scala can call this method when available.
    
    This fix instead checks the types at set time since I think failing sooner 
is better, but I can switch it around to check at copy time if that would be 
better. So far this only converts int to float and other conversions (like 
scipymatrix to array) are left for the future.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark 
SPARK-7675-PySpark-sparkml-Params-type-conversion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9581
    
----
commit ff688c06fbcde6fa5f050e0c1d4a857b2323e0da
Author: Holden Karau <[email protected]>
Date:   2015-11-06T19:48:37Z

    Start work on adding some basic type information so we can handle ints 
showing up and convert them to floats for ml's params

commit 2c4aea51c2f2ab04b8b5288c722a81e826dadc82
Author: Holden Karau <[email protected]>
Date:   2015-11-06T19:54:31Z

    Explicitly specify no default params and take types of of decission tree 
params for now

commit c6a819adbb884fcb3d258e905ff927b6b0d51fa9
Author: Holden Karau <[email protected]>
Date:   2015-11-06T21:29:53Z

    re-generate and fix how we were formatting the type names

commit b22b11233277e89c51692958c58a9735e043b483
Author: Holden Karau <[email protected]>
Date:   2015-11-08T06:11:42Z

    Merge branch 'master' into SPARK-7675-PySpark-sparkml-Params-type-conversion

commit 26cda87d64ad0c92772f41938200691c28f7e1b2
Author: Holden Karau <[email protected]>
Date:   2015-11-09T23:04:41Z

    Update a bit

commit d35d6dbf2225e346f2191cfe92552c4d77fb7d95
Author: Holden Karau <[email protected]>
Date:   2015-11-09T23:06:20Z

    Merge branch 'master' into SPARK-7675-PySpark-sparkml-Params-type-conversion

commit fd876c2c91d398ff4afcb4fffe7a1120e9999582
Author: Holden Karau <[email protected]>
Date:   2015-11-10T01:19:47Z

    Some quick progress

commit cc1ad2dec5e21a079b004b9b683ee5dc850b4c11
Author: Holden Karau <[email protected]>
Date:   2015-11-10T01:33:06Z

    Switch to strings

commit 9138fba068f2eac34419f4e9d95fdf47fc6d72ab
Author: Holden Karau <[email protected]>
Date:   2015-11-10T01:38:48Z

    pep8 fixes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to