GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/14725
[SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper convenience function
to create py4j JavaArrays
## What changes were proposed in this pull request?
Adding convenience functions to Python `JavaWrapper` so that it is easy to
create a py4j JavaArray that is compatible with current class constructors that
have a Scala `Array` as input.
Two functions are added here, one for primitive data types that will check
the type of the Python List and automatically create the right JavaArray type,
and one that takes the Java class as input to allow for custom classes to be
made into a JavaArray.
Usage in actual ML classes would be similar to below
```
class CountVectorizerModel():
def __init__(self, vocab):
jvocab = CountVectorizerModel._new_java_primitive_array(vocab)
model = CountVectorizerModel._create_from_java_class(
"org.apache.spark.ml.feature.CountVectorizerModel", jvocab)
return model
...
cvm - CountVectorizerModel(["a", "b", "c"])
```
## How was this patch tested?
Added unit tests for new functionality and tested constructing a
CountVectorizerModel from a list of vocab strings.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BryanCutler/spark
pyspark-new_java_array-CountVectorizer-SPARK-17161
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14725.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14725
----
commit 2a8de605f0dfe1a1baf3748602ad10c06476198d
Author: Bryan Cutler <[email protected]>
Date: 2016-07-14T17:16:03Z
testing out _new_java_array
commit 97bff0753f8b94ead97d68206268b5ba58abab6c
Author: Bryan Cutler <[email protected]>
Date: 2016-08-19T17:21:39Z
Merge remote-tracking branch 'upstream/master' into
wip-pyspark-new_java_array-CountVectorizer
commit 4766cdcdd6bd10e9e48212c1513dceb6684663c2
Author: Bryan Cutler <[email protected]>
Date: 2016-08-19T23:14:48Z
undo changes to CountVectorizerModel used for testing
commit 1c0ddb92e32470e77fe2b7cfa675eb1c908bc713
Author: Bryan Cutler <[email protected]>
Date: 2016-08-19T23:15:56Z
added convienience functions to JavaWrapper to create py4j JavaArray
commit f9672bfe34b1b5f5ea14700d2aaaee055f5323f8
Author: Bryan Cutler <[email protected]>
Date: 2016-08-19T23:20:16Z
fixed style checks and tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]