srowen commented on a change in pull request #23741:
[SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer
URL: https://github.com/apache/spark/pull/23741#discussion_r257847058
##########
File path: python/pyspark/ml/wrapper.py
##########
@@ -87,9 +88,22 @@ def _new_java_array(pylist, java_class):
- bool -> sc._gateway.jvm.java.lang.Boolean
"""
sc = SparkContext._active_spark_context
- java_array = sc._gateway.new_array(java_class, len(pylist))
- for i in xrange(len(pylist)):
- java_array[i] = pylist[i]
+ java_array = None
+ if len(pylist) > 0 and isinstance(pylist[0], list):
+ # If pylist is a 2D array, then a 2D java array will be created.
+ # Currently, this is only used by
StringIndexerModel.from_arrays_of_labels
+ inner_array_length = 0
+ for i in xrange(len(pylist)):
+ inner_array_length = max(inner_array_length, len(pylist[i]))
+ java_array = sc._gateway.new_array(java_class, len(pylist),
inner_array_length)
+ for i in xrange(len(pylist)):
+ java_array[i] = sc._gateway.new_array(java_class,
len(pylist[i]))
Review comment:
The call to `new_array` above is already allocating a 2D array right?
Then you don't need to (don't want to) replace each dimension with a new
array, unless I'm misunderstanding the semantics.
If the intent is to _not_ create a jagged array but a 2D array big enough
for all elements, then I think this line isn't needed. If you do want to create
a jagged array, I think you want to allocate a 1D array of 1D arrays above and
then fill them in here.
I'd personally say create a square, non-jagged 2D array, and document that
(or even assert this in the function call here too). But I could see it either
way.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]