[GitHub] holdenk commented on a change in pull request #23741: [SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer

GitBox Fri, 08 Feb 2019 10:02:24 -0800

holdenk commented on a change in pull request #23741: 
[SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer
URL: https://github.com/apache/spark/pull/23741#discussion_r255176175


 ##########
 File path: python/pyspark/ml/wrapper.py
 ##########
 @@ -87,9 +87,19 @@ def _new_java_array(pylist, java_class):
           - bool -> sc._gateway.jvm.java.lang.Boolean
         """
         sc = SparkContext._active_spark_context
-        java_array = sc._gateway.new_array(java_class, len(pylist))
-        for i in xrange(len(pylist)):
-            java_array[i] = pylist[i]
+        java_array = None
+        if len(pylist) > 0 and isinstance(pylist[0], list):
+            inner_array_length = 0
+            for i in xrange(len(pylist)):
+                inner_array_length = max(inner_array_length, len(pylist[i]))
 
 Review comment:
   Would it make sense to just recursively call into `_new_java_array` to make 
the array of arrays? I'm not sure, I haven't done much with 2D arrays and Py4J 
so just thinking out-loud here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] holdenk commented on a change in pull request #23741: [SPARK-22798][PYTHON][ML]Add multiple column support to PySpark StringIndexer

Reply via email to