HyukjinKwon opened a new pull request #25950: [SPARK-29240][PYTHON] Pass Py4J 
column instance to support PySpark column in element_at function
URL: https://github.com/apache/spark/pull/25950
 
 
   ### What changes were proposed in this pull request?
   
   This PR makes `element_at` in PySpark able to take PySpark `Column` 
instances.
   
   ### Why are the changes needed?
   
   To match with Scala side. Seems it was intended but not working correctly as 
a bug.
   
   ### Does this PR introduce any user-facing change?
   
   Yes. See below:
   
   ```python
   from pyspark.sql import functions as F
   x = 
spark.createDataFrame([([1,2,3],1),([4,5,6],2),([7,8,9],3)],['list','num'])
   x.withColumn('aa',F.element_at('list',x.num.cast('int'))).show()
   ```
   
   Before:
   
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/functions.py", line 2059, in element_at
       return Column(sc._jvm.functions.element_at(_to_java_column(col), 
extraction))
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
 line 1277, in __call__
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
 line 1241, in _build_args
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
 line 1228, in _get_args
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py",
 line 500, in convert
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/column.py", line 
344, in __iter__
       raise TypeError("Column is not iterable")
   TypeError: Column is not iterable
   ```
   
   After:
   
   ```
   +---------+---+---+
   |     list|num| aa|
   +---------+---+---+
   |[1, 2, 3]|  1|  1|
   |[4, 5, 6]|  2|  5|
   |[7, 8, 9]|  3|  9|
   +---------+---+---+
   ```
   
   ### How was this patch tested?
   
   Manually tested against literal, Python native types, and PySpark column.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to