HyukjinKwon opened a new pull request #26288: [SPARK-29627][PYTHON][SQL] Allow 
array_contains to take column instances
URL: https://github.com/apache/spark/pull/26288
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to allow `array_contains` to take column instances.
   
   
   ### Why are the changes needed?
   
   For consistent support in Scala and Python APIs. Scala allows column 
instances at `array_contains`
   
   Scala:
   
   ```scala
   import org.apache.spark.sql.functions._
   val df = Seq(Array("a", "b", "c"), Array.empty[String]).toDF("data")
   df.select(array_contains($"data", lit("a"))).collect()
   ```
   
   Python:
   
   ```python
   from pyspark.sql.functions import array_contains, lit
   df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
   df.select(array_contains(df.data, lit("a"))).show()
   ```
   
   However, PySpark sides does not allow.
   
   ### Does this PR introduce any user-facing change?
   
   Yes.
   
   
   ```python
   from pyspark.sql.functions import array_contains, lit
   df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
   df.select(array_contains(df.data, lit("a"))).show()
   ```
   
   **Before:**
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/functions.py", line 1950, in 
array_contains
       return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
     File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", 
line 1277, in __call__
     File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", 
line 1241, in _build_args
     File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", 
line 1228, in _get_args
     File 
"/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 
500, in convert
     File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__
       raise TypeError("Column is not iterable")
   TypeError: Column is not iterable
   ```
   
   **After:**
   
   
   ```
   +-----------------------+
   |array_contains(data, a)|
   +-----------------------+
   |                   true|
   |                  false|
   +-----------------------+
   ```
   
   ### How was this patch tested?
   
   Manually tested and added a doctest.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to