zhengruifeng opened a new pull request, #48591:
URL: https://github.com/apache/spark/pull/48591

   ### What changes were proposed in this pull request?
   Make `lit` accept `str` and `bool` type numpy ndarray
   
   
   ### Why are the changes needed?
   to be consistent with PySpark Classic
   ```
   In [4]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).show()
   +---------------+
   |ARRAY('a', 'b')|
   +---------------+
   |         [a, b]|
   +---------------+
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   before:
   ```
   In [3]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).show()
   ---------------------------------------------------------------------------
   PySparkTypeError                          Traceback (most recent call last)
   Cell In[3], line 1
   ----> 1 spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).schema
   
   File ~/Dev/spark/python/pyspark/sql/utils.py:272, in 
try_remote_functions.<locals>.wrapped(*args, **kwargs)
       269 if is_remote() and "PYSPARK_NO_NAMESPACE_SHARE" not in os.environ:
       270     from pyspark.sql.connect import functions
   --> 272     return getattr(functions, f.__name__)(*args, **kwargs)
       273 else:
       274     return f(*args, **kwargs)
   
   File ~/Dev/spark/python/pyspark/sql/connect/functions/builtin.py:274, in 
lit(col)
       272 dt = _from_numpy_type(col.dtype)
       273 if dt is None:
   --> 274     raise PySparkTypeError(
       275         errorClass="UNSUPPORTED_NUMPY_ARRAY_SCALAR",
       276         messageParameters={"dtype": col.dtype.name},
       277     )
       279 # NumpyArrayConverter for Py4J can not support ndarray with int8 
values.
       280 # Actually this is not a problem for Connect, but here still convert 
it
       281 # to int16 for compatibility.
       282 if dt == ByteType():
   
   PySparkTypeError: [UNSUPPORTED_NUMPY_ARRAY_SCALAR] The type of array scalar 
'str32' is not supported.
   ```
   
   after:
   ```
   In [4]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).show()
   +-----------+
   |array(a, b)|
   +-----------+
   |     [a, b]|
   +-----------+
   ```
   
   ### How was this patch tested?
   ci
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to