zhengruifeng opened a new pull request, #48591:
URL: https://github.com/apache/spark/pull/48591
### What changes were proposed in this pull request?
Make `lit` accept `str` and `bool` type numpy ndarray
### Why are the changes needed?
to be consistent with PySpark Classic
```
In [4]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).show()
+---------------+
|ARRAY('a', 'b')|
+---------------+
| [a, b]|
+---------------+
```
### Does this PR introduce _any_ user-facing change?
yes
before:
```
In [3]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).show()
---------------------------------------------------------------------------
PySparkTypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).schema
File ~/Dev/spark/python/pyspark/sql/utils.py:272, in
try_remote_functions.<locals>.wrapped(*args, **kwargs)
269 if is_remote() and "PYSPARK_NO_NAMESPACE_SHARE" not in os.environ:
270 from pyspark.sql.connect import functions
--> 272 return getattr(functions, f.__name__)(*args, **kwargs)
273 else:
274 return f(*args, **kwargs)
File ~/Dev/spark/python/pyspark/sql/connect/functions/builtin.py:274, in
lit(col)
272 dt = _from_numpy_type(col.dtype)
273 if dt is None:
--> 274 raise PySparkTypeError(
275 errorClass="UNSUPPORTED_NUMPY_ARRAY_SCALAR",
276 messageParameters={"dtype": col.dtype.name},
277 )
279 # NumpyArrayConverter for Py4J can not support ndarray with int8
values.
280 # Actually this is not a problem for Connect, but here still convert
it
281 # to int16 for compatibility.
282 if dt == ByteType():
PySparkTypeError: [UNSUPPORTED_NUMPY_ARRAY_SCALAR] The type of array scalar
'str32' is not supported.
```
after:
```
In [4]: spark.range(1).select(sf.lit(np.array(["a", "b"], np.str_))).show()
+-----------+
|array(a, b)|
+-----------+
| [a, b]|
+-----------+
```
### How was this patch tested?
ci
### Was this patch authored or co-authored using generative AI tooling?
no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]