allisonwang-db commented on code in PR #47253:
URL: https://github.com/apache/spark/pull/47253#discussion_r1676188548
##########
python/pyspark/sql/types.py:
##########
@@ -194,16 +194,7 @@ def fromDDL(cls, ddl: str) -> "DataType":
>>> DataType.fromDDL("b: string, a: int")
StructType([StructField('b', StringType(), True), StructField('a',
IntegerType(), True)])
"""
- from pyspark.sql import SparkSession
- from pyspark.sql.functions import udf
-
- # Intentionally uses SparkSession so one implementation can be shared
with/without
- # Spark Connect.
- schema = (
- SparkSession.active().range(0).select(udf(lambda x: x,
returnType=ddl)("id")).schema
- )
- assert len(schema) == 1
- return schema[0].dataType
+ return _parse_datatype_string(ddl)
Review Comment:
Can we make sure the behaivor of `_parse_datatype_string` is the same as the
original `fromDDL`? My concern is that this might introduce unintentional
behavior change for a public API.
What's the error message if we do `fromDDL(a variant)` without this change?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]