[GitHub] [spark] HyukjinKwon opened a new pull request, #36375: [SPARK-39015][SQL] Remove the usage of toSQLValue(v) without an explicit type

GitBox Wed, 27 Apr 2022 03:17:21 -0700


HyukjinKwon opened a new pull request, #36375:
URL: https://github.com/apache/spark/pull/36375


   ### What changes were proposed in this pull request?
   
   This PR is a backport of https://github.com/apache/spark/pull/36351
   
   This PR proposes to remove the the usage of `toSQLValue(v)` without an 
explicit type.
   
   `Literal(v)` is intended to be used from end-users so it cannot handle our 
internal types such as `UTF8String` and `ArrayBasedMapData`. Using this method 
can lead to unexpected error messages such as:
   
   ```
   Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The 
feature is not supported: literal for 'hair' of class 
org.apache.spark.unsafe.types.UTF8String.
     at 
org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
     at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
     at 
org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
     ...
   ```
   
   Since It is impossible to have the corresponding data type from the internal 
types as one type can map to multiple external types (e.g., `Long` for 
`Timestamp`, `TimestampNTZ`, and `LongType`), the removal approach was taken.
   
   ### Why are the changes needed?
   
   To provide the error messages as intended.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   
   ```scala
   import org.apache.spark.sql.Row
   import org.apache.spark.sql.types.StructType
   import org.apache.spark.sql.types.StringType
   import org.apache.spark.sql.types.DataTypes
   
   val arrayStructureData = Seq(
   Row(Map("hair"->"black", "eye"->"brown")),
   Row(Map("hair"->"blond", "eye"->"blue")),
   Row(Map()))
   
   val mapType  = DataTypes.createMapType(StringType, StringType)
   
   val arrayStructureSchema = new StructType().add("properties", mapType)
   
   val mapTypeDF = spark.createDataFrame(
       spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)
   
   spark.conf.set("spark.sql.ansi.enabled", true)
   mapTypeDF.selectExpr("element_at(properties, 'hair')").show
   ```
   
   Before:
   
   ```
   Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The 
feature is not supported: literal for 'hair' of class 
org.apache.spark.unsafe.types.UTF8String.
     at 
org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
     at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
     at 
org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
     ...
   ```
   
   After:
   
   ```
   Caused by: org.apache.spark.SparkNoSuchElementException: 
[MAP_KEY_DOES_NOT_EXIST] Key 'hair' does not exist. To return NULL instead, use 
'try_element_at'. If necessary set spark.sql.ansi.enabled to false to bypass 
this error.
   == SQL(line 1, position 0) ==
   element_at(properties, 'hair')
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ```
   
   ### How was this patch tested?
   
   Unittest was added. Otherwise, existing test cases should cover.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon opened a new pull request, #36375: [SPARK-39015][SQL] Remove the usage of toSQLValue(v) without an explicit type

Reply via email to