JoshRosen commented on code in PR #37460:
URL: https://github.com/apache/spark/pull/37460#discussion_r942750140
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala:
##########
@@ -91,7 +91,17 @@ case class TryCast(child: Expression, toType: DataType,
timeZoneId: Option[Strin
(!Cast.needsTimeZone(child.dataType, toType) || timeZoneId.isDefined)
override lazy val replacement = {
- TryEval(Cast(child, toType, timeZoneId = timeZoneId, ansiEnabled = true))
+ if (equivalentToNonAnsiCast) {
+ Cast(child, toType, timeZoneId = timeZoneId, ansiEnabled = false)
+ } else {
+ TryEval(Cast(child, toType, timeZoneId = timeZoneId, ansiEnabled = true))
+ }
+ }
+
+ private def equivalentToNonAnsiCast: Boolean = {
Review Comment:
In order to preserve existing behaviors, I think that we must consider more
than just the types of the Cast because `TryEval` will catch and suppress
exceptions that occur during the evaluation of the Cast's child.
For example, let's say that I'm casting the result of extracting a field
from a map, something like
```
cast(element_at(myMap, 'key') as integer)
```
This could fail due because either (a) the map doesn't have a value for that
key, or (b) the cast fails because the value isn't an integer. `TryEval`
catches both types of exceptions, so removing it can change behavior:
```python
>>> from pyspark.sql.functions import *
>>> df = spark.range(10).withColumn("mapCol",
create_map(col('id').cast('string'), 'id'))
>>> df.selectExpr("try_cast(element_at(mapCol, 'missingKey') as
string)").first()
Row(TRY_CAST(element_at(mapCol, missingKey) AS STRING)=None)
>>> df.selectExpr("cast(element_at(mapCol, 'missingKey') as string)").first()
22/08/10 10:56:12 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 8)
org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key
'missingKey' does not exist. Use `try_element_at` to tolerate non-existent key
and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false"
to bypass this error.
```
It looks like the other `try_` expressions have similar behavior. For
example, `try_add` will also suppress exceptions from its grandchildren, etc.
---
As a result, removing the `TryEval` might only be safe in cases where we
know that the Cast's child cannot possibly throw.
Maybe we could keep the `TryEval` in place but still set `ansiEnabled =
false` when we know that the Cast itself is equivalent? This would still give
us performance benefits since we still avoid the cost of throwing and catching
Cast exceptions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]