JoshRosen commented on code in PR #37460:
URL: https://github.com/apache/spark/pull/37460#discussion_r942750140


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala:
##########
@@ -91,7 +91,17 @@ case class TryCast(child: Expression, toType: DataType, 
timeZoneId: Option[Strin
     (!Cast.needsTimeZone(child.dataType, toType) || timeZoneId.isDefined)
 
   override lazy val replacement = {
-    TryEval(Cast(child, toType, timeZoneId = timeZoneId, ansiEnabled = true))
+    if (equivalentToNonAnsiCast) {
+      Cast(child, toType, timeZoneId = timeZoneId, ansiEnabled = false)
+    } else {
+      TryEval(Cast(child, toType, timeZoneId = timeZoneId, ansiEnabled = true))
+    }
+  }
+
+  private def equivalentToNonAnsiCast: Boolean = {

Review Comment:
   In order to preserve existing behaviors, I think that we must consider more 
than just the types of the Cast because `TryEval` will catch and suppress 
exceptions that occur during the evaluation of the Cast's child.
   
   For example, let's say that I'm casting the result of extracting a field 
from a map, something like
   
   ```
   cast(element_at(myMap, 'key') as integer)
   ```
   
   This could fail due because either (a) the map doesn't have a value for that 
key, or (b) the cast fails because the value isn't an integer. `TryEval` 
catches both types of exceptions, so removing it can change behavior:
   
   ```python
   >>> from pyspark.sql.functions import *
   
   >>> df = spark.range(10).withColumn("mapCol", 
create_map(col('id').cast('string'), 'id'))
   
   >>> df.selectExpr("try_cast(element_at(mapCol, 'missingKey') as 
string)").first()
   Row(TRY_CAST(element_at(mapCol, missingKey) AS STRING)=None)
   
   >>> df.selectExpr("cast(element_at(mapCol, 'missingKey') as string)").first()
   22/08/10 10:56:12 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 8)
   org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key 
'missingKey' does not exist. Use `try_element_at` to tolerate non-existent key 
and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" 
to bypass this error.
   ```
   
   It looks like the other `try_` expressions have similar behavior. For 
example, `try_add` will also suppress exceptions from its grandchildren, etc.
   
   ---
   
   As a result, removing the `TryEval` might only be safe in cases where we 
know that the Cast's child cannot possibly throw.
   
   Maybe we could keep the `TryEval` in place but still set `ansiEnabled = 
false` when we know that the Cast itself is equivalent? This would still give 
us performance benefits since we still avoid the cost of throwing and catching 
Cast exceptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to