[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion

2021-11-30 Thread GitBox


cloud-fan commented on a change in pull request #34747:
URL: https://github.com/apache/spark/pull/34747#discussion_r759061265



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -424,27 +422,21 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 |the ${ordinalNumber(ti + 1)} table has 
${child.output.length} columns
   """.stripMargin.replace("\n", " ").trim())
   }
-  val isUnion = operator.isInstanceOf[Union]
-  val dataTypesAreCompatibleFn = if (isUnion) {
-(dt1: DataType, dt2: DataType) =>
-  !DataType.equalsStructurally(dt1, dt2, true)
-  } else {
-// SPARK-18058: we shall not care about the nullability of 
columns
-(dt1: DataType, dt2: DataType) =>
-  TypeCoercion.findWiderTypeForTwo(dt1.asNullable, 
dt2.asNullable).isEmpty
-  }
 
+  val dataTypesAreCompatibleFn = 
getDataTypesAreCompatibleFn(operator)
   // Check if the data types match.
   dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, 
dt2), ci) =>
 // SPARK-18058: we shall not care about the nullability of 
columns
 if (dataTypesAreCompatibleFn(dt1, dt2)) {
+  operator.setTagValue(DATA_TYPE_MISMATCH_ERROR, true)

Review comment:
   do we need to set the tag here? it's always the root node and it's very 
easy to find it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion

2021-11-29 Thread GitBox


cloud-fan commented on a change in pull request #34747:
URL: https://github.com/apache/spark/pull/34747#discussion_r758932749



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 
   def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): 
LogicalPlan = {
-if (plan.analyzed) return plan
 AnalysisHelper.markInAnalyzer {
   val analyzed = executeAndTrack(plan, tracker)
   try {
 checkAnalysis(analyzed)
 analyzed
   } catch {
 case e: AnalysisException =>
-  val ae = e.copy(plan = Option(analyzed))
+  val ae = e.copy(plan = Option(analyzed),
+message = e.message + extraHintForAnsiTypeCoercion(plan))
   ae.setStackTrace(e.getStackTrace)
   throw ae
   }
 }
   }
 
+  private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = {
+if (!conf.ansiEnabled) {
+  ""
+} else {
+  val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext 
{
+executeSameContext(plan)
+  }
+  try {
+checkAnalysis(nonAnsiPlan)
+"\nTo fix the error, you might need to add explicit type casts.\n" +

Review comment:
   IIUC the analysis is bottom-up, and `CheckAnalysis` should find the 
bottom-most expression whose children are all resolved and input type 
mismatches?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion

2021-11-29 Thread GitBox


cloud-fan commented on a change in pull request #34747:
URL: https://github.com/apache/spark/pull/34747#discussion_r758924914



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 
   def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): 
LogicalPlan = {
-if (plan.analyzed) return plan
 AnalysisHelper.markInAnalyzer {
   val analyzed = executeAndTrack(plan, tracker)
   try {
 checkAnalysis(analyzed)
 analyzed
   } catch {
 case e: AnalysisException =>
-  val ae = e.copy(plan = Option(analyzed))
+  val ae = e.copy(plan = Option(analyzed),
+message = e.message + extraHintForAnsiTypeCoercion(plan))
   ae.setStackTrace(e.getStackTrace)
   throw ae
   }
 }
   }
 
+  private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = {
+if (!conf.ansiEnabled) {
+  ""
+} else {
+  val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext 
{
+executeSameContext(plan)
+  }
+  try {
+checkAnalysis(nonAnsiPlan)
+"\nTo fix the error, you might need to add explicit type casts.\n" +

Review comment:
   Another point is, the analyzer has "side effects", as it may send RPC 
requests to the remote catalog. I think it's better to not run the entire 
analyzer again, even if the query fails.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion

2021-11-29 Thread GitBox


cloud-fan commented on a change in pull request #34747:
URL: https://github.com/apache/spark/pull/34747#discussion_r758924491



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 
   def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): 
LogicalPlan = {
-if (plan.analyzed) return plan
 AnalysisHelper.markInAnalyzer {
   val analyzed = executeAndTrack(plan, tracker)
   try {
 checkAnalysis(analyzed)
 analyzed
   } catch {
 case e: AnalysisException =>
-  val ae = e.copy(plan = Option(analyzed))
+  val ae = e.copy(plan = Option(analyzed),
+message = e.message + extraHintForAnsiTypeCoercion(plan))
   ae.setStackTrace(e.getStackTrace)
   throw ae
   }
 }
   }
 
+  private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = {
+if (!conf.ansiEnabled) {
+  ""
+} else {
+  val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext 
{
+executeSameContext(plan)
+  }
+  try {
+checkAnalysis(nonAnsiPlan)
+"\nTo fix the error, you might need to add explicit type casts.\n" +

Review comment:
   I'm a bit worried about the accuracy here. Re-running the entire 
analyzer includes more stuff, not just type coercion. Can we be more surgical 
and only rerun type coercion rules in `CheckAnalysis` when we hit input type 
mismatch error?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion

2021-11-29 Thread GitBox


cloud-fan commented on a change in pull request #34747:
URL: https://github.com/apache/spark/pull/34747#discussion_r758923158



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 
   def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): 
LogicalPlan = {
-if (plan.analyzed) return plan
 AnalysisHelper.markInAnalyzer {
   val analyzed = executeAndTrack(plan, tracker)
   try {
 checkAnalysis(analyzed)
 analyzed
   } catch {
 case e: AnalysisException =>
-  val ae = e.copy(plan = Option(analyzed))
+  val ae = e.copy(plan = Option(analyzed),
+message = e.message + extraHintForAnsiTypeCoercion(plan))
   ae.setStackTrace(e.getStackTrace)
   throw ae
   }
 }
   }
 
+  private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = {
+if (!conf.ansiEnabled) {
+  ""
+} else {
+  val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext 
{
+executeSameContext(plan)

Review comment:
   In some cases, people are not able to edit the query. I think turning 
off ansi mode is still a necessary workaround.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org