[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion
cloud-fan commented on a change in pull request #34747: URL: https://github.com/apache/spark/pull/34747#discussion_r759061265 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -424,27 +422,21 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { |the ${ordinalNumber(ti + 1)} table has ${child.output.length} columns """.stripMargin.replace("\n", " ").trim()) } - val isUnion = operator.isInstanceOf[Union] - val dataTypesAreCompatibleFn = if (isUnion) { -(dt1: DataType, dt2: DataType) => - !DataType.equalsStructurally(dt1, dt2, true) - } else { -// SPARK-18058: we shall not care about the nullability of columns -(dt1: DataType, dt2: DataType) => - TypeCoercion.findWiderTypeForTwo(dt1.asNullable, dt2.asNullable).isEmpty - } + val dataTypesAreCompatibleFn = getDataTypesAreCompatibleFn(operator) // Check if the data types match. dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, dt2), ci) => // SPARK-18058: we shall not care about the nullability of columns if (dataTypesAreCompatibleFn(dt1, dt2)) { + operator.setTagValue(DATA_TYPE_MISMATCH_ERROR, true) Review comment: do we need to set the tag here? it's always the root node and it's very easy to find it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion
cloud-fan commented on a change in pull request #34747: URL: https://github.com/apache/spark/pull/34747#discussion_r758932749 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: CatalogManager) } def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): LogicalPlan = { -if (plan.analyzed) return plan AnalysisHelper.markInAnalyzer { val analyzed = executeAndTrack(plan, tracker) try { checkAnalysis(analyzed) analyzed } catch { case e: AnalysisException => - val ae = e.copy(plan = Option(analyzed)) + val ae = e.copy(plan = Option(analyzed), +message = e.message + extraHintForAnsiTypeCoercion(plan)) ae.setStackTrace(e.getStackTrace) throw ae } } } + private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = { +if (!conf.ansiEnabled) { + "" +} else { + val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext { +executeSameContext(plan) + } + try { +checkAnalysis(nonAnsiPlan) +"\nTo fix the error, you might need to add explicit type casts.\n" + Review comment: IIUC the analysis is bottom-up, and `CheckAnalysis` should find the bottom-most expression whose children are all resolved and input type mismatches? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion
cloud-fan commented on a change in pull request #34747: URL: https://github.com/apache/spark/pull/34747#discussion_r758924914 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: CatalogManager) } def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): LogicalPlan = { -if (plan.analyzed) return plan AnalysisHelper.markInAnalyzer { val analyzed = executeAndTrack(plan, tracker) try { checkAnalysis(analyzed) analyzed } catch { case e: AnalysisException => - val ae = e.copy(plan = Option(analyzed)) + val ae = e.copy(plan = Option(analyzed), +message = e.message + extraHintForAnsiTypeCoercion(plan)) ae.setStackTrace(e.getStackTrace) throw ae } } } + private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = { +if (!conf.ansiEnabled) { + "" +} else { + val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext { +executeSameContext(plan) + } + try { +checkAnalysis(nonAnsiPlan) +"\nTo fix the error, you might need to add explicit type casts.\n" + Review comment: Another point is, the analyzer has "side effects", as it may send RPC requests to the remote catalog. I think it's better to not run the entire analyzer again, even if the query fails. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion
cloud-fan commented on a change in pull request #34747: URL: https://github.com/apache/spark/pull/34747#discussion_r758924491 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: CatalogManager) } def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): LogicalPlan = { -if (plan.analyzed) return plan AnalysisHelper.markInAnalyzer { val analyzed = executeAndTrack(plan, tracker) try { checkAnalysis(analyzed) analyzed } catch { case e: AnalysisException => - val ae = e.copy(plan = Option(analyzed)) + val ae = e.copy(plan = Option(analyzed), +message = e.message + extraHintForAnsiTypeCoercion(plan)) ae.setStackTrace(e.getStackTrace) throw ae } } } + private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = { +if (!conf.ansiEnabled) { + "" +} else { + val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext { +executeSameContext(plan) + } + try { +checkAnalysis(nonAnsiPlan) +"\nTo fix the error, you might need to add explicit type casts.\n" + Review comment: I'm a bit worried about the accuracy here. Re-running the entire analyzer includes more stuff, not just type coercion. Can we be more surgical and only rerun type coercion rules in `CheckAnalysis` when we hit input type mismatch error? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #34747: [SPARK-37490][SQL] Show extra hint if analyzer fails due to ANSI type coercion
cloud-fan commented on a change in pull request #34747: URL: https://github.com/apache/spark/pull/34747#discussion_r758923158 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -198,21 +205,39 @@ class Analyzer(override val catalogManager: CatalogManager) } def executeAndCheck(plan: LogicalPlan, tracker: QueryPlanningTracker): LogicalPlan = { -if (plan.analyzed) return plan AnalysisHelper.markInAnalyzer { val analyzed = executeAndTrack(plan, tracker) try { checkAnalysis(analyzed) analyzed } catch { case e: AnalysisException => - val ae = e.copy(plan = Option(analyzed)) + val ae = e.copy(plan = Option(analyzed), +message = e.message + extraHintForAnsiTypeCoercion(plan)) ae.setStackTrace(e.getStackTrace) throw ae } } } + private def extraHintForAnsiTypeCoercion(plan: LogicalPlan): String = { +if (!conf.ansiEnabled) { + "" +} else { + val nonAnsiPlan = AnalysisContext.withDefaultTypeCoercionAnalysisContext { +executeSameContext(plan) Review comment: In some cases, people are not able to edit the query. I think turning off ansi mode is still a necessary workaround. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org