Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan closed pull request #45383: [SPARK-47210][SQL] Addition of implicit casting without indeterminate support URL: https://github.com/apache/spark/pull/45383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on PR #45383: URL: https://github.com/apache/spark/pull/45383#issuecomment-2035818786 the yarn and docker test failures are unrelated, thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1549347237 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case ifExpr: If => + ifExpr.withNewChildren( +ifExpr.predicate +: collateToSingleType(Seq(ifExpr.trueValue, ifExpr.falseValue))) +case caseWhenExpr: CaseWhen => + val newValues = collateToSingleType( +caseWhenExpr.branches.map(b => b._2) ++ caseWhenExpr.elseValue) + caseWhenExpr.withNewChildren( +interleave(Seq.empty, caseWhenExpr.branches.map(b => b._1), newValues)) Review Comment: We actually do not blindly Cast in CollationTypeCasts. We only cast if we get different collations, as all other cases go to null branch in castStringType. Maybe it is better to keep this check internal to this rule, as we will add the priority flag and we will need to handle casting of priority in this rule as well later. Am adding the haveSameType for now to make it check the input types, but will change this code in the following PR to include priorities in the internal implementation of haveSameType check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1549240911 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -112,12 +112,14 @@ case class Collate(child: Expression, collationName: String) since = "4.0.0", group = "string_funcs") // scalastyle:on line.contains.tab -case class Collation(child: Expression) extends UnaryExpression with RuntimeReplaceable { - override def dataType: DataType = StringType +case class Collation(child: Expression) + extends UnaryExpression with RuntimeReplaceable with ExpectsInputTypes { + override def dataType: DataType = SQLConf.get.defaultStringType Review Comment: We shouldn't override it. The default implementation is better `replacement.dataType` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1549240904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case ifExpr: If => + ifExpr.withNewChildren( +ifExpr.predicate +: collateToSingleType(Seq(ifExpr.trueValue, ifExpr.falseValue))) +case caseWhenExpr: CaseWhen => + val newValues = collateToSingleType( +caseWhenExpr.branches.map(b => b._2) ++ caseWhenExpr.elseValue) + caseWhenExpr.withNewChildren( +interleave(Seq.empty, caseWhenExpr.branches.map(b => b._1), newValues)) +case substrExpr: Substring => + // This case is necessary for changing Substring input to implicit collation + substrExpr.withNewChildren( +collateToSingleType(Seq(substrExpr.str)) :+ substrExpr.pos :+ substrExpr.len) Review Comment: For now we do not need it, I can revert this change and leave it for default collation handling. Because if we add a flag for collation priority, we will need to change the priority of the input to implicit, if it was default before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1549239292 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case ifExpr: If => + ifExpr.withNewChildren( +ifExpr.predicate +: collateToSingleType(Seq(ifExpr.trueValue, ifExpr.falseValue))) +case caseWhenExpr: CaseWhen => + val newValues = collateToSingleType( +caseWhenExpr.branches.map(b => b._2) ++ caseWhenExpr.elseValue) + caseWhenExpr.withNewChildren( +interleave(Seq.empty, caseWhenExpr.branches.map(b => b._1), newValues)) +case substrExpr: Substring => + // This case is necessary for changing Substring input to implicit collation + substrExpr.withNewChildren( +collateToSingleType(Seq(substrExpr.str)) :+ substrExpr.pos :+ substrExpr.len) Review Comment: I don't get it. Why do we find the common collation for a single type? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1549237247 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case ifExpr: If => + ifExpr.withNewChildren( +ifExpr.predicate +: collateToSingleType(Seq(ifExpr.trueValue, ifExpr.falseValue))) +case caseWhenExpr: CaseWhen => + val newValues = collateToSingleType( +caseWhenExpr.branches.map(b => b._2) ++ caseWhenExpr.elseValue) + caseWhenExpr.withNewChildren( +interleave(Seq.empty, caseWhenExpr.branches.map(b => b._1), newValues)) Review Comment: Actually this exposes some gaps in this new rule 1. We should add a trigger condition and only enter the branch if `!haveSameType(...)` 2. We should not blindly add cast but use `castIfNotSameType` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1549233784 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case ifExpr: If => + ifExpr.withNewChildren( +ifExpr.predicate +: collateToSingleType(Seq(ifExpr.trueValue, ifExpr.falseValue))) +case caseWhenExpr: CaseWhen => + val newValues = collateToSingleType( +caseWhenExpr.branches.map(b => b._2) ++ caseWhenExpr.elseValue) + caseWhenExpr.withNewChildren( +interleave(Seq.empty, caseWhenExpr.branches.map(b => b._1), newValues)) Review Comment: The code looks a bit complicated now. Can we follow the existing rule? ``` object CaseWhenCoercion extends TypeCoercionRule { override val transform: PartialFunction[Expression, Expression] = { case c: CaseWhen if c.childrenResolved && !haveSameType(c.inputTypesForMerging) => val maybeCommonType = findWiderCommonType(c.inputTypesForMerging) maybeCommonType.map { commonType => val newBranches = c.branches.map { case (condition, value) => (condition, castIfNotSameType(value, commonType)) } val newElseValue = c.elseValue.map(castIfNotSameType(_, commonType)) CaseWhen(newBranches, newElseValue) }.getOrElse(c) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1548791139 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc@(_: In + | _: InSubquery + | _: CreateArray + | _: If + | _: ArrayJoin + | _: CaseWhen + | _: Concat + | _: Greatest + | _: Least + | _: Coalesce + | _: BinaryExpression + | _: ConcatWs + | _: Substring) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) + } + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) + .distinctBy(_.collationId) + +if (hasMultipleImplicits(implicitTypes)) { Review Comment: this can be `implicitTypes.length > 1` now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1548790383 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType} +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc@(_: In + | _: InSubquery + | _: CreateArray + | _: If Review Comment: We should add more case matches as some expressions do not require all its inputs to be the same type ``` case if: If => if.withNewChildren(if.predicate +: collateToSingleType(if.trueValue, if.falseValue)) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
srielau commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547984669 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: Interesting. Yes array() sports an implicit collation. 'c' COLLATE UNICODE_CI sports an explicit collation. The rules are clear that the explicit collation wins. So the result is `array` with an IMPLICIT collation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
srielau commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547984669 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: Interesting. Yes array() sports an implicit collation. 'c' COLLATE UNICODE_CI sports an explicit collation. The rules are clear that the explicit collation wins. So the result is array with an IMPLICIT collation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547880103 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) Review Comment: scala has `distinctBy`, we can distinct by collation id only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547845857 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) Review Comment: I could change the else branch only here as well, to make it look nicer as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547843264 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) Review Comment: There is one problem with this. We said collations are only different if their collationId's are different. If we do this, we would need to change the line in else clause to something that does not look so nice. I have changed this structure in https://github.com/apache/spark/pull/45819, which is a followup for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547841866 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) + +if (hasMultipleImplicits(implicitTypes)) { + throw QueryCompilationErrors.implicitCollationMismatchError() +} +else { + implicitTypes.find(dt => !(dt == SQLConf.get.defaultStringType)) +.getOrElse(SQLConf.get.defaultStringType) +} +} + } + + /** + * This check is always preformed when we have no explicit collation. It returns true + * if there are more than one implicit collations. Collations are distinguished by their + * collationId. + * @param dataTypes + * @return + */ + private def hasMultipleImplicits(dataTypes: Seq[StringType]): Boolean = +dataTypes.map(_.collationId) + .filter(dt => !(dt == SQLConf.get.defaultStringType.collationId)).distinct.size > 1 + +} + +/** + * This rule is used to collate all existing expressions related to StringType into a single + *
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547826569 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) + +if (hasMultipleImplicits(implicitTypes)) { + throw QueryCompilationErrors.implicitCollationMismatchError() +} +else { + implicitTypes.find(dt => !(dt == SQLConf.get.defaultStringType)) +.getOrElse(SQLConf.get.defaultStringType) +} +} + } + + /** + * This check is always preformed when we have no explicit collation. It returns true + * if there are more than one implicit collations. Collations are distinguished by their + * collationId. + * @param dataTypes + * @return + */ + private def hasMultipleImplicits(dataTypes: Seq[StringType]): Boolean = +dataTypes.map(_.collationId) + .filter(dt => !(dt == SQLConf.get.defaultStringType.collationId)).distinct.size > 1 + +} + +/** + * This rule is used to collate all existing expressions related to StringType into a single + *
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547818939 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,16 +705,17 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val children: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) Review Comment: please revert these cosmetic changes. ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,16 +705,17 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val children: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) } e.withNewChildren(children) case e: ExpectsInputTypes if e.inputTypes.nonEmpty => // Convert NullType into some specific target type for ExpectsInputTypes that don't do // general implicit casting. -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val children: Seq[Expression] = + e.children.zip(e.inputTypes).map { case (in, expected) => Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547818283 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) Review Comment: nit: ``` val implicitCollations = ... .distinct if (implicitCollations > 1) { throw ... } else { implicitCollations.headOption.getOrElse(SQLConf.get.defaultStringType) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547814185 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) + +if (hasMultipleImplicits(implicitTypes)) { + throw QueryCompilationErrors.implicitCollationMismatchError() +} +else { + implicitTypes.find(dt => !(dt == SQLConf.get.defaultStringType)) +.getOrElse(SQLConf.get.defaultStringType) +} +} + } + + /** + * This check is always preformed when we have no explicit collation. It returns true + * if there are more than one implicit collations. Collations are distinguished by their + * collationId. + * @param dataTypes + * @return + */ + private def hasMultipleImplicits(dataTypes: Seq[StringType]): Boolean = +dataTypes.map(_.collationId) + .filter(dt => !(dt == SQLConf.get.defaultStringType.collationId)).distinct.size > 1 + +} + +/** + * This rule is used to collate all existing expressions related to StringType into a single + *
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547811709 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) + +if (hasMultipleImplicits(implicitTypes)) { + throw QueryCompilationErrors.implicitCollationMismatchError() +} +else { + implicitTypes.find(dt => !(dt == SQLConf.get.defaultStringType)) +.getOrElse(SQLConf.get.defaultStringType) +} +} + } + + /** + * This check is always preformed when we have no explicit collation. It returns true + * if there are more than one implicit collations. Collations are distinguished by their + * collationId. + * @param dataTypes + * @return + */ + private def hasMultipleImplicits(dataTypes: Seq[StringType]): Boolean = +dataTypes.map(_.collationId) + .filter(dt => !(dt == SQLConf.get.defaultStringType.collationId)).distinct.size > 1 + +} + +/** + * This rule is used to collate all existing expressions related to StringType into a single + *
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547812585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, ConcatWs, CreateArray, Expression, In, InSubquery, Substring} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +abstract class CollationTypeCasts extends TypeCoercionRule { + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) + .filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId) + +if (hasMultipleImplicits(implicitTypes)) { + throw QueryCompilationErrors.implicitCollationMismatchError() +} +else { + implicitTypes.find(dt => !(dt == SQLConf.get.defaultStringType)) +.getOrElse(SQLConf.get.defaultStringType) +} +} + } + + /** + * This check is always preformed when we have no explicit collation. It returns true + * if there are more than one implicit collations. Collations are distinguished by their + * collationId. + * @param dataTypes + * @return + */ + private def hasMultipleImplicits(dataTypes: Seq[StringType]): Boolean = +dataTypes.map(_.collationId) + .filter(dt => !(dt == SQLConf.get.defaultStringType.collationId)).distinct.size > 1 + +} + +/** + * This rule is used to collate all existing expressions related to StringType into a single + *
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547390616 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) Review Comment: > If I understood correctly, we do not want to replicate the code from ImplicitTypeCasts. I don't think this is possible without significant refactoring. Correctness is more important than duplicated code at this stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547309134 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) Review Comment: This is the problem. If I understood correctly, we do not want to replicate the code from ImplicitTypeCasts. This rule is concerned with transforming datatypes into their expected DataTypes, but CollationTypeCasts is concerned with transforming StringTypes into their expected collated StringType. There is a difference, as collated StringType is calculated based on expression parameters that have StringTypes not on what someone expects the input types to be. As you mentioned for `If`, this new collation type rule does not want to fail instead of the IfCoercion which fails if it cannot find wider for left and right, but this new rule wants to fail and mitigate errors of a customer providing differently collated strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547288333 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { Review Comment: Note: if we want to have correlation between function inputs, please match the expression explicitly to do so. `ImplicitCastInputTypes` does not indicate input correlation and please DO NOT make such assumptions here. Looking at `ConcatWs`, it does need the inputs to use the same string collation. Let's match it in the new rule and deal with it correctly. In principle, let's not generalize things without seeing the full picture. What we should do to `ConcatWs` does not necessarily apply to all `ImplicitCastInputTypes` implementations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547273755 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { Review Comment: Once again, this is the problem, we need to call getOutputCollation before we call implicitCast, as implicitCast only concerns one child, and not all children as a group. For collations we need to be sure we decided on the right collation first and then enter a casting rule. But if we do this, we would need to pass additional stuff to implicitCast, which changes the logic of the core function. Another idea was to implement implicitCasting for collations solely in a new rule, but we would end up copying a lot of already existent code in the implicitCast for TypeCollections and StringTypes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547259400 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => Review Comment: Then please match the ConcatWs expression explicitly to handle this case. What I disagree with is to do this blindly for all expressions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547242430 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => Review Comment: A simple example is ConcatWs. It can have ArrayType(StringType, _) for input strings and StringType for separator as parameters. What collations do we want for this then? We need to cast the ArrayType into a proper collation if separator is explicit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547239943 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { Review Comment: It seems we are fixing the problem at the wrong place. If `TypeCollections` has issues, let's update the code in `def implicitCast` to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547226617 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { Review Comment: Well the problem is that we have TypeCollections as expected types. If someone defines StringType as non-first DataType (and consequently non-defaultConcreteType), we need to go through previous elements first to see if it will come to StringType cast. If it does, we need to cast it to the same collation. For example, look at `ConcatWs` if it receives a BinaryType, it should try first to cast to ArrayType, get None and then try to cast it to StringType. But on the other hand, we need for consistency with findWiderCommonType to cast collations at the beginning of rules, as we would get a breakage in castings otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547226867 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { + case in if hasStringType(in.dataType) => +castStringType(in, st).getOrElse(in) + case in => in } e.withNewChildren(children) case e: ExpectsInputTypes if e.inputTypes.nonEmpty => // Convert NullType into some specific target type for ExpectsInputTypes that don't do // general implicit casting. -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = Review Comment: This is not necessary, agreed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547210423 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) Review Comment: Let's not assume thar it's safe to only cast string-type inputs for any expression. Please follow the existing implicit cast rules and explicitly match expressions that need some of their inputs to be the same type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547201438 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) Review Comment: This rule is only touching children that have StringType's as their DataType. I can reorder execution to first filter out StringType arguments and then work on them, Agreed for `ExpectInputTypes`, I have checked now and it should not have any function with 2 StringType inputs, so we are safe not to cast it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547197835 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { Review Comment: what are we doing here? `ImplicitCastInputTypes` defines the expected data type of each input, but does not require all inputs to be the same data type. ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -702,26 +707,39 @@ abstract class TypeCoercionBase { }.getOrElse(b) // If there is no applicable conversion, leave expression unchanged. case e: ImplicitCastInputTypes if e.inputTypes.nonEmpty => -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = e.children.zip(e.inputTypes).map { // If we cannot do the implicit cast, just use the original input. - implicitCast(in, expected).getOrElse(in) + case (in, expected) => implicitCast(in, expected).getOrElse(in) +} +val st = getOutputCollation(e.children) +val children: Seq[Expression] = childrenBeforeCollations.map { + case in if hasStringType(in.dataType) => +castStringType(in, st).getOrElse(in) + case in => in } e.withNewChildren(children) case e: ExpectsInputTypes if e.inputTypes.nonEmpty => // Convert NullType into some specific target type for ExpectsInputTypes that don't do // general implicit casting. -val children: Seq[Expression] = e.children.zip(e.inputTypes).map { case (in, expected) => +val childrenBeforeCollations: Seq[Expression] = Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547196013 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => Review Comment: I disagree with special-case array type. The code looks broken. It assumes the children of the given expression can have both string type and array of string type, then tries to find a common collation between the string type child and the array element. This makes no sense without knowing the semantic of the given expression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547193589 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) Review Comment: How do we define implicit collation? The collation can come from multiple places: 1. default collation 2. table column collation 3. was explicit collation before but does not get propagate Why do we special case default collation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547191461 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId != castType.collationId => castType + case ArrayType(arrType, nullable) => +castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val st = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, st).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(expr: Seq[Expression]): StringType = { +val explicitTypes = expr.filter(_.isInstanceOf[Collate]) + .map(_.dataType.asInstanceOf[StringType].collationId) + .distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => StringType(explicitTypes.head) + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +explicitTypes.map(t => StringType(t).typeName) + ) + // Only implicit or default collations present + case 0 => +val implicitTypes = expr.map(_.dataType) + .filter(hasStringType) + .map(extractStringType) Review Comment: the code looks confusing. If the default collation is not treated as implicit collation, we should filter it out here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1547189187 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, Elt, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: ComplexTypeMergingExpression + | _: CreateArray + | _: Elt + | _: ExpectsInputTypes + | _: Predicate + | _: SortOrder) => + val newChildren = collateToSingleType(sc.children) Review Comment: This is the tricky part of adding a new rule. We must make sure we follow the behavior of existing implicit cast rules and only add implicit cast for certain children of an expression. For example, the true and false branches of `If` expression need implicit cast, but not the if condition. `ExpectsInputTypes` does not indicate that the expression requires all its children to be the same type, and should not be handled here. Please carefully check all implicit cast rules and revisit this new rule. @mihailom-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546551263 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: The example query is `SELECT array('a', 'b' collate UNICODE) || ('c' collate UNICODE_CI)`. This is kind of artificial as users can write one `array(...)` function, but think about a rewrite like `SELECT arr || ('c' collate UNICODE_CI) FROM (SELECT array('a', 'b' collate UNICODE) as arr)`, I think returning an array with string of collation UNICODE_CI is fine? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
srielau commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546517397 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: We should play out how to the behavior. If I want combine arrays or maps, how do I do that. Is this what drives us towards CAST support. And does that imply that teh result of a CAST with a COLLATE clause (on an element/key/value) is a explicit collation)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546445756 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: This is a good news. Let's not special-case CreateArray, otherwise people may question about CreateMap and CreateStruct. They are just SQL functions, not a special building block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
srielau commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546420774 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: Explicit collation NEVER propagates. It only applies to the immediate consumer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546413940 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: How I understood it, arrays are bulding blocks for types as well. The difference is that arrays have to be built from some different types which makes them complex. But `substring`, `concat`, etc. are functions that return something based on a transformation behaviour. In other words, CreateArray is not even considering the the values, it only builds a new 'basic' block. That is why I am also concerned with what we want to do with Elt, do we want to do casting or just propagation of inputs, as it does not change any string, but only takes a specific value from the given index. But again, this differs from array access, as arrays need to have a single type for all input expressions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546398212 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: I'm not convinced. The propagation of "explicit collation" should be well-defined. Why don't funcitons like `substring`, `concat` propagate "explicit collation"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546397910 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -27,7 +27,9 @@ import org.apache.spark.sql.catalyst.util.CollationFactory * @param collationId The id of collation for this StringType. */ @Stable -class StringType private(val collationId: Int) extends AtomicType with Serializable { +class StringType private(val collationId: Int, var isExplicit: Boolean = false) Review Comment: Yeah, good point, needed it to be var when it was not in the constructor. Changing it to val. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546396569 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -27,7 +27,9 @@ import org.apache.spark.sql.catalyst.util.CollationFactory * @param collationId The id of collation for this StringType. */ @Stable -class StringType private(val collationId: Int) extends AtomicType with Serializable { +class StringType private(val collationId: Int, var isExplicit: Boolean = false) Review Comment: why it's a `var`? does it need to be mutable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546396114 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => Review Comment: Added Elt to the check, but will probably revisit that function when we enter a PR for it's support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546388826 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => Review Comment: There are only 2 expressions that are potentially missing. One is Elt, but I figured this one can be done when we actually do the support for this function. And the other one is Stack, but for this one I would say we do not want to do casting, as we actually pass columns there. Additionally, as a followup, I have a ticket to check whether we want to do Map casting of stringTypes and in which way, but this feels like a less of a priority to indeterminate collation. I could add support for Elt, it is a single line of code, but I thought it would be better to work on it in a single PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546370140 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: This might be a good question for @srielau. How do we want this query to behave: `SELECT array('a', 'b' collate UNICODE) || ('c' collate UNICODE_CI)` (Concat_Ws in our implementation)? This query in Postgre fails with explicit type mismatch. Do we want to propagate explicit collations from elements to arrays. I would say yes, as ArrayType is constructed with a parameter called `elementType`. This suggests, if elements are explicit, then ArrayType should be as well. Also, on other operations with arrays, as we defined that function outputs are implicit, I would say array creation is then the only place where we should propagate this explicit meaning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546285554 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => Review Comment: why only `CreateArray` can preserve explicit collation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546285091 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => + val newChildren = collateToSingleType(sc.children) + sc.withNewChildren(newChildren) +case pesc @ (_: CreateArray) => + val newChildren = collateToSingleType(pesc.children, true) + pesc.withNewChildren(newChildren) + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, st: StringType): Option[Expression] = +castStringType(expr.dataType, st).map { dt => Cast(expr, dt)} + + private def castStringType(inType: AbstractDataType, st: StringType): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case ost: StringType if ost.collationId == st.collationId +&& ost.isExplicit == st.isExplicit => null + case _: StringType => st + case ArrayType(arrType, nullable) => +castStringType(arrType, st).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression], + preserveExplicit: Boolean = false): Seq[Expression] = { Review Comment: nit 4 spaces indentation ``` def func para1: xxx, para2: xxx, ...) ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546284265 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.util.CollationFactory */ @Stable class StringType private(val collationId: Int) extends AtomicType with Serializable { + var isExplicit: Boolean = false Review Comment: why isn't it part of construct parameters? ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.util.CollationFactory */ @Stable class StringType private(val collationId: Int) extends AtomicType with Serializable { + var isExplicit: Boolean = false Review Comment: why isn't it part of constructor parameters? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546283459 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.analysis.TypeCoercion.hasStringType +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression) => Review Comment: Can you check all the implicit cast rules and see if we miss to handle any expression? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546114682 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression Review Comment: I went through all the core functions and it seems like StringType implicit castings are already blocked, or go to default branch which is None. Is there something else we want to check? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546065154 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression Review Comment: This is a good point that we can't reuse the existing `fold` style to do implicit cast for string collations. If we must rewrite the code, a new rule sounds better. To avoid the notorious analyzer rule execution order issues, let's make sure this new rule is orthogonal to existing implicit cast rules: the existing "fold" style implicit cast should not apply to string type with different collations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546046606 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression Review Comment: Also, I do not believe incorporating string collations into core functions is the best way to go. If you take a look at findWiderCommonType, it is implemented differently for AnsiTypeCoercion and TypeCoercion. Firstly both of those take leftFold and go through expressions in order, but in our case that is not what we want. We want to first know what collations are explicit and then look for anything else. This is important because we want to have clear design on which error has to come first, and explicit mismatch should always come before implicit one. Another thing is that AnsiTypeCoercion seems to behave differently for StringTypes, we do not reorder any type in the sequence, which would result in implementing the same thing we did here with addition of another rule, but just in a core, already well defined function. One more thing I would add is that not all rules actually use core functions, e.g. ConcatCoercion. Also doing collations casting in one rule is way more efficient, as otherwise we would have to reorder StringTypes in every other rule, to make sure explicit collation mismatches are thrown first, which would result in multiple reorderings as opposed to constant of 2, one at the beginning for all StringTypes and the other at the end for all expressions that were cast to StringType from other types in some other implicit casting rules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1546046606 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression Review Comment: Also, I do not believe incorporating string collations into core functions is the best way to go. If you take a look at findWiderCommonForTwo, it is implemented differently for AnsiTypeCoercion and TypeCoercion. Firstly both of those take leftFold and go through expressions in order, but in our case that is not what we want. We want to first know what collations are explicit and then look for anything else. This is important because we want to have clear design on which error has to come first, and explicit mismatch should always come before implicit one. Another thing is that AnsiTypeCoercion seems to behave differently for StringTypes, we do not reorder any type in the sequence, which would result in implementing the same thing we did here with addition of another rule, but just in a core, already well defined function. One more thing I would add is that not all rules actually use core functions, e.g. ConcatCoercion. Also doing collations casting in one rule is way more efficient, as otherwise we would have to reorder StringTypes in every other rule, to make sure explicit collation mismatches are thrown first, which would result in multiple reorderings as opposed to constant of 2, one at the beginning for all StringTypes and the other at the end for all expressions that were cast to StringType from other types in some other implicit casting rules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544741676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression Review Comment: @cloud-fan Is this what you meant? Now this type of query will be broken `SELECT array('b' collate unicode_ci) || 'a' collate unicode`, but I can add support for array explicit as a followup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544588809 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +case sc @ (_: BinaryExpression Review Comment: This looks like a clean solution for string collation implicit cast, but actually it is not. The current design of the Spark implicit cast system is: we define some core functions like `findTightestCommonType`, `findWiderTypeForTwo`, etc., and there are a bunch of rules to find the target expressions to apply implicit cast using these core functions. The string collation implicit cast should be part of the core functions, instead of a new rule. Otherwise this new rule is a bit inconsistent with the current system and we need to replicate the matching of expressions that need implicit cast. Since the core functions only take `DataType`, not `Expression`, I suggest we add a new boolean flag to `StringType` to indicate it's explicit or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544100458 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -112,12 +112,14 @@ case class Collate(child: Expression, collationName: String) since = "4.0.0", group = "string_funcs") // scalastyle:on line.contains.tab -case class Collation(child: Expression) extends UnaryExpression with RuntimeReplaceable { - override def dataType: DataType = StringType +case class Collation(child: Expression) + extends UnaryExpression with RuntimeReplaceable with ExpectsInputTypes { + override def dataType: DataType = child.dataType Review Comment: I was thinking of some cases when indeterminate comes. What if our costumer wants to concat the collation used to their strings and filter on something like that. If we return a StringType, we would face a problem of getting indeterminate collation failing to keep the pass through for our functions. This is more of a problem if users set default collation to be something else, so we might say that changing return type to default collation would be the better choice? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544095362 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +// Case when we do fail if resulting collation is indeterminate +case checkCastWithoutIndeterminate @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression + | _: CreateArray) + if shouldCast(checkCastWithoutIndeterminate.children) => + val newChildren = collateToSingleType(checkCastWithoutIndeterminate.children) + checkCastWithoutIndeterminate.withNewChildren(newChildren) + } + + def shouldCast(types: Seq[Expression]): Boolean = { +types.filter(e => hasStringType(e.dataType)) + .map(e => extractStringType(e.dataType).collationId).distinct.size > 1 + } + + /** + * Checks whether given data type contains StringType. + */ + @tailrec + def hasStringType(dt: DataType): Boolean = dt match { +case _: StringType => true +case ArrayType(et, _) => hasStringType(et) +case _ => false + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, collationId: Int): Option[Expression] = +castStringType(expr.dataType, collationId).map { dt => + if (dt == expr.dataType) expr else Cast(expr, dt) +} + + private def castStringType(inType: AbstractDataType, collationId: Int): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId == collationId => st + case _: StringType => StringType(collationId) + case ArrayType(arrType, nullable) => +castStringType(arrType, collationId).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val collationId = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, collationId).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(exprs: Seq[Expression]): Int = { +val explicitTypes = exprs.filter(hasExplicitCollation) + .map(e => extractStringType(e.dataType).collationId).distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => explicitTypes.head + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544061247 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +// Case when we do fail if resulting collation is indeterminate +case checkCastWithoutIndeterminate @ (_: BinaryExpression + | _: Predicate + | _: SortOrder + | _: ExpectsInputTypes + | _: ComplexTypeMergingExpression + | _: CreateArray) + if shouldCast(checkCastWithoutIndeterminate.children) => + val newChildren = collateToSingleType(checkCastWithoutIndeterminate.children) + checkCastWithoutIndeterminate.withNewChildren(newChildren) + } + + def shouldCast(types: Seq[Expression]): Boolean = { +types.filter(e => hasStringType(e.dataType)) + .map(e => extractStringType(e.dataType).collationId).distinct.size > 1 + } + + /** + * Checks whether given data type contains StringType. + */ + @tailrec + def hasStringType(dt: DataType): Boolean = dt match { +case _: StringType => true +case ArrayType(et, _) => hasStringType(et) +case _ => false + } + + /** + * Extracts StringTypes from filtered hasStringType + */ + @tailrec + private def extractStringType(dt: DataType): StringType = dt match { +case st: StringType => st +case ArrayType(et, _) => extractStringType(et) + } + + /** + * Casts given expression to collated StringType with id equal to collationId only + * if expression has StringType in the first place. + * @param expr + * @param collationId + * @return + */ + def castStringType(expr: Expression, collationId: Int): Option[Expression] = +castStringType(expr.dataType, collationId).map { dt => + if (dt == expr.dataType) expr else Cast(expr, dt) +} + + private def castStringType(inType: AbstractDataType, collationId: Int): Option[DataType] = { +@Nullable val ret: DataType = inType match { + case st: StringType if st.collationId == collationId => st + case _: StringType => StringType(collationId) + case ArrayType(arrType, nullable) => +castStringType(arrType, collationId).map(ArrayType(_, nullable)).orNull + case _ => null +} +Option(ret) + } + + /** + * Collates input expressions to a single collation. + */ + def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = { +val collationId = getOutputCollation(exprs) + +exprs.map(e => castStringType(e, collationId).getOrElse(e)) + } + + /** + * Based on the data types of the input expressions this method determines + * a collation type which the output will have. This function accepts Seq of + * any expressions, but will only be affected by collated StringTypes or + * complex DataTypes with collated StringTypes (e.g. ArrayType) + */ + def getOutputCollation(exprs: Seq[Expression]): Int = { +val explicitTypes = exprs.filter(hasExplicitCollation) + .map(e => extractStringType(e.dataType).collationId).distinct + +explicitTypes.size match { + // We have 1 explicit collation + case 1 => explicitTypes.head + // Multiple explicit collations occurred + case size if size > 1 => +throw QueryCompilationErrors + .explicitCollationMismatchError( +
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544060704 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -112,12 +112,14 @@ case class Collate(child: Expression, collationName: String) since = "4.0.0", group = "string_funcs") // scalastyle:on line.contains.tab -case class Collation(child: Expression) extends UnaryExpression with RuntimeReplaceable { - override def dataType: DataType = StringType +case class Collation(child: Expression) + extends UnaryExpression with RuntimeReplaceable with ExpectsInputTypes { + override def dataType: DataType = child.dataType Review Comment: shouldn't the collation name be a normal string type? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544055842 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import javax.annotation.Nullable + +import scala.annotation.tailrec + +import org.apache.spark.sql.catalyst.expressions.{BinaryExpression, Cast, Collate, ComplexTypeMergingExpression, CreateArray, ExpectsInputTypes, Expression, Predicate, SortOrder} +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType} + +object CollationTypeCasts extends TypeCoercionRule { + override val transform: PartialFunction[Expression, Expression] = { +case e if !e.childrenResolved => e +// Case when we do fail if resulting collation is indeterminate Review Comment: I thought we excluded indeterminate collation from this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47210][SQL] Addition of implicit casting without indeterminate support [spark]
cloud-fan commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1544004871 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -467,6 +467,24 @@ ], "sqlState" : "42704" }, + "COLLATION_MISMATCH" : { +"message" : [ + "Could not determine which collation to use for string comparison." Review Comment: shall we say "string functions and operators"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org