Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan closed pull request #46083: [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type URL: https://github.com/apache/spark/pull/46083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on PR #46083: URL: https://github.com/apache/spark/pull/46083#issuecomment-2072351055 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on PR #46083: URL: https://github.com/apache/spark/pull/46083#issuecomment-2071987485 @cloud-fan all tests passing after scalastyle fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1575780017 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -268,23 +296,6 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val provider = table.provider.get val options = new SourceOptions(table.storage.properties) -// To work around some hive metastore issues, e.g. not case-preserving, bad decimal type Review Comment: https://github.com/apache/spark/assets/154237371/d761434c-e1bd-418c-bf3c-3b36b368101a;> checked, it is indeed called only once -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1575698057 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -268,23 +296,6 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val provider = table.provider.get val options = new SourceOptions(table.storage.properties) -// To work around some hive metastore issues, e.g. not case-preserving, bad decimal type Review Comment: can you double check if `createDataSourceTable` is only called in one place so that it's safe to move the code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1575038248 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -320,6 +322,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } table.copy( +schema = getHiveCompatibleSchema(table.schema), Review Comment: actually i really liked you previous comment about converting the schema at the similar place so I refactored table property creation to be done before the call to `createDataSourceTable` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1575038248 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -320,6 +322,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } table.copy( +schema = getHiveCompatibleSchema(table.schema), Review Comment: actually i liked you previous comment about converting the schema at the similar place so I refactored table property creation to be done before the call to `createDataSourceTable` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1574582893 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -320,6 +322,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } table.copy( +schema = getHiveCompatibleSchema(table.schema), Review Comment: Can we add some comments saying that we convert the table schema at the last step here, so that the table schema JSON string in the table properties still uses the original schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1574579235 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -239,6 +240,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat ignoreIfExists) Review Comment: ah you are right -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1574440520 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -239,6 +240,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat ignoreIfExists) Review Comment: But isn't schema json being saved to table properties inside `createDataSourceTable`? As far as I see it if we convert the schema before we will lose the ability to read back the collations -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1574440520 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -239,6 +240,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat ignoreIfExists) Review Comment: But isn't schema json being saved to table properties inside `createDataSourceTable`? As far as I see it if we convert the schema before we will lose the ability to read it back -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1574411630 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -239,6 +240,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat ignoreIfExists) Review Comment: shall we convert the table schema here, other than inside `createDataSourceTable`? The code looks clearer if we convert the table schema at the similar place for both Hive serde and data source tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1570322161 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: I fixed this in a separate PR so please check it out #46116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: when I wrote this method a while back it was called `hasNonDefaultCollatedString` and the logic was `collationId != 0`. However the name and logic were changed a while back by @mihailom-db when he introduced a concept of default collation (session collation). All in all, I think the method should be used for exactly this use-case but we should revisit naming/logic of it separately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: when I wrote this method a while back it was called `hasNonDefaultCollatedString` and the logic was `collationId != 0`. However the name and logic were changed a while back by @mihailom-db when he introduced a concept of default collation (session collation). All in all, method here should be used for exactly this use-case but we should revisit naming/logic of it separately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: when I wrote this method a while back it was called `hasNonDefaultCollatedString` and the logic was `collationId != 0`. However the name and logic were changed a while back by @mihailom-db when he introduced a concept of default collation (session collation). All in all, method here should be used for exactly this use-case but we revisit naming/logic of it separately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: when a wrote this method a while back it was called `hasNonDefaultCollatedString` and the logic was `collationId != 0`. However the name and logic were changed a while back by @mihailom-db when he introduced a concept of default collation (session collation). All in all, method here should be used for exactly this use-case but we revisit naming/logic of it separately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568644785 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType == toDataType } } + + /** + * Compares two types, ignoring nullability of ArrayType, MapType, StructType, ignoring case + * sensitivity of field names in StructType as well as differences in collation for String types. + */ + def equalsIgnoreCaseNullabilityAndCollation(from: DataType, to: DataType): Boolean = { Review Comment: I noticed that this is in `sql/api`. If we are not going to use it in the Spark Connect client, we should just add this method to `DataTypeUtils` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568622375 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: i'm not a fan of the name, but that is what the function will actually do. It will check if the collation `supportsBinaryOrdering` which only UTF8_BINARY collation does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568655624 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: when a wrote this method a while back it was called `hasNonDefaultCollatedString` and the logic was `collationId != 0`. However the name was changed a while back by @mihailom-db when he introduced a concept of default collation (session collation). All in all, method here should be used for exactly this use-case but we revisit naming/logic of it separately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568607199 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: I think `binarySortable` is not the property we are looking for. We need to fine collation id != 0, according to how `StringType.typeName` is implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568652156 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType == toDataType } } + + /** + * Compares two types, ignoring nullability of ArrayType, MapType, StructType, ignoring case + * sensitivity of field names in StructType as well as differences in collation for String types. + */ + def equalsIgnoreCaseNullabilityAndCollation(from: DataType, to: DataType): Boolean = { Review Comment: sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568645699 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: I don't think this is future-proof. I think it's better to write new code to explicitly check `collationId != 0` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: i'm not a fan of the name `hasNonBinarySortableCollatedString`, but that method does what you described. It will check if a collation `supportsBinaryEquality` => if id == 0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: i'm not a fan of the name `hasNonBinarySortableCollatedString`, but that method does what you described. It will check if a collation `supportsBinaryEquality` => if id == 0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568617954 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -303,4 +303,35 @@ private[spark] object SchemaUtils { case _ => false } } + + /** + * Replaces any collated string type with non collated StringType + * recursively in the given schema. + */ + def replaceCollatedStringWithStringInSchema(schema: StructType): StructType = { +StructType(schema.map { field => + if (hasNonBinarySortableCollatedString(field.dataType)) { Review Comment: i'm not a fan of the name `hasNonBinarySortableCollatedString`, but that method does what you described it will check if a collation `supportsBinaryEquality` => if id == 0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568616448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -47,6 +47,14 @@ object DataTypeUtils { DataType.equalsIgnoreCaseAndNullability(from, to) } + /** + * Compares two types, ignoring nullability of ArrayType, MapType, StructType, ignoring case + * sensitivity of field names in StructType as well as differences in collation for String types. + */ + def equalsIgnoreCaseNullabilityAndStringTypes(from: DataType, to: DataType): Boolean = { Review Comment: both `equalsIgnoreNullability` and `equalsIgnoreCaseAndNullability` do the same thing so I just wanted to be consistent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568607999 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -47,6 +47,14 @@ object DataTypeUtils { DataType.equalsIgnoreCaseAndNullability(from, to) } + /** + * Compares two types, ignoring nullability of ArrayType, MapType, StructType, ignoring case + * sensitivity of field names in StructType as well as differences in collation for String types. + */ + def equalsIgnoreCaseNullabilityAndStringTypes(from: DataType, to: DataType): Boolean = { Review Comment: why do we need this extra level of indirection? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]
cloud-fan commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1568600521 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -454,4 +454,29 @@ object DataType { case (fromDataType, toDataType) => fromDataType == toDataType } } + + /** + * Compares two types, ignoring nullability of ArrayType, MapType, StructType, ignoring case + * sensitivity of field names in StructType as well as differences in collation for String types. + */ + def equalsIgnoreCaseNullabilityAndStringTypes(from: DataType, to: DataType): Boolean = { Review Comment: ```suggestion def equalsIgnoreCaseNullabilityAndCollation(from: DataType, to: DataType): Boolean = { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org