[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r248505014 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala ## @@ -283,6 +284,7 @@ class WriteJobDescription( val allColumns: Seq[Attribute], val dataColumns: Seq[Attribute], val partitionColumns: Seq[Attribute], +val normalizedPartitionExpression: Seq[Expression], Review comment: Modified, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r248152216 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ## @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with SharedSQLContext { checkAnswer(spark.table("t4"), Row(0, 0)) } } + + test("Null and '' values should not cause dynamic partition failure of string types") { +withTable("t1", "t2") { + spark.range(3).write.saveAsTable("t1") + spark.sql("select id, cast(case when id = 1 then '' else null end as string) as p" + +" from t1").write.partitionBy("p").saveAsTable("t2") + checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), Row(2, null))) +} + } Review comment: Ok, modified, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r248142075 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ## @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with SharedSQLContext { checkAnswer(spark.table("t4"), Row(0, 0)) } } + + test("Null and '' values should not cause dynamic partition failure of string types") { +withTable("t1", "t2") { + spark.range(3).write.saveAsTable("t1") + spark.sql("select id, cast(case when id = 1 then '' else null end as string) as p" + +" from t1").write.partitionBy("p").saveAsTable("t2") + checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), Row(2, null))) +} + } Review comment: Sorry, I don't quite understand what _'test of w/o CodeGen'_ means. Would you like to give an example, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r248142075 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ## @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with SharedSQLContext { checkAnswer(spark.table("t4"), Row(0, 0)) } } + + test("Null and '' values should not cause dynamic partition failure of string types") { +withTable("t1", "t2") { + spark.range(3).write.saveAsTable("t1") + spark.sql("select id, cast(case when id = 1 then '' else null end as string) as p" + +" from t1").write.partitionBy("p").saveAsTable("t2") + checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), Row(2, null))) +} + } Review comment: Sorry, I don't quite understand what _test of w/o CodeGen_ means. Would you like to give an example, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r248142075 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ## @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with SharedSQLContext { checkAnswer(spark.table("t4"), Row(0, 0)) } } + + test("Null and '' values should not cause dynamic partition failure of string types") { +withTable("t1", "t2") { + spark.range(3).write.saveAsTable("t1") + spark.sql("select id, cast(case when id = 1 then '' else null end as string) as p" + +" from t1").write.partitionBy("p").saveAsTable("t2") + checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), Row(2, null))) +} + } Review comment: Sorry, I don't quite understand what w/o CodeGen means. Would you like to give an example, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r248142099 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -49,6 +52,18 @@ object FileFormatWriter extends Logging { customPartitionLocations: Map[TablePartitionSpec, String], outputColumns: Seq[Attribute]) + /** A function that converts the empty string to null for partition values. */ + case class Empty2Null(child: Expression) extends UnaryExpression with String2StringExpression { +override def convert(v: UTF8String): UTF8String = if (v.numBytes() == 0) null else v +override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { + nullSafeCodeGen(ctx, ev, c => +s"""if ($c.numBytes() == 0) { + |${ev.isNull} = true; + |${ev.value} = null; } + |else ${ev.value} = $c;""".stripMargin) +} Review comment: Modified, thanks for review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r247748156 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala ## @@ -169,7 +182,13 @@ class DynamicPartitionDataWriter( /** Extracts the partition values out of an input row. */ private lazy val getPartitionValues: InternalRow => UnsafeRow = { -val proj = UnsafeProjection.create(description.partitionColumns, description.allColumns) +val partitionExpression = + toBoundExprs(description.partitionColumns, description.allColumns).map { +case e: Expression if e.dataType == StringType => + Empty2Null(e) Review comment: @cloud-fan Thanks for review, I have moved it before sort, `PartitionColumns` is retained because it is used to calculate `getPartitionPath` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#discussion_r247748156 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala ## @@ -169,7 +182,13 @@ class DynamicPartitionDataWriter( /** Extracts the partition values out of an input row. */ private lazy val getPartitionValues: InternalRow => UnsafeRow = { -val proj = UnsafeProjection.create(description.partitionColumns, description.allColumns) +val partitionExpression = + toBoundExprs(description.partitionColumns, description.allColumns).map { +case e: Expression if e.dataType == StringType => + Empty2Null(e) Review comment: @cloud-fan Thanks for review, I have moved it before sort, `PartitionColumns` isretained because it is used to calculate `getPartitionPath` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org