[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-16 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r248505014
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
 ##
 @@ -283,6 +284,7 @@ class WriteJobDescription(
 val allColumns: Seq[Attribute],
 val dataColumns: Seq[Attribute],
 val partitionColumns: Seq[Attribute],
+val normalizedPartitionExpression: Seq[Expression],
 
 Review comment:
   Modified, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-15 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r248152216
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ##
 @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with 
SharedSQLContext {
   checkAnswer(spark.table("t4"), Row(0, 0))
 }
   }
+
+  test("Null and '' values should not cause dynamic partition failure of 
string types") {
+withTable("t1", "t2") {
+  spark.range(3).write.saveAsTable("t1")
+  spark.sql("select id, cast(case when id = 1 then '' else null end as 
string) as p" +
+" from t1").write.partitionBy("p").saveAsTable("t2")
+  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, 
null), Row(2, null)))
+}
+  }
 
 Review comment:
   Ok, modified, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-15 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r248142075
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ##
 @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with 
SharedSQLContext {
   checkAnswer(spark.table("t4"), Row(0, 0))
 }
   }
+
+  test("Null and '' values should not cause dynamic partition failure of 
string types") {
+withTable("t1", "t2") {
+  spark.range(3).write.saveAsTable("t1")
+  spark.sql("select id, cast(case when id = 1 then '' else null end as 
string) as p" +
+" from t1").write.partitionBy("p").saveAsTable("t2")
+  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, 
null), Row(2, null)))
+}
+  }
 
 Review comment:
   Sorry, I don't quite understand what _'test of w/o CodeGen'_ means. Would 
you like to give  an example, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-15 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r248142075
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ##
 @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with 
SharedSQLContext {
   checkAnswer(spark.table("t4"), Row(0, 0))
 }
   }
+
+  test("Null and '' values should not cause dynamic partition failure of 
string types") {
+withTable("t1", "t2") {
+  spark.range(3).write.saveAsTable("t1")
+  spark.sql("select id, cast(case when id = 1 then '' else null end as 
string) as p" +
+" from t1").write.partitionBy("p").saveAsTable("t2")
+  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, 
null), Row(2, null)))
+}
+  }
 
 Review comment:
   Sorry, I don't quite understand what _test of w/o CodeGen_ means. Would you 
like to give  an example, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-15 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r248142075
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ##
 @@ -44,4 +44,13 @@ class FileFormatWriterSuite extends QueryTest with 
SharedSQLContext {
   checkAnswer(spark.table("t4"), Row(0, 0))
 }
   }
+
+  test("Null and '' values should not cause dynamic partition failure of 
string types") {
+withTable("t1", "t2") {
+  spark.range(3).write.saveAsTable("t1")
+  spark.sql("select id, cast(case when id = 1 then '' else null end as 
string) as p" +
+" from t1").write.partitionBy("p").saveAsTable("t2")
+  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, 
null), Row(2, null)))
+}
+  }
 
 Review comment:
   Sorry, I don't quite understand what w/o CodeGen means. Would you like to 
give  an example, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-15 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r248142099
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ##
 @@ -49,6 +52,18 @@ object FileFormatWriter extends Logging {
   customPartitionLocations: Map[TablePartitionSpec, String],
   outputColumns: Seq[Attribute])
 
+  /** A function that converts the empty string to null for partition values. 
*/
+  case class Empty2Null(child: Expression) extends UnaryExpression with 
String2StringExpression {
+override def convert(v: UTF8String): UTF8String = if (v.numBytes() == 0) 
null else v
+override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+  nullSafeCodeGen(ctx, ev, c =>
+s"""if ($c.numBytes() == 0) {
+   |${ev.isNull} = true;
+   |${ev.value} = null; }
+   |else ${ev.value} = $c;""".stripMargin)
+}
 
 Review comment:
   Modified, thanks for review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-14 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r247748156
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
 ##
 @@ -169,7 +182,13 @@ class DynamicPartitionDataWriter(
 
   /** Extracts the partition values out of an input row. */
   private lazy val getPartitionValues: InternalRow => UnsafeRow = {
-val proj = UnsafeProjection.create(description.partitionColumns, 
description.allColumns)
+val partitionExpression =
+  toBoundExprs(description.partitionColumns, description.allColumns).map {
+case e: Expression if e.dataType == StringType =>
+  Empty2Null(e)
 
 Review comment:
   @cloud-fan Thanks for review, I have moved it before sort, 
`PartitionColumns` is retained because it is used to calculate 
`getPartitionPath`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types

2019-01-14 Thread GitBox
eatoncys commented on a change in pull request #23010: [SPARK-26012][SQL]Null 
and '' values should not cause dynamic partition failure of string types
URL: https://github.com/apache/spark/pull/23010#discussion_r247748156
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
 ##
 @@ -169,7 +182,13 @@ class DynamicPartitionDataWriter(
 
   /** Extracts the partition values out of an input row. */
   private lazy val getPartitionValues: InternalRow => UnsafeRow = {
-val proj = UnsafeProjection.create(description.partitionColumns, 
description.allColumns)
+val partitionExpression =
+  toBoundExprs(description.partitionColumns, description.allColumns).map {
+case e: Expression if e.dataType == StringType =>
+  Empty2Null(e)
 
 Review comment:
   @cloud-fan Thanks for review, I have moved it before sort, 
`PartitionColumns` isretained because it is used to calculate `getPartitionPath`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org