cloud-fan commented on a change in pull request #31281:
URL: https://github.com/apache/spark/pull/31281#discussion_r562099377



##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/CharVarcharCodegenUtils.java
##########
@@ -22,28 +22,25 @@
 public class CharVarcharCodegenUtils {
   private static final UTF8String SPACE = UTF8String.fromString(" ");
 
-  /**
-   *  Trailing spaces do not count in the length check. We don't need to 
retain the trailing
-   *  spaces, as we will pad char type columns/fields at read time.
-   */
   public static UTF8String charTypeWriteSideCheck(UTF8String inputStr, int 
limit) {
     if (inputStr == null) {
       return null;
     } else {
-      UTF8String trimmed = inputStr.trimRight();
-      if (trimmed.numChars() > limit) {
-        throw new RuntimeException("Exceeds char type length limitation: " + 
limit);
+      int numChars = inputStr.numChars();
+      if (numChars == limit) {
+        return inputStr;
+      } else if (numChars <= limit) {
+        return inputStr.rpad(limit, SPACE);
+      } else {
+        int maxAllowedNumTailSpaces = numChars - limit;

Review comment:
       nit: `numTailSpacesToTrim` seems better

##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/CharVarcharCodegenUtils.java
##########
@@ -54,8 +51,6 @@ public static UTF8String varcharTypeWriteSideCheck(UTF8String 
inputStr, int limi
       if (numChars <= limit) {
         return inputStr;
       } else {
-        // Trailing spaces do not count in the length check. We need to retain 
the trailing spaces
-        // (truncate to length N), as there is no read-time padding for 
varchar type.
         int maxAllowedNumTailSpaces = numChars - limit;

Review comment:
       ditto

##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/CharVarcharCodegenUtils.java
##########
@@ -54,8 +51,6 @@ public static UTF8String varcharTypeWriteSideCheck(UTF8String 
inputStr, int limi
       if (numChars <= limit) {
         return inputStr;
       } else {
-        // Trailing spaces do not count in the length check. We need to retain 
the trailing spaces
-        // (truncate to length N), as there is no read-time padding for 
varchar type.
         int maxAllowedNumTailSpaces = numChars - limit;

Review comment:
       we can probably create a private method for the logic below to save 
duplicated code.

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
##########
@@ -56,7 +56,7 @@ trait CharVarcharTestSuite extends QueryTest with 
SQLTestUtils {
       checkAnswer(spark.table("t"), Row("1", "a" + " " * 4))
       checkColType(spark.table("t").schema(1), CharType(5))
 
-      sql("ALTER TABLE t DROP PARTITION(c='a')")
+      sql("ALTER TABLE t DROP PARTITION(c='a    ')")

Review comment:
       is it the same with hive? or we should also add padding for char type 
column in partition spec?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to