[GitHub] [spark] fhygh opened a new pull request #32501: [SPARK-35359][SQL]Insert data with char/varchar datatype will fail when data length exceed length limitation

GitBox Tue, 11 May 2021 02:17:19 -0700


fhygh opened a new pull request #32501:
URL: https://github.com/apache/spark/pull/32501



   
   ### What changes were proposed in this pull request?
   This PR is used to fix this bug:
   set spark.sql.legacy.charVarcharAsString=true;
   create table chartb01(a char(3));
   insert into chartb01 select 'aaaaa';
   here we expect the data of table chartb01 is 'aaa', but it runs failed.
   
   
   ### Why are the changes needed?
   Improve backward compatibility
   
   ‘‘‘spark-sql
   spark-sql>
            > create table tchar01(col char(2)) using parquet;
   Time taken: 0.767 seconds
   spark-sql>
            > insert into tchar01 select 'aaa';
   ERROR | Executor task launch worker for task 0.0 in stage 0.0 (TID 0) | 
Aborting task | org.apache.spark.util.Utils.logError(Logging.scala:94)
   java.lang.RuntimeException: Exceeds char/varchar type length limitation: 2
           at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.trimTrailingSpaces(CharVarcharCodegenUtils.java:31)
           at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.charTypeWriteSideCheck(CharVarcharCodegenUtils.java:44)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
           at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:279)
           at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1500)
           at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
           at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:212)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:131)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1466)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   ‘‘‘
   
   
   ### Does this PR introduce _any_ user-facing change?
   No (the legacy config is false by default).
   
   
   ### How was this patch tested?
   Added unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] fhygh opened a new pull request #32501: [SPARK-35359][SQL]Insert data with char/varchar datatype will fail when data length exceed length limitation

Reply via email to