[GitHub] [spark] AngersZhuuuu opened a new pull request #34218: [SPARK-35531][SQL] Fix can not insert into hive bucket table if create table with upper case schema

GitBox Thu, 07 Oct 2021 20:31:15 -0700


AngersZhuuuu opened a new pull request #34218:
URL: https://github.com/apache/spark/pull/34218



   
   ### What changes were proposed in this pull request?
   when convert to HiveTable, respect table schema cases.
   
   ### Why are the changes needed?
   When user create a hive bucket table with upper case schema, the table 
schema will be stored as lower cases while bucket column info will stay the 
same with user input.
   
   if we try to insert into this table, an HiveException reports bucket column 
is not in table schema.
   
   here is a simple repro
   ```
   spark.sql("""
     CREATE TABLE TEST1(
       V1 BIGINT,
       S1 INT)
     PARTITIONED BY (PK BIGINT)
     CLUSTERED BY (V1)
     SORTED BY (S1)
     INTO 200 BUCKETS
     STORED AS PARQUET """).show
   
   spark.sql("INSERT INTO TEST1 SELECT * FROM VALUES(1,1,1)").show
   ```
   Error message:
   ```
   scala> spark.sql("INSERT INTO TEST1 SELECT * FROM VALUES(1,1,1)").show
   org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not part 
of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
FieldSchema(name:s1, type:int, comment:null)]
     at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
     at 
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitions(HiveExternalCatalog.scala:1242)
     at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitions(ExternalCatalogWithListener.scala:254)
     at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitions(SessionCatalog.scala:1166)
     at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:103)
     at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
     at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
     at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)
     at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
     at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
     at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
     at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
     at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
     ... 47 elided
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns 
V1 is not part of the table columns ([FieldSchema(name:v1, type:bigint, 
comment:null), FieldSchema(name:s1, type:int, comment:null)]
     at org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1082)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitions$1(HiveClientImpl.scala:732)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitions(HiveClientImpl.scala:731)
     at 
org.apache.spark.sql.hive.client.HiveClient.getPartitions(HiveClient.scala:222)
     at 
org.apache.spark.sql.hive.client.HiveClient.getPartitions$(HiveClient.scala:218)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitions(HiveClientImpl.scala:91)
     at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitions$1(HiveExternalCatalog.scala:1245)
     at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
     ... 69 more
   ```
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu opened a new pull request #34218: [SPARK-35531][SQL] Fix can not insert into hive bucket table if create table with upper case schema

Reply via email to