Attila Magyar created HIVE-23253:
------------------------------------

             Summary: Synchronization between external SerDe schemas and 
Metastore
                 Key: HIVE-23253
                 URL: https://issues.apache.org/jira/browse/HIVE-23253
             Project: Hive
          Issue Type: Bug
          Components: Hive, Metastore
    Affects Versions: 3.1.2
            Reporter: Attila Magyar
             Fix For: 3.0.0


In HIVE-15995 an ALTER <table> UPDATE COLUMNS statement was introduce to sync 
external SerDe schema changes with the metastore. This command can only be 
manually invoked.

See it in the documentation.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns

 

Maybe it would make sense to run an update columns automatically in certain 
cases to prevent problems coming from cases where the user forgets running the 
update columns manually.

 

One way to reproduce the issue is to change the schema url via an alter table 
statement.
{code:java}
[root@c7401 vagrant]# cat test_schema1.avsc
{
"type":"record",
"name":"test_schema",
"namespace":"gdc_datascience_qa",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
}
]
}[root@c7401 vagrant]# cat test_schema2.avsc
{
"type":"record",
"name":"test_schema",
"namespace":"gdc_datascience_qa",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"last_name",
"type":[
"null",
"string"
],
"default":null
}
]
}
 {code}
{code:java}
 $ hadoop fs -copyFromLocal *.avsc /tmp/
  [beeline] create external table t1 stored as avro tblproperties 
('avro.schema.url'='/tmp/test_schema1.avsc');
  [beeline] alter table t1 set 
tblproperties('avro.schema.url'='/tmp/test_schema2.avsc'); 
  [beeline] insert into t1 values ('n1', 'l1');
  [beeline] create external table t2 stored as avro tblproperties 
('avro.schema.url'='/tmp/test_schema2.avsc');
  [beeline] insert into t2 values ('n2', 'l2');
  [beeline] insert overwrite table t1 select * from t2; {code}
Error:
{code:java}
 MetaException(message:Column last_name doesn't exist in table t1 in database 
default)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446
 {code}
Running an ALTER UPDATE COLUMNS fixes the problem.

 

cc: [~szita]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to