Hi, Experts,

We have an application which creates a ORC table in hive to store the
events. The table schema needs to be changed dynamically since new data
adds more new columns (append only). We noticed the application needs to
use alter table to change the table schema before loading the new data in
order to read previous records (before schema change) correctly.


For example:

create table t1 (c1 struct<f1:string, f2:int>) partitions by (ds timestamp)
stored as orc;

load data inpath ‘/old/data/file/path’ into table t1 partition
(ds=‘2015-06-02 00:00:00’);

alter table t1 change c1 c1 struct<f1:string, f2:int, f3:string>;

load data inpath ‘/new/data/file/path’ into table t1 partition
(ds=‘2015-06-02 00:00:00’);

If ALTER is not executed, say, just load both old and new ORC files into
another table with the final schema, the previous records can not be read
at all with hitting ClassCastException. May I know what magic ALTER
statement does to have ORC reader reading the file before schema change?
Thanks very much for your help!


Jessica

Reply via email to