Re: Adding new columns to parquet based Hive table

Kumar V Wed, 14 Jan 2015 13:21:06 -0800

Hi,    Thanks for your response.I can't do another insert as the data is 
already in the table. Also, since there is a lot of data in the table already, 
I am trying to find a way to avoid reprocessing/reloading.
Thanks.


     On Wednesday, January 14, 2015 2:47 PM, Daniel Haviv 
<daniel.ha...@veracity-group.com> wrote:
   

 Hi Kumar,Altering the table just update's Hive's metadata without updating 
parquet's schema.I believe that if you'll insert to your table (after adding 
the column) you'll be able to later on select all 3 columns.
Daniel
On 14 בינו׳ 2015, at 21:34, Kumar V <kumarbuyonl...@yahoo.com> wrote:


Hi,
    Any ideas on how to go about this ? Any insights you have would be helpful. 
I am kinda stuck here.
Here are the steps I followed on hive 0.13
1) create table t (f1 String, f2 string) stored as Parquet;2) upload parquet 
files with 2 fields3) select * from t; <---- Works fine.4) alter table t add 
columns (f3 string);5) Select * from t; <----- ERROR  "Caused by: 
java.lang.IllegalStateException: Column f3 at index 2 does not exist at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:116)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
  at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
  at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
  at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)


 

     On Wednesday, January 7, 2015 2:55 PM, Kumar V <kumarbuyonl...@yahoo.com> 
wrote:
   

 Hi,    I have a Parquet format Hive table with a few columns.  I have loaded a 
lot of data to this table already and it seems to work.I have to add a few new 
columns to this table.  If I add new columns, queries don't work anymore since 
I have not reloaded the old data.Is there a way to add new fields to the table 
and not reload the old Parquet files and make the query work ?
I tried this in Hive 0.10 and also on hive 0.13.  Getting an error in both 
versions.
Please let me know how to handle this.
Regards,Kumar.

Re: Adding new columns to parquet based Hive table

Reply via email to