Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-09-02 Thread Thai Bui
Here’s all I can find related to this idea. ParquetHiveSerde is where the raw parquet data is unpacked into readable POJO. Everything started with this root array ObjectInspector

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-30 Thread Anup Tiwari
Hi Thai, Any links or examples for achieving this? Since I do not have much idea of this. On Thu, 30 Aug 2018 20:08 Thai Bui, wrote: > Another option is to implement a custom ParquetInputFormat extending the > current Hive MR Parquet format and handle schema coersion at the input >

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-30 Thread Thai Bui
Another option is to implement a custom ParquetInputFormat extending the current Hive MR Parquet format and handle schema coersion at the input split/record reader level. This would be more involving but guarantee to work if you could add auxiliary jars to your Hive cluster. On Wed, Aug 29, 2018

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-29 Thread Gopal Vijayaraghavan
> Because I believe string should be able to handle integer as well.  No, because it is not a lossless conversion. Comparisons are lost. "9" > "11", but 9 < 11 Even float -> double is lossy (because of epsilon). You can always apply the Hive workaround suggested, otherwise you might find

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-29 Thread Anup Tiwari
Hi, > optional int32 action_date (DATE); > optional binary action_date (UTF8); Those two column types aren't convertible implicitly between each other, which is probably the problem In above statement, are you referring to date/utf-8 OR int32/binary.. Because I believe string should be able to

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-29 Thread Gopal Vijayaraghavan
Hi, > on some days parquet was created by hive 2.1.1 and on some days it was > created by using glue … > After some drill down i saw schema of columns inside both type of parquet > file using parquet tool and found different data types for some column ... > optional int32 action_date (DATE); >

Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-29 Thread Anup Tiwari
Hi All, We have a use case where we have created a partition external table in hive 2.3.3 which is pointing to a parquet location where we have date level folder and on some days parquet was created by hive 2.1.1 and on some days it was created by using glue. Now when we trying to read this data,