Re: Inconsistent handling of schema in Avro tables

2018-07-16 Thread Todd Lipcon
On Thu, Jul 12, 2018 at 5:07 PM, Bharath Vissapragada < bhara...@cloudera.com.invalid> wrote: > On Thu, Jul 12, 2018 at 12:03 PM Todd Lipcon > wrote: > > > > So, I think my proposal here is: > > > > 1. Query behavior on existing tables > > - If the table-level format is non-Avro, > > - AND the

Re: Inconsistent handling of schema in Avro tables

2018-07-12 Thread Bharath Vissapragada
On Thu, Jul 12, 2018 at 12:03 PM Todd Lipcon wrote: > Again there's inconsistency with Hive: the presence of a single Avro > partition doesn't change the table-level schema. > > The interesting thing is that, when I modified Impala to have a similar > behavior, I got the following error from the

Re: Inconsistent handling of schema in Avro tables

2018-07-12 Thread Todd Lipcon
Again there's inconsistency with Hive: the presence of a single Avro partition doesn't change the table-level schema. The interesting thing is that, when I modified Impala to have a similar behavior, I got the following error from the backend when trying to query the data: WARNINGS: Unresolvable

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Todd Lipcon
Turns out it's even a bit more messy. The presence of one or more avro partitions can change the types of existing columns, even if there is no explicit avro schema specified for the table: https://gist.github.com/5018d6ff50f846c72762319eb7cf5ca8 Not quite sure how to handle this one in a world

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Bharath Vissapragada
Agreed. On Wed, Jul 11, 2018 at 8:55 PM Todd Lipcon wrote: > Your commit message there makes sense, Bharath -- we should set > 'avroSchema' in the descriptor in case any referenced partition is avro, > because the scanner needs that info. However, we don't need to also > override the

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Todd Lipcon
Your commit message there makes sense, Bharath -- we should set 'avroSchema' in the descriptor in case any referenced partition is avro, because the scanner needs that info. However, we don't need to also override the table-level schema. So, I think we can preserve the fix that you made while also

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Bharath Vissapragada
I added this functionality where adding an Avro partition in a mixed partition table resets the table level schema. While I don't exactly remember why we chose this path, I do recall that we debated quite a bit

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Edward Capriolo
I know that Hive can deal with schema being different per partition, but I really hesitate to understand why someone would want to do this. If someone asked me to support a mixed avro/parquet table I would suggest they create a view. If they kept insisting I would reply "Well it is your funeral."

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Tim Armstrong
The behaviour of Avro schemas in all these cases has always been rather mysterious to me. Before you wrote this email I would have assumed that Impala's behaviour would be like Hive's behaviour. I agree with the principle that the creation of a partition without changes to table metadata