> Noted. Filing this issue because we have legacy data which was generated
>in incompatible ways, and it works fine with MR. We'll try to change the
>data ourselves.
Sure, the easy workaround for this is to do ³insert overwrite table foo as
select * from table² (or partition self-insert) with
set
hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
That will not be fast to query/process since for each row the schema check
kicks in and has to ask ³What¹s the schema for this row?² to the hive
IOContext.
While the Hive+Tez fast path makes the schema differentation way before
that with an
// this is the bit where we make sure we don't group across partition
// schema boundaries
if (schemaEvolved(s, prevSplit, groupAcrossFiles, work)) {
HTH.
Cheers,
Gopal