Re: Hive on Tez fails with Sequence files having different key classes

Gopal Vijayaraghavan Tue, 28 Jul 2015 22:07:10 -0700


> Noted. Filing this issue because we have legacy data which was generated
>in incompatible ways, and it works fine with MR. We'll try to change the
>data ourselves.


Sure, the easy workaround for this is to do ³insert overwrite table foo as
select * from table² (or partition self-insert) with

set 
hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

That will not be fast to query/process since for each row the schema check
kicks in and has to ask ³What¹s the schema for this row?² to the hive
IOContext.

While the Hive+Tez fast path makes the schema differentation way before
that with an
 
 // this is the bit where we make sure we don't group across partition
 // schema boundaries

 if (schemaEvolved(s, prevSplit, groupAcrossFiles, work)) {

HTH.

Cheers,
Gopal

Re: Hive on Tez fails with Sequence files having different key classes

Reply via email to