This seems out of the blue but my initial benchmarks have shown that
there's no performance gain when Hive index is used with Tez engine. I'm
not sure why, but several posts online have suggested that Tez engine does
not support Hive index (bitmap, compact). Is true? If yes, that is sad.

I understand that ORC format is a much better alternative if you manage
your own tables. However, at my company, we have several teams that pick
our own technology and thus, most teams would use Parquet due to its ease
of integrations with various external systems.

Nonetheless, we still want to have fast ad-hoc query via Hive LLAP / Tez. I
think that index is a perfect solution for non-ORC file format since you
can selectively build an index table and leverage Tez to only look at those
blocks and/or files that we need to scan.

Thanks for any input,
Thai

Reply via email to