We've been running Hive 2.0.1 on Tez 0.8.4 for a few weeks now. Most
queries that we run work. However some queries that go over millions to
billions of rows don't finish using Tez as the execution engine.

Here's an example of a simple query that does not finish

select count(distinct external_id) from t1;

Table t1 has 300 million + rows.

The mappers finish pretty quickly and it's supposed to run only 1 reducer.
The reducer does not finish.

Here's a screenshot of another query that ran for over 8 hours, where the
map output records is about a billion rows

[image: Inline image 1]

When I switched the execution engine to mr, the query finished in 30 mins.

Are there any knobs we have to tweak?

Premal Shah.

Reply via email to