Hi,

So when you set Tez as the execution engine for Hive it takes about half
the time to finish a query the second time you run it going from say 24
seconds to 12 seconds. but if I keep re running it it gets down to about 2
seconds on that same query. The speed goes up to 12 seconds if I wait to
long before the next rerun or if I do large enough adjustments to the query.


So I'm working on a blogpost about Tez and need to find out why this is
happening. The first reduced speed seem to mainly just be because of hot
containers that store the information about where to find your data. While
the seconds reduce down to about 2 sec seems to be some in memory storage
of the data. Does it store the results in memory and keep it ready for next
time or?



-- 

Lars Selsaas

Data Engineer

Think Big Analytics <http://thinkbiganalytics.com>

[email protected]

650-537-5321

Reply via email to