Hi, So when you set Tez as the execution engine for Hive it takes about half the time to finish a query the second time you run it going from say 24 seconds to 12 seconds. but if I keep re running it it gets down to about 2 seconds on that same query. The speed goes up to 12 seconds if I wait to long before the next rerun or if I do large enough adjustments to the query.
So I'm working on a blogpost about Tez and need to find out why this is happening. The first reduced speed seem to mainly just be because of hot containers that store the information about where to find your data. While the seconds reduce down to about 2 sec seems to be some in memory storage of the data. Does it store the results in memory and keep it ready for next time or? -- Lars Selsaas Data Engineer Think Big Analytics <http://thinkbiganalytics.com> [email protected] 650-537-5321
