> > @Sungwoo, you've mentioned that "in Hive-LLAP 4.1 Query 72 fails due to > MapJoinMemoryExhaustionError." > Could https://issues.apache.org/jira/browse/HIVE-15221 be related? >
I suspect it is due to some change in the query plan because 1) increasing hive.auto.convert.join.noconditionaltask.size does not help, 2) the LLAP daemon is given enough memory (216GB), and 3) we used the same hive-site.xml across all the experiments. We will start to stabilize the performance of Hive 4.1 sometime soon, and we will try to create a ticket for query 72. As for the effect of LLAP IO on the performance, I think we assume different scenarios. Here, I assume the sequential execution of TPC-DS queries, where the cache hit ratio is quite low. For example, a query caches tens of GBs of data, which is flushed without being used by the next query. Occasionally we see even intra-query cache hits where the same input data is read several times inside the execution of a single query (like query 14-1), but this is not common. However, executing the same (small-to-medium) query repeatedly decreases the running time quite a bit, as expected. Sungwoo