>
> @Sungwoo, you've mentioned that "in Hive-LLAP 4.1 Query 72 fails due to
> MapJoinMemoryExhaustionError."
> Could https://issues.apache.org/jira/browse/HIVE-15221 be related?
>

I suspect it is due to some change in the query plan because 1)
increasing hive.auto.convert.join.noconditionaltask.size does not help, 2)
the LLAP daemon is given enough memory (216GB), and 3) we used the same
hive-site.xml across all the experiments. We will start to stabilize the
performance of Hive 4.1 sometime soon, and we will try to create a ticket
for query 72.

As for the effect of LLAP IO on the performance, I think we assume
different scenarios. Here, I assume the sequential execution of TPC-DS
queries, where the cache hit ratio is quite low. For example, a query
caches tens of GBs of data, which is flushed without being used by the next
query. Occasionally we see even intra-query cache hits where the same input
data is read several times inside the execution of a single query (like
query 14-1), but this is not common. However, executing the same
(small-to-medium) query repeatedly decreases the running time quite a bit,
as expected.

Sungwoo

Reply via email to