The best approach would be to use a demonized containers such as Hive LLAP + Tez session pool or Spark on Hive.
I’m not that familiar with Spark on Hive so I can’t comment on it but Hive on LLAP has worked really well for me when coupled with Tez session pool. You’ll have to specify how many Tez AMs initialized per LLAP pool when HiveServer2 started, and those AMs will be used for all the queries in that pool. The actual Tez containers are “replaced” by LLAP daemons that are always running so there’s no start up cost as well. The underline execution engine is still Tez but it is executed in a special LLAP mode and this could potentially give you sub second response time. In my experience, when Hive LLAP is used, IO cache is enabled and the file format is ORC, I can get under 1s for small queries when the cage is hit (equivalent to in-memory database at at time). Parquet is slower since the LLAP mode doesn’t support efficient IO caching and vectorized execution. On Mon, Apr 16, 2018 at 9:33 AM Anup Tiwari <anupsdtiw...@gmail.com> wrote: > Hi All, > > We have a use case where we need to return output in < 10 sec. We have > evaluated different set of tool for execution and they work find but they > do not cover all cases as well as they are not reliable(since they are in > evolving phase). But Hive works well in this context. > > Using Hive LLAP, we have reduced query time to 6-7sec. But query launching > takes ~12-15 sec due to which response time becomes 18-21 sec. > > Is there any way we can reduce this launching time? > > Please note that we have tried prewarm containers but when we are > launching query from hive client then it is not picking containers from > already initialized containers rather it launches its own. > > Please let me know how can we overcome this issue since this is the only > problem which is stopping us from using Hive. Any links/description is > really appreciated. > > > Regards, > Anup Tiwari > -- Thai