Re: Ways to reduce launching time of query in Hive 2.2.1
Do you use Tez session pool along with LLAP (as Thai suggests in the previous reply)? If a new query finds an idle AM in Tez session pool, there will be no launch cost for AM. If no idle AM is found or if you specify a queue name, a new AM should start in order to serve the query. This is explained in detail in the following article (see 'Understanding #4'): https://community.hortonworks.com/articles/56636/hive-understanding-concurrent-sessions-queue-alloc.html Hence, if not enough AMs are available in Tez session pool, new queries will have to wait until old queries are finished. If there are not many concurrent queries, I guess using Tez session pool will solve your issue. In a highly concurrent setting, Hive-MR3 practically eliminates this limitation. In Hive-MR3, HiveServer2 in shared session mode launches a single AppMaster to be shared by all incoming queries, so there is no launch cost. Containers are also shared by all queries and thus run like daemons. https://mr3.postech.ac.kr/hivemr3/features/hiveserver2/ Hive-MR3 0.1 does not support LLAP IO yet, but Hive-MR3 0.2 will support LLAP IO (which will be released by the end of this month.) --- Sungwoo Park On Mon, Apr 16, 2018 at 11:33 PM, Anup Tiwariwrote: > Hi All, > > We have a use case where we need to return output in < 10 sec. We have > evaluated different set of tool for execution and they work find but they > do not cover all cases as well as they are not reliable(since they are in > evolving phase). But Hive works well in this context. > > Using Hive LLAP, we have reduced query time to 6-7sec. But query launching > takes ~12-15 sec due to which response time becomes 18-21 sec. > > Is there any way we can reduce this launching time? > > Please note that we have tried prewarm containers but when we are > launching query from hive client then it is not picking containers from > already initialized containers rather it launches its own. > > Please let me know how can we overcome this issue since this is the only > problem which is stopping us from using Hive. Any links/description is > really appreciated. > > > Regards, > Anup Tiwari >
Re: Ways to reduce launching time of query in Hive 2.2.1
The best approach would be to use a demonized containers such as Hive LLAP + Tez session pool or Spark on Hive. I’m not that familiar with Spark on Hive so I can’t comment on it but Hive on LLAP has worked really well for me when coupled with Tez session pool. You’ll have to specify how many Tez AMs initialized per LLAP pool when HiveServer2 started, and those AMs will be used for all the queries in that pool. The actual Tez containers are “replaced” by LLAP daemons that are always running so there’s no start up cost as well. The underline execution engine is still Tez but it is executed in a special LLAP mode and this could potentially give you sub second response time. In my experience, when Hive LLAP is used, IO cache is enabled and the file format is ORC, I can get under 1s for small queries when the cage is hit (equivalent to in-memory database at at time). Parquet is slower since the LLAP mode doesn’t support efficient IO caching and vectorized execution. On Mon, Apr 16, 2018 at 9:33 AM Anup Tiwariwrote: > Hi All, > > We have a use case where we need to return output in < 10 sec. We have > evaluated different set of tool for execution and they work find but they > do not cover all cases as well as they are not reliable(since they are in > evolving phase). But Hive works well in this context. > > Using Hive LLAP, we have reduced query time to 6-7sec. But query launching > takes ~12-15 sec due to which response time becomes 18-21 sec. > > Is there any way we can reduce this launching time? > > Please note that we have tried prewarm containers but when we are > launching query from hive client then it is not picking containers from > already initialized containers rather it launches its own. > > Please let me know how can we overcome this issue since this is the only > problem which is stopping us from using Hive. Any links/description is > really appreciated. > > > Regards, > Anup Tiwari > -- Thai
Ways to reduce launching time of query in Hive 2.2.1
Hi All, We have a use case where we need to return output in < 10 sec. We have evaluated different set of tool for execution and they work find but they do not cover all cases as well as they are not reliable(since they are in evolving phase). But Hive works well in this context. Using Hive LLAP, we have reduced query time to 6-7sec. But query launching takes ~12-15 sec due to which response time becomes 18-21 sec. Is there any way we can reduce this launching time? Please note that we have tried prewarm containers but when we are launching query from hive client then it is not picking containers from already initialized containers rather it launches its own. Please let me know how can we overcome this issue since this is the only problem which is stopping us from using Hive. Any links/description is really appreciated. Regards, Anup Tiwari