Hi Tim, My Goal is : queries that their actual memory per node exceeds more than what i setup as a default max memory node to fail, despite i have a different queries in the pool, in the same pool some business queries can be simple as select count(*) and some others can have few joins.
And i think this is the right decision and such query should be optimized. And also if i'm looking in my historical queries, i can know from the max used memory per node which queries will fail, and i think this help me alot, but i need any other query to queued if it asked actual memory lower than what i setup as default max memory per node for a query. Based on the above i'm looking for the parameters that i need to configure. i don't mind how much time and how much queries will queued, in my case i don't have any impala query that running beyond 4-5 minutes and 80% of queries below 1 minute. So i don't mind to setup the queue timeout to 20 minutes and max queued to 20-30 queries per pool. I want to make sure no query will fail if it not exceeding the default memory per node that i setup. should i used only the default max memory per node alone? should i combined it with the max running queries or with the memory limit of the whole pool? On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <tarmstr...@cloudera.com> wrote: > I think the previous answers have been good. I wanted to add a couple of > side notes for context since I've been doing a lot of work in this area of > Impala. I could talk about this stuff for hours. > > We do have mechanisms, like spilling data to disk or reducing # of > threads, that kick in to keep queries under the mem_limit. This has existed > in some form since Impala 2.0, but Impala 2.10 included some architectural > changes to make this more robust, and we have further improvements in the > pipeline. The end goal, which we're getting much closer to, is that queries > should reliably run to completion instead of getting killed after they are > admitted. > > That support is going to enable future enhancements to memory-based > admission control to make it easier for cluster admins like yourself to > configure admission control. It is definitely tricky to pick a good value > for mem_limit when pools can contain a mix of queries and I think Impala > can do better at making these decisions automatically. > > - Tim > > On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <alex.b...@cloudera.com> > wrote: > >> For a given query the logic for determining the memory that will be >> required from admission is: >> - if the query has mem_limit use that >> - otherwise, use memory estimates from the planner >> >> A query may be assigned a mem_limit by: >> - taking the default mem_limit from the pool it was submitted to (this is >> the recommended practice) >> - manually setting one for the query (in case you want to override the >> pool default for a single query) >> >> In that setup, the memory estimates from the planner are irrelevant for >> admission decisions and only serve for informational purposes. >> Please do not read too much into the memory estimates from the planner. >> They can be totally wrong (like your 8TB example). >> >> >> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <jes...@gmail.com> wrote: >> >>> Again, the 8TB estimate would not be relevant if the query had a >>> mem_limit set. >>> I think all that we discussed is covered in the docs, but if you feel >>> like specific parts need clarification, please file a jira. >>> >>> On 23 February 2018 at 11:51, Fawze Abujaber <fawz...@gmail.com> wrote: >>> > Sorry for asking many questions, but i see your answers are closing >>> the >>> > gaps that i cannot find in the documentation. >>> > >>> > So how we can explain that there was an estimate for 8T per node and >>> impala >>> > decided to submit this query? >>> > >>> > My goal that each query running beyond the actual limit per node to >>> fail ( >>> > and this is what i setup in the default memory per node per pool) an >>> want >>> > all other queries to be queue and not killed, so what i understand >>> that i >>> > need to setup the max queue query to unlimited and the queue timeout to >>> > hours. >>> > >>> > And in order to reach that i need to setup the default memory per node >>> for >>> > each pool and setting either max concurrency or the max memory per >>> pool that >>> > will help to measure the max concurrent queries that can run in >>> specific >>> > pool. >>> > >>> > I think reaching this goal will close all my gaps. >>> > >>> > >>> > >>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <jes...@gmail.com> wrote: >>> >> >>> >> > Do queuing query or not is based on the prediction which based on >>> the >>> >> > estimate and of course the concurrency that can run in a pool. >>> >> >>> >> Yes, it is. >>> >> >>> >> > If I have memory limit per pool and memory limit per node for a >>> pool, so >>> >> > it >>> >> > can be used to estimate number of queries that can run >>> concurrently, is >>> >> > this >>> >> > also based on the prediction and not the actual use. >>> >> >>> >> Also on prediction. >>> > >>> > >>> >> >> >