Can you please share the query profiles for the failures you got along with the admission control setting?
Thanks Mostafa > On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <[email protected]> wrote: > > Thanks you all for your help and advises. > > Unfortunately i rolled back the upgrade till i understand how to control > impala resources and tackle all the failures that i start to see after the > upgrade. > > > >> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]> wrote: >> Hi Tim, >> >> My Goal is : queries that their actual memory per node exceeds more than >> what i setup as a default max memory node to fail, despite i have a >> different queries in the pool, in the same pool some business queries can be >> simple as select count(*) and some others can have few joins. >> >> And i think this is the right decision and such query should be optimized. >> >> And also if i'm looking in my historical queries, i can know from the max >> used memory per node which queries will fail, and i think this help me alot, >> but i need any other query to queued if it asked actual memory lower than >> what i setup as default max memory per node for a query. >> >> Based on the above i'm looking for the parameters that i need to configure. >> >> i don't mind how much time and how much queries will queued, in my case i >> don't have any impala query that running beyond 4-5 minutes and 80% of >> queries below 1 minute. >> >> So i don't mind to setup the queue timeout to 20 minutes and max queued to >> 20-30 queries per pool. >> >> I want to make sure no query will fail if it not exceeding the default >> memory per node that i setup. >> >> should i used only the default max memory per node alone? should i combined >> it with the max running queries or with the memory limit of the whole pool? >> >> >>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]> >>> wrote: >>> I think the previous answers have been good. I wanted to add a couple of >>> side notes for context since I've been doing a lot of work in this area of >>> Impala. I could talk about this stuff for hours. >>> >>> We do have mechanisms, like spilling data to disk or reducing # of threads, >>> that kick in to keep queries under the mem_limit. This has existed in some >>> form since Impala 2.0, but Impala 2.10 included some architectural changes >>> to make this more robust, and we have further improvements in the pipeline. >>> The end goal, which we're getting much closer to, is that queries should >>> reliably run to completion instead of getting killed after they are >>> admitted. >>> >>> That support is going to enable future enhancements to memory-based >>> admission control to make it easier for cluster admins like yourself to >>> configure admission control. It is definitely tricky to pick a good value >>> for mem_limit when pools can contain a mix of queries and I think Impala >>> can do better at making these decisions automatically. >>> >>> - Tim >>> >>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <[email protected]> >>>> wrote: >>>> For a given query the logic for determining the memory that will be >>>> required from admission is: >>>> - if the query has mem_limit use that >>>> - otherwise, use memory estimates from the planner >>>> >>>> A query may be assigned a mem_limit by: >>>> - taking the default mem_limit from the pool it was submitted to (this is >>>> the recommended practice) >>>> - manually setting one for the query (in case you want to override the >>>> pool default for a single query) >>>> >>>> In that setup, the memory estimates from the planner are irrelevant for >>>> admission decisions and only serve for informational purposes. >>>> Please do not read too much into the memory estimates from the planner. >>>> They can be totally wrong (like your 8TB example). >>>> >>>> >>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote: >>>>> Again, the 8TB estimate would not be relevant if the query had a >>>>> mem_limit set. >>>>> I think all that we discussed is covered in the docs, but if you feel >>>>> like specific parts need clarification, please file a jira. >>>>> >>>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]> wrote: >>>>> > Sorry for asking many questions, but i see your answers are closing the >>>>> > gaps that i cannot find in the documentation. >>>>> > >>>>> > So how we can explain that there was an estimate for 8T per node and >>>>> > impala >>>>> > decided to submit this query? >>>>> > >>>>> > My goal that each query running beyond the actual limit per node to >>>>> > fail ( >>>>> > and this is what i setup in the default memory per node per pool) an >>>>> > want >>>>> > all other queries to be queue and not killed, so what i understand that >>>>> > i >>>>> > need to setup the max queue query to unlimited and the queue timeout to >>>>> > hours. >>>>> > >>>>> > And in order to reach that i need to setup the default memory per node >>>>> > for >>>>> > each pool and setting either max concurrency or the max memory per pool >>>>> > that >>>>> > will help to measure the max concurrent queries that can run in specific >>>>> > pool. >>>>> > >>>>> > I think reaching this goal will close all my gaps. >>>>> > >>>>> > >>>>> > >>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote: >>>>> >> >>>>> >> > Do queuing query or not is based on the prediction which based on the >>>>> >> > estimate and of course the concurrency that can run in a pool. >>>>> >> >>>>> >> Yes, it is. >>>>> >> >>>>> >> > If I have memory limit per pool and memory limit per node for a >>>>> >> > pool, so >>>>> >> > it >>>>> >> > can be used to estimate number of queries that can run concurrently, >>>>> >> > is >>>>> >> > this >>>>> >> > also based on the prediction and not the actual use. >>>>> >> >>>>> >> Also on prediction. >>>>> > >>>>> > >>>> >>> >> >
