Thanks you all for your help and advises. Unfortunately i rolled back the upgrade till i understand how to control impala resources and tackle all the failures that i start to see after the upgrade.
On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]> wrote: > Hi Tim, > > My Goal is : queries that their actual memory per node exceeds more than > what i setup as a default max memory node to fail, despite i have a > different queries in the pool, in the same pool some business queries can > be simple as select count(*) and some others can have few joins. > > And i think this is the right decision and such query should be optimized. > > And also if i'm looking in my historical queries, i can know from the max > used memory per node which queries will fail, and i think this help me > alot, but i need any other query to queued if it asked actual memory lower > than what i setup as default max memory per node for a query. > > Based on the above i'm looking for the parameters that i need to configure. > > i don't mind how much time and how much queries will queued, in my case i > don't have any impala query that running beyond 4-5 minutes and 80% of > queries below 1 minute. > > So i don't mind to setup the queue timeout to 20 minutes and max queued to > 20-30 queries per pool. > > I want to make sure no query will fail if it not exceeding the default > memory per node that i setup. > > should i used only the default max memory per node alone? should i > combined it with the max running queries or with the memory limit of the > whole pool? > > > On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]> > wrote: > >> I think the previous answers have been good. I wanted to add a couple of >> side notes for context since I've been doing a lot of work in this area of >> Impala. I could talk about this stuff for hours. >> >> We do have mechanisms, like spilling data to disk or reducing # of >> threads, that kick in to keep queries under the mem_limit. This has existed >> in some form since Impala 2.0, but Impala 2.10 included some architectural >> changes to make this more robust, and we have further improvements in the >> pipeline. The end goal, which we're getting much closer to, is that queries >> should reliably run to completion instead of getting killed after they are >> admitted. >> >> That support is going to enable future enhancements to memory-based >> admission control to make it easier for cluster admins like yourself to >> configure admission control. It is definitely tricky to pick a good value >> for mem_limit when pools can contain a mix of queries and I think Impala >> can do better at making these decisions automatically. >> >> - Tim >> >> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <[email protected]> >> wrote: >> >>> For a given query the logic for determining the memory that will be >>> required from admission is: >>> - if the query has mem_limit use that >>> - otherwise, use memory estimates from the planner >>> >>> A query may be assigned a mem_limit by: >>> - taking the default mem_limit from the pool it was submitted to (this >>> is the recommended practice) >>> - manually setting one for the query (in case you want to override the >>> pool default for a single query) >>> >>> In that setup, the memory estimates from the planner are irrelevant for >>> admission decisions and only serve for informational purposes. >>> Please do not read too much into the memory estimates from the planner. >>> They can be totally wrong (like your 8TB example). >>> >>> >>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote: >>> >>>> Again, the 8TB estimate would not be relevant if the query had a >>>> mem_limit set. >>>> I think all that we discussed is covered in the docs, but if you feel >>>> like specific parts need clarification, please file a jira. >>>> >>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]> wrote: >>>> > Sorry for asking many questions, but i see your answers are closing >>>> the >>>> > gaps that i cannot find in the documentation. >>>> > >>>> > So how we can explain that there was an estimate for 8T per node and >>>> impala >>>> > decided to submit this query? >>>> > >>>> > My goal that each query running beyond the actual limit per node to >>>> fail ( >>>> > and this is what i setup in the default memory per node per pool) an >>>> want >>>> > all other queries to be queue and not killed, so what i understand >>>> that i >>>> > need to setup the max queue query to unlimited and the queue timeout >>>> to >>>> > hours. >>>> > >>>> > And in order to reach that i need to setup the default memory per >>>> node for >>>> > each pool and setting either max concurrency or the max memory per >>>> pool that >>>> > will help to measure the max concurrent queries that can run in >>>> specific >>>> > pool. >>>> > >>>> > I think reaching this goal will close all my gaps. >>>> > >>>> > >>>> > >>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote: >>>> >> >>>> >> > Do queuing query or not is based on the prediction which based on >>>> the >>>> >> > estimate and of course the concurrency that can run in a pool. >>>> >> >>>> >> Yes, it is. >>>> >> >>>> >> > If I have memory limit per pool and memory limit per node for a >>>> pool, so >>>> >> > it >>>> >> > can be used to estimate number of queries that can run >>>> concurrently, is >>>> >> > this >>>> >> > also based on the prediction and not the actual use. >>>> >> >>>> >> Also on prediction. >>>> > >>>> > >>>> >>> >>> >> >
