Re: Estimate peak memory VS used peak memory

Mostafa Mokhtar Wed, 28 Feb 2018 09:57:06 -0800

Can you please share the query profiles for the failures you got along with the 
admission control setting?


Thanks 
Mostafa

> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <[email protected]> wrote:
> 
> Thanks you all for your help and advises.
> 
> Unfortunately i rolled back the upgrade till i understand how to control 
> impala resources and tackle all the failures that i start to see after the 
> upgrade.
> 
> 
> 
>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]> wrote:
>> Hi Tim,
>> 
>> My Goal is : queries that their actual memory per node exceeds more than 
>> what i setup as a default max memory node to fail, despite i have a 
>> different queries in the pool, in the same pool some business queries can be 
>> simple as select count(*) and some others can have few joins.
>> 
>> And i think this is the right decision and such query should be optimized.
>> 
>> And also if i'm looking in my historical queries, i can know from the max 
>> used memory per node which queries will fail, and i think this help me alot, 
>> but i need any other query to queued if it asked actual memory lower than 
>> what i setup as default max memory per node for a query.
>> 
>> Based on the above i'm looking for the parameters that i need to configure.
>> 
>> i don't mind how much time and how much queries will queued, in my case i 
>> don't have any impala query that running beyond 4-5 minutes and 80% of 
>> queries below 1 minute.
>> 
>> So i don't mind to setup the queue timeout to 20 minutes and max queued to 
>> 20-30 queries per pool.
>> 
>> I want to make sure no query will fail if it not exceeding the default 
>> memory per node that i setup.
>> 
>> should i used only the default max memory per node alone? should i combined 
>> it with the max running queries or with the memory limit of the whole pool?
>> 
>> 
>>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]> 
>>> wrote:
>>> I think the previous answers have been good. I wanted to add a couple of 
>>> side notes for context since I've been doing a lot of work in this area of 
>>> Impala. I could talk about this stuff for hours.
>>> 
>>> We do have mechanisms, like spilling data to disk or reducing # of threads, 
>>> that kick in to keep queries under the mem_limit. This has existed in some 
>>> form since Impala 2.0, but Impala 2.10 included some architectural changes 
>>> to make this more robust, and we have further improvements in the pipeline. 
>>> The end goal, which we're getting much closer to, is that queries should 
>>> reliably run to completion instead of getting killed after they are 
>>> admitted.
>>> 
>>> That support is going to enable future enhancements to memory-based 
>>> admission control to make it easier for cluster admins like yourself to 
>>> configure admission control. It is definitely tricky to pick a good value 
>>> for mem_limit when pools can contain a mix of queries and I think Impala 
>>> can do better at making these decisions automatically.
>>> 
>>> - Tim
>>> 
>>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <[email protected]> 
>>>> wrote:
>>>> For a given query the logic for determining the memory that will be 
>>>> required from admission is:
>>>> - if the query has mem_limit use that
>>>> - otherwise, use memory estimates from the planner
>>>> 
>>>> A query may be assigned a mem_limit by:
>>>> - taking the default mem_limit from the pool it was submitted to (this is 
>>>> the recommended practice)
>>>> - manually setting one for the query (in case you want to override the 
>>>> pool default for a single query)
>>>> 
>>>> In that setup, the memory estimates from the planner are irrelevant for 
>>>> admission decisions and only serve for informational purposes.
>>>> Please do not read too much into the memory estimates from the planner. 
>>>> They can be totally wrong (like your 8TB example).
>>>> 
>>>> 
>>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote:
>>>>> Again, the 8TB estimate would not be relevant if the query had a 
>>>>> mem_limit set.
>>>>> I think all that we discussed is covered in the docs, but if you feel
>>>>> like specific parts need clarification, please file a jira.
>>>>> 
>>>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]> wrote:
>>>>> > Sorry for  asking many questions, but i see your answers are closing the
>>>>> > gaps that i cannot find in the documentation.
>>>>> >
>>>>> > So how we can explain that there was an estimate for 8T per node and 
>>>>> > impala
>>>>> > decided to submit this query?
>>>>> >
>>>>> > My goal that each query running beyond the actual limit per node to 
>>>>> > fail (
>>>>> > and this is what i setup in the default memory per node per pool) an 
>>>>> > want
>>>>> > all other queries to be queue and not killed, so what i understand that 
>>>>> > i
>>>>> > need to setup the max queue query to unlimited and the queue timeout to
>>>>> > hours.
>>>>> >
>>>>> > And in order to reach that i need to setup the default memory per node 
>>>>> > for
>>>>> > each pool and setting either max concurrency or the max memory per pool 
>>>>> > that
>>>>> > will help to measure the max concurrent queries that can run in specific
>>>>> > pool.
>>>>> >
>>>>> > I think reaching this goal will close all my gaps.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote:
>>>>> >>
>>>>> >> > Do queuing query or not is based on the prediction which based on the
>>>>> >> > estimate and of course the concurrency that can run in a pool.
>>>>> >>
>>>>> >> Yes, it is.
>>>>> >>
>>>>> >> > If I have memory limit per pool and memory limit per node for a 
>>>>> >> > pool, so
>>>>> >> > it
>>>>> >> > can be used to estimate number of queries that can run concurrently, 
>>>>> >> > is
>>>>> >> > this
>>>>> >> > also based on the prediction and not the actual use.
>>>>> >>
>>>>> >> Also on prediction.
>>>>> >
>>>>> >
>>>> 
>>> 
>> 
>

Re: Estimate peak memory VS used peak memory

Reply via email to