Re: Estimate peak memory VS used peak memory

Fawze Abujaber Fri, 23 Feb 2018 10:23:29 -0800

Hi Tim,

My Goal is : queries that their actual memory per node exceeds more than
what i setup as a default max memory node to fail, despite i have a
different queries in the pool, in the same pool some business queries can
be simple as select count(*) and some others can have few joins.


And i think this is the right decision and such query should be optimized.

And also if i'm looking in my historical queries, i can know from the max
used memory per node which queries will fail, and i think this help me
alot, but i need any other query to queued if it asked actual memory lower
than what i setup as default max memory per node for a query.

Based on the above i'm looking for the parameters that i need to configure.

i don't mind how much time and how much queries will queued, in my case i
don't have any impala query that running beyond 4-5 minutes and 80% of
queries below 1 minute.

So i don't mind to setup the queue timeout to 20 minutes and max queued to
20-30 queries per pool.

I want to make sure no query will fail if it not exceeding the default
memory per node that i setup.

should i used only the default max memory per node alone? should i combined
it with the max running queries or with the memory limit of the whole pool?


On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I think the previous answers have been good. I wanted to add a couple of
> side notes for context since I've been doing a lot of work in this area of
> Impala. I could talk about this stuff for hours.
>
> We do have mechanisms, like spilling data to disk or reducing # of
> threads, that kick in to keep queries under the mem_limit. This has existed
> in some form since Impala 2.0, but Impala 2.10 included some architectural
> changes to make this more robust, and we have further improvements in the
> pipeline. The end goal, which we're getting much closer to, is that queries
> should reliably run to completion instead of getting killed after they are
> admitted.
>
> That support is going to enable future enhancements to memory-based
> admission control to make it easier for cluster admins like yourself to
> configure admission control. It is definitely tricky to pick a good value
> for mem_limit when pools can contain a mix of queries and I think Impala
> can do better at making these decisions automatically.
>
> - Tim
>
> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <alex.b...@cloudera.com>
> wrote:
>
>> For a given query the logic for determining the memory that will be
>> required from admission is:
>> - if the query has mem_limit use that
>> - otherwise, use memory estimates from the planner
>>
>> A query may be assigned a mem_limit by:
>> - taking the default mem_limit from the pool it was submitted to (this is
>> the recommended practice)
>> - manually setting one for the query (in case you want to override the
>> pool default for a single query)
>>
>> In that setup, the memory estimates from the planner are irrelevant for
>> admission decisions and only serve for informational purposes.
>> Please do not read too much into the memory estimates from the planner.
>> They can be totally wrong (like your 8TB example).
>>
>>
>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <jes...@gmail.com> wrote:
>>
>>> Again, the 8TB estimate would not be relevant if the query had a
>>> mem_limit set.
>>> I think all that we discussed is covered in the docs, but if you feel
>>> like specific parts need clarification, please file a jira.
>>>
>>> On 23 February 2018 at 11:51, Fawze Abujaber <fawz...@gmail.com> wrote:
>>> > Sorry for  asking many questions, but i see your answers are closing
>>> the
>>> > gaps that i cannot find in the documentation.
>>> >
>>> > So how we can explain that there was an estimate for 8T per node and
>>> impala
>>> > decided to submit this query?
>>> >
>>> > My goal that each query running beyond the actual limit per node to
>>> fail (
>>> > and this is what i setup in the default memory per node per pool) an
>>> want
>>> > all other queries to be queue and not killed, so what i understand
>>> that i
>>> > need to setup the max queue query to unlimited and the queue timeout to
>>> > hours.
>>> >
>>> > And in order to reach that i need to setup the default memory per node
>>> for
>>> > each pool and setting either max concurrency or the max memory per
>>> pool that
>>> > will help to measure the max concurrent queries that can run in
>>> specific
>>> > pool.
>>> >
>>> > I think reaching this goal will close all my gaps.
>>> >
>>> >
>>> >
>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <jes...@gmail.com> wrote:
>>> >>
>>> >> > Do queuing query or not is based on the prediction which based on
>>> the
>>> >> > estimate and of course the concurrency that can run in a pool.
>>> >>
>>> >> Yes, it is.
>>> >>
>>> >> > If I have memory limit per pool and memory limit per node for a
>>> pool, so
>>> >> > it
>>> >> > can be used to estimate number of queries that can run
>>> concurrently, is
>>> >> > this
>>> >> > also based on the prediction and not the actual use.
>>> >>
>>> >> Also on prediction.
>>> >
>>> >
>>>
>>
>>
>

Re: Estimate peak memory VS used peak memory

Reply via email to