Re: Estimate peak memory VS used peak memory

Fawze Abujaber Sun, 04 Mar 2018 21:56:20 -0800

Hi Mostafa,

Is this expected behavior or a BUG?


On Wed, 28 Feb 2018 at 20:29 Fawze Abujaber <[email protected]> wrote:

> Hi Mostafa,
>
> I already rollback the version, so i don't know how to get the settings
> and if i can get the query profile fora finished queries in the rollback
> version.
>
> But for example after the upgrade we started to see the following error
> which stopped to see after the rollback: IS NOT NULL predicate does not
> support complex types
>
>
>    - IllegalStateException: org.apache.impala.common.AnalysisException:
>    IS NOT NULL predicate does not support complex types: participants IS NOT
>    NULL CAUSED BY: AnalysisException: IS NOT NULL predicate does not support
>    complex types: participants IS NOT NULL
>
>
>
> On Wed, Feb 28, 2018 at 7:56 PM, Mostafa Mokhtar <[email protected]>
> wrote:
>
>> Can you please share the query profiles for the failures you got along
>> with the admission control setting?
>>
>> Thanks
>> Mostafa
>>
>> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <[email protected]> wrote:
>>
>> Thanks you all for your help and advises.
>>
>> Unfortunately i rolled back the upgrade till i understand how to control
>> impala resources and tackle all the failures that i start to see after the
>> upgrade.
>>
>>
>>
>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> My Goal is : queries that their actual memory per node exceeds more than
>>> what i setup as a default max memory node to fail, despite i have a
>>> different queries in the pool, in the same pool some business queries can
>>> be simple as select count(*) and some others can have few joins.
>>>
>>> And i think this is the right decision and such query should be
>>> optimized.
>>>
>>> And also if i'm looking in my historical queries, i can know from the
>>> max used memory per node which queries will fail, and i think this help me
>>> alot, but i need any other query to queued if it asked actual memory lower
>>> than what i setup as default max memory per node for a query.
>>>
>>> Based on the above i'm looking for the parameters that i need to
>>> configure.
>>>
>>> i don't mind how much time and how much queries will queued, in my case
>>> i don't have any impala query that running beyond 4-5 minutes and 80% of
>>> queries below 1 minute.
>>>
>>> So i don't mind to setup the queue timeout to 20 minutes and max queued
>>> to 20-30 queries per pool.
>>>
>>> I want to make sure no query will fail if it not exceeding the default
>>> memory per node that i setup.
>>>
>>> should i used only the default max memory per node alone? should i
>>> combined it with the max running queries or with the memory limit of the
>>> whole pool?
>>>
>>>
>>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]>
>>> wrote:
>>>
>>>> I think the previous answers have been good. I wanted to add a couple
>>>> of side notes for context since I've been doing a lot of work in this area
>>>> of Impala. I could talk about this stuff for hours.
>>>>
>>>> We do have mechanisms, like spilling data to disk or reducing # of
>>>> threads, that kick in to keep queries under the mem_limit. This has existed
>>>> in some form since Impala 2.0, but Impala 2.10 included some architectural
>>>> changes to make this more robust, and we have further improvements in the
>>>> pipeline. The end goal, which we're getting much closer to, is that queries
>>>> should reliably run to completion instead of getting killed after they are
>>>> admitted.
>>>>
>>>> That support is going to enable future enhancements to memory-based
>>>> admission control to make it easier for cluster admins like yourself to
>>>> configure admission control. It is definitely tricky to pick a good value
>>>> for mem_limit when pools can contain a mix of queries and I think Impala
>>>> can do better at making these decisions automatically.
>>>>
>>>> - Tim
>>>>
>>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <[email protected]
>>>> > wrote:
>>>>
>>>>> For a given query the logic for determining the memory that will be
>>>>> required from admission is:
>>>>> - if the query has mem_limit use that
>>>>> - otherwise, use memory estimates from the planner
>>>>>
>>>>> A query may be assigned a mem_limit by:
>>>>> - taking the default mem_limit from the pool it was submitted to (this
>>>>> is the recommended practice)
>>>>> - manually setting one for the query (in case you want to override the
>>>>> pool default for a single query)
>>>>>
>>>>> In that setup, the memory estimates from the planner are irrelevant
>>>>> for admission decisions and only serve for informational purposes.
>>>>> Please do not read too much into the memory estimates from the
>>>>> planner. They can be totally wrong (like your 8TB example).
>>>>>
>>>>>
>>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote:
>>>>>
>>>>>> Again, the 8TB estimate would not be relevant if the query had a
>>>>>> mem_limit set.
>>>>>> I think all that we discussed is covered in the docs, but if you feel
>>>>>> like specific parts need clarification, please file a jira.
>>>>>>
>>>>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]>
>>>>>> wrote:
>>>>>> > Sorry for  asking many questions, but i see your answers are
>>>>>> closing the
>>>>>> > gaps that i cannot find in the documentation.
>>>>>> >
>>>>>> > So how we can explain that there was an estimate for 8T per node
>>>>>> and impala
>>>>>> > decided to submit this query?
>>>>>> >
>>>>>> > My goal that each query running beyond the actual limit per node to
>>>>>> fail (
>>>>>> > and this is what i setup in the default memory per node per pool)
>>>>>> an want
>>>>>> > all other queries to be queue and not killed, so what i understand
>>>>>> that i
>>>>>> > need to setup the max queue query to unlimited and the queue
>>>>>> timeout to
>>>>>> > hours.
>>>>>> >
>>>>>> > And in order to reach that i need to setup the default memory per
>>>>>> node for
>>>>>> > each pool and setting either max concurrency or the max memory per
>>>>>> pool that
>>>>>> > will help to measure the max concurrent queries that can run in
>>>>>> specific
>>>>>> > pool.
>>>>>> >
>>>>>> > I think reaching this goal will close all my gaps.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote:
>>>>>> >>
>>>>>> >> > Do queuing query or not is based on the prediction which based
>>>>>> on the
>>>>>> >> > estimate and of course the concurrency that can run in a pool.
>>>>>> >>
>>>>>> >> Yes, it is.
>>>>>> >>
>>>>>> >> > If I have memory limit per pool and memory limit per node for a
>>>>>> pool, so
>>>>>> >> > it
>>>>>> >> > can be used to estimate number of queries that can run
>>>>>> concurrently, is
>>>>>> >> > this
>>>>>> >> > also based on the prediction and not the actual use.
>>>>>> >>
>>>>>> >> Also on prediction.
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Estimate peak memory VS used peak memory

Reply via email to