Re: Estimate peak memory VS used peak memory

Jeszy Sun, 04 Mar 2018 22:22:27 -0800

Looks like a very different question than the original one on this
thread, it would be better to start a new thread for a new question.
Keep in mind that you are likely to get quicker answers (from
yourself) by checking the behaviour against the documentation. If
there is a bug (sounds possible), it might have already been found,
searching around issues.apache.org will tell, along with fix version
(if any).


HTH

On 4 March 2018 at 21:35, Fawze Abujaber <[email protected]> wrote:
> Hi Mostafa,
>
> Is this expected behavior or a BUG?
>
> On Wed, 28 Feb 2018 at 20:29 Fawze Abujaber <[email protected]> wrote:
>>
>> Hi Mostafa,
>>
>> I already rollback the version, so i don't know how to get the settings
>> and if i can get the query profile fora finished queries in the rollback
>> version.
>>
>> But for example after the upgrade we started to see the following error
>> which stopped to see after the rollback: IS NOT NULL predicate does not
>> support complex types
>>
>> IllegalStateException: org.apache.impala.common.AnalysisException: IS NOT
>> NULL predicate does not support complex types: participants IS NOT NULL
>> CAUSED BY: AnalysisException: IS NOT NULL predicate does not support complex
>> types: participants IS NOT NULL
>>
>>
>>
>> On Wed, Feb 28, 2018 at 7:56 PM, Mostafa Mokhtar <[email protected]>
>> wrote:
>>>
>>> Can you please share the query profiles for the failures you got along
>>> with the admission control setting?
>>>
>>> Thanks
>>> Mostafa
>>>
>>> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <[email protected]> wrote:
>>>
>>> Thanks you all for your help and advises.
>>>
>>> Unfortunately i rolled back the upgrade till i understand how to control
>>> impala resources and tackle all the failures that i start to see after the
>>> upgrade.
>>>
>>>
>>>
>>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]>
>>> wrote:
>>>>
>>>> Hi Tim,
>>>>
>>>> My Goal is : queries that their actual memory per node exceeds more than
>>>> what i setup as a default max memory node to fail, despite i have a
>>>> different queries in the pool, in the same pool some business queries can 
>>>> be
>>>> simple as select count(*) and some others can have few joins.
>>>>
>>>> And i think this is the right decision and such query should be
>>>> optimized.
>>>>
>>>> And also if i'm looking in my historical queries, i can know from the
>>>> max used memory per node which queries will fail, and i think this help me
>>>> alot, but i need any other query to queued if it asked actual memory lower
>>>> than what i setup as default max memory per node for a query.
>>>>
>>>> Based on the above i'm looking for the parameters that i need to
>>>> configure.
>>>>
>>>> i don't mind how much time and how much queries will queued, in my case
>>>> i don't have any impala query that running beyond 4-5 minutes and 80% of
>>>> queries below 1 minute.
>>>>
>>>> So i don't mind to setup the queue timeout to 20 minutes and max queued
>>>> to 20-30 queries per pool.
>>>>
>>>> I want to make sure no query will fail if it not exceeding the default
>>>> memory per node that i setup.
>>>>
>>>> should i used only the default max memory per node alone? should i
>>>> combined it with the max running queries or with the memory limit of the
>>>> whole pool?
>>>>
>>>>
>>>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]>
>>>> wrote:
>>>>>
>>>>> I think the previous answers have been good. I wanted to add a couple
>>>>> of side notes for context since I've been doing a lot of work in this area
>>>>> of Impala. I could talk about this stuff for hours.
>>>>>
>>>>> We do have mechanisms, like spilling data to disk or reducing # of
>>>>> threads, that kick in to keep queries under the mem_limit. This has 
>>>>> existed
>>>>> in some form since Impala 2.0, but Impala 2.10 included some architectural
>>>>> changes to make this more robust, and we have further improvements in the
>>>>> pipeline. The end goal, which we're getting much closer to, is that 
>>>>> queries
>>>>> should reliably run to completion instead of getting killed after they are
>>>>> admitted.
>>>>>
>>>>> That support is going to enable future enhancements to memory-based
>>>>> admission control to make it easier for cluster admins like yourself to
>>>>> configure admission control. It is definitely tricky to pick a good value
>>>>> for mem_limit when pools can contain a mix of queries and I think Impala 
>>>>> can
>>>>> do better at making these decisions automatically.
>>>>>
>>>>> - Tim
>>>>>
>>>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> For a given query the logic for determining the memory that will be
>>>>>> required from admission is:
>>>>>> - if the query has mem_limit use that
>>>>>> - otherwise, use memory estimates from the planner
>>>>>>
>>>>>> A query may be assigned a mem_limit by:
>>>>>> - taking the default mem_limit from the pool it was submitted to (this
>>>>>> is the recommended practice)
>>>>>> - manually setting one for the query (in case you want to override the
>>>>>> pool default for a single query)
>>>>>>
>>>>>> In that setup, the memory estimates from the planner are irrelevant
>>>>>> for admission decisions and only serve for informational purposes.
>>>>>> Please do not read too much into the memory estimates from the
>>>>>> planner. They can be totally wrong (like your 8TB example).
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote:
>>>>>>>
>>>>>>> Again, the 8TB estimate would not be relevant if the query had a
>>>>>>> mem_limit set.
>>>>>>> I think all that we discussed is covered in the docs, but if you feel
>>>>>>> like specific parts need clarification, please file a jira.
>>>>>>>
>>>>>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]>
>>>>>>> wrote:
>>>>>>> > Sorry for  asking many questions, but i see your answers are
>>>>>>> > closing the
>>>>>>> > gaps that i cannot find in the documentation.
>>>>>>> >
>>>>>>> > So how we can explain that there was an estimate for 8T per node
>>>>>>> > and impala
>>>>>>> > decided to submit this query?
>>>>>>> >
>>>>>>> > My goal that each query running beyond the actual limit per node to
>>>>>>> > fail (
>>>>>>> > and this is what i setup in the default memory per node per pool)
>>>>>>> > an want
>>>>>>> > all other queries to be queue and not killed, so what i understand
>>>>>>> > that i
>>>>>>> > need to setup the max queue query to unlimited and the queue
>>>>>>> > timeout to
>>>>>>> > hours.
>>>>>>> >
>>>>>>> > And in order to reach that i need to setup the default memory per
>>>>>>> > node for
>>>>>>> > each pool and setting either max concurrency or the max memory per
>>>>>>> > pool that
>>>>>>> > will help to measure the max concurrent queries that can run in
>>>>>>> > specific
>>>>>>> > pool.
>>>>>>> >
>>>>>>> > I think reaching this goal will close all my gaps.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote:
>>>>>>> >>
>>>>>>> >> > Do queuing query or not is based on the prediction which based
>>>>>>> >> > on the
>>>>>>> >> > estimate and of course the concurrency that can run in a pool.
>>>>>>> >>
>>>>>>> >> Yes, it is.
>>>>>>> >>
>>>>>>> >> > If I have memory limit per pool and memory limit per node for a
>>>>>>> >> > pool, so
>>>>>>> >> > it
>>>>>>> >> > can be used to estimate number of queries that can run
>>>>>>> >> > concurrently, is
>>>>>>> >> > this
>>>>>>> >> > also based on the prediction and not the actual use.
>>>>>>> >>
>>>>>>> >> Also on prediction.
>>>>>>> >
>>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Estimate peak memory VS used peak memory

Reply via email to