Re: Estimate peak memory VS used peak memory

2018-03-04 Thread Jeszy
Looks like a very different question than the original one on this
thread, it would be better to start a new thread for a new question.
Keep in mind that you are likely to get quicker answers (from
yourself) by checking the behaviour against the documentation. If
there is a bug (sounds possible), it might have already been found,
searching around issues.apache.org will tell, along with fix version
(if any).

HTH

On 4 March 2018 at 21:35, Fawze Abujaber  wrote:
> Hi Mostafa,
>
> Is this expected behavior or a BUG?
>
> On Wed, 28 Feb 2018 at 20:29 Fawze Abujaber  wrote:
>>
>> Hi Mostafa,
>>
>> I already rollback the version, so i don't know how to get the settings
>> and if i can get the query profile fora finished queries in the rollback
>> version.
>>
>> But for example after the upgrade we started to see the following error
>> which stopped to see after the rollback: IS NOT NULL predicate does not
>> support complex types
>>
>> IllegalStateException: org.apache.impala.common.AnalysisException: IS NOT
>> NULL predicate does not support complex types: participants IS NOT NULL
>> CAUSED BY: AnalysisException: IS NOT NULL predicate does not support complex
>> types: participants IS NOT NULL
>>
>>
>>
>> On Wed, Feb 28, 2018 at 7:56 PM, Mostafa Mokhtar 
>> wrote:
>>>
>>> Can you please share the query profiles for the failures you got along
>>> with the admission control setting?
>>>
>>> Thanks
>>> Mostafa
>>>
>>> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber  wrote:
>>>
>>> Thanks you all for your help and advises.
>>>
>>> Unfortunately i rolled back the upgrade till i understand how to control
>>> impala resources and tackle all the failures that i start to see after the
>>> upgrade.
>>>
>>>
>>>
>>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber 
>>> wrote:

 Hi Tim,

 My Goal is : queries that their actual memory per node exceeds more than
 what i setup as a default max memory node to fail, despite i have a
 different queries in the pool, in the same pool some business queries can 
 be
 simple as select count(*) and some others can have few joins.

 And i think this is the right decision and such query should be
 optimized.

 And also if i'm looking in my historical queries, i can know from the
 max used memory per node which queries will fail, and i think this help me
 alot, but i need any other query to queued if it asked actual memory lower
 than what i setup as default max memory per node for a query.

 Based on the above i'm looking for the parameters that i need to
 configure.

 i don't mind how much time and how much queries will queued, in my case
 i don't have any impala query that running beyond 4-5 minutes and 80% of
 queries below 1 minute.

 So i don't mind to setup the queue timeout to 20 minutes and max queued
 to 20-30 queries per pool.

 I want to make sure no query will fail if it not exceeding the default
 memory per node that i setup.

 should i used only the default max memory per node alone? should i
 combined it with the max running queries or with the memory limit of the
 whole pool?


 On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong 
 wrote:
>
> I think the previous answers have been good. I wanted to add a couple
> of side notes for context since I've been doing a lot of work in this area
> of Impala. I could talk about this stuff for hours.
>
> We do have mechanisms, like spilling data to disk or reducing # of
> threads, that kick in to keep queries under the mem_limit. This has 
> existed
> in some form since Impala 2.0, but Impala 2.10 included some architectural
> changes to make this more robust, and we have further improvements in the
> pipeline. The end goal, which we're getting much closer to, is that 
> queries
> should reliably run to completion instead of getting killed after they are
> admitted.
>
> That support is going to enable future enhancements to memory-based
> admission control to make it easier for cluster admins like yourself to
> configure admission control. It is definitely tricky to pick a good value
> for mem_limit when pools can contain a mix of queries and I think Impala 
> can
> do better at making these decisions automatically.
>
> - Tim
>
> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm
>  wrote:
>>
>> For a given query the logic for determining the memory that will be
>> required from admission is:
>> - if the query has mem_limit use that
>> - otherwise, use memory estimates from the planner
>>
>> A query may be assigned a mem_limit by:
>> - taking the default mem_limit from the pool it was submitted to (this
>> is the recommended practice)
>> - manually setting one for the query (in case you want to override the
>> pool default for a single query)
>>
>

Re: Estimate peak memory VS used peak memory

2018-03-04 Thread Fawze Abujaber
Hi Mostafa,

Is this expected behavior or a BUG?

On Wed, 28 Feb 2018 at 20:29 Fawze Abujaber  wrote:

> Hi Mostafa,
>
> I already rollback the version, so i don't know how to get the settings
> and if i can get the query profile fora finished queries in the rollback
> version.
>
> But for example after the upgrade we started to see the following error
> which stopped to see after the rollback: IS NOT NULL predicate does not
> support complex types
>
>
>- IllegalStateException: org.apache.impala.common.AnalysisException:
>IS NOT NULL predicate does not support complex types: participants IS NOT
>NULL CAUSED BY: AnalysisException: IS NOT NULL predicate does not support
>complex types: participants IS NOT NULL
>
>
>
> On Wed, Feb 28, 2018 at 7:56 PM, Mostafa Mokhtar 
> wrote:
>
>> Can you please share the query profiles for the failures you got along
>> with the admission control setting?
>>
>> Thanks
>> Mostafa
>>
>> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber  wrote:
>>
>> Thanks you all for your help and advises.
>>
>> Unfortunately i rolled back the upgrade till i understand how to control
>> impala resources and tackle all the failures that i start to see after the
>> upgrade.
>>
>>
>>
>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber 
>> wrote:
>>
>>> Hi Tim,
>>>
>>> My Goal is : queries that their actual memory per node exceeds more than
>>> what i setup as a default max memory node to fail, despite i have a
>>> different queries in the pool, in the same pool some business queries can
>>> be simple as select count(*) and some others can have few joins.
>>>
>>> And i think this is the right decision and such query should be
>>> optimized.
>>>
>>> And also if i'm looking in my historical queries, i can know from the
>>> max used memory per node which queries will fail, and i think this help me
>>> alot, but i need any other query to queued if it asked actual memory lower
>>> than what i setup as default max memory per node for a query.
>>>
>>> Based on the above i'm looking for the parameters that i need to
>>> configure.
>>>
>>> i don't mind how much time and how much queries will queued, in my case
>>> i don't have any impala query that running beyond 4-5 minutes and 80% of
>>> queries below 1 minute.
>>>
>>> So i don't mind to setup the queue timeout to 20 minutes and max queued
>>> to 20-30 queries per pool.
>>>
>>> I want to make sure no query will fail if it not exceeding the default
>>> memory per node that i setup.
>>>
>>> should i used only the default max memory per node alone? should i
>>> combined it with the max running queries or with the memory limit of the
>>> whole pool?
>>>
>>>
>>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong 
>>> wrote:
>>>
 I think the previous answers have been good. I wanted to add a couple
 of side notes for context since I've been doing a lot of work in this area
 of Impala. I could talk about this stuff for hours.

 We do have mechanisms, like spilling data to disk or reducing # of
 threads, that kick in to keep queries under the mem_limit. This has existed
 in some form since Impala 2.0, but Impala 2.10 included some architectural
 changes to make this more robust, and we have further improvements in the
 pipeline. The end goal, which we're getting much closer to, is that queries
 should reliably run to completion instead of getting killed after they are
 admitted.

 That support is going to enable future enhancements to memory-based
 admission control to make it easier for cluster admins like yourself to
 configure admission control. It is definitely tricky to pick a good value
 for mem_limit when pools can contain a mix of queries and I think Impala
 can do better at making these decisions automatically.

 - Tim

 On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm >>> > wrote:

> For a given query the logic for determining the memory that will be
> required from admission is:
> - if the query has mem_limit use that
> - otherwise, use memory estimates from the planner
>
> A query may be assigned a mem_limit by:
> - taking the default mem_limit from the pool it was submitted to (this
> is the recommended practice)
> - manually setting one for the query (in case you want to override the
> pool default for a single query)
>
> In that setup, the memory estimates from the planner are irrelevant
> for admission decisions and only serve for informational purposes.
> Please do not read too much into the memory estimates from the
> planner. They can be totally wrong (like your 8TB example).
>
>
> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy  wrote:
>
>> Again, the 8TB estimate would not be relevant if the query had a
>> mem_limit set.
>> I think all that we discussed is covered in the docs, but if you feel
>> like specific parts need clarification, please file a jira.
>>
>>