Hi Mostafa, Is this expected behavior or a BUG?
On Wed, 28 Feb 2018 at 20:29 Fawze Abujaber <[email protected]> wrote: > Hi Mostafa, > > I already rollback the version, so i don't know how to get the settings > and if i can get the query profile fora finished queries in the rollback > version. > > But for example after the upgrade we started to see the following error > which stopped to see after the rollback: IS NOT NULL predicate does not > support complex types > > > - IllegalStateException: org.apache.impala.common.AnalysisException: > IS NOT NULL predicate does not support complex types: participants IS NOT > NULL CAUSED BY: AnalysisException: IS NOT NULL predicate does not support > complex types: participants IS NOT NULL > > > > On Wed, Feb 28, 2018 at 7:56 PM, Mostafa Mokhtar <[email protected]> > wrote: > >> Can you please share the query profiles for the failures you got along >> with the admission control setting? >> >> Thanks >> Mostafa >> >> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <[email protected]> wrote: >> >> Thanks you all for your help and advises. >> >> Unfortunately i rolled back the upgrade till i understand how to control >> impala resources and tackle all the failures that i start to see after the >> upgrade. >> >> >> >> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]> >> wrote: >> >>> Hi Tim, >>> >>> My Goal is : queries that their actual memory per node exceeds more than >>> what i setup as a default max memory node to fail, despite i have a >>> different queries in the pool, in the same pool some business queries can >>> be simple as select count(*) and some others can have few joins. >>> >>> And i think this is the right decision and such query should be >>> optimized. >>> >>> And also if i'm looking in my historical queries, i can know from the >>> max used memory per node which queries will fail, and i think this help me >>> alot, but i need any other query to queued if it asked actual memory lower >>> than what i setup as default max memory per node for a query. >>> >>> Based on the above i'm looking for the parameters that i need to >>> configure. >>> >>> i don't mind how much time and how much queries will queued, in my case >>> i don't have any impala query that running beyond 4-5 minutes and 80% of >>> queries below 1 minute. >>> >>> So i don't mind to setup the queue timeout to 20 minutes and max queued >>> to 20-30 queries per pool. >>> >>> I want to make sure no query will fail if it not exceeding the default >>> memory per node that i setup. >>> >>> should i used only the default max memory per node alone? should i >>> combined it with the max running queries or with the memory limit of the >>> whole pool? >>> >>> >>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]> >>> wrote: >>> >>>> I think the previous answers have been good. I wanted to add a couple >>>> of side notes for context since I've been doing a lot of work in this area >>>> of Impala. I could talk about this stuff for hours. >>>> >>>> We do have mechanisms, like spilling data to disk or reducing # of >>>> threads, that kick in to keep queries under the mem_limit. This has existed >>>> in some form since Impala 2.0, but Impala 2.10 included some architectural >>>> changes to make this more robust, and we have further improvements in the >>>> pipeline. The end goal, which we're getting much closer to, is that queries >>>> should reliably run to completion instead of getting killed after they are >>>> admitted. >>>> >>>> That support is going to enable future enhancements to memory-based >>>> admission control to make it easier for cluster admins like yourself to >>>> configure admission control. It is definitely tricky to pick a good value >>>> for mem_limit when pools can contain a mix of queries and I think Impala >>>> can do better at making these decisions automatically. >>>> >>>> - Tim >>>> >>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm <[email protected] >>>> > wrote: >>>> >>>>> For a given query the logic for determining the memory that will be >>>>> required from admission is: >>>>> - if the query has mem_limit use that >>>>> - otherwise, use memory estimates from the planner >>>>> >>>>> A query may be assigned a mem_limit by: >>>>> - taking the default mem_limit from the pool it was submitted to (this >>>>> is the recommended practice) >>>>> - manually setting one for the query (in case you want to override the >>>>> pool default for a single query) >>>>> >>>>> In that setup, the memory estimates from the planner are irrelevant >>>>> for admission decisions and only serve for informational purposes. >>>>> Please do not read too much into the memory estimates from the >>>>> planner. They can be totally wrong (like your 8TB example). >>>>> >>>>> >>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote: >>>>> >>>>>> Again, the 8TB estimate would not be relevant if the query had a >>>>>> mem_limit set. >>>>>> I think all that we discussed is covered in the docs, but if you feel >>>>>> like specific parts need clarification, please file a jira. >>>>>> >>>>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]> >>>>>> wrote: >>>>>> > Sorry for asking many questions, but i see your answers are >>>>>> closing the >>>>>> > gaps that i cannot find in the documentation. >>>>>> > >>>>>> > So how we can explain that there was an estimate for 8T per node >>>>>> and impala >>>>>> > decided to submit this query? >>>>>> > >>>>>> > My goal that each query running beyond the actual limit per node to >>>>>> fail ( >>>>>> > and this is what i setup in the default memory per node per pool) >>>>>> an want >>>>>> > all other queries to be queue and not killed, so what i understand >>>>>> that i >>>>>> > need to setup the max queue query to unlimited and the queue >>>>>> timeout to >>>>>> > hours. >>>>>> > >>>>>> > And in order to reach that i need to setup the default memory per >>>>>> node for >>>>>> > each pool and setting either max concurrency or the max memory per >>>>>> pool that >>>>>> > will help to measure the max concurrent queries that can run in >>>>>> specific >>>>>> > pool. >>>>>> > >>>>>> > I think reaching this goal will close all my gaps. >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote: >>>>>> >> >>>>>> >> > Do queuing query or not is based on the prediction which based >>>>>> on the >>>>>> >> > estimate and of course the concurrency that can run in a pool. >>>>>> >> >>>>>> >> Yes, it is. >>>>>> >> >>>>>> >> > If I have memory limit per pool and memory limit per node for a >>>>>> pool, so >>>>>> >> > it >>>>>> >> > can be used to estimate number of queries that can run >>>>>> concurrently, is >>>>>> >> > this >>>>>> >> > also based on the prediction and not the actual use. >>>>>> >> >>>>>> >> Also on prediction. >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> >>> >> >
