Looks like a very different question than the original one on this thread, it would be better to start a new thread for a new question. Keep in mind that you are likely to get quicker answers (from yourself) by checking the behaviour against the documentation. If there is a bug (sounds possible), it might have already been found, searching around issues.apache.org will tell, along with fix version (if any).
HTH On 4 March 2018 at 21:35, Fawze Abujaber <[email protected]> wrote: > Hi Mostafa, > > Is this expected behavior or a BUG? > > On Wed, 28 Feb 2018 at 20:29 Fawze Abujaber <[email protected]> wrote: >> >> Hi Mostafa, >> >> I already rollback the version, so i don't know how to get the settings >> and if i can get the query profile fora finished queries in the rollback >> version. >> >> But for example after the upgrade we started to see the following error >> which stopped to see after the rollback: IS NOT NULL predicate does not >> support complex types >> >> IllegalStateException: org.apache.impala.common.AnalysisException: IS NOT >> NULL predicate does not support complex types: participants IS NOT NULL >> CAUSED BY: AnalysisException: IS NOT NULL predicate does not support complex >> types: participants IS NOT NULL >> >> >> >> On Wed, Feb 28, 2018 at 7:56 PM, Mostafa Mokhtar <[email protected]> >> wrote: >>> >>> Can you please share the query profiles for the failures you got along >>> with the admission control setting? >>> >>> Thanks >>> Mostafa >>> >>> On Feb 28, 2018, at 9:28 AM, Fawze Abujaber <[email protected]> wrote: >>> >>> Thanks you all for your help and advises. >>> >>> Unfortunately i rolled back the upgrade till i understand how to control >>> impala resources and tackle all the failures that i start to see after the >>> upgrade. >>> >>> >>> >>> On Fri, Feb 23, 2018 at 8:22 PM, Fawze Abujaber <[email protected]> >>> wrote: >>>> >>>> Hi Tim, >>>> >>>> My Goal is : queries that their actual memory per node exceeds more than >>>> what i setup as a default max memory node to fail, despite i have a >>>> different queries in the pool, in the same pool some business queries can >>>> be >>>> simple as select count(*) and some others can have few joins. >>>> >>>> And i think this is the right decision and such query should be >>>> optimized. >>>> >>>> And also if i'm looking in my historical queries, i can know from the >>>> max used memory per node which queries will fail, and i think this help me >>>> alot, but i need any other query to queued if it asked actual memory lower >>>> than what i setup as default max memory per node for a query. >>>> >>>> Based on the above i'm looking for the parameters that i need to >>>> configure. >>>> >>>> i don't mind how much time and how much queries will queued, in my case >>>> i don't have any impala query that running beyond 4-5 minutes and 80% of >>>> queries below 1 minute. >>>> >>>> So i don't mind to setup the queue timeout to 20 minutes and max queued >>>> to 20-30 queries per pool. >>>> >>>> I want to make sure no query will fail if it not exceeding the default >>>> memory per node that i setup. >>>> >>>> should i used only the default max memory per node alone? should i >>>> combined it with the max running queries or with the memory limit of the >>>> whole pool? >>>> >>>> >>>> On Fri, Feb 23, 2018 at 8:08 PM, Tim Armstrong <[email protected]> >>>> wrote: >>>>> >>>>> I think the previous answers have been good. I wanted to add a couple >>>>> of side notes for context since I've been doing a lot of work in this area >>>>> of Impala. I could talk about this stuff for hours. >>>>> >>>>> We do have mechanisms, like spilling data to disk or reducing # of >>>>> threads, that kick in to keep queries under the mem_limit. This has >>>>> existed >>>>> in some form since Impala 2.0, but Impala 2.10 included some architectural >>>>> changes to make this more robust, and we have further improvements in the >>>>> pipeline. The end goal, which we're getting much closer to, is that >>>>> queries >>>>> should reliably run to completion instead of getting killed after they are >>>>> admitted. >>>>> >>>>> That support is going to enable future enhancements to memory-based >>>>> admission control to make it easier for cluster admins like yourself to >>>>> configure admission control. It is definitely tricky to pick a good value >>>>> for mem_limit when pools can contain a mix of queries and I think Impala >>>>> can >>>>> do better at making these decisions automatically. >>>>> >>>>> - Tim >>>>> >>>>> On Fri, Feb 23, 2018 at 9:05 AM, Alexander Behm >>>>> <[email protected]> wrote: >>>>>> >>>>>> For a given query the logic for determining the memory that will be >>>>>> required from admission is: >>>>>> - if the query has mem_limit use that >>>>>> - otherwise, use memory estimates from the planner >>>>>> >>>>>> A query may be assigned a mem_limit by: >>>>>> - taking the default mem_limit from the pool it was submitted to (this >>>>>> is the recommended practice) >>>>>> - manually setting one for the query (in case you want to override the >>>>>> pool default for a single query) >>>>>> >>>>>> In that setup, the memory estimates from the planner are irrelevant >>>>>> for admission decisions and only serve for informational purposes. >>>>>> Please do not read too much into the memory estimates from the >>>>>> planner. They can be totally wrong (like your 8TB example). >>>>>> >>>>>> >>>>>> On Fri, Feb 23, 2018 at 3:47 AM, Jeszy <[email protected]> wrote: >>>>>>> >>>>>>> Again, the 8TB estimate would not be relevant if the query had a >>>>>>> mem_limit set. >>>>>>> I think all that we discussed is covered in the docs, but if you feel >>>>>>> like specific parts need clarification, please file a jira. >>>>>>> >>>>>>> On 23 February 2018 at 11:51, Fawze Abujaber <[email protected]> >>>>>>> wrote: >>>>>>> > Sorry for asking many questions, but i see your answers are >>>>>>> > closing the >>>>>>> > gaps that i cannot find in the documentation. >>>>>>> > >>>>>>> > So how we can explain that there was an estimate for 8T per node >>>>>>> > and impala >>>>>>> > decided to submit this query? >>>>>>> > >>>>>>> > My goal that each query running beyond the actual limit per node to >>>>>>> > fail ( >>>>>>> > and this is what i setup in the default memory per node per pool) >>>>>>> > an want >>>>>>> > all other queries to be queue and not killed, so what i understand >>>>>>> > that i >>>>>>> > need to setup the max queue query to unlimited and the queue >>>>>>> > timeout to >>>>>>> > hours. >>>>>>> > >>>>>>> > And in order to reach that i need to setup the default memory per >>>>>>> > node for >>>>>>> > each pool and setting either max concurrency or the max memory per >>>>>>> > pool that >>>>>>> > will help to measure the max concurrent queries that can run in >>>>>>> > specific >>>>>>> > pool. >>>>>>> > >>>>>>> > I think reaching this goal will close all my gaps. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Fri, Feb 23, 2018 at 11:49 AM, Jeszy <[email protected]> wrote: >>>>>>> >> >>>>>>> >> > Do queuing query or not is based on the prediction which based >>>>>>> >> > on the >>>>>>> >> > estimate and of course the concurrency that can run in a pool. >>>>>>> >> >>>>>>> >> Yes, it is. >>>>>>> >> >>>>>>> >> > If I have memory limit per pool and memory limit per node for a >>>>>>> >> > pool, so >>>>>>> >> > it >>>>>>> >> > can be used to estimate number of queries that can run >>>>>>> >> > concurrently, is >>>>>>> >> > this >>>>>>> >> > also based on the prediction and not the actual use. >>>>>>> >> >>>>>>> >> Also on prediction. >>>>>>> > >>>>>>> > >>>>>> >>>>>> >>>>> >>>> >>> >> >
