Hi Matthew,
        Thanks for your reply. The point is , per admission control documents, 
most of impala limits are "soft limit", are the 2 settings you mentioned also 
"soft limit" ? The soft limit means the pool will exceed the memory/concurrency 
limit at some moment when impala is not aware of. But it is affecting other 
pool at that moment. 
        Thanks.

Songbo

-----邮件原件-----
发件人: Matthew Jacobs [mailto:[email protected]] 
发送时间: 2016年7月19日 0:44
收件人: [email protected]
主题: Re: Impala user resource isolation best practice

By the way, some of the controls I mentioned were added in Impala 2.5, so you 
should consider upgrading if you're not already using a newer version of Impala.

Thanks,
Matt

On Mon, Jul 18, 2016 at 9:20 AM, Matthew Jacobs <[email protected]> wrote:
> Hi Songbo,
>
> Right now the best you can do is with admission control with:
> (a) a single coordinator to avoid the possibility of over-admitting by 
> different coordinators
> (b) setting default query mem limits so that individual queries are 
> limited
>
> For your scenario, I'd recommend setting up 2 pools, one for user A 
> and a second for user B. Set the max number of running queries for 
> user A to something reasonable for the concurrency for that workload.
> Set the max memory for the user B pool to the portion of cluster 
> memory you're willing to give to those queries. (Notice the pool with 
> the small queries has the max number of running queries set and the 
> pool with the fewer but larger big queries has the max memory set -- 
> that is intentional, the former is faster for admission but doesn't 
> limit based on memory.) How well this will work depends on how well 
> you can pick good numbers for these settings, which can be difficult 
> and requires studying your workload.
>
> This isn't perfect resource isolation because rogue queries can still 
> consume too much CPU or other resources, but it's the best you'll be 
> able to do right now. In the future we will have better tools to make 
> this easier.
>
> Best,
> Matt
>
> On Mon, Jul 18, 2016 at 2:59 AM, 廖松博 <[email protected]> wrote:
>> Hello guys,
>>
>>
>>
>>        My Company is using Cloudera Impala as our basic 
>> infrastructure for online data analysis. The most difficult part we 
>> met is resource isolation and instability.
>>
>> According to our experiences in Impala, some big query which consume 
>> a vast amount of memory will crash impalad process(actually as worker 
>> but not coordinator, right?).
>>
>> In our simplest scenario, user A is a very important customer and his 
>> queries are relatively small, user B is a unimportant user who may 
>> issue very large SQL to impala. It is unacceptable that the big query 
>> from user B crash the impalad process and affect the user experiences 
>> of user A. So resource isolation is the point.
>>
>> But per the Impala documents :
>> http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_
>> admission.html , Impala resource isolation is soft limit, cannot 
>> strictly prevent query from user B affecting user A.
>>
>> As I know llama(run impala with yarn) is not recommended and we 
>> actually tried it but disappointed about the performance and accuracy.
>>
>>        Is there any best practice for user resource isolation? So 
>> different user will not affect each other.
>>
>>        Thanks.
>>
>>
>>
>> Best Regards,
>>
>> Songbo

Reply via email to