Hi Matthew,
Thanks for your reply. The point is , per admission control documents,
most of impala limits are "soft limit", are the 2 settings you mentioned also
"soft limit" ? The soft limit means the pool will exceed the memory/concurrency
limit at some moment when impala is not aware of. But it is affecting other
pool at that moment.
Thanks.
Songbo
-----邮件原件-----
发件人: Matthew Jacobs [mailto:[email protected]]
发送时间: 2016年7月19日 0:44
收件人: [email protected]
主题: Re: Impala user resource isolation best practice
By the way, some of the controls I mentioned were added in Impala 2.5, so you
should consider upgrading if you're not already using a newer version of Impala.
Thanks,
Matt
On Mon, Jul 18, 2016 at 9:20 AM, Matthew Jacobs <[email protected]> wrote:
> Hi Songbo,
>
> Right now the best you can do is with admission control with:
> (a) a single coordinator to avoid the possibility of over-admitting by
> different coordinators
> (b) setting default query mem limits so that individual queries are
> limited
>
> For your scenario, I'd recommend setting up 2 pools, one for user A
> and a second for user B. Set the max number of running queries for
> user A to something reasonable for the concurrency for that workload.
> Set the max memory for the user B pool to the portion of cluster
> memory you're willing to give to those queries. (Notice the pool with
> the small queries has the max number of running queries set and the
> pool with the fewer but larger big queries has the max memory set --
> that is intentional, the former is faster for admission but doesn't
> limit based on memory.) How well this will work depends on how well
> you can pick good numbers for these settings, which can be difficult
> and requires studying your workload.
>
> This isn't perfect resource isolation because rogue queries can still
> consume too much CPU or other resources, but it's the best you'll be
> able to do right now. In the future we will have better tools to make
> this easier.
>
> Best,
> Matt
>
> On Mon, Jul 18, 2016 at 2:59 AM, 廖松博 <[email protected]> wrote:
>> Hello guys,
>>
>>
>>
>> My Company is using Cloudera Impala as our basic
>> infrastructure for online data analysis. The most difficult part we
>> met is resource isolation and instability.
>>
>> According to our experiences in Impala, some big query which consume
>> a vast amount of memory will crash impalad process(actually as worker
>> but not coordinator, right?).
>>
>> In our simplest scenario, user A is a very important customer and his
>> queries are relatively small, user B is a unimportant user who may
>> issue very large SQL to impala. It is unacceptable that the big query
>> from user B crash the impalad process and affect the user experiences
>> of user A. So resource isolation is the point.
>>
>> But per the Impala documents :
>> http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_
>> admission.html , Impala resource isolation is soft limit, cannot
>> strictly prevent query from user B affecting user A.
>>
>> As I know llama(run impala with yarn) is not recommended and we
>> actually tried it but disappointed about the performance and accuracy.
>>
>> Is there any best practice for user resource isolation? So
>> different user will not affect each other.
>>
>> Thanks.
>>
>>
>>
>> Best Regards,
>>
>> Songbo