Hello guys,
My Company is using Cloudera Impala as our basic infrastructure for
online data analysis. The most difficult part we met is resource isolation and
instability.
According to our experiences in Impala, some big query which consume a vast
amount of memory will crash impalad process(actually as worker but not
coordinator, right?).
In our simplest scenario, user A is a very important customer and his queries
are relatively small, user B is a unimportant user who may issue very large SQL
to impala. It is unacceptable that the big query from user B crash the impalad
process and affect the user experiences of user A. So resource isolation is the
point.
But per the Impala documents :
http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_admission.html
, Impala resource isolation is soft limit, cannot strictly prevent query from
user B affecting user A.
As I know llama(run impala with yarn) is not recommended and we actually tried
it but disappointed about the performance and accuracy.
Is there any best practice for user resource isolation? So different
user will not affect each other.
Thanks.
Best Regards,
Songbo