Clarification on query execution

Vibhath Ileperuma Tue, 04 Jan 2022 05:24:56 -0800

Hi All,

We are having a distributed impala cluster with 6 impala executors and
there is a kudu table with 3x replication. This table is hash partitioned
by 3 columns (say HC1, HC2, HC3) and there are 2 partitions available from
each column. Also this table is range partitioned by another column (RC1).
Ex:
[image: image.png]


We are executing the following sql on this table.
[image: image.png]

When checking the query profile of this sql from impala web UI, we noticed
that only two executors are used to execute this sql.

   1. Why impala doesn't use all 6 executors for execution? Since we have
   given conditions for HC1 and HC3 in where clause, does impala use only two
   executors to scan two partitions of HC2? Is there a way to distribute the
   sql to all 6 executors?
   2. When executing this type of multiple sqls in parallel, we noticed all
   the queries use same two executors causing high memory usage. To avoid this
   I set the 'replica_preference' query option to 'remote' and then the
   queries dispatched to different two executors. However it introduced a
   small delay in execution time. Is there a way to avoid this high memory
   usage without introducing a delay to execution time.

Thank You.

Best Regards,

Vibhath.

Clarification on query execution

Reply via email to