Hello,

I've done some tests with concurrent queries running in the same queue and it 
seems the runtime always receive a penalty compared to a single query running, 
even if we have plenty of resources available to run multiple queries 
concurrently in that queue. In addition, I tried that by running the same query 
concurrently in different queues and the runtime received a penalty as well. 
For instance, if a query runtime running alone in the cluster is, let's say, 10 
seconds, it seems that I can only achieve that performance when no other query 
is running in the cluster, otherwise there is a penalty of around 50% in 
runtime.


Other behavior I noticed was that the result of queries is being released in 
"batches" according with the concurrency I have set (that might even be the 
reason for the aforementioned issue). For example, if I run a single query, I 
get the answer in 1 second, but if I run 2 of that same query concurrently the 
results of both queries will come after 1.5s. If I run 5 of the same query 
concurrently, then the result of all 5 queries is going to come back at the 
same time after 3 seconds, and so on. The penalty in runtime always increases 
with the number of queries I run concurrently, because the first issued query 
apparently has to wait the whole "batch" to execute.


Could you please explain what is happening under the hood that might be driving 
this behavior? Is this supposed to work this way in the latest version of 
Impala we have or is there something in the configuration I can use to prevent 
this? Is that planned for Impala's roadmap?


Thanks,

Paulo.



Reply via email to