A couple of quick things to check.

Look at the query plans of the queries you are interested in. In general expect 
the query times to improve (not quite linear though) for larger data sets, and 
where you may have lots of concurrency as the planning duties, execution, etc 
will be spread out on the cluster.

Expect query planning time to increase for a cluster
- You can offset that by using metadata caching

For Major query fragments that are broken up into many minor fragments you can 
expect that this will improve close to linear due to more parallel minor 
fragments due to bigger cluster ( 5 vs 1 node). Make sure that queries and data 
sets are optimized to parallelize in a cluster, see Drill Best Practices.

For exchange operators you should expect an increase in time due to network and 
a cluster vs single node.


These links will help a lot for some best practices to optimize your cluster.
https://community.mapr.com/docs/DOC-1497 
<https://community.mapr.com/docs/DOC-1497>
https://community.mapr.com/thread/18549 
<https://community.mapr.com/thread/18549>

--Andries



> On Jul 30, 2016, at 6:38 AM, Nicolas Paris <[email protected]> wrote:
> 
> Hey,
> 
> I have run tests on drill on a standalone installation (1 computer 8
> core/32GO ram).
> I will get soon a 5 computer cluster (8 core/96GO ram each).
> Is it possible to get an estimation of the performance gain ?
> Is it linear ? Will the performance get better ? Worst ?
> 
> I just want an estimation/extrapolation.
> 
> Thanks !

Reply via email to