I’ll second Adnries’ comment about measurable performance in AWS : you should 
not expect consistency there (especially with instance types that are smaller 
than a physical server, such as the c3.xlarge instances you’re using).

How does the memory utilization look during your queries ?   Memory pressures 
often manifest as CPU loading, especially in the pathological case of excessive 
Java garbage collection.   Drill does an excellent job of separating the data 
being queried from the traditional Java heap … but there can still be some 
pressure there.   Check the drillbit logs and see if GC’s are occuring more 
frequently as your query count goes up.

— David


On Mar 25, 2015, at 8:09 AM, Andries Engelbrecht <[email protected]> 
wrote:

> What version of Drill are you running?
> 
> It sounds like you are CPU bound, and the query time increases 10x with a 30x 
> increase in concurrency (which looks pretty good at first glance)
> At a high level this seems to be pretty reasonable, hard to give more 
> specifics without seeing the query profiles. What is consuming the most time 
> (and resource) in the query profiles? Perhaps there are some gains to be had 
> in optimizing the queries.
> 
> If the cluster is primarily used for Drill you may want to adjust the 
> planner.width.max_per_node system parameter to consume more of the cores on 
> the nodes.
> See what the current setting in in sys.options, and adjust to no more than 
> the number of cores on the node. Experimenting with this may help a bit.
> You also may want to experiment with planner.width.max_per_query.
> I have not looked into the queue mechanisms in detail yet, but it doesn’t 
> seem that the cluster is having issues with how it is managing concurrency.
> 
> Keep in mind AWS can be inconsistent in terms of performance, so hard to 
> measure exacts on a cloud platform.
> 
> —Andries
> 
> On Mar 25, 2015, at 5:44 AM, Adam Gilmore <[email protected]> wrote:
> 
>> Hi all,
>> 
>> I'm doing some testing on query performance, especially in a clustered
>> environment.
>> 
>> The test data is 5 Parquet files with 2.2 million records in each file
>> (total of ~11m).
>> 
>> The cluster is an Amazon EMR cluster with a total of 10 drillbits
>> (c3.xlarge instances).
>> 
>> A single SUM() with a GROUP BY results in a ~700ms query.
>> 
>> We setup about 30 agents running a query every second (total 30 queries per
>> second) and the performance drops to queries at about 6-7 seconds.
>> 
>> The bottleneck seems to be entirely CPU based - all drillbits' CPUs are
>> fairly swamped.
>> 
>> Looking at the plans, the Parquet scan still performs fairly well, but the
>> hash aggregate gets gradually slower and slower (obviously competing for
>> CPU time).
>> 
>> Is this the expected query times for such a setup?  Is there anything
>> further I can investigate to gain more performance?
> 

Reply via email to