The most consuming of the operators is the hash aggregate, followed by the Parquet group scan, which makes a fair bit of sense.
The memory utilization on the boxes is about 6-7GB out of the possible 8GB (so usually have some free memory). I checked the logs but didn't see any alerts re GC. Do I need debug logging for this or? On Thu, Mar 26, 2015 at 1:38 AM, David Tucker <[email protected]> wrote: > I’ll second Adnries’ comment about measurable performance in AWS : you > should not expect consistency there (especially with instance types that > are smaller than a physical server, such as the c3.xlarge instances you’re > using). > > How does the memory utilization look during your queries ? Memory > pressures often manifest as CPU loading, especially in the pathological > case of excessive Java garbage collection. Drill does an excellent job of > separating the data being queried from the traditional Java heap … but > there can still be some pressure there. Check the drillbit logs and see > if GC’s are occuring more frequently as your query count goes up. > > — David > > > On Mar 25, 2015, at 8:09 AM, Andries Engelbrecht < > [email protected]> wrote: > > > What version of Drill are you running? > > > > It sounds like you are CPU bound, and the query time increases 10x with > a 30x increase in concurrency (which looks pretty good at first glance) > > At a high level this seems to be pretty reasonable, hard to give more > specifics without seeing the query profiles. What is consuming the most > time (and resource) in the query profiles? Perhaps there are some gains to > be had in optimizing the queries. > > > > If the cluster is primarily used for Drill you may want to adjust the > planner.width.max_per_node system parameter to consume more of the cores on > the nodes. > > See what the current setting in in sys.options, and adjust to no more > than the number of cores on the node. Experimenting with this may help a > bit. > > You also may want to experiment with planner.width.max_per_query. > > I have not looked into the queue mechanisms in detail yet, but it > doesn’t seem that the cluster is having issues with how it is managing > concurrency. > > > > Keep in mind AWS can be inconsistent in terms of performance, so hard to > measure exacts on a cloud platform. > > > > —Andries > > > > On Mar 25, 2015, at 5:44 AM, Adam Gilmore <[email protected]> wrote: > > > >> Hi all, > >> > >> I'm doing some testing on query performance, especially in a clustered > >> environment. > >> > >> The test data is 5 Parquet files with 2.2 million records in each file > >> (total of ~11m). > >> > >> The cluster is an Amazon EMR cluster with a total of 10 drillbits > >> (c3.xlarge instances). > >> > >> A single SUM() with a GROUP BY results in a ~700ms query. > >> > >> We setup about 30 agents running a query every second (total 30 queries > per > >> second) and the performance drops to queries at about 6-7 seconds. > >> > >> The bottleneck seems to be entirely CPU based - all drillbits' CPUs are > >> fairly swamped. > >> > >> Looking at the plans, the Parquet scan still performs fairly well, but > the > >> hash aggregate gets gradually slower and slower (obviously competing for > >> CPU time). > >> > >> Is this the expected query times for such a setup? Is there anything > >> further I can investigate to gain more performance? > > > >
