Alexander, We received a response regarding the issue you're facing from another Impala contributor, which fell off list. It's addressed inline below:
"Based on what he is describing it seems like IMPALA-5302 and IMPALA-4923 > are in play. > To verify will need a couple of query profiles, then a print+screen from > "sudo perf top" from one of the machines after letting it run for a couple > 10 seconds while the queries are running." On Wed, Sep 6, 2017 at 3:12 AM, Alexander Shoshin < [email protected]> wrote: > Hi, > > > > I guess my previous letter might not been delivered. > > > > Could you suggest is it possible to verify if the issue > https://issues.apache.org/jira/browse/IMPALA-4923 affects my queries or > not? > > > > Thanks, > > Alexander > > > > > > *From:* Alexander Shoshin > *Sent:* Monday, September 04, 2017 1:21 PM > *To:* [email protected] > *Cc:* Special SBER-BPOC Team <[email protected]> > *Subject:* RE: Bottleneck > > > > Hi guys, > > thanks for advices! > > > > Tim, > > here are some more details about my cluster: I use CDH 5.11.1, Impala > 2.8.0, each machine in cluster has 80 logical CPU cores and 700 GB of RAM. > It’s hard to provide all queries profiles here, there are 28 of them. > Different queries use from 10 MB to 7000 MB of RAM on each cluster node. > Moreover if I use only “heavy” queries which consume several GB of RAM > Impala starts use all available memory and some queries fails with “out of > memory”. But I can’t force Impala use all memory with “light” queries. > > > > It looks like https://issues.apache.org/jira/browse/IMPALA-5302 as a part > of https://issues.apache.org/jira/browse/IMPALA-4923 might be a reason. I > know that the most reliable way to check whether this issue affects me or > not is to update Impala to 2.9.0. But it’s not so easy for me because I > don’t have all necessary administrative privileges. Is there a way to > verify if this issue affects me or not? Or maybe there is a way to make > some patch for this issue so I don’t need to update an Impala? > > > > Alexander, > > I am using all Impala daemons as coordinators. Each new query goes to a > next coordinator in a list. I have tried to increase fe_service_threads > from 64 up to 120 but there is still the same behavior. I have also tried > to change be_service_threads, num_threads_per_core, num_hdfs_worker_threads > — no result. > > > > Silvius, > > I will try to use this command, thanks. > > > > Regards, > > Alexander > > > > > > *From:* Silvius Rus [mailto:[email protected] <[email protected]>] > *Sent:* Friday, September 01, 2017 11:11 PM > *To:* [email protected] > *Cc:* Special SBER-BPOC Team <[email protected]> > *Subject:* Re: Bottleneck > > > > One piece of information that might help is to run "perf top" on the > machine with the highest CPU usage. > > > > On Fri, Sep 1, 2017 at 9:57 AM, Alexander Behm <[email protected]> > wrote: > > Are you submitting all queries to the same coordinator? If so, you might > have to increase the --fe_service_threads to allow more concurrent > connections. > > That said the single coordinator will eventually become a bottleneck, so > we recommend submitting queries to different impalads. > > > > On Fri, Sep 1, 2017 at 9:41 AM, Tim Armstrong <[email protected]> > wrote: > > Hi Alexander, > > It's hard to know based on the information available. Query profiles > often provide some clues here. I agree Impala would be able to max out one > of the resources in most circumstances. > > On Impala 2.8 and earlier we saw behaviour similar to what you described > when running queries with selective scans on machines with many cores: > https://issues.apache.org/jira/browse/IMPALA-4923 . The bottleneck there > was lock contention during memory allocation - the threads spent a lot of > time asleep waiting to get a shared lock. > > > > On Fri, Sep 1, 2017 at 8:36 AM, Alexander Shoshin < > [email protected]> wrote: > > Hi, > > > > I am working with Impala trying to find its maximum throughput on my > hardware. I have a cluster under Cloudera Manager which consists of 7 > machines (1 master node + 6 worker nodes). > > > > I am running queries on Impala using JDBC. I’ve reached maximum throughput > equals 80 finished queries per minute. It doesn’t grow up no matter how > many hundreds of concurrent queries I send. But the strange thing is that > no one of resources (memory, CPU, disk read/write, net send/received) > hasn’t reached its maximum. They are used less than on a half. > > > > Could you suppose what can be a bottleneck? May it be some Impala setting > that limits performance or maximum concurrent threads? The mem_limit option > for my Impala daemons is about 70% of available machine memory. > > > > Thanks, > > Alexander > > > > > > >
