Hi Andrey, Can you please share some of the query profiles for us to analyze? When running queries with a large number of scanner threads you are likely to hit IMPALA-5302 and IMPALA-4923 which have been fixed in Impala 2.9.
Also can you run "sudo perf top" for 30 seconds or so then share a print screen? Thanks Mostafa ---------- Forwarded message ---------- > From: Andrey Kuznetsov <[email protected]> > Date: Tue, Sep 5, 2017 at 8:12 AM > Subject: [Impala] Performance strange behavior > To: "[email protected]" <[email protected]> > Cc: Special SBER-BPOC Team <[email protected]> > > > Hi folk, > > Need you experience. > > I conduct performance testing of Impala+Parquet on 3,6, and 8 data nodes. > Throughput is presented below for each configuration: > > > > [image: cid:[email protected]] > > > > 1. I am wonder why throughput for 1+8 for threads >100 less then > throughput for 1+6. Do you know why it happens? > > 2. Do you know how we can explain throughput degradation after > threads > 80? Threads concurrency? > > > > Settings (755Gb RAM per host, 70 cores per host, 10Gbit/sec network) is > the same for all configurations, impala daemons run on each data node with > 500Gb memory limit, there are no queues, there are no any bottleneck in > resources (CPU/disk/net/RAM, plots are attached). > > > > Best regards, > > *ANDREY KUZNETSOV* > > *Software Engineering Team Leader* > > > >
