Thanks for getting back Mostafa. IO rates while Presto was running seemed to be
higher (60-125 MB/s). If it was a slowdown due to GC / HDFS, I’d imagine some
of the Presto runs would be affected but that doesn’t seem to be the case.
We’ve run the query on a few occasions on the same day (fairly close to each
other) and also on different days to reduce the likelihood of the fs cache and
while we do see subsequent runs being a little quicker on Presto & Impala, it
always seems like the Impala run is much slower by 2x – 3x. I’ll check out the
GC metrics on our NameNode to be double check and get back with that.
The number of partitions & files is an interesting dimension. The data is
partitioned in this fashion (currently only have one day’s worth of data):
partitionkey1={0|1}/day=2017-10-04/hour={00-23}/platform={US|EU|AS}/partitionKey2={0|1}/partitionKey3={0|1}
2 * 1 * 24 * 3 * 2 * 2 = 576
Looking at the number of files we end up touching as part of the day=2017-10-04
query, it is: ~15K. Do you think this is on the high end given we have only 4
machines (with 48 cores each?). I noticed that there’s a start-up Impala option
“-num_hdfs_worker_threads”. I didn’t find much documentation on this, but
wondering if bumping this up would potentially help? (Could try this out
tomorrow).
Thanks,
-- Piyush
From: Mostafa Mokhtar <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, December 6, 2017 at 5:51 PM
To: "[email protected]" <[email protected]>
Subject: Re: Debugging slow Impala hdfs-scans
The query profile shows that a lot of time is spent in "- TotalStorageWaitTime:
20h4m", usually this is an indicator that Impala is waiting on IO from HDFS.
Number of files can also be an issue, I recommend checking GC time for the HDFS
NameNode.
The filter is not selective so it is interesting to see that Presto is running
the same query faster, did you observe IO rates while Presto was running or the
data was read from the file system cache?
On Wed, Dec 6, 2017 at 2:01 PM, Piyush Narang
<[email protected]<mailto:[email protected]>> wrote:
Hi folks,
I’m trying to debug why the performance of our new Impala setup is performing a
bit worse on the same queries as Presto. We’re running Impala 2.10.0 and
starting it up with 225G of memory (-mem-limit). It runs on the same set of
nodes as Presto (4 nodes, 48 cores each, 10G ethernet). We noticed a couple of
queries in Impala were fairly slow in the hdfs-scan stage so we tried to
isolate the behavior with a slightly simpler query:
select max(hour + nb_display) from my_large_parquet_table where day =
'2017-10-04';
This query takes around 28-30 mins on Impala and seems to run in around 8.5
mins on Presto. The input table is 11.64TB and uses Parquet (snappy compressed).
Looking at the performance profile (attached in case anyone’s interested), I
noticed that during the query Impala seems to be averaging around 1.1 - 1.2
MB/s per thread and the total read throughput is around 2.13 MB/s. I tried
bumping up the number of scanner threads (from 48 to 96) but that seemed to
only help marginally (improved runtimes by ~10s). Running “sar –n DEV 1” on
some of our hosts while the query is running, seems to show that Impala is
reading at a rate of 30-50 MB/s (whereas we see this go up to 60-125 MB/s on
our other runs).
We haven’t tweaked our Impala setup much beyond the defaults that come out of
the box. I’m wondering if I’m missing some tuning settings that help improve
read rates from HDFS when Impala is running outside of the datanodes. If anyone
has any ideas / suggestions, they’d be welcome. I am happy to provide more
details if needed as well.
Thanks,
-- Piyush