Philip Zeyliger created IMPALA-7980:
---------------------------------------

             Summary: High system CPU time usage (and waste) when runtime 
filters filter out files
                 Key: IMPALA-7980
                 URL: https://issues.apache.org/jira/browse/IMPALA-7980
             Project: IMPALA
          Issue Type: Task
            Reporter: Philip Zeyliger


When running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster 
with {{replica_preference=remote}}, we observed really high system CPU usage 
for some of the scan nodes:
{code}
HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non-
child: 100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s
{code}
Using {perf}, we discovered a lot of usage of {futex_wait} and 
{pthread_cond_wait} and so on. (We also used perf to record context switches 
and cycles.) Interestingly, observing in top saw the really high system CPU 
usage spike some time into the query.

We believe what's going on is that we start many ScannerThread instances, which 
wait first until initial ranges have been issued and then grab data using 
{impala::io::ScanRange::GetNext()}. They do this in a loop, and it uses two 
locks, until the query is done or there are no {{num_unqueued_files_}} left. If 
num_unqueued_files_ is left above zero, then these threads just loop through 
two lock acquisitions and nothing else. We believe that this hot loop is eating 
system CPU aggressively.

It's a bit interesting that this is exacerbated in the case with more remote 
reads. Our best guess is that some of the reads take significantly longer in 
this case, and a single outlier can extend this period of waste.







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to