Joe McDonnell created IMPALA-6567: ------------------------------------- Summary: Functional dataload is intermittently super-slow Key: IMPALA-6567 URL: https://issues.apache.org/jira/browse/IMPALA-6567 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.12.0 Reporter: Joe McDonnell
Recent GVO builds intermittently have a functional dataload of almost 2 hours when it used to be ~30-35 minutes: *02:12:15* Loading TPC-DS data (logging to /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... *02:34:27* Loading workload 'tpch' using exploration strategy 'core' OK (Took: 22 min 12 sec)*02:34:35* Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 22 min 20 sec)*04:11:40* Loading workload 'functional-query' using exploration strategy 'exhaustive' OK (Took: 119 min 25 sec) This has happened on multiple runs (including some in progress): [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1370/] [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1382/] [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1383/] (missing some logs due to abort) [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1384/] (in progress) [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1385/] (in progress) Dataload creates a SQL script that invalidates each table created using an "invalidate metadata ${tablename}" command. There are 830 "invalidate metadata ${tablename}" calls in the invocation of this script (see IMPALA-6386 for why we do invalidate at the table level). Even so, this script should execute very quickly. The impalad.INFO from the 1370 run shows that this script is taking a long time. The first invalidate metadata for functional tables is at 2:41 and the last invalidate metadata for this run of the invalidate script is at 3:17. The invalidate script runs twice. The second run begins at 3:19 and finishes at 4:11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)