[ https://issues.apache.org/jira/browse/IMPALA-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Behm resolved IMPALA-6567. ------------------------------------ Resolution: Fixed Fix Version/s: Impala 2.12.0 commit ad91e0b04cedb84b5b08c810de4ab1a5555ef036 Author: Alex Behm <alex.b...@cloudera.com> Date: Thu Feb 22 21:07:27 2018 -0800 IMPALA-6567: ResetMetadataStmt analysis should not load tables. This fixes a regression introduced by IMPALA-5152 where invalidate metadata <tbl> and refresh <tbl> accidentally required the target table to be loaded during analysis, ultimately leading to a double load in some situations (load during analysis, then another load during execution). Since the purpose of these statements is to reload metadata it does not make sense to require a table load during analysis - that load happens during execution. Note that REFRESH <tbl> PARTITION (<partition>) still requires the containing table to be loaded. This was the behavior before IMPALA-5152 and this patch does not attempt to improve that. Testing: - added new unit test - ran FE tests locally - validated the desired behavior by inspecting logs and the timeine from invalidate/refresh statements Change-Id: I7033781ebf27ea53cfd26ff0e4f74d4f242bd1dc Reviewed-on: http://gerrit.cloudera.org:8080/9418 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <alex.b...@cloudera.com> > Functional dataload is intermittently super-slow > ------------------------------------------------ > > Key: IMPALA-6567 > URL: https://issues.apache.org/jira/browse/IMPALA-6567 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.12.0 > Reporter: Joe McDonnell > Assignee: Alexander Behm > Priority: Blocker > Fix For: Impala 2.12.0 > > > Recent GVO builds intermittently have a functional dataload of almost 2 hours > when it used to be ~30-35 minutes: > ** > {noformat} > 02:12:15 Loading TPC-DS data (logging to > /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... > 02:34:27 Loading workload 'tpch' using exploration strategy 'core' OK (Took: > 22 min 12 sec) > 02:34:35 Loading workload 'tpcds' using exploration strategy 'core' OK (Took: > 22 min 20 sec) > 04:11:40 Loading workload 'functional-query' using exploration strategy > 'exhaustive' OK (Took: 119 min 25 sec) > {noformat} > > This has happened on multiple runs (including some in progress): > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1370/] > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1382/] > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1383/] (missing some > logs due to abort) > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1384/] (in progress) > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1385/] (in progress) > > Dataload creates a SQL script that invalidates each table created using an > "invalidate metadata ${tablename}" command. There are 830 "invalidate > metadata ${tablename}" calls in the invocation of this script (see > IMPALA-6386 for why we do invalidate at the table level). Even so, this > script should execute very quickly. > The impalad.INFO from the 1370 run shows that this script is taking a long > time. The first invalidate metadata for functional tables is at 2:41 and the > last invalidate metadata for this run of the invalidate script is at 3:17. > The invalidate script runs twice. The second run begins at 3:19 and finishes > at 4:11. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)