[jira] [Resolved] (IMPALA-5152) Frontend requests metadata for one table at a time in the query
[ https://issues.apache.org/jira/browse/IMPALA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Behm resolved IMPALA-5152. Resolution: Fixed Fix Version/s: Impala 2.12.0 commit 8ea1ce87e2150c843b4da15f9d42b87006e6ffca Author: Alex BehmDate: Fri Apr 7 09:58:40 2017 -0700 IMPALA-5152: Introduce metadata loading phase Reworks the collection and loading of missing metadata when compiling a statement. Introduces a new metadata-loading phase between parsing and analysis. Summary of the new compilation flow: 1. Parse statement. 2. Collect all table references from the parsed statement and generate a list of tables that need to be loaded for analysis to succeed. 3. Request missing metadata and wait for it to arrive. As views become loaded we expand the set of required tables based on the view definitions. This step populates a statement-local table cache that contains all loaded tables relevant to the statement. 4. Create a new Analyzer with the table cache and analyze the statement. During analysis only the table cache is consulted for table metadata, the ImpaladCatalog is not used for that purpose anymore. 5. Authorize the statement. 6. Plan generation as usual. The intent of the existing code was to collect all tables missing metadata during analysis, load the metadata, and then re-analyze the statement (and repeat those steps until all metadata is loaded). Unfortunately, the relevant code was hard-to-follow, subtle and not well tested, and therefore it was broken in several ways over the course of time. For example, the introduction of path analysis for nested types subtly broke the intended behavior, and there are other similar examples. The serial table loading observed in the JIRA was caused by the following code in the resolution of table references: for (all path interpretations) { try { // Try to resolve the path; might call getTable() which // throws for nonexistent tables. } catch (AnalysisException e) { if (analyzer.hasMissingTbls()) throw e; } } The following example illustrates the problem: SELECT * FROM a.b, x.y When resolving the path "a.b" we consider that "a" could be a database or a table. Similarly, "b" could be a table or a nested collection. If the path resolution for "a.b" adds a missing table entry, then the path resolution for "x.y" could exit prematurely, without trying the other path interpretations that would lead to adding the expected missing table. So effectively, the tables end up being loaded one-by-one. Testing: - A core/hdfs run succeeded - No new tests were added because the existing functional tests provide good coverage of various metadata loading scenarios. - The issue reported in IMPALA-5152 is basically impossible now. Adding FE unit tests for that bug specifically would require ugly changes to the new code to enable such testing. Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Reviewed-on: http://gerrit.cloudera.org:8080/8958 Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins > Frontend requests metadata for one table at a time in the query > > > Key: IMPALA-5152 > URL: https://issues.apache.org/jira/browse/IMPALA-5152 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0 >Reporter: Mostafa Mokhtar >Assignee: Alexander Behm >Priority: Critical > Labels: Performance, frontend > Fix For: Impala 2.12.0 > > > It appears that the Frontend serializes loading metadata for missing tables > in a query, Catalog log shows that the queue size is alway 0. > Query below references 9 tables and metadata is loaded for one table at a > time. > {code} > explain select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as > store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave > ,stddev_samp(ss_quantity) as store_sales_quantitystdev > ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov > ,count(sr_return_quantity) as store_returns_quantitycount > ,avg(sr_return_quantity) as store_returns_quantityave > ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev > ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as > store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount > ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as >
[jira] [Created] (IMPALA-6563) test_compact_catalog_updates failing to connect client
Bikramjeet Vig created IMPALA-6563: -- Summary: test_compact_catalog_updates failing to connect client Key: IMPALA-6563 URL: https://issues.apache.org/jira/browse/IMPALA-6563 Project: IMPALA Issue Type: Bug Reporter: Bikramjeet Vig Fix For: Impala 2.12.0 test_compact_catalog_updates fails with {noformat} custom_cluster/test_compact_catalog_updates.py:52: in test_compact_catalog_topic_updates client1.close() E UnboundLocalError: local variable 'client1' referenced before assignment {noformat} the test first starts up a cluster and tires to create a client. The logs indicate that impalads started without error so I believe its the client that fails to connect. tail of INFO logs from one of the impalad {noformat} I0220 11:08:09.632342 4268 impala-server.cc:2041] Impala has started. W0220 11:08:09.956959 4748 HiveConf.java:2886] HiveConf of name hive.access.conf.url does not exist I0220 11:08:10.057112 4774 impala-server.cc:1754] Connection from client 127.0.0.1:40735 closed, closing 1 associated session(s) I0220 11:08:10.295311 4748 impala-server.cc:1363] Catalog topic update applied with version: 1131 new min catalog object version: 2 I0220 11:08:10.994792 4747 thrift-util.cc:123] TSocket::read() recv() Connection reset by peer I0220 11:08:10.994799 4746 thrift-util.cc:123] TSocket::read() recv() Connection reset by peer I0220 11:08:10.994946 4748 thrift-util.cc:123] TSocket::read() recv() Connection reset by peer I0220 11:08:10.994967 4747 thrift-util.cc:123] TAcceptQueueServer client died: ECONNRESET I0220 11:08:10.995034 4746 thrift-util.cc:123] TAcceptQueueServer client died: ECONNRESET I0220 11:08:10.995067 4748 thrift-util.cc:123] TAcceptQueueServer client died: ECONNRESET {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6561) metadata ops counter should not increase for Src table in CreateTableLike
Juan Yu created IMPALA-6561: --- Summary: metadata ops counter should not increase for Src table in CreateTableLike Key: IMPALA-6561 URL: https://issues.apache.org/jira/browse/IMPALA-6561 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Juan Yu metadata ops counter is increased in getExistingTable() so catalog incidentally increases the counter for src table of CreateTableLike http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1775 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6558) Show summary of catalog cache
Juan Yu created IMPALA-6558: --- Summary: Show summary of catalog cache Key: IMPALA-6558 URL: https://issues.apache.org/jira/browse/IMPALA-6558 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Juan Yu Show summary of catalog cache, including: what tables are completely cached Are these tables or views When is last time compute stats run against that table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6557) Show details of recent topic delta update
Juan Yu created IMPALA-6557: --- Summary: Show details of recent topic delta update Key: IMPALA-6557 URL: https://issues.apache.org/jira/browse/IMPALA-6557 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Juan Yu Details of metadata topic delta updates are very useful for troubleshooting. E.g. Num of tables and list of tables in recent topic updates help us know if there are many tables being updated concurrently. Are several large tables often updated together? Are catalog cache version much higher than coordinator cache version? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6556) Show in-flight DDLs and what tables have been loading on Catalog WebUI
Juan Yu created IMPALA-6556: --- Summary: Show in-flight DDLs and what tables have been loading on Catalog WebUI Key: IMPALA-6556 URL: https://issues.apache.org/jira/browse/IMPALA-6556 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Juan Yu This helps users to know how many DDLs are running. How many tables have been loading. So users could know if a query is hung or just waiting for metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6555) Clean up relationship between DiskIoMgr::min_buffer_size_ and BufferPool::min_buffer_len_
Tim Armstrong created IMPALA-6555: - Summary: Clean up relationship between DiskIoMgr::min_buffer_size_ and BufferPool::min_buffer_len_ Key: IMPALA-6555 URL: https://issues.apache.org/jira/browse/IMPALA-6555 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong Assignee: Tim Armstrong They are always the same value in practice, obtained from --min_buffer_size. We should probably get rid of DiskIoMgr::min_buffer_size_ and fix up all references to it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6424) REFRESH right after invalidate metadata loads file metadata twice
[ https://issues.apache.org/jira/browse/IMPALA-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimitris Tsirogiannis resolved IMPALA-6424. --- Resolution: Fixed Fix Version/s: Impala 2.12.0 Change-Id: Ie41a734493dcea0e36d6b051966f1d0302907dee Reviewed-on: [http://gerrit.cloudera.org:8080/9224] Reviewed-by: Dimitris Tsirogiannis < [dtsirogian...@cloudera.com|mailto:dtsirogian...@cloudera.com] > Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 1 file changed, 23 insertions(+), 5 deletions(-) > REFRESH right after invalidate metadata loads file metadata twice > - > > Key: IMPALA-6424 > URL: https://issues.apache.org/jira/browse/IMPALA-6424 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Juan Yu >Assignee: Dimitris Tsirogiannis >Priority: Critical > Fix For: Impala 2.12.0 > > > Compare with normal REFRESH, REFRESH right after Invalidate metadata > load file metadata twice and takes 2x time. The second refresh seems > redundant. > I0119 07:46:41.107390 26758 CatalogServiceCatalog.java:1518] Invalidating > table metadata: s3.catalog_sales > I0119 07:46:43.002053 26309 catalog-server.cc:331] Publishing update : > TABLE:s3.catalog_sales@1166 > I0119 07:46:43.002068 26309 catalog-server.cc:331] Publishing update : > CATALOG:b0f520a5e2ab4056:b7e2e045fa39d625@1166 > I0119 07:46:46.696725 26758 TableLoadingMgr.java:70] Loading metadata for > table: s3.catalog_sales > I0119 07:46:46.696781 26758 TableLoadingMgr.java:72] Remaining items in > queue: 0. Loads in progress: 1 > I0119 07:46:46.696857 27023 TableLoader.java:58] Loading metadata for: > s3.catalog_sales > I0119 07:46:46.713222 27023 HdfsTable.java:1206] Fetching partition metadata > from the Metastore: s3.catalog_sales > I0119 07:46:46.905102 27023 HdfsTable.java:1210] Fetched partition metadata > from the Metastore: s3.catalog_sales > *I0119 07:46:46.939254 27023 HdfsTable.java:834] Loading file and block > metadata for 1837 paths for table s3.catalog_sales using a thread pool of > size 20* > I0119 07:47:00.426975 27023 HdfsTable.java:874] Loaded file and block > metadata for s3.catalog_sales > I0119 07:47:00.427062 27023 TableLoader.java:97] Loaded metadata for: > s3.catalog_sales > I0119 07:47:00.427243 26758 CatalogServiceCatalog.java:1433] Refreshing table > metadata: s3.catalog_sales > I0119 07:47:00.441572 26758 HdfsTable.java:1193] Incrementally loading table > metadata for: s3.catalog_sales > *I0119 07:47:00.456437 26758 HdfsTable.java:834] Loading file and block > metadata for 1837 paths for table s3.catalog_sales using a thread pool of > size 20* > I0119 07:47:14.038097 26758 HdfsTable.java:874] Loaded file and block > metadata for s3.catalog_sales > I0119 07:47:14.038132 26758 HdfsTable.java:1203] Incrementally loaded table > metadata for: s3.catalog_sales > I0119 07:47:14.038179 26758 CatalogServiceCatalog.java:1456] Refreshed table > metadata: s3.catalog_sales > I0119 07:47:14.062625 26309 catalog-server.cc:331] Publishing update : > TABLE:s3.catalog_sales@1168 > I0119 07:47:14.062645 26309 catalog-server.cc:331] Publishing update : > CATALOG:b0f520a5e2ab4056:b7e2e045fa39d625@1168 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6551) Update TPCDS columns from DOUBLE to DECIMAL for Kudu
Grant Henke created IMPALA-6551: --- Summary: Update TPCDS columns from DOUBLE to DECIMAL for Kudu Key: IMPALA-6551 URL: https://issues.apache.org/jira/browse/IMPALA-6551 Project: IMPALA Issue Type: Improvement Affects Versions: Impala 2.12.0 Reporter: Grant Henke Once the Kudu Decimal support patch is in (IMPALA-5752), we need to change some of the columns from DOUBLE to DECIMAL for Kudu for TPCDS and possibly TPCH. The expected results need to be updated as well. The expected results should be the same as for other storage types. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-6552) Add tests for Parquet stats filtering with +0/-0 edge cases
Tim Armstrong created IMPALA-6552: - Summary: Add tests for Parquet stats filtering with +0/-0 edge cases Key: IMPALA-6552 URL: https://issues.apache.org/jira/browse/IMPALA-6552 Project: IMPALA Issue Type: Test Components: Backend Reporter: Tim Armstrong Related to IMPALA-6527, we should add test coverage for floating point parquet stats that ensure +0 and -0 in stats fields are handled correctly. We're in the clear right now since we just use regular comparison operators, which don't distinguish between the two zeros. -- This message was sent by Atlassian JIRA (v7.6.3#76005)