[jira] [Resolved] (IMPALA-5152) Frontend requests metadata for one table at a time in the query

2018-02-21 Thread Alexander Behm (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Behm resolved IMPALA-5152.

   Resolution: Fixed
Fix Version/s: Impala 2.12.0

commit 8ea1ce87e2150c843b4da15f9d42b87006e6ffca
Author: Alex Behm 
Date:   Fri Apr 7 09:58:40 2017 -0700

IMPALA-5152: Introduce metadata loading phase

Reworks the collection and loading of missing metadata
when compiling a statement. Introduces a new
metadata-loading phase between parsing and analysis.
Summary of the new compilation flow:
1. Parse statement.
2. Collect all table references from the parsed
   statement and generate a list of tables that need
   to be loaded for analysis to succeed.
3. Request missing metadata and wait for it to arrive.
   As views become loaded we expand the set of required
   tables based on the view definitions.
   This step populates a statement-local table cache
   that contains all loaded tables relevant to the
   statement.
4. Create a new Analyzer with the table cache and
   analyze the statement. During analysis only the
   table cache is consulted for table metadata, the
   ImpaladCatalog is not used for that purpose anymore.
5. Authorize the statement.
6. Plan generation as usual.

The intent of the existing code was to collect all tables
missing metadata during analysis, load the metadata, and then
re-analyze the statement (and repeat those steps until all
metadata is loaded).
Unfortunately, the relevant code was hard-to-follow, subtle
and not well tested, and therefore it was broken in several
ways over the course of time. For example, the introduction
of path analysis for nested types subtly broke the intended
behavior, and there are other similar examples.

The serial table loading observed in the JIRA was caused by the
following code in the resolution of table references:
for (all path interpretations) {
  try {
// Try to resolve the path; might call getTable() which
// throws for nonexistent tables.
  } catch (AnalysisException e) {
if (analyzer.hasMissingTbls()) throw e;
  }
}

The following example illustrates the problem:
SELECT * FROM a.b, x.y
When resolving the path "a.b" we consider that "a" could be a
database or a table. Similarly, "b" could be a table or a
nested collection.
If the path resolution for "a.b" adds a missing table entry,
then the path resolution for "x.y" could exit prematurely,
without trying the other path interpretations that would
lead to adding the expected missing table. So effectively,
the tables end up being loaded one-by-one.

Testing:
- A core/hdfs run succeeded
- No new tests were added because the existing functional tests
  provide good coverage of various metadata loading scenarios.
- The issue reported in IMPALA-5152 is basically impossible now.
  Adding FE unit tests for that bug specifically would require
  ugly changes to the new code to enable such testing.

Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55
Reviewed-on: http://gerrit.cloudera.org:8080/8958
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins


> Frontend requests metadata for one table at a time in the query 
> 
>
> Key: IMPALA-5152
> URL: https://issues.apache.org/jira/browse/IMPALA-5152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Alexander Behm
>Priority: Critical
>  Labels: Performance, frontend
> Fix For: Impala 2.12.0
>
>
> It appears that the Frontend serializes loading metadata for missing tables 
> in a query, Catalog log shows that the queue size is alway 0. 
> Query below references  9 tables and metadata is loaded for one table at a 
> time. 
> {code}
> explain select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as 
> store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave 
> ,stddev_samp(ss_quantity) as store_sales_quantitystdev 
> ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov 
> ,count(sr_return_quantity) as store_returns_quantitycount 
> ,avg(sr_return_quantity) as store_returns_quantityave 
> ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev 
> ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount 
> ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as 
> 

[jira] [Created] (IMPALA-6563) test_compact_catalog_updates failing to connect client

2018-02-21 Thread Bikramjeet Vig (JIRA)
Bikramjeet Vig created IMPALA-6563:
--

 Summary: test_compact_catalog_updates failing to connect client
 Key: IMPALA-6563
 URL: https://issues.apache.org/jira/browse/IMPALA-6563
 Project: IMPALA
  Issue Type: Bug
Reporter: Bikramjeet Vig
 Fix For: Impala 2.12.0


test_compact_catalog_updates fails with 

{noformat}
custom_cluster/test_compact_catalog_updates.py:52: in 
test_compact_catalog_topic_updates
client1.close()
E   UnboundLocalError: local variable 'client1' referenced before assignment
{noformat}

the test first starts up a cluster and tires to create a client. The logs 
indicate that impalads started without error so I believe its the client that 
fails to connect.

tail of INFO logs from one of the impalad
{noformat}
I0220 11:08:09.632342  4268 impala-server.cc:2041] Impala has started.
W0220 11:08:09.956959  4748 HiveConf.java:2886] HiveConf of name 
hive.access.conf.url does not exist
I0220 11:08:10.057112  4774 impala-server.cc:1754] Connection from client 
127.0.0.1:40735 closed, closing 1 associated session(s)
I0220 11:08:10.295311  4748 impala-server.cc:1363] Catalog topic update applied 
with version: 1131 new min catalog object version: 2
I0220 11:08:10.994792  4747 thrift-util.cc:123] TSocket::read() recv() Connection reset by peer
I0220 11:08:10.994799  4746 thrift-util.cc:123] TSocket::read() recv() Connection reset by peer
I0220 11:08:10.994946  4748 thrift-util.cc:123] TSocket::read() recv() Connection reset by peer
I0220 11:08:10.994967  4747 thrift-util.cc:123] TAcceptQueueServer client died: 
ECONNRESET
I0220 11:08:10.995034  4746 thrift-util.cc:123] TAcceptQueueServer client died: 
ECONNRESET
I0220 11:08:10.995067  4748 thrift-util.cc:123] TAcceptQueueServer client died: 
ECONNRESET
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6561) metadata ops counter should not increase for Src table in CreateTableLike

2018-02-21 Thread Juan Yu (JIRA)
Juan Yu created IMPALA-6561:
---

 Summary: metadata ops counter should not increase for Src table in 
CreateTableLike
 Key: IMPALA-6561
 URL: https://issues.apache.org/jira/browse/IMPALA-6561
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Juan Yu


metadata ops counter is increased in getExistingTable() so catalog incidentally 
increases the counter for src table of CreateTableLike

http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1775



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6558) Show summary of catalog cache

2018-02-21 Thread Juan Yu (JIRA)
Juan Yu created IMPALA-6558:
---

 Summary: Show summary of catalog cache
 Key: IMPALA-6558
 URL: https://issues.apache.org/jira/browse/IMPALA-6558
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Juan Yu


Show summary of catalog cache, including:

what tables are completely cached

Are these tables or views

When is last time compute stats run against that table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6557) Show details of recent topic delta update

2018-02-21 Thread Juan Yu (JIRA)
Juan Yu created IMPALA-6557:
---

 Summary: Show details of recent topic delta update
 Key: IMPALA-6557
 URL: https://issues.apache.org/jira/browse/IMPALA-6557
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Juan Yu


Details of metadata topic delta updates are very useful for troubleshooting. 

E.g. Num of tables and list of tables in recent topic updates help us know if 
there are many  tables being updated concurrently.

Are several large tables often updated together?

Are catalog cache version much higher than coordinator cache version?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6556) Show in-flight DDLs and what tables have been loading on Catalog WebUI

2018-02-21 Thread Juan Yu (JIRA)
Juan Yu created IMPALA-6556:
---

 Summary: Show in-flight DDLs and what tables have been loading on 
Catalog WebUI
 Key: IMPALA-6556
 URL: https://issues.apache.org/jira/browse/IMPALA-6556
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Juan Yu


This helps users to know how many DDLs are running. How many tables have been 
loading. 

So users could know if a query is hung or just waiting for metadata. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6555) Clean up relationship between DiskIoMgr::min_buffer_size_ and BufferPool::min_buffer_len_

2018-02-21 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-6555:
-

 Summary: Clean up relationship between DiskIoMgr::min_buffer_size_ 
and BufferPool::min_buffer_len_
 Key: IMPALA-6555
 URL: https://issues.apache.org/jira/browse/IMPALA-6555
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong
Assignee: Tim Armstrong


They are always the same value in practice, obtained from --min_buffer_size. We 
should probably get rid of DiskIoMgr::min_buffer_size_ and fix up all 
references to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6424) REFRESH right after invalidate metadata loads file metadata twice

2018-02-21 Thread Dimitris Tsirogiannis (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimitris Tsirogiannis resolved IMPALA-6424.
---
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

Change-Id: Ie41a734493dcea0e36d6b051966f1d0302907dee
Reviewed-on:

[http://gerrit.cloudera.org:8080/9224]


Reviewed-by: Dimitris Tsirogiannis <

[dtsirogian...@cloudera.com|mailto:dtsirogian...@cloudera.com]

>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
1 file changed, 23 insertions(+), 5 deletions(-)

> REFRESH right after invalidate metadata  loads file metadata twice
> -
>
> Key: IMPALA-6424
> URL: https://issues.apache.org/jira/browse/IMPALA-6424
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Juan Yu
>Assignee: Dimitris Tsirogiannis
>Priority: Critical
> Fix For: Impala 2.12.0
>
>
> Compare with normal REFRESH, REFRESH right after Invalidate metadata  
> load file metadata twice and takes 2x time. The second refresh seems 
> redundant.
> I0119 07:46:41.107390 26758 CatalogServiceCatalog.java:1518] Invalidating 
> table metadata: s3.catalog_sales
> I0119 07:46:43.002053 26309 catalog-server.cc:331] Publishing update : 
> TABLE:s3.catalog_sales@1166
> I0119 07:46:43.002068 26309 catalog-server.cc:331] Publishing update : 
> CATALOG:b0f520a5e2ab4056:b7e2e045fa39d625@1166
> I0119 07:46:46.696725 26758 TableLoadingMgr.java:70] Loading metadata for 
> table: s3.catalog_sales
> I0119 07:46:46.696781 26758 TableLoadingMgr.java:72] Remaining items in 
> queue: 0. Loads in progress: 1
> I0119 07:46:46.696857 27023 TableLoader.java:58] Loading metadata for: 
> s3.catalog_sales
> I0119 07:46:46.713222 27023 HdfsTable.java:1206] Fetching partition metadata 
> from the Metastore: s3.catalog_sales
> I0119 07:46:46.905102 27023 HdfsTable.java:1210] Fetched partition metadata 
> from the Metastore: s3.catalog_sales
>  *I0119 07:46:46.939254 27023 HdfsTable.java:834] Loading file and block 
> metadata for 1837 paths for table s3.catalog_sales using a thread pool of 
> size 20*
> I0119 07:47:00.426975 27023 HdfsTable.java:874] Loaded file and block 
> metadata for s3.catalog_sales
> I0119 07:47:00.427062 27023 TableLoader.java:97] Loaded metadata for: 
> s3.catalog_sales
> I0119 07:47:00.427243 26758 CatalogServiceCatalog.java:1433] Refreshing table 
> metadata: s3.catalog_sales
> I0119 07:47:00.441572 26758 HdfsTable.java:1193] Incrementally loading table 
> metadata for: s3.catalog_sales
>  *I0119 07:47:00.456437 26758 HdfsTable.java:834] Loading file and block 
> metadata for 1837 paths for table s3.catalog_sales using a thread pool of 
> size 20*
> I0119 07:47:14.038097 26758 HdfsTable.java:874] Loaded file and block 
> metadata for s3.catalog_sales
> I0119 07:47:14.038132 26758 HdfsTable.java:1203] Incrementally loaded table 
> metadata for: s3.catalog_sales
> I0119 07:47:14.038179 26758 CatalogServiceCatalog.java:1456] Refreshed table 
> metadata: s3.catalog_sales
> I0119 07:47:14.062625 26309 catalog-server.cc:331] Publishing update : 
> TABLE:s3.catalog_sales@1168
> I0119 07:47:14.062645 26309 catalog-server.cc:331] Publishing update : 
> CATALOG:b0f520a5e2ab4056:b7e2e045fa39d625@1168



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6551) Update TPCDS columns from DOUBLE to DECIMAL for Kudu

2018-02-21 Thread Grant Henke (JIRA)
Grant Henke created IMPALA-6551:
---

 Summary: Update TPCDS columns from DOUBLE to DECIMAL for Kudu
 Key: IMPALA-6551
 URL: https://issues.apache.org/jira/browse/IMPALA-6551
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Impala 2.12.0
Reporter: Grant Henke


Once the Kudu Decimal support patch is in (IMPALA-5752), we need to change some 
of the columns from DOUBLE to DECIMAL for Kudu for TPCDS and possibly TPCH. The 
expected results need to be updated as well. The expected results should be the 
same as for other storage types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6552) Add tests for Parquet stats filtering with +0/-0 edge cases

2018-02-21 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-6552:
-

 Summary: Add tests for Parquet stats filtering with +0/-0 edge 
cases
 Key: IMPALA-6552
 URL: https://issues.apache.org/jira/browse/IMPALA-6552
 Project: IMPALA
  Issue Type: Test
  Components: Backend
Reporter: Tim Armstrong


Related to IMPALA-6527, we should add test coverage for floating point parquet 
stats that ensure +0 and -0 in stats fields are handled correctly. We're in the 
clear right now since we just use regular comparison operators, which don't 
distinguish between the two zeros.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)