date:20190201

[jira] [Created] (IMPALA-8158) Use HS2 service to retrieve thrift profiles

2019-02-01 Thread Lars Volker (JIRA)

Lars Volker created IMPALA-8158:
---

 Summary: Use HS2 service to retrieve thrift profiles
 Key: IMPALA-8158
 URL: https://issues.apache.org/jira/browse/IMPALA-8158
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Lars Volker
Assignee: Lars Volker


Once Impyla has been updated, we should retrieve Thrift profiles through HS2 
synchronously instead of scraping the debug web pages.

https://github.com/cloudera/impyla/issues/332



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8158) Use HS2 service to retrieve thrift profiles

2019-02-01 Thread Lars Volker (JIRA)

Lars Volker created IMPALA-8158:
---

 Summary: Use HS2 service to retrieve thrift profiles
 Key: IMPALA-8158
 URL: https://issues.apache.org/jira/browse/IMPALA-8158
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Lars Volker
Assignee: Lars Volker


Once Impyla has been updated, we should retrieve Thrift profiles through HS2 
synchronously instead of scraping the debug web pages.

https://github.com/cloudera/impyla/issues/332



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-8142) ASAN build failure in query_test/test_nested_types.py

2019-02-01 Thread Lars Volker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8142.
-
   Resolution: Duplicate
 Assignee: Lars Volker  (was: Lenisha Gandhi)
Fix Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> ASAN build failure in query_test/test_nested_types.py
> -
>
> Key: IMPALA-8142
> URL: https://issues.apache.org/jira/browse/IMPALA-8142
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, build-failure
> Fix For: Impala 3.2.0
>
>
> From the build log:
> {noformat}
> 05:23:33 === FAILURES 
> ===
> 05:23:33  TestNestedTypes.test_subplan[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 05:23:33 [gw7] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:23:33 query_test/test_nested_types.py:77: in test_subplan
> 05:23:33 self.run_test_case('QueryTest/nested-types-subplan', vector)
> 05:23:33 common/impala_test_suite.py:472: in run_test_case
> 05:23:33 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 05:23:33 common/impala_test_suite.py:699: in __execute_query
> 05:23:33 return impalad_client.execute(query, user=user)
> 05:23:33 common/impala_connection.py:174: in execute
> 05:23:33 return self.__beeswax_client.execute(sql_stmt, user=user)
> 05:23:33 beeswax/impala_beeswax.py:200: in execute
> 05:23:33 result = self.fetch_results(query_string, handle)
> 05:23:33 beeswax/impala_beeswax.py:445: in fetch_results
> 05:23:33 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:23:33 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:23:33 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:23:33 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:23:33 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:23:33 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:23:33 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:23:33 EMESSAGE: TSocket read 0 bytes
> {noformat}
> From {{impalad.ERROR}}:
> {noformat}
> SUMMARY: AddressSanitizer: use-after-poison 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tuple.h:241:13
>  in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) const
> ...
> ==119152==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8142) ASAN build failure in query_test/test_nested_types.py

2019-02-01 Thread Lars Volker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8142.
-
   Resolution: Duplicate
 Assignee: Lars Volker  (was: Lenisha Gandhi)
Fix Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> ASAN build failure in query_test/test_nested_types.py
> -
>
> Key: IMPALA-8142
> URL: https://issues.apache.org/jira/browse/IMPALA-8142
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, build-failure
> Fix For: Impala 3.2.0
>
>
> From the build log:
> {noformat}
> 05:23:33 === FAILURES 
> ===
> 05:23:33  TestNestedTypes.test_subplan[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 05:23:33 [gw7] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:23:33 query_test/test_nested_types.py:77: in test_subplan
> 05:23:33 self.run_test_case('QueryTest/nested-types-subplan', vector)
> 05:23:33 common/impala_test_suite.py:472: in run_test_case
> 05:23:33 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 05:23:33 common/impala_test_suite.py:699: in __execute_query
> 05:23:33 return impalad_client.execute(query, user=user)
> 05:23:33 common/impala_connection.py:174: in execute
> 05:23:33 return self.__beeswax_client.execute(sql_stmt, user=user)
> 05:23:33 beeswax/impala_beeswax.py:200: in execute
> 05:23:33 result = self.fetch_results(query_string, handle)
> 05:23:33 beeswax/impala_beeswax.py:445: in fetch_results
> 05:23:33 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:23:33 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:23:33 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:23:33 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:23:33 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:23:33 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:23:33 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:23:33 EMESSAGE: TSocket read 0 bytes
> {noformat}
> From {{impalad.ERROR}}:
> {noformat}
> SUMMARY: AddressSanitizer: use-after-poison 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tuple.h:241:13
>  in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) const
> ...
> ==119152==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (IMPALA-8142) ASAN build failure in query_test/test_nested_types.py

2019-02-01 Thread Lars Volker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8142:

Affects Version/s: Impala 3.2.0

> ASAN build failure in query_test/test_nested_types.py
> -
>
> Key: IMPALA-8142
> URL: https://issues.apache.org/jira/browse/IMPALA-8142
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lenisha Gandhi
>Priority: Blocker
>  Labels: asan, build-failure
> Fix For: Impala 3.1.0
>
>
> From the build log:
> {noformat}
> 05:23:33 === FAILURES 
> ===
> 05:23:33  TestNestedTypes.test_subplan[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 05:23:33 [gw7] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:23:33 query_test/test_nested_types.py:77: in test_subplan
> 05:23:33 self.run_test_case('QueryTest/nested-types-subplan', vector)
> 05:23:33 common/impala_test_suite.py:472: in run_test_case
> 05:23:33 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 05:23:33 common/impala_test_suite.py:699: in __execute_query
> 05:23:33 return impalad_client.execute(query, user=user)
> 05:23:33 common/impala_connection.py:174: in execute
> 05:23:33 return self.__beeswax_client.execute(sql_stmt, user=user)
> 05:23:33 beeswax/impala_beeswax.py:200: in execute
> 05:23:33 result = self.fetch_results(query_string, handle)
> 05:23:33 beeswax/impala_beeswax.py:445: in fetch_results
> 05:23:33 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:23:33 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:23:33 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:23:33 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:23:33 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:23:33 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:23:33 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:23:33 EMESSAGE: TSocket read 0 bytes
> {noformat}
> From {{impalad.ERROR}}:
> {noformat}
> SUMMARY: AddressSanitizer: use-after-poison 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tuple.h:241:13
>  in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) const
> ...
> ==119152==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2019-02-01 Thread bharath v (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7961:
--
Description: 
When catalog server is under heavy load with concurrent updates to objects, 
queries with SYNC_DDL can fail with the following message.

*User facing error message:*
{noformat}
ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
SYNC_DDL operation after 3 attempts.The operation has been successfully 
executed but its effects may have not been broadcast to all the coordinators.
{noformat}
*Exception from the catalog server log:*
{noformat}
I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 1088
I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 12625
I1031 00:00:49.168851 1131986 jni-util.cc:230] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 3 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)

{noformat}
*What this means*

The Catalog operation is actually successful (the change has been committed to 
HMS and Catalog server cache) but the Catalog server noticed that it is taking 
longer than expected time for it to broadcast the changes (for whatever reason) 
and instead of hanging in there, it fails fast. The coordinators are expected 
to eventually sync up in the background.

*Problem*
 - This violates the contract of the SYNC_DDL query option since the query 
returns early.
 - This is a behavioral regression from pre IMPALA-5058 state where the queries 
would wait forever for SYNC_DDL based changes to propagate.

*Notes*
 - Introduced by IMPALA-5058
 - Based on the occurrences of this issue, we narrowed it down to a specific 
kind of DDLs (see Jira comments).
 - My understanding is that this also applies to the Catalog V2 (or 
LocalCatalog mode) since we still rely on the CatalogServer for DDL 
orchestration and hence it takes this codepath.

  was:
When catalog server is under heavy load with concurrent updates to objects, 
queries with SYNC_DDL can fail with the following message.

*User facing error message:*
{noformat}
ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
SYNC_DDL operation after 3 attempts.The operation has been successfully 
executed but its effects may have not been broadcast to all the coordinators.
{noformat}
*Exception from the catalog server log:*
{noformat}
I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 1088
I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 12625
I1031 00:00:49.168851 1131986 jni-util.cc:230] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 3 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)

{noformat}
*What this means*

The Catalog operation is actually successful (the change has been committed to 
HMS and Catalog server cache) but the Catalog server noticed that it is taking 
longer than expected time for it to broadcast the changes (for whatever reason) 
and instead of hanging in there, it fails fast. The coordinators are expected 
to eventually sync up in the background.

*Problem*
 - This violates the contract of the SYNC_DDL query option since the query 
returns early.
 - This is a behavioral regression from pre IMPALA-5058 state where the queries 
would wait forever for SYNC_DDL based changes to propagate.

*Notes*
 - Usual suspect here is heavily concurrent catalog operations with long 
running DDLs.
 - Introduced by IMPALA-5058
 - My understanding is that this also applies to the Catalog V2 (or 
LocalCatalog mode) since we still rely on the CatalogServer for DDL 
orchestration and hence it takes this codepath.

Please refer to

[jira] [Commented] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2019-02-01 Thread bharath v (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758860#comment-16758860
 ] 

bharath v commented on IMPALA-7961:
---

After digging into this a bit more, the issue here seems to be a race in "IF 
NOT EXISTS" + sync_ddl=true combination. For ex: "create table if not exists 
foo(..)" and the *table foo already exists* in the catalog server.

In such a case we do the following:
{noformat}
Table existingTbl = catalog_.getTableNoThrow(tableName.getDb(), 
tableName.getTbl());
if (params.if_not_exists && existingTbl != null) {
  addSummary(response, "Table already exists.");
  LOG.trace(String.format("Skipping table creation because %s already 
exists and " +
  "IF NOT EXISTS was specified.", tableName));
  existingTbl.getLock().lock();
  try {
addTableToCatalogUpdate(existingTbl, response.getResult());
return false;
  } finally {
existingTbl.getLock().unlock();
  }
}
{noformat}
We add the *{{existingTbl}}* to the DDL response and wait on a topic update 
covering its version.
{noformat}
if (req.isSync_ddl()) {
  
resp.getResult().setVersion(catalog_.waitForSyncDdlVersion(resp.getResult()));
}

 public long waitForSyncDdlVersion(TCatalogUpdateResult result) throws 
CatalogException {
 .
  long topicVersionForUpdates =
  getCoveringTopicUpdateVersion(result.getUpdated_catalog_objects());
..
{noformat}
The catch here is that since this "existingTbl" could've been unchanged for a 
period (no version bumps), it's topic entry could be past 
TOPIC_UPDATE_LOG_GC_FREQUENCY and the entry could be potentially GC'ed, in 
which case, unless there is a version bump on this table, it wouldn't be added 
again to the {{topicUpdateLog_}}. This means that the 
{{waitForSyncDdlVersion()}} would loop until it exhausts retries as nothing 
would add the table to the log unless modified. I could reproduce this with 
aggressive topicUpdate_ GCs using the attached patch.

"create table if not exists" is a specific example and it is quiet possible 
that there are other manifestations of this issue with the same theme of trying 
to add objects to DDL objects to responses without adding them to topicUpdateLog

> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail 
> fast
> ---
>
> Key: IMPALA-7961
> URL: https://issues.apache.org/jira/browse/IMPALA-7961
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
>
> When catalog server is under heavy load with concurrent updates to objects, 
> queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
> SYNC_DDL operation after 3 attempts.The operation has been successfully 
> executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 3 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> 
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed 
> to HMS and Catalog server cache) but the Catalog server noticed that it is 
> taking longer than expected time for it to broadcast the changes (for 
> whatever reason) and instead of hanging in there, it fails fast. The 
> coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query 
> returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the 
> queries would wait forever for SYNC_DDL based changes to propagate.
>

[jira] [Updated] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2019-02-01 Thread bharath v (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7961:
--
Attachment: 0001-Repro-of-IMPALA-7961.patch

> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail 
> fast
> ---
>
> Key: IMPALA-7961
> URL: https://issues.apache.org/jira/browse/IMPALA-7961
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
> Attachments: 0001-Repro-of-IMPALA-7961.patch
>
>
> When catalog server is under heavy load with concurrent updates to objects, 
> queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
> SYNC_DDL operation after 3 attempts.The operation has been successfully 
> executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 3 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> 
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed 
> to HMS and Catalog server cache) but the Catalog server noticed that it is 
> taking longer than expected time for it to broadcast the changes (for 
> whatever reason) and instead of hanging in there, it fails fast. The 
> coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query 
> returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the 
> queries would wait forever for SYNC_DDL based changes to propagate.
> *Notes*
>  - Usual suspect here is heavily concurrent catalog operations with long 
> running DDLs.
>  - Introduced by IMPALA-5058
>  - My understanding is that this also applies to the Catalog V2 (or 
> LocalCatalog mode) since we still rely on the CatalogServer for DDL 
> orchestration and hence it takes this codepath.
> Please refer to the jira comment for technical explanation as to why this is 
> happening (to be updated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8157) Log exceptions from the front end

2019-02-01 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8157:

Summary: Log exceptions from the front end  (was: Log exceptions from the 
FrontEnd)

> Log exceptions from the front end
> -
>
> Key: IMPALA-8157
> URL: https://issues.apache.org/jira/browse/IMPALA-8157
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Fang-Yu Rao
>Priority: Minor
>
> The BE calls into the FE for a variety of operations. Each of these may fail 
> in expected ways (invalid query, say) or unexpected ways (a code change 
> introduces a null pointer exception.)
> At present, the BE logs only the exception, and only at the INFO level. This 
> ticket asks to log all unexpected exceptions at the ERROR level. The basic 
> idea is to extend all FE entry points to do:
> {code:java}
> try {
>   // Do the operation
> } catch (ExpectedException e) {
>   // Don't log expected exceptions
>   throw e;
> } catch (Throwable e) {
>   LOG.error("Something went wrong", e);
>   throw e;
> }
> {code}
> The above code logs all exceptions except for those that are considered 
> expected. The job of this ticket is to:
> * Find all the entry points
> * Identify which, if any, exceptions are expected
> * Add logging code with an error message that identifies the operation
> This pattern was tested ad-hoc to find a bug during development and seems to 
> work fine. As. a result, the change is mostly a matter of the above three 
> steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8156 started by Paul Rogers.
---
> Add format options to the EXPLAIN statement
> ---
>
> Key: IMPALA-8156
> URL: https://issues.apache.org/jira/browse/IMPALA-8156
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The EXPLAIN statement is very basic:
> {code:sql}
> EXPLAIN ;
> {code}
> Example:
> {code:sql}
> EXPLAIN SELECT * FROM alltypes;
> {code}
> Explain does provide some options set as session options:
> {code:sql}
> SET set explain_level=extended;
> EXPLAIN ;
> {code}
> We have often found the need for additional information. For example, it 
> would be very useful to obtain the SELECT statement after view substitution.
> We wish to extend EXPLAIN to allow additional options, while retaining full 
> backward compatibility. The extended syntax is:
> {code:sql}
> EXPLAIN [FORMAT([opt(, opt)*])] ;
> {code}
> This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
> options to be added in the future without the need to define new keywords.
> Options are in the {{name=value}} form with {{name}} as an identifier and 
> {{value}} as a string literal. Both are case-insensitive. Example to set the 
> explain level:
> {code:sql}
> EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
> {code}
> The two options supported at present are:
> * {{level}} - Sets the explain level.
> * {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.
> The {{level}} option overrides the existing session options. If {{level}} is 
> not present, then the session option is used instead. Values are identical to 
> that for the {{SET explain_level=' The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, 
> {{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. 
> Example:
> {noformat}
> functional> explain format(rewritten) SELECT * FROM view_view;
> ++
> | Explain String |
> ++
> | SELECT * FROM /* functional.view_view */ ( |
> | SELECT * FROM /* functional.alltypes_view */ ( |
> | SELECT * FROM functional.alltypes) |
> | )  |
> ++
> {noformat}
> Here, the names in comments are the view names. Views are then expanded 
> inline to show the full extend of the statement. This is very helpful to 
> resolve user issues.
> h4. Comparison with Other SQL Dialects
> The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a 
> vendor extension. MySQL defines {{EXPLAIN}} as:
> {noformat}
> {EXPLAIN | DESCRIBE | DESC}
> [explain_type]
> {explainable_stmt | FOR CONNECTION connection_id}
> explain_type: {
> FORMAT = format_name
> }
> format_name: {
> TRADITIONAL
>   | JSON
> }
> {noformat}
> That is, MySQL also uses the {{FORMAT}} keyword with only two choices.
> SqlServer uses a form much like Impala's present form with no options.
> Postgres uses options and keywords:
> {noformat}
> EXPLAIN [ ( option [, ...] ) ] statement
> EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
> where option can be one of:
> ANALYZE [ boolean ]
> VERBOSE [ boolean ]
> COSTS [ boolean ]
> BUFFERS [ boolean ]
> FORMAT { TEXT | XML | JSON | YAML }
> {noformat}
> Apache Drill uses a series of keywords to express options:
> {noformat}
> explain plan [ including all attributes ]
>  [ with implementation | without implementation ]
>  for  ;
> {noformat}
> We claim that, given the wide variety of vendor implementations, the proposed 
> Impala syntax is reasonable.
> h4. Futures
> IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose 
> to select JSON output using the "format" option:
> {code:sql}
> EXPLAIN FORMAT(format='json') 
> {code}
> The format can be combined other options such as level:
> {code:sql}
> EXPLAIN FORMAT(format='json', level='extended') 
> {code}
> h4. Details
> The key/value syntax is very general, but cumbersome for simple tasks. The 
> {{FORMAT}} option allows a number of simplifications.
> First, for the explain level, each level can be used as a Boolean option:
> {code:sql}
> EXPLAIN FORMAT(extended='true') 
> {code}
> Second, for Boolean options, the value is optional and "true" is assumed:
> {code:sql}
> EXPLAIN FORMAT(EXTENDED) 
> {code}
> Third, if only a value is given, the value is assumed to be for the "format" 
> key (which is not yet supported):
> {code:sql}
> EXPLAIN FORMAT('json') 
> {code}
>

[jira] [Created] (IMPALA-8157) Log exceptions from the FrontEnd

2019-02-01 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-8157:
---

 Summary: Log exceptions from the FrontEnd
 Key: IMPALA-8157
 URL: https://issues.apache.org/jira/browse/IMPALA-8157
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Fang-Yu Rao


The BE calls into the FE for a variety of operations. Each of these may fail in 
expected ways (invalid query, say) or unexpected ways (a code change introduces 
a null pointer exception.)

At present, the BE logs only the exception, and only at the INFO level. This 
ticket asks to log all unexpected exceptions at the ERROR level. The basic idea 
is to extend all FE entry points to do:

{code:java}
try {
  // Do the operation
} catch (ExpectedException e) {
  // Don't log expected exceptions
  throw e;
} catch (Throwable e) {
  LOG.error("Something went wrong", e);
  throw e;
}
{code}

The above code logs all exceptions except for those that are considered 
expected. The job of this ticket is to:

* Find all the entry points
* Identify which, if any, exceptions are expected
* Add logging code with an error message that identifies the operation

This pattern was tested ad-hoc to find a bug during development and seems to 
work fine. As. a result, the change is mostly a matter of the above three steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-8157) Log exceptions from the FrontEnd

2019-02-01 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-8157:
---

 Summary: Log exceptions from the FrontEnd
 Key: IMPALA-8157
 URL: https://issues.apache.org/jira/browse/IMPALA-8157
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Fang-Yu Rao


The BE calls into the FE for a variety of operations. Each of these may fail in 
expected ways (invalid query, say) or unexpected ways (a code change introduces 
a null pointer exception.)

At present, the BE logs only the exception, and only at the INFO level. This 
ticket asks to log all unexpected exceptions at the ERROR level. The basic idea 
is to extend all FE entry points to do:

{code:java}
try {
  // Do the operation
} catch (ExpectedException e) {
  // Don't log expected exceptions
  throw e;
} catch (Throwable e) {
  LOG.error("Something went wrong", e);
  throw e;
}
{code}

The above code logs all exceptions except for those that are considered 
expected. The job of this ticket is to:

* Find all the entry points
* Identify which, if any, exceptions are expected
* Add logging code with an error message that identifies the operation

This pattern was tested ad-hoc to find a bug during development and seems to 
work fine. As. a result, the change is mostly a matter of the above three steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-8156:
---

 Summary: Add format options to the EXPLAIN statement
 Key: IMPALA-8156
 URL: https://issues.apache.org/jira/browse/IMPALA-8156
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Paul Rogers


The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full 
backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] ;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and 
{{value}} as a string literal. Both are case-insensitive. Example to set the 
explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is 
not present, then the session option is used instead.

The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, 
{{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. 
Example:

{noformat}
functional> explain format(rewritten) SELECT * FROM view_view;

++
| Explain String |
++
| SELECT * FROM /* functional.view_view */ ( |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes) |
| )  |
++
{noformat}

Here, the names in comments are the view names. Views are then expanded inline 
to show the full extend of the statement. This is very helpful to resolve user 
issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor 
extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
 [ with implementation | without implementation ]
 for  ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed 
Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to 
select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') 
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') 
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The 
{{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') 
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) 
{code}

Third, if only a value is given, the value is assumed to be for the "format" 
key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') 
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. 
Rather than enforcing a strict set of rules, when given an odd set of rules, 
the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') 
{code}

The short answer here is that options are ambiguous, behavior is undefined, but 
reasonable.

[jira] [Updated] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8156:

Description: 
The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full 
backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] ;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and 
{{value}} as a string literal. Both are case-insensitive. Example to set the 
explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is 
not present, then the session option is used instead. Values are identical to 
that for the {{SET explain_level=' explain format(rewritten) SELECT * FROM view_view;

++
| Explain String |
++
| SELECT * FROM /* functional.view_view */ ( |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes) |
| )  |
++
{noformat}

Here, the names in comments are the view names. Views are then expanded inline 
to show the full extend of the statement. This is very helpful to resolve user 
issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor 
extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
 [ with implementation | without implementation ]
 for  ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed 
Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to 
select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') 
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') 
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The 
{{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') 
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) 
{code}

Third, if only a value is given, the value is assumed to be for the "format" 
key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') 
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. 
Rather than enforcing a strict set of rules, when given an odd set of rules, 
the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') 
{code}

The short answer here is that options are ambiguous, behavior is undefined, but 
reasonable.

  was:
The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to

[jira] [Created] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-8156:
---

 Summary: Add format options to the EXPLAIN statement
 Key: IMPALA-8156
 URL: https://issues.apache.org/jira/browse/IMPALA-8156
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Paul Rogers


The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full 
backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] ;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and 
{{value}} as a string literal. Both are case-insensitive. Example to set the 
explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is 
not present, then the session option is used instead.

The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, 
{{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. 
Example:

{noformat}
functional> explain format(rewritten) SELECT * FROM view_view;

++
| Explain String |
++
| SELECT * FROM /* functional.view_view */ ( |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes) |
| )  |
++
{noformat}

Here, the names in comments are the view names. Views are then expanded inline 
to show the full extend of the statement. This is very helpful to resolve user 
issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor 
extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
 [ with implementation | without implementation ]
 for  ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed 
Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to 
select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') 
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') 
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The 
{{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') 
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) 
{code}

Third, if only a value is given, the value is assumed to be for the "format" 
key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') 
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. 
Rather than enforcing a strict set of rules, when given an odd set of rules, 
the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') 
{code}

The short answer here is that options are ambiguous, behavior is undefined, but 
reasonable.

[jira] [Created] (IMPALA-8155) Switch to Impala-lzo/2.x for Impala-2.x

2019-02-01 Thread Quanlong Huang (JIRA)

Quanlong Huang created IMPALA-8155:
--

 Summary: Switch to Impala-lzo/2.x for Impala-2.x
 Key: IMPALA-8155
 URL: https://issues.apache.org/jira/browse/IMPALA-8155
 Project: IMPALA
  Issue Type: Task
Reporter: Quanlong Huang


Impala-2.x is currently based on Cloudera/Impala-lzo master branch. As it 
updated, builds of Impala-2.x will fail. We need to switch to another branch 
that points to the original commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6479) Update DESCRIBE statement to respect column level privileges

2019-02-01 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6479:
---
Fix Version/s: Impala 2.1

> Update DESCRIBE statement to respect column level privileges
> 
>
> Key: IMPALA-6479
> URL: https://issues.apache.org/jira/browse/IMPALA-6479
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 2.1, Impala 3.0
>
>
> Currently if a user is granted select on a subset of columns on a table, the 
> DESCRIBE command will show them all columns, and the DESCRIBE 
> FORMATTED/EXTENDED is not allowed.
> This change would update the DESCRIBE command that if a user has select on a 
> subset of columns, it will only show the data from the columns the user has 
> access to.  For DESCRIBE FORMATTED/EXTENDED, if a user has some column 
> access, but not all columns, the Location, and View * Text would be removed 
> from the additional metadata.
> The purpose of this change is to increase consumability by allowing tools 
> that allow users to browse data, such a for creating reports, to present only 
> columns they have access to.  There is also a security aspect to this fix by 
> not exposing additional data.  Other statements such a SHOW COLUMN STATS, 
> will be handled by a separate Jira to be opened.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6479) Update DESCRIBE statement to respect column level privileges

2019-02-01 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6479:
---
Fix Version/s: (was: Impala 2.1)

> Update DESCRIBE statement to respect column level privileges
> 
>
> Key: IMPALA-6479
> URL: https://issues.apache.org/jira/browse/IMPALA-6479
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 3.0
>
>
> Currently if a user is granted select on a subset of columns on a table, the 
> DESCRIBE command will show them all columns, and the DESCRIBE 
> FORMATTED/EXTENDED is not allowed.
> This change would update the DESCRIBE command that if a user has select on a 
> subset of columns, it will only show the data from the columns the user has 
> access to.  For DESCRIBE FORMATTED/EXTENDED, if a user has some column 
> access, but not all columns, the Location, and View * Text would be removed 
> from the additional metadata.
> The purpose of this change is to increase consumability by allowing tools 
> that allow users to browse data, such a for creating reports, to present only 
> columns they have access to.  There is also a security aspect to this fix by 
> not exposing additional data.  Other statements such a SHOW COLUMN STATS, 
> will be handled by a separate Jira to be opened.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8154) Disable auth_to_local by default

2019-02-01 Thread Michael Ho (JIRA)

Michael Ho created IMPALA-8154:
--

 Summary: Disable auth_to_local by default
 Key: IMPALA-8154
 URL: https://issues.apache.org/jira/browse/IMPALA-8154
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.1.0, Impala 2.12.0
Reporter: Michael Ho
Assignee: Michael Ho


Before KRPC the local name mapping was done from the principal name entirely, 
however when KRPC is enabled Impala starts to use the system auth_to_local 
rules, "use_system_auth_to_local" is enabled by default. This can cause 
regression in cases where localauth is configured in the krb5.conf. This may 
cause issue for connection between Impalad after [this 
commit|https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d]

The fix is to disable use_system_auth_to_local by default.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8154) Disable auth_to_local by default

2019-02-01 Thread Michael Ho (JIRA)

Michael Ho created IMPALA-8154:
--

 Summary: Disable auth_to_local by default
 Key: IMPALA-8154
 URL: https://issues.apache.org/jira/browse/IMPALA-8154
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.1.0, Impala 2.12.0
Reporter: Michael Ho
Assignee: Michael Ho


Before KRPC the local name mapping was done from the principal name entirely, 
however when KRPC is enabled Impala starts to use the system auth_to_local 
rules, "use_system_auth_to_local" is enabled by default. This can cause 
regression in cases where localauth is configured in the krb5.conf. This may 
cause issue for connection between Impalad after [this 
commit|https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d]

The fix is to disable use_system_auth_to_local by default.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-01 Thread Pooja Nilangekar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758629#comment-16758629
 ] 

Pooja Nilangekar commented on IMPALA-8151:
--

I agree. I believe it would make sense to use sizeof() for all other datatypes 
as well. Since datatypes like TIMESTAMP may be modified in the future. Or would 
it be too much of an overhead? 

> HiveUdfCall assumes StringValue is 16 bytes
> ---
>
> Key: IMPALA-8151
> URL: https://issues.apache.org/jira/browse/IMPALA-8151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: crash
>
> HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
> {code}
>   switch (GetChild(i)->type().type) {
> case TYPE_BOOLEAN:
> case TYPE_TINYINT:
>   // Using explicit sizes helps the compiler unroll memcpy
>   memcpy(input_ptr, v, 1);
>   break;
> case TYPE_SMALLINT:
>   memcpy(input_ptr, v, 2);
>   break;
> case TYPE_INT:
> case TYPE_FLOAT:
>   memcpy(input_ptr, v, 4);
>   break;
> case TYPE_BIGINT:
> case TYPE_DOUBLE:
>   memcpy(input_ptr, v, 8);
>   break;
> case TYPE_TIMESTAMP:
> case TYPE_STRING:
> case TYPE_VARCHAR:
>   memcpy(input_ptr, v, 16);
>   break;
> default:
>   DCHECK(false) << "NYI";
>   }
> {code}
> STRING and VARCHAR were only 16 bytes because of padding. This padding is 
> removed by IMPALA-7367, so this will read past the end of the actual value. 
> This could in theory lead to a crash.
> We need to change the value, but we should probably also switch to 
> sizeof(StringValue) so that it doesn't get broken by similar changes in 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-01 Thread Andrew Sherman (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758628#comment-16758628
 ] 

Andrew Sherman commented on IMPALA-6263:


I will take a look, assigning to me

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-01 Thread Andrew Sherman (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-6263:
--

Assignee: Andrew Sherman

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-02-01 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8137.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-02-01 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8137.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-01 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758613#comment-16758613
 ] 

Michael Ho commented on IMPALA-6263:


Yet another instance seen recently about it. Again, this is a run with 
thrift-0.11. [~asherman], are you interested in looking more into it ?

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8153) Impala Doc: Add a section on Admission Debug page to Web UI doc

2019-02-01 Thread Alex Rodoni (JIRA)

Alex Rodoni created IMPALA-8153:
---

 Summary: Impala Doc: Add a section on Admission Debug page to Web 
UI doc
 Key: IMPALA-8153
 URL: https://issues.apache.org/jira/browse/IMPALA-8153
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-02-01 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758592#comment-16758592
 ] 

ASF subversion and git services commented on IMPALA-8137:
-

Commit 6291d6063fe4ff9c483b60b8d9fc254298a51473 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6291d60 ]

IMPALA-8137: [DOCS] Order By does not happens on one node

Change-Id: If8d7bf26fffaf93982e67f8bc8f37742c81fda39
Reviewed-on: http://gerrit.cloudera.org:8080/12330
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 


> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8153) Impala Doc: Add a section on Admission Debug page to Web UI doc

2019-02-01 Thread Alex Rodoni (JIRA)

Alex Rodoni created IMPALA-8153:
---

 Summary: Impala Doc: Add a section on Admission Debug page to Web 
UI doc
 Key: IMPALA-8153
 URL: https://issues.apache.org/jira/browse/IMPALA-8153
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IMPALA-7934) Switch to using Java 8's Base64 impl for incremental stats encoding

2019-02-01 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758590#comment-16758590
 ] 

ASF subversion and git services commented on IMPALA-7934:
-

Commit b0942296ab5f24660473abc218d45978fc402d81 in impala's branch 
refs/heads/master from Fredy Wijaya
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b094229 ]

IMPALA-7934: Switch to java.util.Base64 implementation

It is shown that java.util.Base64 implementation seems to have better
performance compared to Apache Commons Codec's Base64 implementation,
which can benefit operations, such as incremental stats. This patch
switches the implementation of Base64 from Apache Commons
Codec to java.util.Base64 implementation.

This is the JMH benchmark result comparing java.util.Base64 vs Commons
Codec's Base64:

Result "base64.Base64Benchmark.javaBase64":
  31.149 ±(99.9%) 1.567 ms/op [Average]
  (min, avg, max) = (27.564, 31.149, 34.675), stdev = 2.091
  CI (99.9%): [29.583, 32.716] (assumes normal distribution)

Result "base64.Base64Benchmark.codecBase64":
  65.921 ±(99.9%) 4.762 ms/op [Average]
  (min, avg, max) = (58.072, 65.921, 80.470), stdev = 6.357
  CI (99.9%): [61.159, 70.683] (assumes normal distribution)

BenchmarkMode  Cnt   Score   Error Units
Base64Benchmark.javaBase64   avgt   25  31.149 ± 1.567 ms/op
Base64Benchmark.codecBase64  avgt   25  65.921 ± 4.762 ms/op

Testing:
- Ran all FE tests
- Created a table with incremental stats without a patch and read it
  with the patch

Change-Id: I2d43d4a4f073a800d963ce4c77f21c9efa8471ac
Reviewed-on: http://gerrit.cloudera.org:8080/12250
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Switch to using Java 8's Base64 impl for incremental stats encoding
> ---
>
> Key: IMPALA-7934
> URL: https://issues.apache.org/jira/browse/IMPALA-7934
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: Fredy Wijaya
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 3.2.0
>
> Attachments: base64.png
>
>
> Incremental stats are compressed and Base64 encoded before they are chunked 
> and written to the HMS' partition parameters map. When they are read back, we 
> need to Base64 decode and decompress. 
> For certain incremental stats heavy tables, we noticed that a significant 
> amount of time is spent in these base64 classes (see the attached image for 
> the stack. Unfortunately, I don't have the text version of it).
> Java 8 comes with its own Base64 implementation and that has shown much 
> better perf results [1] compared to apache codec's impl. So consider 
> switching to Java 8's base64 impl.
>  [1] http://java-performance.info/base64-encoding-and-decoding-performance/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7867) Expose collection interfaces, not implementations

2019-02-01 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758591#comment-16758591
 ] 

ASF subversion and git services commented on IMPALA-7867:
-

Commit 396f542eda32dd92e80edbeb216a4cdeb7fe0ace in impala's branch 
refs/heads/master from paul-rogers
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=396f542 ]

IMPALA-7867 (Part 5): Collection cleanup in analyzer

Continues the work to clean up the code to:

* Use collection interfaces for variable and function declarations,
* Replace Guava newArrayList(), etc. calls with the direct
  use of Java collection classes.
* Clean up unused imports and add override annotations.

This commit cleans up remaining issues in the analyzer now that the
other modules use collection interfaces.

Tests: this is purely a refactoring with no functional change. Reran
existing tests.

Change-Id: I1d1c37beb926896f5e00faab0b06034aebb835c5
Reviewed-on: http://gerrit.cloudera.org:8080/12266
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Expose collection interfaces, not implementations
> -
>
> Key: IMPALA-7867
> URL: https://issues.apache.org/jira/browse/IMPALA-7867
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> When using Java collections, a common Java best practice is to expose the 
> collection interface, but hide the implementation choice. This pattern allows 
> us to start with a generic implementation (an {{ArrayList}}, say), but evolve 
> to a more specific implementation to achieve certain goals (a {{LinkedList}} 
> or {{ImmutableList}}, say.)
> For whatever reason, the Impala FE code exposes {{ArrayList}}, {{HashMap}} 
> and other implementation choices as variable types and in method signatures.
> This ticket tracks a gradual process of revising the declarations and 
> signatures to use the interfaces {{List}} instead of the implementation 
> {{ArrayList}}.
> Also, the FE code appears to predate Java 7, so that declarations of lists 
> tend to be in one of two forms (with or without Guava):
> {code:java}
> foo1 = new ArrayList();
> foo2 = Lists.newArrayList();
> {code}
> Since Java 7, the preferred form is:
> {code:java}
> foo = new ArrayList<>();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-1730) Reduce the window of spinning for Parquet and base-sequence scanners

2019-02-01 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/IMPALA-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758595#comment-16758595
]

ASF subversion and git services commented on IMPALA-1730:
-

Commit a8e30506aafef14646d95a56fb87cf7c28d259d6 in impala's branch
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a8e3050 ]

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child:
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work. The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

set RUNTIME_FILTER_WAIT_TIME_MS=1;
select count(*)
from customer
join customer_address on c_current_addr_sk = ca_address_sk
where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the
old-fashioned:
grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x +=
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1

[jira] [Commented] (IMPALA-7980) High system CPU time usage (and waste) when runtime filters filter out files

2019-02-01 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/IMPALA-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758593#comment-16758593
]

ASF subversion and git services commented on IMPALA-7980:
-