[jira] [Created] (IMPALA-8158) Use HS2 service to retrieve thrift profiles

2019-02-01 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-8158:
---

 Summary: Use HS2 service to retrieve thrift profiles
 Key: IMPALA-8158
 URL: https://issues.apache.org/jira/browse/IMPALA-8158
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Lars Volker
Assignee: Lars Volker


Once Impyla has been updated, we should retrieve Thrift profiles through HS2 
synchronously instead of scraping the debug web pages.

https://github.com/cloudera/impyla/issues/332



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8158) Use HS2 service to retrieve thrift profiles

2019-02-01 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-8158:
---

 Summary: Use HS2 service to retrieve thrift profiles
 Key: IMPALA-8158
 URL: https://issues.apache.org/jira/browse/IMPALA-8158
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Lars Volker
Assignee: Lars Volker


Once Impyla has been updated, we should retrieve Thrift profiles through HS2 
synchronously instead of scraping the debug web pages.

https://github.com/cloudera/impyla/issues/332



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8142) ASAN build failure in query_test/test_nested_types.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8142.
-
   Resolution: Duplicate
 Assignee: Lars Volker  (was: Lenisha Gandhi)
Fix Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> ASAN build failure in query_test/test_nested_types.py
> -
>
> Key: IMPALA-8142
> URL: https://issues.apache.org/jira/browse/IMPALA-8142
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, build-failure
> Fix For: Impala 3.2.0
>
>
> From the build log:
> {noformat}
> 05:23:33 === FAILURES 
> ===
> 05:23:33  TestNestedTypes.test_subplan[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 05:23:33 [gw7] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:23:33 query_test/test_nested_types.py:77: in test_subplan
> 05:23:33 self.run_test_case('QueryTest/nested-types-subplan', vector)
> 05:23:33 common/impala_test_suite.py:472: in run_test_case
> 05:23:33 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 05:23:33 common/impala_test_suite.py:699: in __execute_query
> 05:23:33 return impalad_client.execute(query, user=user)
> 05:23:33 common/impala_connection.py:174: in execute
> 05:23:33 return self.__beeswax_client.execute(sql_stmt, user=user)
> 05:23:33 beeswax/impala_beeswax.py:200: in execute
> 05:23:33 result = self.fetch_results(query_string, handle)
> 05:23:33 beeswax/impala_beeswax.py:445: in fetch_results
> 05:23:33 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:23:33 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:23:33 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:23:33 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:23:33 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:23:33 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:23:33 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:23:33 EMESSAGE: TSocket read 0 bytes
> {noformat}
> From {{impalad.ERROR}}:
> {noformat}
> SUMMARY: AddressSanitizer: use-after-poison 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tuple.h:241:13
>  in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) const
> ...
> ==119152==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8142) ASAN build failure in query_test/test_nested_types.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8142.
-
   Resolution: Duplicate
 Assignee: Lars Volker  (was: Lenisha Gandhi)
Fix Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> ASAN build failure in query_test/test_nested_types.py
> -
>
> Key: IMPALA-8142
> URL: https://issues.apache.org/jira/browse/IMPALA-8142
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, build-failure
> Fix For: Impala 3.2.0
>
>
> From the build log:
> {noformat}
> 05:23:33 === FAILURES 
> ===
> 05:23:33  TestNestedTypes.test_subplan[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 05:23:33 [gw7] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:23:33 query_test/test_nested_types.py:77: in test_subplan
> 05:23:33 self.run_test_case('QueryTest/nested-types-subplan', vector)
> 05:23:33 common/impala_test_suite.py:472: in run_test_case
> 05:23:33 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 05:23:33 common/impala_test_suite.py:699: in __execute_query
> 05:23:33 return impalad_client.execute(query, user=user)
> 05:23:33 common/impala_connection.py:174: in execute
> 05:23:33 return self.__beeswax_client.execute(sql_stmt, user=user)
> 05:23:33 beeswax/impala_beeswax.py:200: in execute
> 05:23:33 result = self.fetch_results(query_string, handle)
> 05:23:33 beeswax/impala_beeswax.py:445: in fetch_results
> 05:23:33 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:23:33 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:23:33 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:23:33 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:23:33 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:23:33 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:23:33 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:23:33 EMESSAGE: TSocket read 0 bytes
> {noformat}
> From {{impalad.ERROR}}:
> {noformat}
> SUMMARY: AddressSanitizer: use-after-poison 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tuple.h:241:13
>  in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) const
> ...
> ==119152==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-8142) ASAN build failure in query_test/test_nested_types.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8142:

Affects Version/s: Impala 3.2.0

> ASAN build failure in query_test/test_nested_types.py
> -
>
> Key: IMPALA-8142
> URL: https://issues.apache.org/jira/browse/IMPALA-8142
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lenisha Gandhi
>Priority: Blocker
>  Labels: asan, build-failure
> Fix For: Impala 3.1.0
>
>
> From the build log:
> {noformat}
> 05:23:33 === FAILURES 
> ===
> 05:23:33  TestNestedTypes.test_subplan[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 05:23:33 [gw7] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:23:33 query_test/test_nested_types.py:77: in test_subplan
> 05:23:33 self.run_test_case('QueryTest/nested-types-subplan', vector)
> 05:23:33 common/impala_test_suite.py:472: in run_test_case
> 05:23:33 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 05:23:33 common/impala_test_suite.py:699: in __execute_query
> 05:23:33 return impalad_client.execute(query, user=user)
> 05:23:33 common/impala_connection.py:174: in execute
> 05:23:33 return self.__beeswax_client.execute(sql_stmt, user=user)
> 05:23:33 beeswax/impala_beeswax.py:200: in execute
> 05:23:33 result = self.fetch_results(query_string, handle)
> 05:23:33 beeswax/impala_beeswax.py:445: in fetch_results
> 05:23:33 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:23:33 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:23:33 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:23:33 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:23:33 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:23:33 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:23:33 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:23:33 EMESSAGE: TSocket read 0 bytes
> {noformat}
> From {{impalad.ERROR}}:
> {noformat}
> SUMMARY: AddressSanitizer: use-after-poison 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tuple.h:241:13
>  in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) const
> ...
> ==119152==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2019-02-01 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7961:
--
Description: 
When catalog server is under heavy load with concurrent updates to objects, 
queries with SYNC_DDL can fail with the following message.

*User facing error message:*
{noformat}
ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
SYNC_DDL operation after 3 attempts.The operation has been successfully 
executed but its effects may have not been broadcast to all the coordinators.
{noformat}
*Exception from the catalog server log:*
{noformat}
I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 1088
I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 12625
I1031 00:00:49.168851 1131986 jni-util.cc:230] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 3 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)

{noformat}
*What this means*

The Catalog operation is actually successful (the change has been committed to 
HMS and Catalog server cache) but the Catalog server noticed that it is taking 
longer than expected time for it to broadcast the changes (for whatever reason) 
and instead of hanging in there, it fails fast. The coordinators are expected 
to eventually sync up in the background.

*Problem*
 - This violates the contract of the SYNC_DDL query option since the query 
returns early.
 - This is a behavioral regression from pre IMPALA-5058 state where the queries 
would wait forever for SYNC_DDL based changes to propagate.

*Notes*
 - Introduced by IMPALA-5058
 - Based on the occurrences of this issue, we narrowed it down to a specific 
kind of DDLs (see Jira comments).
 - My understanding is that this also applies to the Catalog V2 (or 
LocalCatalog mode) since we still rely on the CatalogServer for DDL 
orchestration and hence it takes this codepath.

  was:
When catalog server is under heavy load with concurrent updates to objects, 
queries with SYNC_DDL can fail with the following message.

*User facing error message:*
{noformat}
ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
SYNC_DDL operation after 3 attempts.The operation has been successfully 
executed but its effects may have not been broadcast to all the coordinators.
{noformat}
*Exception from the catalog server log:*
{noformat}
I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 1088
I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 12625
I1031 00:00:49.168851 1131986 jni-util.cc:230] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 3 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)

{noformat}
*What this means*

The Catalog operation is actually successful (the change has been committed to 
HMS and Catalog server cache) but the Catalog server noticed that it is taking 
longer than expected time for it to broadcast the changes (for whatever reason) 
and instead of hanging in there, it fails fast. The coordinators are expected 
to eventually sync up in the background.

*Problem*
 - This violates the contract of the SYNC_DDL query option since the query 
returns early.
 - This is a behavioral regression from pre IMPALA-5058 state where the queries 
would wait forever for SYNC_DDL based changes to propagate.

*Notes*
 - Usual suspect here is heavily concurrent catalog operations with long 
running DDLs.
 - Introduced by IMPALA-5058
 - My understanding is that this also applies to the Catalog V2 (or 
LocalCatalog mode) since we still rely on the CatalogServer for DDL 
orchestration and hence it takes this codepath.

Please refer to 

[jira] [Commented] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2019-02-01 Thread bharath v (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758860#comment-16758860
 ] 

bharath v commented on IMPALA-7961:
---

After digging into this a bit more, the issue here seems to be a race in "IF 
NOT EXISTS" + sync_ddl=true combination. For ex: "create table if not exists 
foo(..)" and the *table foo already exists* in the catalog server.

In such a case we do the following:
{noformat}
Table existingTbl = catalog_.getTableNoThrow(tableName.getDb(), 
tableName.getTbl());
if (params.if_not_exists && existingTbl != null) {
  addSummary(response, "Table already exists.");
  LOG.trace(String.format("Skipping table creation because %s already 
exists and " +
  "IF NOT EXISTS was specified.", tableName));
  existingTbl.getLock().lock();
  try {
addTableToCatalogUpdate(existingTbl, response.getResult());
return false;
  } finally {
existingTbl.getLock().unlock();
  }
}
{noformat}
We add the *{{existingTbl}}* to the DDL response and wait on a topic update 
covering its version.
{noformat}
if (req.isSync_ddl()) {
  
resp.getResult().setVersion(catalog_.waitForSyncDdlVersion(resp.getResult()));
}

 public long waitForSyncDdlVersion(TCatalogUpdateResult result) throws 
CatalogException {
 .
  long topicVersionForUpdates =
  getCoveringTopicUpdateVersion(result.getUpdated_catalog_objects());
..
{noformat}
The catch here is that since this "existingTbl" could've been unchanged for a 
period (no version bumps), it's topic entry could be past 
TOPIC_UPDATE_LOG_GC_FREQUENCY and the entry could be potentially GC'ed, in 
which case, unless there is a version bump on this table, it wouldn't be added 
again to the {{topicUpdateLog_}}. This means that the 
{{waitForSyncDdlVersion()}} would loop until it exhausts retries as nothing 
would add the table to the log unless modified. I could reproduce this with 
aggressive topicUpdate_ GCs using the attached patch.

"create table if not exists" is a specific example and it is quiet possible 
that there are other manifestations of this issue with the same theme of trying 
to add objects to DDL objects to responses without adding them to topicUpdateLog

> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail 
> fast
> ---
>
> Key: IMPALA-7961
> URL: https://issues.apache.org/jira/browse/IMPALA-7961
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
>
> When catalog server is under heavy load with concurrent updates to objects, 
> queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
> SYNC_DDL operation after 3 attempts.The operation has been successfully 
> executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 3 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> 
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed 
> to HMS and Catalog server cache) but the Catalog server noticed that it is 
> taking longer than expected time for it to broadcast the changes (for 
> whatever reason) and instead of hanging in there, it fails fast. The 
> coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query 
> returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the 
> queries would wait forever for SYNC_DDL based changes to propagate.
> 

[jira] [Updated] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2019-02-01 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7961:
--
Attachment: 0001-Repro-of-IMPALA-7961.patch

> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail 
> fast
> ---
>
> Key: IMPALA-7961
> URL: https://issues.apache.org/jira/browse/IMPALA-7961
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Critical
> Attachments: 0001-Repro-of-IMPALA-7961.patch
>
>
> When catalog server is under heavy load with concurrent updates to objects, 
> queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
> SYNC_DDL operation after 3 attempts.The operation has been successfully 
> executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 3 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> 
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed 
> to HMS and Catalog server cache) but the Catalog server noticed that it is 
> taking longer than expected time for it to broadcast the changes (for 
> whatever reason) and instead of hanging in there, it fails fast. The 
> coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query 
> returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the 
> queries would wait forever for SYNC_DDL based changes to propagate.
> *Notes*
>  - Usual suspect here is heavily concurrent catalog operations with long 
> running DDLs.
>  - Introduced by IMPALA-5058
>  - My understanding is that this also applies to the Catalog V2 (or 
> LocalCatalog mode) since we still rely on the CatalogServer for DDL 
> orchestration and hence it takes this codepath.
> Please refer to the jira comment for technical explanation as to why this is 
> happening (to be updated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8157) Log exceptions from the front end

2019-02-01 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8157:

Summary: Log exceptions from the front end  (was: Log exceptions from the 
FrontEnd)

> Log exceptions from the front end
> -
>
> Key: IMPALA-8157
> URL: https://issues.apache.org/jira/browse/IMPALA-8157
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Fang-Yu Rao
>Priority: Minor
>
> The BE calls into the FE for a variety of operations. Each of these may fail 
> in expected ways (invalid query, say) or unexpected ways (a code change 
> introduces a null pointer exception.)
> At present, the BE logs only the exception, and only at the INFO level. This 
> ticket asks to log all unexpected exceptions at the ERROR level. The basic 
> idea is to extend all FE entry points to do:
> {code:java}
> try {
>   // Do the operation
> } catch (ExpectedException e) {
>   // Don't log expected exceptions
>   throw e;
> } catch (Throwable e) {
>   LOG.error("Something went wrong", e);
>   throw e;
> }
> {code}
> The above code logs all exceptions except for those that are considered 
> expected. The job of this ticket is to:
> * Find all the entry points
> * Identify which, if any, exceptions are expected
> * Add logging code with an error message that identifies the operation
> This pattern was tested ad-hoc to find a bug during development and seems to 
> work fine. As. a result, the change is mostly a matter of the above three 
> steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8156 started by Paul Rogers.
---
> Add format options to the EXPLAIN statement
> ---
>
> Key: IMPALA-8156
> URL: https://issues.apache.org/jira/browse/IMPALA-8156
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The EXPLAIN statement is very basic:
> {code:sql}
> EXPLAIN ;
> {code}
> Example:
> {code:sql}
> EXPLAIN SELECT * FROM alltypes;
> {code}
> Explain does provide some options set as session options:
> {code:sql}
> SET set explain_level=extended;
> EXPLAIN ;
> {code}
> We have often found the need for additional information. For example, it 
> would be very useful to obtain the SELECT statement after view substitution.
> We wish to extend EXPLAIN to allow additional options, while retaining full 
> backward compatibility. The extended syntax is:
> {code:sql}
> EXPLAIN [FORMAT([opt(, opt)*])] ;
> {code}
> This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
> options to be added in the future without the need to define new keywords.
> Options are in the {{name=value}} form with {{name}} as an identifier and 
> {{value}} as a string literal. Both are case-insensitive. Example to set the 
> explain level:
> {code:sql}
> EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
> {code}
> The two options supported at present are:
> * {{level}} - Sets the explain level.
> * {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.
> The {{level}} option overrides the existing session options. If {{level}} is 
> not present, then the session option is used instead. Values are identical to 
> that for the {{SET explain_level=' The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, 
> {{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. 
> Example:
> {noformat}
> functional> explain format(rewritten) SELECT * FROM view_view;
> ++
> | Explain String |
> ++
> | SELECT * FROM /* functional.view_view */ ( |
> | SELECT * FROM /* functional.alltypes_view */ ( |
> | SELECT * FROM functional.alltypes) |
> | )  |
> ++
> {noformat}
> Here, the names in comments are the view names. Views are then expanded 
> inline to show the full extend of the statement. This is very helpful to 
> resolve user issues.
> h4. Comparison with Other SQL Dialects
> The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a 
> vendor extension. MySQL defines {{EXPLAIN}} as:
> {noformat}
> {EXPLAIN | DESCRIBE | DESC}
> [explain_type]
> {explainable_stmt | FOR CONNECTION connection_id}
> explain_type: {
> FORMAT = format_name
> }
> format_name: {
> TRADITIONAL
>   | JSON
> }
> {noformat}
> That is, MySQL also uses the {{FORMAT}} keyword with only two choices.
> SqlServer uses a form much like Impala's present form with no options.
> Postgres uses options and keywords:
> {noformat}
> EXPLAIN [ ( option [, ...] ) ] statement
> EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
> where option can be one of:
> ANALYZE [ boolean ]
> VERBOSE [ boolean ]
> COSTS [ boolean ]
> BUFFERS [ boolean ]
> FORMAT { TEXT | XML | JSON | YAML }
> {noformat}
> Apache Drill uses a series of keywords to express options:
> {noformat}
> explain plan [ including all attributes ]
>  [ with implementation | without implementation ]
>  for  ;
> {noformat}
> We claim that, given the wide variety of vendor implementations, the proposed 
> Impala syntax is reasonable.
> h4. Futures
> IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose 
> to select JSON output using the "format" option:
> {code:sql}
> EXPLAIN FORMAT(format='json') 
> {code}
> The format can be combined other options such as level:
> {code:sql}
> EXPLAIN FORMAT(format='json', level='extended') 
> {code}
> h4. Details
> The key/value syntax is very general, but cumbersome for simple tasks. The 
> {{FORMAT}} option allows a number of simplifications.
> First, for the explain level, each level can be used as a Boolean option:
> {code:sql}
> EXPLAIN FORMAT(extended='true') 
> {code}
> Second, for Boolean options, the value is optional and "true" is assumed:
> {code:sql}
> EXPLAIN FORMAT(EXTENDED) 
> {code}
> Third, if only a value is given, the value is assumed to be for the "format" 
> key (which is not yet supported):
> {code:sql}
> EXPLAIN FORMAT('json') 
> {code}
> 

[jira] [Created] (IMPALA-8157) Log exceptions from the FrontEnd

2019-02-01 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8157:
---

 Summary: Log exceptions from the FrontEnd
 Key: IMPALA-8157
 URL: https://issues.apache.org/jira/browse/IMPALA-8157
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Fang-Yu Rao


The BE calls into the FE for a variety of operations. Each of these may fail in 
expected ways (invalid query, say) or unexpected ways (a code change introduces 
a null pointer exception.)

At present, the BE logs only the exception, and only at the INFO level. This 
ticket asks to log all unexpected exceptions at the ERROR level. The basic idea 
is to extend all FE entry points to do:

{code:java}
try {
  // Do the operation
} catch (ExpectedException e) {
  // Don't log expected exceptions
  throw e;
} catch (Throwable e) {
  LOG.error("Something went wrong", e);
  throw e;
}
{code}

The above code logs all exceptions except for those that are considered 
expected. The job of this ticket is to:

* Find all the entry points
* Identify which, if any, exceptions are expected
* Add logging code with an error message that identifies the operation

This pattern was tested ad-hoc to find a bug during development and seems to 
work fine. As. a result, the change is mostly a matter of the above three steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8157) Log exceptions from the FrontEnd

2019-02-01 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8157:
---

 Summary: Log exceptions from the FrontEnd
 Key: IMPALA-8157
 URL: https://issues.apache.org/jira/browse/IMPALA-8157
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Fang-Yu Rao


The BE calls into the FE for a variety of operations. Each of these may fail in 
expected ways (invalid query, say) or unexpected ways (a code change introduces 
a null pointer exception.)

At present, the BE logs only the exception, and only at the INFO level. This 
ticket asks to log all unexpected exceptions at the ERROR level. The basic idea 
is to extend all FE entry points to do:

{code:java}
try {
  // Do the operation
} catch (ExpectedException e) {
  // Don't log expected exceptions
  throw e;
} catch (Throwable e) {
  LOG.error("Something went wrong", e);
  throw e;
}
{code}

The above code logs all exceptions except for those that are considered 
expected. The job of this ticket is to:

* Find all the entry points
* Identify which, if any, exceptions are expected
* Add logging code with an error message that identifies the operation

This pattern was tested ad-hoc to find a bug during development and seems to 
work fine. As. a result, the change is mostly a matter of the above three steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8156:
---

 Summary: Add format options to the EXPLAIN statement
 Key: IMPALA-8156
 URL: https://issues.apache.org/jira/browse/IMPALA-8156
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Paul Rogers


The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full 
backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] ;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and 
{{value}} as a string literal. Both are case-insensitive. Example to set the 
explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is 
not present, then the session option is used instead.

The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, 
{{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. 
Example:

{noformat}
functional> explain format(rewritten) SELECT * FROM view_view;

++
| Explain String |
++
| SELECT * FROM /* functional.view_view */ ( |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes) |
| )  |
++
{noformat}

Here, the names in comments are the view names. Views are then expanded inline 
to show the full extend of the statement. This is very helpful to resolve user 
issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor 
extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
 [ with implementation | without implementation ]
 for  ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed 
Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to 
select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') 
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') 
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The 
{{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') 
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) 
{code}

Third, if only a value is given, the value is assumed to be for the "format" 
key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') 
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. 
Rather than enforcing a strict set of rules, when given an odd set of rules, 
the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') 
{code}

The short answer here is that options are ambiguous, behavior is undefined, but 
reasonable.




[jira] [Updated] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8156:

Description: 
The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full 
backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] ;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and 
{{value}} as a string literal. Both are case-insensitive. Example to set the 
explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is 
not present, then the session option is used instead. Values are identical to 
that for the {{SET explain_level=' explain format(rewritten) SELECT * FROM view_view;

++
| Explain String |
++
| SELECT * FROM /* functional.view_view */ ( |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes) |
| )  |
++
{noformat}

Here, the names in comments are the view names. Views are then expanded inline 
to show the full extend of the statement. This is very helpful to resolve user 
issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor 
extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
 [ with implementation | without implementation ]
 for  ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed 
Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to 
select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') 
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') 
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The 
{{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') 
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) 
{code}

Third, if only a value is given, the value is assumed to be for the "format" 
key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') 
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. 
Rather than enforcing a strict set of rules, when given an odd set of rules, 
the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') 
{code}

The short answer here is that options are ambiguous, behavior is undefined, but 
reasonable.

  was:
The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to 

[jira] [Created] (IMPALA-8156) Add format options to the EXPLAIN statement

2019-02-01 Thread Paul Rogers (JIRA)
Paul Rogers created IMPALA-8156:
---

 Summary: Add format options to the EXPLAIN statement
 Key: IMPALA-8156
 URL: https://issues.apache.org/jira/browse/IMPALA-8156
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Paul Rogers


The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN ;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN ;
{code}

We have often found the need for additional information. For example, it would 
be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full 
backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] ;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of 
options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and 
{{value}} as a string literal. Both are case-insensitive. Example to set the 
explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is 
not present, then the session option is used instead.

The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, 
{{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. 
Example:

{noformat}
functional> explain format(rewritten) SELECT * FROM view_view;

++
| Explain String |
++
| SELECT * FROM /* functional.view_view */ ( |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes) |
| )  |
++
{noformat}

Here, the names in comments are the view names. Views are then expanded inline 
to show the full extend of the statement. This is very helpful to resolve user 
issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor 
extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
[explain_type]
{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
FORMAT = format_name
}

format_name: {
TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
 [ with implementation | without implementation ]
 for  ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed 
Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to 
select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') 
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') 
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The 
{{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') 
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) 
{code}

Third, if only a value is given, the value is assumed to be for the "format" 
key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') 
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. 
Rather than enforcing a strict set of rules, when given an odd set of rules, 
the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') 
{code}

The short answer here is that options are ambiguous, behavior is undefined, but 
reasonable.




[jira] [Created] (IMPALA-8155) Switch to Impala-lzo/2.x for Impala-2.x

2019-02-01 Thread Quanlong Huang (JIRA)
Quanlong Huang created IMPALA-8155:
--

 Summary: Switch to Impala-lzo/2.x for Impala-2.x
 Key: IMPALA-8155
 URL: https://issues.apache.org/jira/browse/IMPALA-8155
 Project: IMPALA
  Issue Type: Task
Reporter: Quanlong Huang


Impala-2.x is currently based on Cloudera/Impala-lzo master branch. As it 
updated, builds of Impala-2.x will fail. We need to switch to another branch 
that points to the original commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6479) Update DESCRIBE statement to respect column level privileges

2019-02-01 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6479:
---
Fix Version/s: Impala 2.1

> Update DESCRIBE statement to respect column level privileges
> 
>
> Key: IMPALA-6479
> URL: https://issues.apache.org/jira/browse/IMPALA-6479
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 2.1, Impala 3.0
>
>
> Currently if a user is granted select on a subset of columns on a table, the 
> DESCRIBE command will show them all columns, and the DESCRIBE 
> FORMATTED/EXTENDED is not allowed.
> This change would update the DESCRIBE command that if a user has select on a 
> subset of columns, it will only show the data from the columns the user has 
> access to.  For DESCRIBE FORMATTED/EXTENDED, if a user has some column 
> access, but not all columns, the Location, and View * Text would be removed 
> from the additional metadata.
> The purpose of this change is to increase consumability by allowing tools 
> that allow users to browse data, such a for creating reports, to present only 
> columns they have access to.  There is also a security aspect to this fix by 
> not exposing additional data.  Other statements such a SHOW COLUMN STATS, 
> will be handled by a separate Jira to be opened.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6479) Update DESCRIBE statement to respect column level privileges

2019-02-01 Thread Quanlong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6479:
---
Fix Version/s: (was: Impala 2.1)

> Update DESCRIBE statement to respect column level privileges
> 
>
> Key: IMPALA-6479
> URL: https://issues.apache.org/jira/browse/IMPALA-6479
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 3.0
>
>
> Currently if a user is granted select on a subset of columns on a table, the 
> DESCRIBE command will show them all columns, and the DESCRIBE 
> FORMATTED/EXTENDED is not allowed.
> This change would update the DESCRIBE command that if a user has select on a 
> subset of columns, it will only show the data from the columns the user has 
> access to.  For DESCRIBE FORMATTED/EXTENDED, if a user has some column 
> access, but not all columns, the Location, and View * Text would be removed 
> from the additional metadata.
> The purpose of this change is to increase consumability by allowing tools 
> that allow users to browse data, such a for creating reports, to present only 
> columns they have access to.  There is also a security aspect to this fix by 
> not exposing additional data.  Other statements such a SHOW COLUMN STATS, 
> will be handled by a separate Jira to be opened.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8154) Disable auth_to_local by default

2019-02-01 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8154:
--

 Summary: Disable auth_to_local by default
 Key: IMPALA-8154
 URL: https://issues.apache.org/jira/browse/IMPALA-8154
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.1.0, Impala 2.12.0
Reporter: Michael Ho
Assignee: Michael Ho


Before KRPC the local name mapping was done from the principal name entirely, 
however when KRPC is enabled Impala starts to use the system auth_to_local 
rules, "use_system_auth_to_local" is enabled by default. This can cause 
regression in cases where localauth is configured in the krb5.conf. This may 
cause issue for connection between Impalad after [this 
commit|https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d]

The fix is to disable use_system_auth_to_local by default.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8154) Disable auth_to_local by default

2019-02-01 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8154:
--

 Summary: Disable auth_to_local by default
 Key: IMPALA-8154
 URL: https://issues.apache.org/jira/browse/IMPALA-8154
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.1.0, Impala 2.12.0
Reporter: Michael Ho
Assignee: Michael Ho


Before KRPC the local name mapping was done from the principal name entirely, 
however when KRPC is enabled Impala starts to use the system auth_to_local 
rules, "use_system_auth_to_local" is enabled by default. This can cause 
regression in cases where localauth is configured in the krb5.conf. This may 
cause issue for connection between Impalad after [this 
commit|https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d]

The fix is to disable use_system_auth_to_local by default.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-01 Thread Pooja Nilangekar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758629#comment-16758629
 ] 

Pooja Nilangekar commented on IMPALA-8151:
--

I agree. I believe it would make sense to use sizeof() for all other datatypes 
as well. Since datatypes like TIMESTAMP may be modified in the future. Or would 
it be too much of an overhead? 

> HiveUdfCall assumes StringValue is 16 bytes
> ---
>
> Key: IMPALA-8151
> URL: https://issues.apache.org/jira/browse/IMPALA-8151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: crash
>
> HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
> {code}
>   switch (GetChild(i)->type().type) {
> case TYPE_BOOLEAN:
> case TYPE_TINYINT:
>   // Using explicit sizes helps the compiler unroll memcpy
>   memcpy(input_ptr, v, 1);
>   break;
> case TYPE_SMALLINT:
>   memcpy(input_ptr, v, 2);
>   break;
> case TYPE_INT:
> case TYPE_FLOAT:
>   memcpy(input_ptr, v, 4);
>   break;
> case TYPE_BIGINT:
> case TYPE_DOUBLE:
>   memcpy(input_ptr, v, 8);
>   break;
> case TYPE_TIMESTAMP:
> case TYPE_STRING:
> case TYPE_VARCHAR:
>   memcpy(input_ptr, v, 16);
>   break;
> default:
>   DCHECK(false) << "NYI";
>   }
> {code}
> STRING and VARCHAR were only 16 bytes because of padding. This padding is 
> removed by IMPALA-7367, so this will read past the end of the actual value. 
> This could in theory lead to a crash.
> We need to change the value, but we should probably also switch to 
> sizeof(StringValue) so that it doesn't get broken by similar changes in 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-01 Thread Andrew Sherman (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758628#comment-16758628
 ] 

Andrew Sherman commented on IMPALA-6263:


I will take a look, assigning to me

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-01 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-6263:
--

Assignee: Andrew Sherman

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-02-01 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8137.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-02-01 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8137.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-01 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758613#comment-16758613
 ] 

Michael Ho commented on IMPALA-6263:


Yet another instance seen recently about it. Again, this is a run with 
thrift-0.11. [~asherman], are you interested in looking more into it ?

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8153) Impala Doc: Add a section on Admission Debug page to Web UI doc

2019-02-01 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8153:
---

 Summary: Impala Doc: Add a section on Admission Debug page to Web 
UI doc
 Key: IMPALA-8153
 URL: https://issues.apache.org/jira/browse/IMPALA-8153
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8137) Order by docs incorrectly state that order by happens on one node

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758592#comment-16758592
 ] 

ASF subversion and git services commented on IMPALA-8137:
-

Commit 6291d6063fe4ff9c483b60b8d9fc254298a51473 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6291d60 ]

IMPALA-8137: [DOCS] Order By does not happens on one node

Change-Id: If8d7bf26fffaf93982e67f8bc8f37742c81fda39
Reviewed-on: http://gerrit.cloudera.org:8080/12330
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 


> Order by docs incorrectly state that order by happens on one node
> -
>
> Key: IMPALA-8137
> URL: https://issues.apache.org/jira/browse/IMPALA-8137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_order_by.html
> "because the entire result set must be produced and transferred to one node 
> before the sorting can happen." is incorrect. If there is an "ORDER BY" 
> clause in a select block, then first data is sorted locally by each impala 
> daemon, then streamed to the coordinator, which merges the sorted result sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8153) Impala Doc: Add a section on Admission Debug page to Web UI doc

2019-02-01 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-8153:
---

 Summary: Impala Doc: Add a section on Admission Debug page to Web 
UI doc
 Key: IMPALA-8153
 URL: https://issues.apache.org/jira/browse/IMPALA-8153
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-7934) Switch to using Java 8's Base64 impl for incremental stats encoding

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758590#comment-16758590
 ] 

ASF subversion and git services commented on IMPALA-7934:
-

Commit b0942296ab5f24660473abc218d45978fc402d81 in impala's branch 
refs/heads/master from Fredy Wijaya
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b094229 ]

IMPALA-7934: Switch to java.util.Base64 implementation

It is shown that java.util.Base64 implementation seems to have better
performance compared to Apache Commons Codec's Base64 implementation,
which can benefit operations, such as incremental stats. This patch
switches the implementation of Base64 from Apache Commons
Codec to java.util.Base64 implementation.

This is the JMH benchmark result comparing java.util.Base64 vs Commons
Codec's Base64:

Result "base64.Base64Benchmark.javaBase64":
  31.149 ±(99.9%) 1.567 ms/op [Average]
  (min, avg, max) = (27.564, 31.149, 34.675), stdev = 2.091
  CI (99.9%): [29.583, 32.716] (assumes normal distribution)

Result "base64.Base64Benchmark.codecBase64":
  65.921 ±(99.9%) 4.762 ms/op [Average]
  (min, avg, max) = (58.072, 65.921, 80.470), stdev = 6.357
  CI (99.9%): [61.159, 70.683] (assumes normal distribution)

BenchmarkMode  Cnt   Score   Error Units
Base64Benchmark.javaBase64   avgt   25  31.149 ± 1.567 ms/op
Base64Benchmark.codecBase64  avgt   25  65.921 ± 4.762 ms/op

Testing:
- Ran all FE tests
- Created a table with incremental stats without a patch and read it
  with the patch

Change-Id: I2d43d4a4f073a800d963ce4c77f21c9efa8471ac
Reviewed-on: http://gerrit.cloudera.org:8080/12250
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Switch to using Java 8's Base64 impl for incremental stats encoding
> ---
>
> Key: IMPALA-7934
> URL: https://issues.apache.org/jira/browse/IMPALA-7934
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: Fredy Wijaya
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 3.2.0
>
> Attachments: base64.png
>
>
> Incremental stats are compressed and Base64 encoded before they are chunked 
> and written to the HMS' partition parameters map. When they are read back, we 
> need to Base64 decode and decompress. 
> For certain incremental stats heavy tables, we noticed that a significant 
> amount of time is spent in these base64 classes (see the attached image for 
> the stack. Unfortunately, I don't have the text version of it).
> Java 8 comes with its own Base64 implementation and that has shown much 
> better perf results [1] compared to apache codec's impl. So consider 
> switching to Java 8's base64 impl.
>  [1] http://java-performance.info/base64-encoding-and-decoding-performance/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7867) Expose collection interfaces, not implementations

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758591#comment-16758591
 ] 

ASF subversion and git services commented on IMPALA-7867:
-

Commit 396f542eda32dd92e80edbeb216a4cdeb7fe0ace in impala's branch 
refs/heads/master from paul-rogers
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=396f542 ]

IMPALA-7867 (Part 5): Collection cleanup in analyzer

Continues the work to clean up the code to:

* Use collection interfaces for variable and function declarations,
* Replace Guava newArrayList(), etc. calls with the direct
  use of Java collection classes.
* Clean up unused imports and add override annotations.

This commit cleans up remaining issues in the analyzer now that the
other modules use collection interfaces.

Tests: this is purely a refactoring with no functional change. Reran
existing tests.

Change-Id: I1d1c37beb926896f5e00faab0b06034aebb835c5
Reviewed-on: http://gerrit.cloudera.org:8080/12266
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Expose collection interfaces, not implementations
> -
>
> Key: IMPALA-7867
> URL: https://issues.apache.org/jira/browse/IMPALA-7867
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> When using Java collections, a common Java best practice is to expose the 
> collection interface, but hide the implementation choice. This pattern allows 
> us to start with a generic implementation (an {{ArrayList}}, say), but evolve 
> to a more specific implementation to achieve certain goals (a {{LinkedList}} 
> or {{ImmutableList}}, say.)
> For whatever reason, the Impala FE code exposes {{ArrayList}}, {{HashMap}} 
> and other implementation choices as variable types and in method signatures.
> This ticket tracks a gradual process of revising the declarations and 
> signatures to use the interfaces {{List}} instead of the implementation 
> {{ArrayList}}.
> Also, the FE code appears to predate Java 7, so that declarations of lists 
> tend to be in one of two forms (with or without Guava):
> {code:java}
> foo1 = new ArrayList();
> foo2 = Lists.newArrayList();
> {code}
> Since Java 7, the preferred form is:
> {code:java}
> foo = new ArrayList<>();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1730) Reduce the window of spinning for Parquet and base-sequence scanners

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758595#comment-16758595
 ] 

ASF subversion and git services commented on IMPALA-1730:
-

Commit a8e30506aafef14646d95a56fb87cf7c28d259d6 in impala's branch 
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a8e3050 ]

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1

[jira] [Commented] (IMPALA-7980) High system CPU time usage (and waste) when runtime filters filter out files

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758593#comment-16758593
 ] 

ASF subversion and git services commented on IMPALA-7980:
-

Commit a8e30506aafef14646d95a56fb87cf7c28d259d6 in impala's branch 
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a8e3050 ]

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1

[jira] [Commented] (IMPALA-8102) Impala/HBase recommendations need update

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758589#comment-16758589
 ] 

ASF subversion and git services commented on IMPALA-8102:
-

Commit 79e735a46df258395ea518a5cf6e22e851a91119 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=79e735a ]

IMPALA-8102: update Impala/HBase docs

Provide pointers to Kudu, which is generally better for analytics

Remove or reword advice that encourages people to use HBase for
analytics.

Remove incorrect information about joins resulting in single-row HBase
lookups - this simply doesn't happen.

Change-Id: If1d5f014722d35eab9b60f7a4e8479738f1bed5b
Reviewed-on: http://gerrit.cloudera.org:8080/12315
Tested-by: Impala Public Jenkins 
Reviewed-by: Alex Rodoni 


> Impala/HBase recommendations need update
> 
>
> Key: IMPALA-8102
> URL: https://issues.apache.org/jira/browse/IMPALA-8102
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> https://impala.apache.org/docs/build/html/topics/impala_hbase.html hasn't 
> been updated for a while. The recommendations are a bit out of date - 
> generally HBase is not the best format for analytic workloads yet that page 
> seems to encourage using it.
> E.g.
> {quote}If you have join queries that do aggregation operations on large fact 
> tables and join the results against small dimension tables, consider using 
> Impala for the fact tables and HBase for the dimension tables.{quote}
> Assigning to myself to figure out what the best practice is, but I think we 
> need to include:
> * A statement Kudu offers significantly better performance for analytical 
> workloads with mutable data
> * A statement that HDFS tables are also preferable unless data is frequently 
> mutated
> * A pointer to the Kudu docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3798) Race condition may cause scanners to spin with runtime filters on Avro or Sequence files

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758594#comment-16758594
 ] 

ASF subversion and git services commented on IMPALA-3798:
-

Commit a8e30506aafef14646d95a56fb87cf7c28d259d6 in impala's branch 
refs/heads/master from Philip Zeyliger
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a8e3050 ]

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1

[jira] [Commented] (IMPALA-6479) Update DESCRIBE statement to respect column level privileges

2019-02-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758588#comment-16758588
 ] 

ASF subversion and git services commented on IMPALA-6479:
-

Commit b795a2c71cec33363fcce116fcb7e00364903c3a in impala's branch 
refs/heads/2.x from Adam Holley
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b795a2c ]

IMPALA-6479: Update DESCRIBE to respect column privileges

Modified the Frontend to filter columns from the DESCRIBE
statement.  Additionally, if a user has select on at least
one column, they can run DESCRIBE and see most metadata.
If they do not have full table access, they will not see
location or view query metadata.

Testing:
Added tests to validate users that have one or more column
access can run describe and that the output is filtered
accordingly.

Change-Id: Ic96ae184fccdc88ba970b5adcd501da1966accb9
Reviewed-on: http://gerrit.cloudera.org:8080/9276
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
Reviewed-on: http://gerrit.cloudera.org:8080/12292
Reviewed-by: Fredy Wijaya 
Tested-by: Impala Public Jenkins 


> Update DESCRIBE statement to respect column level privileges
> 
>
> Key: IMPALA-6479
> URL: https://issues.apache.org/jira/browse/IMPALA-6479
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 3.0
>
>
> Currently if a user is granted select on a subset of columns on a table, the 
> DESCRIBE command will show them all columns, and the DESCRIBE 
> FORMATTED/EXTENDED is not allowed.
> This change would update the DESCRIBE command that if a user has select on a 
> subset of columns, it will only show the data from the columns the user has 
> access to.  For DESCRIBE FORMATTED/EXTENDED, if a user has some column 
> access, but not all columns, the Location, and View * Text would be removed 
> from the additional metadata.
> The purpose of this change is to increase consumability by allowing tools 
> that allow users to browse data, such a for creating reports, to present only 
> columns they have access to.  There is also a security aspect to this fix by 
> not exposing additional data.  Other statements such a SHOW COLUMN STATS, 
> will be handled by a separate Jira to be opened.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8092) Add a debug page to provide better observability for admission control

2019-02-01 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758581#comment-16758581
 ] 

Alex Rodoni commented on IMPALA-8092:
-

[~bikramjeet.vig] User facing?

> Add a debug page to provide better observability for admission control
> --
>
> Key: IMPALA-8092
> URL: https://issues.apache.org/jira/browse/IMPALA-8092
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: admission-control, observability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2019-02-01 Thread Greg Rahn (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758554#comment-16758554
 ] 

Greg Rahn commented on IMPALA-4018:
---

Your understanding is correct. 
 I see the plan as such:
 - Since CAST(...FORMAT...) is net new syntax it should support the ISO SQL 
masks only. Other engines that implement the ISO SQL standard will then have 
compatible SQL.  There will be no confusion as "it follows the ISO standard".
 - Once the new ISO SQL masks are supported, we can explore a way to allow 
users to use the ISO SQL masks with the legacy conversion functions. This would 
allow legacy user SQL code to keep working but also enable migrations from 
typical db systems to work as well. Running in mixed-mode (some legacy Java, 
some new ISO SQL masks) should be possible, and probably best via a session 
setting via query option. Details to be worked out.  Noting impact on test 
combinations here also.
 - Another challenge here is that once DATE is supported in Impala, to_date() 
needs to return a DATE, not STRING. The new CAST(...FORMAT...) syntax will 
allow CAST( ... AS DATE ...) to work as expected from the beginning, so we 
should only introduce support for that in the new syntax once DATE is supported.

On the point of different masks for to_timestamp() and friends vs CAST(... as 
TIMESTAMP FORMAT ...): I see limiting the new ISO:2016 syntax to only ISO date 
masks as less confusing, especially for folks less familiar with Hive/Impala 
SQL specifics. It will be new syntax, but it will have the expected behaviors 
for those familiar with ISO SQL. It also means that things that used to work 
still do thus minimizing breaking or behavior changes.

The point on Hive/Impala compatible views and syntax is a fair one, and I'm 
aware of it, but several differences exist today and likely will in the future, 
but for things like this I see the right solution is that all engines converge 
on ISO/ANSI SQL compatibility. We can certainly let the Hive community (and 
others) know of Impala's adoption of this new ISO SQL syntax.

> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>  ::=
>   SS
>  ::=
>   S
> 

[jira] [Updated] (IMPALA-8143) Add features to DoRpcWithRetry()

2019-02-01 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated IMPALA-8143:
---
Description: 
DoRpcWithRetry() is a templated utility function that is currently in 
control-service.h that is used to retry synchronous Krpc calls. It makes a call 
to a Krpc function that is is passed as a lambda function. It sets the krpc 
timeout to the ‘krpc_timeout‘ parameter and calls the Krpc function a number of 
times controlled by the ‘times_to_try’ parameter.

Possible improvements:
 * Move code to rpc-mgr.inline.h
 * Add a configurable sleep if RpcMgr::IsServerTooBusy() says the remote 
server’s queue is full.
 * Make QueryState::ReportExecStatus() use DoRpcWithRetry()
 * Consider if asynchronous code like that in KrpcDataStreamSender::Channel  
can also use DoRpcWithRetry()
 * Replace FAULT_INJECTION_RPC_DELAY with DebugAction 

  was:
DoRpcWithRetry() is a templated utility function that is currently in 
control-service.h that is used to retry synchronous Krpc calls. It makes a call 
to a Krpc function that is is passed as a lambda function. It sets the krpc 
timeout to the ‘krpc_timeout‘ parameter and calls the Krpc function a number of 
times controlled by the ‘times_to_try’ parameter.

Possible improvements:
 * Move code to rpc-mgr.inline.h
 * Add a configurable sleep if RpcMgr::IsServerTooBusy() says the remote 
server’s queue is full.
 * Make QueryState::ReportExecStatus() use DoRpcWithRetry()
 * Consider if asynchronous code like that in KrpcDataStreamSender::Channel  
can also use DoRpcWithRetry()


> Add features to DoRpcWithRetry()
> 
>
> Key: IMPALA-8143
> URL: https://issues.apache.org/jira/browse/IMPALA-8143
> Project: IMPALA
>  Issue Type: Task
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
>
> DoRpcWithRetry() is a templated utility function that is currently in 
> control-service.h that is used to retry synchronous Krpc calls. It makes a 
> call to a Krpc function that is is passed as a lambda function. It sets the 
> krpc timeout to the ‘krpc_timeout‘ parameter and calls the Krpc function a 
> number of times controlled by the ‘times_to_try’ parameter.
> Possible improvements:
>  * Move code to rpc-mgr.inline.h
>  * Add a configurable sleep if RpcMgr::IsServerTooBusy() says the remote 
> server’s queue is full.
>  * Make QueryState::ReportExecStatus() use DoRpcWithRetry()
>  * Consider if asynchronous code like that in KrpcDataStreamSender::Channel  
> can also use DoRpcWithRetry()
>  * Replace FAULT_INJECTION_RPC_DELAY with DebugAction 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5861) HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5861:
-

Assignee: Tim Armstrong

> HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts
> --
>
> Key: IMPALA-5861
> URL: https://issues.apache.org/jira/browse/IMPALA-5861
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Dan Hecht
>Assignee: Tim Armstrong
>Priority: Major
>
> It appears that this code is double counting into {{rows_read_counter()}}, 
> since {{row_group_rows_read_}} is already accumulating:
> {code:title=HdfsParquetScanner::GetNextInternal()}
>   } else if (scan_node_->IsZeroSlotTableScan()) {
> // There are no materialized slots and we are not optimizing count(*), 
> e.g.
> // "select 1 from alltypes". We can serve this query from just the file 
> metadata.
> // We don't need to read the column data.
> if (row_group_rows_read_ == file_metadata_.num_rows) {
>   eos_ = true;
>   return Status::OK();
> }
> assemble_rows_timer_.Start();
> DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
> int64_t rows_remaining = file_metadata_.num_rows - row_group_rows_read_;
> int max_tuples = min(row_batch->capacity(), rows_remaining);
> TupleRow* current_row = row_batch->GetRow(row_batch->AddRow());
> int num_to_commit = WriteTemplateTuples(current_row, max_tuples);
> Status status = CommitRows(row_batch, num_to_commit);
> assemble_rows_timer_.Stop();
> RETURN_IF_ERROR(status);
> row_group_rows_read_ += num_to_commit;
> COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_);  
> <==
> return Status::OK();
>   }
> {code}
> Repro in impala-shell:
> {noformat}
> set batch_size=16; set num_nodes=1; select count(*) from 
> functional.alltypesmixedformat; profile
> 
>- RowsRead: 3.94K (3936)
>- RowsReturned: 1.20K (1200)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8150) AuditingTest.TestAccessEventsOnAuthFailure

2019-02-01 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8150 started by Fredy Wijaya.

> AuditingTest.TestAccessEventsOnAuthFailure
> --
>
> Key: IMPALA-8150
> URL: https://issues.apache.org/jira/browse/IMPALA-8150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Fredy Wijaya
>Priority: Blocker
>  Labels: broken-build
>
> {{org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure}} 
> started to fail recently with the following backtrace.
> [~fredyw], would you mind taking a look as you seem to have touched this test 
> recently.
> {noformat}
> java.lang.IllegalStateException: Error refreshing authorization policy: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
> Caused by: org.apache.impala.catalog.CatalogException: Error refreshing 
> authorization policy: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
>  
> Caused by: org.apache.impala.common.ImpalaRuntimeException: Error refreshing 
> authorization policy, current policy state may be inconsistent. Running 
> 'invalidate metadata' may resolve this problem: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.impala.common.SentryPolicyReaderException: 
> org.apache.impala.common.InternalException: Error creating Sentry Service 
> client: at 
> org.apache.impala.analysis.AuditingTest.TestAccessEventsOnAuthFailure(AuditingTest.java:373)
> Caused by: org.apache.impala.common.SentryPolicyReaderException: 
> org.apache.impala.common.InternalException: Error creating Sentry Service 
> client: 
> Caused by: org.apache.impala.common.InternalException: Error creating Sentry 
> Service client:
> Caused by: 
> org.apache.sentry.core.common.exception.MissingConfigurationException: 
> Property 'sentry.service.server.principal' is missing in configuration
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6734) Consider making the output type of all mathematical functions the same as the input type

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6734:
--
Issue Type: Improvement  (was: Bug)

> Consider making the output type of all mathematical functions the same as the 
> input type
> 
>
> Key: IMPALA-6734
> URL: https://issues.apache.org/jira/browse/IMPALA-6734
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Taras Bobrovytsky
>Priority: Major
>
> In IMPALA-6230 we made round() and several other related functions follow the 
> rule that the output type of a function should match the input type. We 
> should consider doing this for all other mathematical functions (such as 
> sign()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3831) Broken links in Impala debug webpage when query is starting up

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3831.
---
Resolution: Cannot Reproduce

> Broken links in Impala debug webpage when query is starting up
> --
>
> Key: IMPALA-3831
> URL: https://issues.apache.org/jira/browse/IMPALA-3831
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: debugging, ramp-up, supportability
> Attachments: impala-debug-webpage-invalid-query-id.png
>
>
> When you click on a query in the Impala debug webpage, I think before it 
> finishes planning, you get a blank plan webpage and a "Error: Invalid query 
> id: ae41282f79f47d9b:c117ce848f4ae9bd" if you click on the link from the 
> "queries" page. 
> This is ok in itself, but all of the links to other query pages are broken 
> and missing a query_id. E.g. 
> "http://tarmstrong-box.ca.cloudera.com:25000/query_summary?query_id=; This is 
> annoying because you have to refresh until the query finishes planning, then 
> click through to the other query pages. We already know the query id, so we 
> should be able to generate the correct links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3831) Broken links in Impala debug webpage when query is starting up

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3831.
---
Resolution: Cannot Reproduce

> Broken links in Impala debug webpage when query is starting up
> --
>
> Key: IMPALA-3831
> URL: https://issues.apache.org/jira/browse/IMPALA-3831
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: debugging, ramp-up, supportability
> Attachments: impala-debug-webpage-invalid-query-id.png
>
>
> When you click on a query in the Impala debug webpage, I think before it 
> finishes planning, you get a blank plan webpage and a "Error: Invalid query 
> id: ae41282f79f47d9b:c117ce848f4ae9bd" if you click on the link from the 
> "queries" page. 
> This is ok in itself, but all of the links to other query pages are broken 
> and missing a query_id. E.g. 
> "http://tarmstrong-box.ca.cloudera.com:25000/query_summary?query_id=; This is 
> annoying because you have to refresh until the query finishes planning, then 
> click through to the other query pages. We already know the query id, so we 
> should be able to generate the correct links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6211) Query state shows FINISHED in webUI/25000/queries page while it shows CREATED in profile

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6211.
---
Resolution: Duplicate

> Query state shows FINISHED in webUI/25000/queries page while it shows CREATED 
> in profile
> 
>
> Key: IMPALA-6211
> URL: https://issues.apache.org/jira/browse/IMPALA-6211
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Mala Chikka Kempanna
>Priority: Major
> Attachments: Profile-query-state.png, webUI-query-state.png
>
>
> A query run from HUE shows inconsisten state in impala webUI and in profile.
> On impala debug web UI 25000/queries, the query state is shown FINISHED, but 
> in profile query state shows CREATED.
> These two states need to be in sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6211) Query state shows FINISHED in webUI/25000/queries page while it shows CREATED in profile

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6211.
---
Resolution: Duplicate

> Query state shows FINISHED in webUI/25000/queries page while it shows CREATED 
> in profile
> 
>
> Key: IMPALA-6211
> URL: https://issues.apache.org/jira/browse/IMPALA-6211
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Mala Chikka Kempanna
>Priority: Major
> Attachments: Profile-query-state.png, webUI-query-state.png
>
>
> A query run from HUE shows inconsisten state in impala webUI and in profile.
> On impala debug web UI 25000/queries, the query state is shown FINISHED, but 
> in profile query state shows CREATED.
> These two states need to be in sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3922) Support yarn.scheduler.fair.allow-undeclared-pools option in fair-scheduler.xml

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3922:
--
Summary: Support yarn.scheduler.fair.allow-undeclared-pools option in 
fair-scheduler.xml  (was: Admission control pools are dynamically created even 
when dynamic queues are disabled in YARN)

> Support yarn.scheduler.fair.allow-undeclared-pools option in 
> fair-scheduler.xml
> ---
>
> Key: IMPALA-3922
> URL: https://issues.apache.org/jira/browse/IMPALA-3922
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Jim Halfpenny
>Priority: Minor
>  Labels: admission-control, resource-management
>
> When the YARN parameter yarn.scheduler.fair.allow-undeclared-pools is set to 
> false Impala still creates resource pools dynamically. This leads to 
> unexpected behaviour when configuring admission controls.
> Impala uses the resource pool root. if no pool is specified. 
> Queries that do not
> specify a pool will trigger creation of a new one, which is likely not what 
> the user intended. This behaviour occurs even when 
> yarn.scheduler.fair.user-as-default-queue is set to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3573) Planner doesn't take into account runtime filter selectivity

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3573:
--
Issue Type: Improvement  (was: Bug)

> Planner doesn't take into account runtime filter selectivity
> 
>
> Key: IMPALA-3573
> URL: https://issues.apache.org/jira/browse/IMPALA-3573
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: bushy, planner
>
> Applying selective runtime filters can drastically change the cardinality of 
> scan nodes, the planner doesn't cost the runtime filters as filters as a 
> result it misses out on a more selective plan. 
> In this particular query there is three fact to dimension joins 
> * (ss x d1) -> 389.28M rows
> * (sr x d2) -> 234.43M rows
> * (cs x d3) -> 12.85B rows
> The planner doesn't re-evaluate the cardinality estimation of ss, sr and cs 
> after the runtime filter are applied and puts ss as the left most node in the 
> plan where it should have been cs. 
> Ideally this should be a bushy plan 
> Query 
> {code}
> select i_item_id
> ,i_item_desc
> ,s_store_id
> ,s_store_name
> ,sum(ss_quantity)as store_sales_quantity
> ,sum(sr_return_quantity) as store_returns_quantity
> ,sum(cs_quantity)as catalog_sales_quantity
>  from
> store_sales
>,store_returns
>,catalog_sales
>,date_dim d1
>,date_dim d2
>,date_dim d3
>,store
>,item
>  where
>  d1.d_moy   = 4 
>  and d1.d_year  = 1999
>  and d1.d_date_sk   = ss_sold_date_sk
>  and i_item_sk  = ss_item_sk
>  and s_store_sk = ss_store_sk
>  and ss_customer_sk = sr_customer_sk
>  and ss_item_sk = sr_item_sk
>  and ss_ticket_number   = sr_ticket_number
>  and sr_returned_date_sk= d2.d_date_sk
>  and d2.d_moy   between 4 and  4 + 3 
>  and d2.d_year  = 1999
>  and sr_customer_sk = cs_bill_customer_sk
>  and sr_item_sk = cs_item_sk
>  and cs_sold_date_sk= d3.d_date_sk 
>  and d3.d_year  in (1999,1999+1,1999+2)
>  group by
> i_item_id
>,i_item_desc
>,s_store_id
>,s_store_name
>  order by
> i_item_id 
>,i_item_desc
>,s_store_id
>,s_store_name
> limit 100
> {code}
> Plan 
> {code}
> 28:MERGING-EXCHANGE [UNPARTITIONED]
> |  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
> |  limit: 100
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=9 row-size=224B cardinality=100
> |
> 16:TOP-N [LIMIT=100]
> |  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
> |  hosts=20 per-host-mem=21.89KB
> |  tuple-ids=9 row-size=224B cardinality=100
> |
> 27:AGGREGATE [FINALIZE]
> |  output: sum:merge(ss_quantity), sum:merge(sr_return_quantity), 
> sum:merge(cs_quantity)
> |  group by: i_item_id, i_item_desc, s_store_id, s_store_name
> |  hosts=20 per-host-mem=212.22MB
> |  tuple-ids=8 row-size=224B cardinality=335852270
> |
> 26:EXCHANGE [HASH(i_item_id,i_item_desc,s_store_id,s_store_name)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=8 row-size=224B cardinality=335852270
> |
> 15:AGGREGATE [STREAMING]
> |  output: sum(ss_quantity), sum(sr_return_quantity), sum(cs_quantity)
> |  group by: i_item_id, i_item_desc, s_store_id, s_store_name
> |  hosts=20 per-host-mem=77.13GB
> |  tuple-ids=8 row-size=224B cardinality=335852270
> |
> 14:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: ss_item_sk = i_item_sk
> |  runtime filters: RF000 <- i_item_sk
> |  hosts=20 per-host-mem=5.24MB
> |  tuple-ids=0,3,1,4,2,5,6,7 row-size=368B cardinality=335852270
> |
> |--25:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=7 row-size=156B cardinality=32000
> |  |
> |  07:SCAN HDFS [tpcds_15000_decimal_parquet.item, RANDOM]
> | partitions=1/1 files=1 size=3.14MB
> | table stats: 32000 rows total
> | column stats: all
> | hosts=1 per-host-mem=48.00MB
> | tuple-ids=7 row-size=156B cardinality=32000
> |
> 13:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: ss_store_sk = s_store_sk
> |  runtime filters: RF001 <- s_store_sk
> |  hosts=20 per-host-mem=4.00KB
> |  tuple-ids=0,3,1,4,2,5,6 row-size=212B cardinality=335852270
> |
> |--24:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=6 row-size=60B cardinality=62
> |  |
> |  06:SCAN HDFS [tpcds_15000_decimal_parquet.store, RANDOM]
> | partitions=1/1 files=1 size=11.92KB
> | table stats: 62 rows total
> | column stats: all
> | hosts=1 per-host-mem=48.00MB
> | tuple-ids=6 row-size=60B cardinality=62
> |
> 12:HASH JOIN [INNER JOIN, 

[jira] [Updated] (IMPALA-5035) Impala should not load INDEX_TABLEs from HMS

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5035:
--
Issue Type: Improvement  (was: Bug)

> Impala should not load INDEX_TABLEs from HMS
> 
>
> Key: IMPALA-5035
> URL: https://issues.apache.org/jira/browse/IMPALA-5035
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.8.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Peikai Zheng
>Priority: Major
>  Labels: catalog-server, ramp-up
>
> The catalog will retrieve and store entries for INDEX_TABLES that are created 
> in Hive using the CREATE INDEX statement. However, INDEX_TABLES cannot be 
> accessed/used in Impala, and accessing an INDEX_TABLE will always throw a 
> TableLoading exception. The catalog should not be loading INDEX_TABLEs from 
> HMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6875) Reservations for ORC scanner

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6875:
--
Issue Type: Improvement  (was: Bug)

> Reservations for ORC scanner
> 
>
> Key: IMPALA-6875
> URL: https://issues.apache.org/jira/browse/IMPALA-6875
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> This tracks the work needed to get ORC to be a first-class citizen when it 
> comes to scanner reservations, i.e. reserving the memory required for holding 
> columns in memory instead of just mallocing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3631) Investigate why Decimal to Timestamp casting became slower

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3631.
---
Resolution: Later

I don't think this is really a priority now - we should probably just treat 
this the same as other performance improvements and wait until we have some 
evidence that it's important to improve.

> Investigate why Decimal to Timestamp casting became slower
> --
>
> Key: IMPALA-3631
> URL: https://issues.apache.org/jira/browse/IMPALA-3631
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Taras Bobrovytsky
>Priority: Minor
>  Labels: performance
>
> https://issues.cloudera.org/browse/IMPALA-3163 fixes the correctness issue 
> with Decimal to Timestamp casting, but worsens the performance by about 30%. 
> We want to understand why this happens and possibly fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-7771) Download page should not link to unreleased code

2019-02-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758411#comment-16758411
 ] 

Tim Armstrong commented on IMPALA-7771:
---

"Only release artifacts that have been approved by the relevant PMC may be 
linked from the download page." does seem to imply that if you interpret the 
source repo as an "artifact". Anyway we might as well just remove it, the 
source code is linked from the navbar.


> Download page should not link to unreleased code
> 
>
> Key: IMPALA-7771
> URL: https://issues.apache.org/jira/browse/IMPALA-7771
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Tim Armstrong
>Priority: Major
>
> The download page must not link to unleased code such as repos:
> http://www.apache.org/dev/release-download-pages.html#links
> Such links are only to be published on pages for developers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6777) Whitespace inconsistencies in pretty printer across units

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6777:
--
Target Version: Product Backlog
  Priority: Trivial  (was: Minor)

> Whitespace inconsistencies in pretty printer across units
> -
>
> Key: IMPALA-6777
> URL: https://issues.apache.org/jira/browse/IMPALA-6777
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Lars Volker
>Priority: Trivial
>  Labels: newbie
>
> Depending on the unit we sometimes print a whitespace between a value and its 
> unit and sometimes we don't:
>  
> {noformat}
> "human_readable": "Count: 9, min / max: 13.000us / 22.000us, 25th %-ile: 
> 13.000us, 50th %-ile: 16.000us, 75th %-ile: 17.000us, 90th %-ile: 18.000us, 
> 95th %-ile: 22.000us, 99.9th %-ile: 22.000us",
> "human_readable": "Count: 9, min / max: 80.00 B / 80.00 B, 25th %-ile: 80.00 
> B, 50th %-ile: 80.00 B, 75th %-ile: 80.00 B, 90th %-ile: 80.00 B, 95th %-ile: 
> 80.00 B, 99.9th %-ile: 80.00 B",
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7771) Download page should not link to unreleased code

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7771:
-

Assignee: Tim Armstrong

> Download page should not link to unreleased code
> 
>
> Key: IMPALA-7771
> URL: https://issues.apache.org/jira/browse/IMPALA-7771
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Tim Armstrong
>Priority: Major
>
> The download page must not link to unleased code such as repos:
> http://www.apache.org/dev/release-download-pages.html#links
> Such links are only to be published on pages for developers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5861) HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts

2019-02-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758405#comment-16758405
 ] 

Tim Armstrong commented on IMPALA-5861:
---

This one looks so trivial we should just fix it.

> HdfsParquetScanner::GetNextInternal() IsZeroSlotTableScan() case double counts
> --
>
> Key: IMPALA-5861
> URL: https://issues.apache.org/jira/browse/IMPALA-5861
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Dan Hecht
>Assignee: Tim Armstrong
>Priority: Major
>
> It appears that this code is double counting into {{rows_read_counter()}}, 
> since {{row_group_rows_read_}} is already accumulating:
> {code:title=HdfsParquetScanner::GetNextInternal()}
>   } else if (scan_node_->IsZeroSlotTableScan()) {
> // There are no materialized slots and we are not optimizing count(*), 
> e.g.
> // "select 1 from alltypes". We can serve this query from just the file 
> metadata.
> // We don't need to read the column data.
> if (row_group_rows_read_ == file_metadata_.num_rows) {
>   eos_ = true;
>   return Status::OK();
> }
> assemble_rows_timer_.Start();
> DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
> int64_t rows_remaining = file_metadata_.num_rows - row_group_rows_read_;
> int max_tuples = min(row_batch->capacity(), rows_remaining);
> TupleRow* current_row = row_batch->GetRow(row_batch->AddRow());
> int num_to_commit = WriteTemplateTuples(current_row, max_tuples);
> Status status = CommitRows(row_batch, num_to_commit);
> assemble_rows_timer_.Stop();
> RETURN_IF_ERROR(status);
> row_group_rows_read_ += num_to_commit;
> COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_);  
> <==
> return Status::OK();
>   }
> {code}
> Repro in impala-shell:
> {noformat}
> set batch_size=16; set num_nodes=1; select count(*) from 
> functional.alltypesmixedformat; profile
> 
>- RowsRead: 3.94K (3936)
>- RowsReturned: 1.20K (1200)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-4057) Start webserver with interface"127.0.0.1" failed.

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-4057:
--
Priority: Trivial  (was: Minor)

> Start webserver with interface"127.0.0.1" failed.
> -
>
> Key: IMPALA-4057
> URL: https://issues.apache.org/jira/browse/IMPALA-4057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
> Environment: bash-4.1$ lsb_release -a
> LSB Version:  
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> Distributor ID:   CentOS
> Description:  CentOS release 6.7 (Final)
> Release:  6.7
> Codename: Final
> bash-4.1$
>Reporter: hewenting
>Assignee: hewenting
>Priority: Trivial
>  Labels: impala, webserver
>
> Start impala with option -websever_interface=127.0.0.1 failed.
> log displayed on terminal:
> {noformat}
> bash-4.1$ ./bin/start-impala-cluster.py -s 1 --impalad_args 
> "-webserver_interface=127.0.0.1"
> Starting State Store logging to 
> /home/impala/incubator-impala/logs/cluster/statestored.INFO
> Starting Catalog Service logging to 
> /home/impala/incubator-impala/logs/cluster/catalogd.INFO
> Starting Impala Daemon logging to 
> /home/impala/incubator-impala/logs/cluster/impalad.INFO
> MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> MainThread: Getting num_known_live_backends from nobida141:25000
> MainThread: Debug webpage not yet available.
> ...
> MainThread: Debug webpage not yet available.
> MainThread: Debug webpage did not become available in expected time.
> MainThread: Waiting for num_known_live_backends=1. Current value: None
> Error starting cluster: num_known_live_backends did not reach expected value 
> in time
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4057) Start webserver with interface"127.0.0.1" failed.

2019-02-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758391#comment-16758391
 ] 

Tim Armstrong commented on IMPALA-4057:
---

I think this is a minor bug in impala_cluster.py - it assumes that the 
webserver will be listening on whatever socket.gethostname() maps to. For this 
to work generally, since you can get the webserver to listen on any interface, 
we'd need to plumb through the webserver port to the code that polls the 
webserver, or try all of the interfaces on the localhost.

> Start webserver with interface"127.0.0.1" failed.
> -
>
> Key: IMPALA-4057
> URL: https://issues.apache.org/jira/browse/IMPALA-4057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
> Environment: bash-4.1$ lsb_release -a
> LSB Version:  
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> Distributor ID:   CentOS
> Description:  CentOS release 6.7 (Final)
> Release:  6.7
> Codename: Final
> bash-4.1$
>Reporter: hewenting
>Assignee: hewenting
>Priority: Minor
>  Labels: impala, webserver
>
> Start impala with option -websever_interface=127.0.0.1 failed.
> log displayed on terminal:
> {noformat}
> bash-4.1$ ./bin/start-impala-cluster.py -s 1 --impalad_args 
> "-webserver_interface=127.0.0.1"
> Starting State Store logging to 
> /home/impala/incubator-impala/logs/cluster/statestored.INFO
> Starting Catalog Service logging to 
> /home/impala/incubator-impala/logs/cluster/catalogd.INFO
> Starting Impala Daemon logging to 
> /home/impala/incubator-impala/logs/cluster/impalad.INFO
> MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> MainThread: Getting num_known_live_backends from nobida141:25000
> MainThread: Debug webpage not yet available.
> ...
> MainThread: Debug webpage not yet available.
> MainThread: Debug webpage did not become available in expected time.
> MainThread: Waiting for num_known_live_backends=1. Current value: None
> Error starting cluster: num_known_live_backends did not reach expected value 
> in time
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6098) core on parquet select

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6098.
---
Resolution: Cannot Reproduce

> core on parquet select
> --
>
> Key: IMPALA-6098
> URL: https://issues.apache.org/jira/browse/IMPALA-6098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
> Environment: Version
> catalogd version 2.9.0-cdh5.12.0 RELEASE (build 
> 03c6ddbdcec39238be4f5b14a300d5c4f576097e)
> Built on Thu Jun 29 04:17:31 PDT 2017
> Hardware Info
> Cpu Info:
>   Model: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>   Cores: 24
>   Max Possible Cores: 24
>   L1 Cache: 32.00 KB (Line: 64.00 B)
>   L2 Cache: 256.00 KB (Line: 64.00 B)
>   L3 Cache: 15.00 MB (Line: 64.00 B)
>   Hardware Supports:
> ssse3
> sse4_1
> sse4_2
> popcnt
> avx
> avx2
>   Numa Nodes: 2
>   Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->1 | 7->1 
> | 8->1 | 9->1 | 10->1 | 11->1 | 12->0 | 13->0 | 14->0 | 15->0 | 16->0 | 17->0 
> | 18->1 | 19->1 | 20->1 | 21->1 | 22->1 | 23->1 |
>  Physical Memory: 62.28 GB
>  Disk Info: 
>   Num disks 13: 
> sda (rotational=false)
> sdb (rotational=true)
> sdc (rotational=true)
> sdd (rotational=true)
> sde (rotational=true)
> sdk (rotational=true)
> sdf (rotational=true)
> sdl (rotational=true)
> sdm (rotational=true)
> sdg (rotational=true)
> sdi (rotational=true)
> sdj (rotational=true)
> sdh (rotational=true)
> OS Info
> OS version: Linux version 3.10.104-1-tlinux2-0041.tl1 (r...@te64.site) (gcc 
> version 4.4.6 20110731 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Oct 28 20:36:06 
> CST 2016
> Clock: clocksource: 'tsc', clockid_t: CLOCK_MONOTONIC
>Reporter: sw
>Priority: Major
>
> i create table like this:
> {code:java}
> "CREATE EXTERNAL TABLE fact_vm_widetable  LIKE PARQUET  
> '/user/spark/parquet-vm/part-0-69d62acd-92a4-4774-ae6c-71be5c2dfcd0-c000.snappy.parquet'
> STORED AS PARQUET
> LOCATION '/user/spark/parquet-vm';"
> {code}
> then: select count(1) from fact_host_widetable  , all Impala Daemon will core.
> got info from core:
> (gdb) bt
> #0  0x7f64bf6f3625 in raise () from /lib64/libc.so.6
> #1  0x7f64bf6f4e05 in abort () from /lib64/libc.so.6
> #2  0x7f64c016c07d in __gnu_cxx::__verbose_terminate_handler() () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #3  0x7f64c016a0e6 in ?? () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #4  0x7f64c016a131 in std::terminate() () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #5  0x7f64c016a348 in __cxa_throw () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #6  0x7f64c01c5976 in std::__throw_runtime_error(char const*) () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #7  0x7f64c018cac4 in 
> std::locale::facet::_S_create_c_locale(__locale_struct*&, char const*, 
> __locale_struct*) ()
>from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #8  0x7f64c0181f69 in std::locale::_Impl::_Impl(char const*, unsigned 
> long) () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #9  0x7f64c0183192 in std::locale::locale(char const*) () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #10 0x00e81de3 in boost::filesystem::path::codecvt() ()
> #11 0x00c6f8f2 in 
> impala::HdfsScanNodeBase::Prepare(impala::RuntimeState*) ()
> #12 0x00c67ce9 in 
> impala::HdfsScanNode::Prepare(impala::RuntimeState*) ()
> #13 0x00c50bf4 in impala::ExecNode::Prepare(impala::RuntimeState*) ()
> #14 0x00cf0037 in 
> impala::PartitionedAggregationNode::Prepare(impala::RuntimeState*) ()
> #15 0x00a7efcd in impala::FragmentInstanceState::Prepare() ()
> #16 0x00a7fb71 in impala::FragmentInstanceState::Exec() ()
> #17 0x00a6bab6 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> #18 0x00bf0ac9 in 
> impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, 
> std::basic_string, std::allocator > 
> const&, boost::function, impala::Promise*) ()
> #19 0x00bf1484 in boost::detail::thread_data void (*)(std::basic_string, std::allocator 
> > const&, std::basic_string, 
> std::allocator > const&, boost::function, 
> impala::Promise*), 
> boost::_bi::list4 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value*> > > >::run() ()
> #20 0x00e592ea in 

[jira] [Commented] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2019-02-01 Thread Gabor Kaszab (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758389#comment-16758389
 ] 

Gabor Kaszab commented on IMPALA-4018:
--

Understood. Thanks for the explanation, Greg!

So as far as I understand the plan is to introduce CAST(..FORMAT..) with SQL 
format from the beginning and leave e.g. to_timestamp() and from_timestamp() to 
use Java format with an additional flag to switch them to SQL pattern.

If this is the case then I'm worried a bit to introduce inconsistency within 
Impala. I can imagine using one pattern here but another there by default would 
cause some confusion for the users.
[~grahn] Do you think we could avoid this somehow? Can we deprecate the 
functions using Java syntax eventually and indicate to the users at this point 
that they should migrate off?
(On a sidenote, I found no documentation about the current date time formats in 
our docs so it's a great moment to add them alongside with the new format. 
[~arodoni_cloudera] This is just an FYI.)

Another concern was raised by [~Paul.Rogers] on the code review: As view 
definitions can be written to HMS should we be worried that Hive won't be able 
to read them if they are written with SQL pattern? In addition Hive doesn't 
have FORMAT clause for CAST so it brings in another inconsistency between the 
systems. Should we initiate a conversation with the Hive community to handle 
the same on their side?

> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>  ::=
>   SS
>  ::=
>   S
>  ::=
>   FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
>  ::=
>   A.M. | P.M.
>  ::=
>   TZH
>  ::=
>   TZM
> {noformat}
> SQL:2016 also introduced the FORMAT clause for CAST which is the standard way 
> to do string <> datetime conversions
> {noformat}
>  ::=
>   CAST 
>AS 
>   [ FORMAT  ]
>   
>  ::=
> 
>   | 
>  ::=
> 
> | 
>  ::=
>   
> {noformat}
> For example:
> {noformat}
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-4057) Start webserver with interface"127.0.0.1" failed.

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-4057:
--
Priority: Minor  (was: Major)

> Start webserver with interface"127.0.0.1" failed.
> -
>
> Key: IMPALA-4057
> URL: https://issues.apache.org/jira/browse/IMPALA-4057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
> Environment: bash-4.1$ lsb_release -a
> LSB Version:  
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> Distributor ID:   CentOS
> Description:  CentOS release 6.7 (Final)
> Release:  6.7
> Codename: Final
> bash-4.1$
>Reporter: hewenting
>Assignee: hewenting
>Priority: Minor
>  Labels: impala, webserver
>
> Start impala with option -websever_interface=127.0.0.1 failed.
> log displayed on terminal:
> {noformat}
> bash-4.1$ ./bin/start-impala-cluster.py -s 1 --impalad_args 
> "-webserver_interface=127.0.0.1"
> Starting State Store logging to 
> /home/impala/incubator-impala/logs/cluster/statestored.INFO
> Starting Catalog Service logging to 
> /home/impala/incubator-impala/logs/cluster/catalogd.INFO
> Starting Impala Daemon logging to 
> /home/impala/incubator-impala/logs/cluster/impalad.INFO
> MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> MainThread: Getting num_known_live_backends from nobida141:25000
> MainThread: Debug webpage not yet available.
> ...
> MainThread: Debug webpage not yet available.
> MainThread: Debug webpage did not become available in expected time.
> MainThread: Waiting for num_known_live_backends=1. Current value: None
> Error starting cluster: num_known_live_backends did not reach expected value 
> in time
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6098) core on parquet select

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6098.
---
Resolution: Cannot Reproduce

> core on parquet select
> --
>
> Key: IMPALA-6098
> URL: https://issues.apache.org/jira/browse/IMPALA-6098
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
> Environment: Version
> catalogd version 2.9.0-cdh5.12.0 RELEASE (build 
> 03c6ddbdcec39238be4f5b14a300d5c4f576097e)
> Built on Thu Jun 29 04:17:31 PDT 2017
> Hardware Info
> Cpu Info:
>   Model: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>   Cores: 24
>   Max Possible Cores: 24
>   L1 Cache: 32.00 KB (Line: 64.00 B)
>   L2 Cache: 256.00 KB (Line: 64.00 B)
>   L3 Cache: 15.00 MB (Line: 64.00 B)
>   Hardware Supports:
> ssse3
> sse4_1
> sse4_2
> popcnt
> avx
> avx2
>   Numa Nodes: 2
>   Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->1 | 7->1 
> | 8->1 | 9->1 | 10->1 | 11->1 | 12->0 | 13->0 | 14->0 | 15->0 | 16->0 | 17->0 
> | 18->1 | 19->1 | 20->1 | 21->1 | 22->1 | 23->1 |
>  Physical Memory: 62.28 GB
>  Disk Info: 
>   Num disks 13: 
> sda (rotational=false)
> sdb (rotational=true)
> sdc (rotational=true)
> sdd (rotational=true)
> sde (rotational=true)
> sdk (rotational=true)
> sdf (rotational=true)
> sdl (rotational=true)
> sdm (rotational=true)
> sdg (rotational=true)
> sdi (rotational=true)
> sdj (rotational=true)
> sdh (rotational=true)
> OS Info
> OS version: Linux version 3.10.104-1-tlinux2-0041.tl1 (r...@te64.site) (gcc 
> version 4.4.6 20110731 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Oct 28 20:36:06 
> CST 2016
> Clock: clocksource: 'tsc', clockid_t: CLOCK_MONOTONIC
>Reporter: sw
>Priority: Major
>
> i create table like this:
> {code:java}
> "CREATE EXTERNAL TABLE fact_vm_widetable  LIKE PARQUET  
> '/user/spark/parquet-vm/part-0-69d62acd-92a4-4774-ae6c-71be5c2dfcd0-c000.snappy.parquet'
> STORED AS PARQUET
> LOCATION '/user/spark/parquet-vm';"
> {code}
> then: select count(1) from fact_host_widetable  , all Impala Daemon will core.
> got info from core:
> (gdb) bt
> #0  0x7f64bf6f3625 in raise () from /lib64/libc.so.6
> #1  0x7f64bf6f4e05 in abort () from /lib64/libc.so.6
> #2  0x7f64c016c07d in __gnu_cxx::__verbose_terminate_handler() () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #3  0x7f64c016a0e6 in ?? () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #4  0x7f64c016a131 in std::terminate() () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #5  0x7f64c016a348 in __cxa_throw () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #6  0x7f64c01c5976 in std::__throw_runtime_error(char const*) () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #7  0x7f64c018cac4 in 
> std::locale::facet::_S_create_c_locale(__locale_struct*&, char const*, 
> __locale_struct*) ()
>from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #8  0x7f64c0181f69 in std::locale::_Impl::_Impl(char const*, unsigned 
> long) () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #9  0x7f64c0183192 in std::locale::locale(char const*) () from 
> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/impala/lib/libstdc++.so.6
> #10 0x00e81de3 in boost::filesystem::path::codecvt() ()
> #11 0x00c6f8f2 in 
> impala::HdfsScanNodeBase::Prepare(impala::RuntimeState*) ()
> #12 0x00c67ce9 in 
> impala::HdfsScanNode::Prepare(impala::RuntimeState*) ()
> #13 0x00c50bf4 in impala::ExecNode::Prepare(impala::RuntimeState*) ()
> #14 0x00cf0037 in 
> impala::PartitionedAggregationNode::Prepare(impala::RuntimeState*) ()
> #15 0x00a7efcd in impala::FragmentInstanceState::Prepare() ()
> #16 0x00a7fb71 in impala::FragmentInstanceState::Exec() ()
> #17 0x00a6bab6 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> #18 0x00bf0ac9 in 
> impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, 
> std::basic_string, std::allocator > 
> const&, boost::function, impala::Promise*) ()
> #19 0x00bf1484 in boost::detail::thread_data void (*)(std::basic_string, std::allocator 
> > const&, std::basic_string, 
> std::allocator > const&, boost::function, 
> impala::Promise*), 
> boost::_bi::list4 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value*> > > >::run() ()
> #20 0x00e592ea in 

[jira] [Updated] (IMPALA-691) Process mem limit should account for the JVM's memory usage

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-691:
-
Summary: Process mem limit should account for the JVM's memory usage  (was: 
Process mem limit does not account for the JVM's memory usage)

> Process mem limit should account for the JVM's memory usage
> ---
>
> Key: IMPALA-691
> URL: https://issues.apache.org/jira/browse/IMPALA-691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.2.1, Impala 2.0, Impala 2.1, Impala 2.2, Impala 
> 2.3.0
>Reporter: Skye Wanderman-Milne
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: incompatibility, resource-management
>
> The JVM doesn't appear to use malloc, so it's memory usage is not reported by 
> tcmalloc and we do not count it in the process mem limit. I verified this by 
> adding a large allocation in the FE, and noting that the total memory usage 
> (virtual or resident) reported in /memz is not affected, but the virtual and 
> resident memory usage reported by top is.
> This is problematic especially because Impala caches table metadata in the FE 
> (JVM) which can become quite big (few GBs) in extreme cases.
> *Workaround*
> As a workaround, we recommend reducing the process memory limit by 1-2GB to 
> "reserve" memory for the JVM. How much memory you should reserve typically 
> depends on the size of your catalog ( number of 
> tables/partitions/columns/blocks etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7895) Incorrect expected results for spillable-buffer-sizing.test

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7895.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Incorrect expected results for spillable-buffer-sizing.test
> ---
>
> Key: IMPALA-7895
> URL: https://issues.apache.org/jira/browse/IMPALA-7895
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> A recent change appears to have caused a test to expect the wrong rewritten 
> SQL in {{spillable-buffer-sizing.test}}.
> {noformat}
> # Mid NDV aggregation - should scale down buffers to intermediate size.
> select straight_join l_orderkey, o_orderstatus, count(*)
> from tpch_parquet.lineitem
> join tpch_parquet.orders on o_orderkey = l_orderkey
> group by 1, 2
> having count(*) = 1
>  DISTRIBUTEDPLAN
> Max Per-Host Resource Reservation: Memory=82.00MB Threads=7
> Per-Host Resource Estimates: Memory=244MB
> Analyzed query: SELECT 
> -- +straight_join
> l_orderkey, o_orderstatus, count(*) FROM tpch_parquet.lineitem INNER JOIN
> tpch_parquet.orders ON o_orderkey = l_orderkey GROUP BY CAST(1 AS 
> INVALID_TYPE),
> CAST(2 AS INVALID_TYPE) HAVING count(*) = CAST(1 AS BIGINT)
> {noformat}
> Correct rewritten SQL:
> {noformat}
> Analyzed query: SELECT 
> -- +straight_join
> l_orderkey, o_orderstatus, count(*) FROM tpch_parquet.lineitem INNER JOIN
> tpch_parquet.orders ON o_orderkey = l_orderkey GROUP BY l_orderkey,
> o_orderstatus HAVING count(*) = CAST(1 AS BIGINT)
> {noformat}
> The same problem occurs in {{max-rows-test.test}}.
> The problem is due to the existence of two copies of the grouping 
> expressions. The {{toSql()}} function used the original, unanalyzed copy, not 
> the rewritten copy with ordinal replacements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-7923) DecimalValue should be marked as packed

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7923:
--
Issue Type: Improvement  (was: Bug)

> DecimalValue should be marked as packed
> ---
>
> Key: IMPALA-7923
> URL: https://issues.apache.org/jira/browse/IMPALA-7923
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Major
>
> IMPALA-7473 was a symptom of a more general problem that DecimalValue is not 
> guaranteed to be aligned by the Impala runtime, but the class is not marked 
> as packed and, under some circumstances, GCC will emit code for aligned loads 
> to value_ when value_ is an int128. 
> Testing helps confirm that the compiler does not emit the problematic loads 
> in practice, but it would be better to mark the struct as packed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7665) Bringing up stopped statestore causes queries to fail

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7665:
--
Priority: Critical  (was: Major)

> Bringing up stopped statestore causes queries to fail
> -
>
> Key: IMPALA-7665
> URL: https://issues.apache.org/jira/browse/IMPALA-7665
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Critical
>  Labels: query-lifecycle, statestore
>
> I can reproduce this by running a long-running query then cycling the 
> statestore:
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh -q 
> "select distinct * from tpch10_parquet.lineitem"
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> c486fb9ea4330e1008fa9b7ceaa60492e43ee120)
> Query: select distinct * from tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 17:06:48 (Coordinator: 
> http://tarmstrong-box:25000)
> {noformat}
> If I kill the statestore, the query runs fine, but if I start up the 
> statestore again, it fails.
> {noformat}
> # In one terminal, start up the statestore
> $ 
> /home/tarmstrong/Impala/incubator-impala/be/build/latest/statestore/statestored
>  -log_filename=statestored 
> -log_dir=/home/tarmstrong/Impala/incubator-impala/logs/cluster -v=1 
> -logbufsecs=5 -max_log_files=10
> # The running query then fails
> WARNINGS: Failed due to unreachable impalad(s): tarmstrong-box:22001, 
> tarmstrong-box:22002
> {noformat}
> Note that I've seen different subsets impalads reported as failed, e.g. 
> "Failed due to unreachable impalad(s): tarmstrong-box:22001"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6910:
-

Assignee: (was: Sailesh Mukil)

> Multiple tests failing on S3 build: error reading from HDFS file
> 
>
> Key: IMPALA-6910
> URL: https://issues.apache.org/jira/browse/IMPALA-6910
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: David Knupp
>Priority: Critical
>  Labels: broken-build, flaky, s3
> Fix For: Impala 3.2.0
>
>
> Stacktrace
> {noformat}
> query_test/test_compressed_formats.py:149: in test_seq_writer
> self.run_test_case('QueryTest/seq-writer', vector, unique_database)
> common/impala_test_suite.py:397: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:612: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:341: in __execute_query
> self.wait_for_completion(handle)
> beeswax/impala_beeswax.py:361: in wait_for_completion
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error: Error reading from HDFS file: 
> s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq
> E   Error(255): Unknown error 255
> E   Root cause: SdkClientException: Data read has a different length than the 
> expected: dataLength=8576; expectedLength=17785; includeSkipped=true; 
> in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; 
> markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; 
> resetCount=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7877) Support Hive GenericUDF

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7877:
--
Issue Type: Improvement  (was: Bug)

> Support Hive GenericUDF
> ---
>
> Key: IMPALA-7877
> URL: https://issues.apache.org/jira/browse/IMPALA-7877
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.0
>Reporter: eugen yushin
>Priority: Major
>
> Running Hive UDF extending GenericUDF interface results in class cast 
> exception. Relevant [code 
> block|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java#L586]:
> {code}
> LOG.debug("Loading UDF '" + udfPath + "' from " + jarPath);
> loader = getClassLoader(jarPath);
> Class c = Class.forName(udfPath, true, loader);
> Class udfClass = c.asSubclass(UDF.class);
> {code}
> Reproduce steps:
> {code}
> create function my_lower(string) returns string location 
> '/path/to/hive-exec-1.1.0-cdh5.15.0.jar' 
> symbol='org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower';
> select my_lower('Some String NOT ALREADY LOWERCASE');
> {code}
> Stack trace:
> {code}
> I1121 11:58:29.509138 29092 Frontend.java:952] Analyzing query: select 
> my_lower('Some String NOT ALREADY LOWERCASE')
> I1121 11:58:29.513121 29092 UdfExecutor.java:581] Loading UDF 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower' from 
> /var/lib/impala/udfs/hive-exec-1.1.0-cdh5.15.0.83728.2.jar
> I1121 11:58:29.515535 29092 jni-util.cc:230] java.lang.ClassCastException: 
> class org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower
> at java.lang.Class.asSubclass(Class.java:3404)
> at 
> org.apache.impala.hive.executor.UdfExecutor.init(UdfExecutor.java:584)
> at 
> org.apache.impala.hive.executor.UdfExecutor.(UdfExecutor.java:217)
> at 
> org.apache.impala.service.FeSupport.NativeEvalExprsWithoutRow(Native Method)
> at 
> org.apache.impala.service.FeSupport.EvalExprsWithoutRow(FeSupport.java:208)
> at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:163)
> at org.apache.impala.analysis.LiteralExpr.create(LiteralExpr.java:184)
> at 
> org.apache.impala.rewrite.FoldConstantsRule.apply(FoldConstantsRule.java:68)
> at 
> org.apache.impala.rewrite.ExprRewriter.applyRuleBottomUp(ExprRewriter.java:85)
> at 
> org.apache.impala.rewrite.ExprRewriter.applyRuleRepeatedly(ExprRewriter.java:71)
> at 
> org.apache.impala.rewrite.ExprRewriter.rewrite(ExprRewriter.java:55)
> at 
> org.apache.impala.analysis.SelectList.rewriteExprs(SelectList.java:97)
> at 
> org.apache.impala.analysis.SelectStmt.rewriteExprs(SelectStmt.java:894)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:432)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:393)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:962)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
> I1121 11:58:29.523166 29092 status.cc:125] ClassCastException: class 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower
> @   0x96663a  impala::Status::Status()
> @   0xcedfdd  impala::JniUtil::GetJniExceptionMsg()
> @  0x109457f  impala::HiveUdfCall::OpenEvaluator()
> @   0x96d757  impala::ScalarExprEvaluator::Open()
> @   0xbedc2d  
> Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow
> @ 0x7fc705b49e6d  (unknown)
> {code}
> Marked as bug because there're no any notes related to this behaviour in docs 
> (while it claims Impala supports Hive UDF, it should support all possible 
> Hive UDF formats if other is not specified).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7895) Incorrect expected results for spillable-buffer-sizing.test

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7895.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Incorrect expected results for spillable-buffer-sizing.test
> ---
>
> Key: IMPALA-7895
> URL: https://issues.apache.org/jira/browse/IMPALA-7895
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> A recent change appears to have caused a test to expect the wrong rewritten 
> SQL in {{spillable-buffer-sizing.test}}.
> {noformat}
> # Mid NDV aggregation - should scale down buffers to intermediate size.
> select straight_join l_orderkey, o_orderstatus, count(*)
> from tpch_parquet.lineitem
> join tpch_parquet.orders on o_orderkey = l_orderkey
> group by 1, 2
> having count(*) = 1
>  DISTRIBUTEDPLAN
> Max Per-Host Resource Reservation: Memory=82.00MB Threads=7
> Per-Host Resource Estimates: Memory=244MB
> Analyzed query: SELECT 
> -- +straight_join
> l_orderkey, o_orderstatus, count(*) FROM tpch_parquet.lineitem INNER JOIN
> tpch_parquet.orders ON o_orderkey = l_orderkey GROUP BY CAST(1 AS 
> INVALID_TYPE),
> CAST(2 AS INVALID_TYPE) HAVING count(*) = CAST(1 AS BIGINT)
> {noformat}
> Correct rewritten SQL:
> {noformat}
> Analyzed query: SELECT 
> -- +straight_join
> l_orderkey, o_orderstatus, count(*) FROM tpch_parquet.lineitem INNER JOIN
> tpch_parquet.orders ON o_orderkey = l_orderkey GROUP BY l_orderkey,
> o_orderstatus HAVING count(*) = CAST(1 AS BIGINT)
> {noformat}
> The same problem occurs in {{max-rows-test.test}}.
> The problem is due to the existence of two copies of the grouping 
> expressions. The {{toSql()}} function used the original, unanalyzed copy, not 
> the rewritten copy with ordinal replacements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8056) Impala accepts plus in front of string value

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8056:
--
Target Version: Product Backlog
Labels: incompatibility  (was: )

> Impala accepts plus in front of string value
> 
>
> Key: IMPALA-8056
> URL: https://issues.apache.org/jira/browse/IMPALA-8056
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Andrejs Dubovskis
>Priority: Minor
>  Labels: incompatibility
>
> Impala accepts plus in front of string value and reject minus.
> See the output for the corresponding  queries:
> {code}
> Server version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 23f574543323301846b41fa5433690df32efe085)
> Query: select "a",+"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> Query progress can be monitored at: 
> http://catdn009:25000/query_plan?query_id=2640632c29c812c7:905a9fcd
> a b
> Fetched 1 row(s) in 0.01s
> Query: select "a",-"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> ERROR: AnalysisException: Arithmetic operation requires numeric operands: -1 
> * 'b'
> Could not execute command: select "a",-"b"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8056) Impala accepts plus in front of string value

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8056:
--
Labels: compatibility incompatibility  (was: incompatibility)

> Impala accepts plus in front of string value
> 
>
> Key: IMPALA-8056
> URL: https://issues.apache.org/jira/browse/IMPALA-8056
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Andrejs Dubovskis
>Priority: Minor
>  Labels: compatibility, incompatibility
>
> Impala accepts plus in front of string value and reject minus.
> See the output for the corresponding  queries:
> {code}
> Server version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 23f574543323301846b41fa5433690df32efe085)
> Query: select "a",+"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> Query progress can be monitored at: 
> http://catdn009:25000/query_plan?query_id=2640632c29c812c7:905a9fcd
> a b
> Fetched 1 row(s) in 0.01s
> Query: select "a",-"b"
> Query submitted at: 2019-01-08 14:42:55 (Coordinator: http://catdn009:25000)
> ERROR: AnalysisException: Arithmetic operation requires numeric operands: -1 
> * 'b'
> Could not execute command: select "a",-"b"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8109) Impala cannot read the gzip files bigger than 2 GB

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8109:
--
Priority: Major  (was: Minor)

> Impala cannot read the gzip files bigger than 2 GB
> --
>
> Key: IMPALA-8109
> URL: https://issues.apache.org/jira/browse/IMPALA-8109
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: hakki
>Priority: Major
>
> When querying a partition containing gzip files, the query fails with the 
> error below: 
> WARNINGS: Disk I/O error: Error seeking to -2147483648 in file: 
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz: 
> Error(255): Unknown error 255
> Root cause: EOFException: Cannot seek to negative offset
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz file is 
> a delimited text file and has a size of bigger than 2 GB (approx: 2.4 GB) The 
> uncompressed size is ~13GB
> The impalad version is : 2.12.0-cdh5.15.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-691) Process mem limit does not account for the JVM's memory usage

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-691:
-
Issue Type: Improvement  (was: Bug)

> Process mem limit does not account for the JVM's memory usage
> -
>
> Key: IMPALA-691
> URL: https://issues.apache.org/jira/browse/IMPALA-691
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.2.1, Impala 2.0, Impala 2.1, Impala 2.2, Impala 
> 2.3.0
>Reporter: Skye Wanderman-Milne
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: incompatibility, resource-management
>
> The JVM doesn't appear to use malloc, so it's memory usage is not reported by 
> tcmalloc and we do not count it in the process mem limit. I verified this by 
> adding a large allocation in the FE, and noting that the total memory usage 
> (virtual or resident) reported in /memz is not affected, but the virtual and 
> resident memory usage reported by top is.
> This is problematic especially because Impala caches table metadata in the FE 
> (JVM) which can become quite big (few GBs) in extreme cases.
> *Workaround*
> As a workaround, we recommend reducing the process memory limit by 1-2GB to 
> "reserve" memory for the JVM. How much memory you should reserve typically 
> depends on the size of your catalog ( number of 
> tables/partitions/columns/blocks etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7540) Intern common strings in catalog

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7540:
--
Issue Type: Improvement  (was: Bug)

> Intern common strings in catalog
> 
>
> Key: IMPALA-7540
> URL: https://issues.apache.org/jira/browse/IMPALA-7540
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> Using jxray shows that there are many common duplicate strings in the 
> catalog. For example, each table repeats the database name, and metadata like 
> the HMS parameter maps reuse a lot of common strings like "EXTERNAL" or 
> "transient_lastDdlTime". We should intern these to save memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8152) Aggregate Commands on HBase Table Omit Null Values

2019-02-01 Thread Alan Jackoway (JIRA)
Alan Jackoway created IMPALA-8152:
-

 Summary: Aggregate Commands on HBase Table Omit Null Values
 Key: IMPALA-8152
 URL: https://issues.apache.org/jira/browse/IMPALA-8152
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0
Reporter: Alan Jackoway


We have an HBase-backed impala table, which has a string column (for the 
purpose of this jira, {{sCol}})

There are records where that column is null, which we can observe with queries 
like {{select * from table where sCol is null limit 1}}

However, when we run these commands, we get bad results:
{code:sql}
-- Returns 0
select count(*) from table where sCol is null;
-- Returns only rows for string values (we only have a few options in this 
case), no row for null
select sCol, count(*) from table group by sCol
{code}

These commands work as expected on parquet-backed tables. They also do not work 
in Hive, where I will file a jira shortly.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8152) Aggregate Commands on HBase Table Omit Null Values

2019-02-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758329#comment-16758329
 ] 

Tim Armstrong commented on IMPALA-8152:
---

Probably the same as IMPALA-283

> Aggregate Commands on HBase Table Omit Null Values
> --
>
> Key: IMPALA-8152
> URL: https://issues.apache.org/jira/browse/IMPALA-8152
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Alan Jackoway
>Priority: Major
>
> We have an HBase-backed impala table, which has a string column (for the 
> purpose of this jira, {{sCol}})
> There are records where that column is null, which we can observe with 
> queries like {{select * from table where sCol is null limit 1}}
> However, when we run these commands, we get bad results:
> {code:sql}
> -- Returns 0
> select count(*) from table where sCol is null;
> -- Returns only rows for string values (we only have a few options in this 
> case), no row for null
> select sCol, count(*) from table group by sCol
> {code}
> These commands work as expected on parquet-backed tables. They also do not 
> work in Hive, where I will file a jira shortly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8152) Aggregate Commands on HBase Table Omit Null Values

2019-02-01 Thread Alan Jackoway (JIRA)
Alan Jackoway created IMPALA-8152:
-

 Summary: Aggregate Commands on HBase Table Omit Null Values
 Key: IMPALA-8152
 URL: https://issues.apache.org/jira/browse/IMPALA-8152
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0
Reporter: Alan Jackoway


We have an HBase-backed impala table, which has a string column (for the 
purpose of this jira, {{sCol}})

There are records where that column is null, which we can observe with queries 
like {{select * from table where sCol is null limit 1}}

However, when we run these commands, we get bad results:
{code:sql}
-- Returns 0
select count(*) from table where sCol is null;
-- Returns only rows for string values (we only have a few options in this 
case), no row for null
select sCol, count(*) from table group by sCol
{code}

These commands work as expected on parquet-backed tables. They also do not work 
in Hive, where I will file a jira shortly.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8140) Grouping aggregation with limit breaks asan build

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8140.
-
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Grouping aggregation with limit breaks asan build
> -
>
> Key: IMPALA-8140
> URL: https://issues.apache.org/jira/browse/IMPALA-8140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, crash
> Fix For: Impala 3.2.0
>
>
> Commit 4af3a7853e9 for IMPALA-7333 breaks the following query on ASAN:
> {code:sql}
> select count(*) from tpch_parquet.orders o group by o.o_clerk limit 10;
> {code}
> {noformat}
> ==30219==ERROR: AddressSanitizer: use-after-poison on address 0x631000c4569c 
> at pc 0x020163cc bp 0x7f73a12a5700 sp 0x7f73a12a56f8
> READ of size 1 at 0x631000c4569c thread T276
> #0 0x20163cb in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) 
> const /tmp/be/src/runtime/tuple.h:241:13
> #1 0x280c3d1 in 
> impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
> impala::SlotDescriptor const&, impala::Tuple*, void*) 
> /tmp/be/src/exprs/agg-fn-evaluator.cc:393:29
> #2 0x2777bc8 in 
> impala::AggFnEvaluator::Finalize(std::vector std::allocator > const&, impala::Tuple*, 
> impala::Tuple*) /tmp/be/src/exprs/agg-fn-evaluator.h:307:15
> #3 0x27add96 in 
> impala::GroupingAggregator::CleanupHashTbl(std::vector  std::allocator > const&, 
> impala::HashTable::Iterator) /tmp/be/src/exec/grouping-aggregator.cc:351:7
> #4 0x27ae2b2 in impala::GroupingAggregator::ClosePartitions() 
> /tmp/be/src/exec/grouping-aggregator.cc:930:5
> #5 0x27ae5f4 in impala::GroupingAggregator::Close(impala::RuntimeState*) 
> /tmp/be/src/exec/grouping-aggregator.cc:383:3
> #6 0x27637f7 in impala::AggregationNode::Close(impala::RuntimeState*) 
> /tmp/be/src/exec/aggregation-node.cc:139:32
> #7 0x206b7e9 in impala::FragmentInstanceState::Close() 
> /tmp/be/src/runtime/fragment-instance-state.cc:368:42
> #8 0x2066b1a in impala::FragmentInstanceState::Exec() 
> /tmp/be/src/runtime/fragment-instance-state.cc:99:3
> #9 0x2080e12 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> /tmp/be/src/runtime/query-state.cc:584:24
> #10 0x1d79036 in boost::function0::operator()() const 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #11 0x24bbe06 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) 
> /tmp/be/src/util/thread.cc:359:3
> #12 0x24c72f8 in void boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), boost::_bi::list0&, int) 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:525:9
> #13 0x24c714b in boost::_bi::bind_t std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20:16
> #14 0x3c83949 in thread_proxy 
> (/home/lv/i4/be/build/debug/service/impalad+0x3c83949)
> #15 0x7f768ce73183 in start_thread 
> /build/eglibc-ripdx6/eglibc-2.19/nptl/pthread_create.c:312
> #16 0x7f768c98a03c in clone 
> /build/eglibc-ripdx6/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {noformat}
> The problem seems to be that we call 
> {{output_partition_->aggregated_row_stream->Close()}} in 
> be/src/exec/grouping-aggregator.cc:284 when hitting the limit, and then later 
> the tuple creation in {{CleanupHashTbl()}} in 
> be/src/exec/grouping-aggregator.cc:341 reads from poisoned memory.
> A similar query does not show the crash:
> {code:sql}
> select count(*) from functional_parquet.alltypes a group by a.string_col 
> limit 2;
> {code}
> [~tarmstrong] - Do you have an idea why the query on a much smaller dataset 
> wouldn't crash?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8118) ASAN build failure: query_test/test_scanners.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8118.
-
   Resolution: Duplicate
Fix Version/s: Impala 3.2.0

> ASAN build failure: query_test/test_scanners.py
> ---
>
> Key: IMPALA-8118
> URL: https://issues.apache.org/jira/browse/IMPALA-8118
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0, Impala 3.1.0
>
>
> Build of latest master, with ASAN, failed with the following error, which to 
> my newbie eyes looks like a connection failure:
> {noformat}
> 05:42:04 === FAILURES 
> ===
> 05:42:04  TestQueriesTextTables.test_data_source_tables[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> 05:42:04 [gw5] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:42:04 query_test/test_queries.py:174: in test_data_source_tables
> 05:42:04 self.run_test_case('QueryTest/data-source-tables', vector)
> 05:42:04 common/impala_test_suite.py:472: in run_test_case
> 05:42:04 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> ...
> 05:42:04 handle = self.execute_query_async(query_string, user=user)
> 05:42:04 beeswax/impala_beeswax.py:351: in execute_query_async
> 05:42:04 handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> 05:42:04 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:42:04 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:42:04 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:42:04 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:42:04 EMESSAGE: TSocket read 0 bytes
> 05:42:04 - Captured stderr call 
> -
> ...
> 05:42:04 -- executing against localhost:21000
> 05:42:04 select *
> 05:42:04 from alltypes_datasource
> 05:42:04 where float_col != 0 and
> 05:42:04   int_col >= 1990 limit 5;
> {noformat}
> A similar error appears for multiple other tests. Then:
> {noformat}
> 05:42:04 TTransportException: Could not connect to localhost:21050
> 05:42:04 !!! Interrupted: stopping after 10 failures 
> 
> {noformat}
> I wonder if these are just symptoms of a failure in the BE code due to ASAN 
> being enabled.
> Similar error in the latest build:
> {noformat}
> 05:20:05 === FAILURES 
> ===
> 05:20:05  TestHdfsQueries.test_hdfs_scan_node[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> hbase/none] 
> 05:20:05 [gw4] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:20:05 query_test/test_queries.py:240: in test_hdfs_scan_node
> 05:20:05 self.run_test_case('QueryTest/hdfs-scan-node', vector)
> ...
> 05:20:05 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:20:05 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:20:05 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:20:05 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:20:05 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:20:05 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:20:05 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:20:05 EMESSAGE: TSocket read 0 bytes
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8118) ASAN build failure: query_test/test_scanners.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8118.
-
   Resolution: Duplicate
Fix Version/s: Impala 3.2.0

> ASAN build failure: query_test/test_scanners.py
> ---
>
> Key: IMPALA-8118
> URL: https://issues.apache.org/jira/browse/IMPALA-8118
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0, Impala 3.1.0
>
>
> Build of latest master, with ASAN, failed with the following error, which to 
> my newbie eyes looks like a connection failure:
> {noformat}
> 05:42:04 === FAILURES 
> ===
> 05:42:04  TestQueriesTextTables.test_data_source_tables[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> 05:42:04 [gw5] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:42:04 query_test/test_queries.py:174: in test_data_source_tables
> 05:42:04 self.run_test_case('QueryTest/data-source-tables', vector)
> 05:42:04 common/impala_test_suite.py:472: in run_test_case
> 05:42:04 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> ...
> 05:42:04 handle = self.execute_query_async(query_string, user=user)
> 05:42:04 beeswax/impala_beeswax.py:351: in execute_query_async
> 05:42:04 handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> 05:42:04 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:42:04 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:42:04 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:42:04 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:42:04 EMESSAGE: TSocket read 0 bytes
> 05:42:04 - Captured stderr call 
> -
> ...
> 05:42:04 -- executing against localhost:21000
> 05:42:04 select *
> 05:42:04 from alltypes_datasource
> 05:42:04 where float_col != 0 and
> 05:42:04   int_col >= 1990 limit 5;
> {noformat}
> A similar error appears for multiple other tests. Then:
> {noformat}
> 05:42:04 TTransportException: Could not connect to localhost:21050
> 05:42:04 !!! Interrupted: stopping after 10 failures 
> 
> {noformat}
> I wonder if these are just symptoms of a failure in the BE code due to ASAN 
> being enabled.
> Similar error in the latest build:
> {noformat}
> 05:20:05 === FAILURES 
> ===
> 05:20:05  TestHdfsQueries.test_hdfs_scan_node[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> hbase/none] 
> 05:20:05 [gw4] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:20:05 query_test/test_queries.py:240: in test_hdfs_scan_node
> 05:20:05 self.run_test_case('QueryTest/hdfs-scan-node', vector)
> ...
> 05:20:05 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:20:05 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:20:05 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:20:05 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:20:05 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:20:05 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:20:05 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:20:05 EMESSAGE: TSocket read 0 bytes
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-8118) ASAN build failure: query_test/test_scanners.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8118:

Affects Version/s: Impala 3.2.0

> ASAN build failure: query_test/test_scanners.py
> ---
>
> Key: IMPALA-8118
> URL: https://issues.apache.org/jira/browse/IMPALA-8118
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0, Impala 3.2.0
>
>
> Build of latest master, with ASAN, failed with the following error, which to 
> my newbie eyes looks like a connection failure:
> {noformat}
> 05:42:04 === FAILURES 
> ===
> 05:42:04  TestQueriesTextTables.test_data_source_tables[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> 05:42:04 [gw5] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:42:04 query_test/test_queries.py:174: in test_data_source_tables
> 05:42:04 self.run_test_case('QueryTest/data-source-tables', vector)
> 05:42:04 common/impala_test_suite.py:472: in run_test_case
> 05:42:04 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> ...
> 05:42:04 handle = self.execute_query_async(query_string, user=user)
> 05:42:04 beeswax/impala_beeswax.py:351: in execute_query_async
> 05:42:04 handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> 05:42:04 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:42:04 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:42:04 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:42:04 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:42:04 EMESSAGE: TSocket read 0 bytes
> 05:42:04 - Captured stderr call 
> -
> ...
> 05:42:04 -- executing against localhost:21000
> 05:42:04 select *
> 05:42:04 from alltypes_datasource
> 05:42:04 where float_col != 0 and
> 05:42:04   int_col >= 1990 limit 5;
> {noformat}
> A similar error appears for multiple other tests. Then:
> {noformat}
> 05:42:04 TTransportException: Could not connect to localhost:21050
> 05:42:04 !!! Interrupted: stopping after 10 failures 
> 
> {noformat}
> I wonder if these are just symptoms of a failure in the BE code due to ASAN 
> being enabled.
> Similar error in the latest build:
> {noformat}
> 05:20:05 === FAILURES 
> ===
> 05:20:05  TestHdfsQueries.test_hdfs_scan_node[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> hbase/none] 
> 05:20:05 [gw4] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdh6.x-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 05:20:05 query_test/test_queries.py:240: in test_hdfs_scan_node
> 05:20:05 self.run_test_case('QueryTest/hdfs-scan-node', vector)
> ...
> 05:20:05 exec_result = self.__fetch_results(query_handle, max_rows)
> 05:20:05 beeswax/impala_beeswax.py:456: in __fetch_results
> 05:20:05 results = self.__do_rpc(lambda: self.imp_service.fetch(handle, 
> False, fetch_rows))
> 05:20:05 beeswax/impala_beeswax.py:512: in __do_rpc
> 05:20:05 raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> 05:20:05 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 05:20:05 EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> 05:20:05 EMESSAGE: TSocket read 0 bytes
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8140) Grouping aggregation with limit breaks asan build

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8140.
-
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Grouping aggregation with limit breaks asan build
> -
>
> Key: IMPALA-8140
> URL: https://issues.apache.org/jira/browse/IMPALA-8140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0, Impala 3.2.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: asan, crash
> Fix For: Impala 3.2.0
>
>
> Commit 4af3a7853e9 for IMPALA-7333 breaks the following query on ASAN:
> {code:sql}
> select count(*) from tpch_parquet.orders o group by o.o_clerk limit 10;
> {code}
> {noformat}
> ==30219==ERROR: AddressSanitizer: use-after-poison on address 0x631000c4569c 
> at pc 0x020163cc bp 0x7f73a12a5700 sp 0x7f73a12a56f8
> READ of size 1 at 0x631000c4569c thread T276
> #0 0x20163cb in impala::Tuple::IsNull(impala::NullIndicatorOffset const&) 
> const /tmp/be/src/runtime/tuple.h:241:13
> #1 0x280c3d1 in 
> impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
> impala::SlotDescriptor const&, impala::Tuple*, void*) 
> /tmp/be/src/exprs/agg-fn-evaluator.cc:393:29
> #2 0x2777bc8 in 
> impala::AggFnEvaluator::Finalize(std::vector std::allocator > const&, impala::Tuple*, 
> impala::Tuple*) /tmp/be/src/exprs/agg-fn-evaluator.h:307:15
> #3 0x27add96 in 
> impala::GroupingAggregator::CleanupHashTbl(std::vector  std::allocator > const&, 
> impala::HashTable::Iterator) /tmp/be/src/exec/grouping-aggregator.cc:351:7
> #4 0x27ae2b2 in impala::GroupingAggregator::ClosePartitions() 
> /tmp/be/src/exec/grouping-aggregator.cc:930:5
> #5 0x27ae5f4 in impala::GroupingAggregator::Close(impala::RuntimeState*) 
> /tmp/be/src/exec/grouping-aggregator.cc:383:3
> #6 0x27637f7 in impala::AggregationNode::Close(impala::RuntimeState*) 
> /tmp/be/src/exec/aggregation-node.cc:139:32
> #7 0x206b7e9 in impala::FragmentInstanceState::Close() 
> /tmp/be/src/runtime/fragment-instance-state.cc:368:42
> #8 0x2066b1a in impala::FragmentInstanceState::Exec() 
> /tmp/be/src/runtime/fragment-instance-state.cc:99:3
> #9 0x2080e12 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> /tmp/be/src/runtime/query-state.cc:584:24
> #10 0x1d79036 in boost::function0::operator()() const 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #11 0x24bbe06 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) 
> /tmp/be/src/util/thread.cc:359:3
> #12 0x24c72f8 in void boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void (*&)(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), boost::_bi::list0&, int) 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:525:9
> #13 0x24c714b in boost::_bi::bind_t std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() 
> /opt/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20:16
> #14 0x3c83949 in thread_proxy 
> (/home/lv/i4/be/build/debug/service/impalad+0x3c83949)
> #15 0x7f768ce73183 in start_thread 
> /build/eglibc-ripdx6/eglibc-2.19/nptl/pthread_create.c:312
> #16 0x7f768c98a03c in clone 
> /build/eglibc-ripdx6/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {noformat}
> The problem seems to be that we call 
> {{output_partition_->aggregated_row_stream->Close()}} in 
> be/src/exec/grouping-aggregator.cc:284 when hitting the limit, and then later 
> the tuple creation in {{CleanupHashTbl()}} in 
> be/src/exec/grouping-aggregator.cc:341 reads from poisoned memory.
> A similar query does not show the crash:
> {code:sql}
> select count(*) from functional_parquet.alltypes a group by a.string_col 
> limit 2;
> {code}
> [~tarmstrong] - Do you have an idea why the query on a much smaller dataset 
> wouldn't crash?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8129) Build failure: query_test/test_observability.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8129.
-
   Resolution: Fixed
Fix Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> Build failure: query_test/test_observability.py
> ---
>
> Key: IMPALA-8129
> URL: https://issues.apache.org/jira/browse/IMPALA-8129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> {{query_test/test_observability.py}} failed in the multiple builds:
> Erasure-coding build:
> {noformat}
> 18:49:01 === FAILURES 
> ===
> 18:49:01 ___ TestObservability.test_global_exchange_counters 
> 
> 18:49:01 [gw0] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/../infra/python/env/bin/python
> 18:49:01 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 18:49:01 assert "ExchangeScanRatio: 3.19" in profile
> 18:49:01 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=704d1f6b09400fba:b91dc70):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG build...  - OptimizationTime: 32.000ms\n
>- PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 
> 26.000ms\n'
> {noformat}
> Core build:
> {noformat}
> 07:36:43 FAIL 
> query_test/test_observability.py::TestObservability::()::test_global_exchange_counters
> 07:36:43 === FAILURES 
> ===
> 07:36:43 ___ TestObservability.test_global_exchange_counters 
> 
> 07:36:43 [gw2] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/../infra/python/env/bin/python
> 07:36:43 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 07:36:43 assert "ExchangeScanRatio: 3.19" in profile
> 07:36:43 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=b546ddcfab65e431:471aa218):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...  - OptimizationTime: 32.000ms\n 
>   - PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 32.000ms\n'
> {noformat}
> Assigning to Lars since it may be related to the patch for IMPALA-7731: Add 
> Read/Exchange counters to profile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8129) Build failure: query_test/test_observability.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-8129.
-
   Resolution: Fixed
Fix Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> Build failure: query_test/test_observability.py
> ---
>
> Key: IMPALA-8129
> URL: https://issues.apache.org/jira/browse/IMPALA-8129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> {{query_test/test_observability.py}} failed in the multiple builds:
> Erasure-coding build:
> {noformat}
> 18:49:01 === FAILURES 
> ===
> 18:49:01 ___ TestObservability.test_global_exchange_counters 
> 
> 18:49:01 [gw0] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/../infra/python/env/bin/python
> 18:49:01 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 18:49:01 assert "ExchangeScanRatio: 3.19" in profile
> 18:49:01 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=704d1f6b09400fba:b91dc70):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG build...  - OptimizationTime: 32.000ms\n
>- PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 
> 26.000ms\n'
> {noformat}
> Core build:
> {noformat}
> 07:36:43 FAIL 
> query_test/test_observability.py::TestObservability::()::test_global_exchange_counters
> 07:36:43 === FAILURES 
> ===
> 07:36:43 ___ TestObservability.test_global_exchange_counters 
> 
> 07:36:43 [gw2] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/../infra/python/env/bin/python
> 07:36:43 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 07:36:43 assert "ExchangeScanRatio: 3.19" in profile
> 07:36:43 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=b546ddcfab65e431:471aa218):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...  - OptimizationTime: 32.000ms\n 
>   - PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 32.000ms\n'
> {noformat}
> Assigning to Lars since it may be related to the patch for IMPALA-7731: Add 
> Read/Exchange counters to profile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8129) Build failure: query_test/test_observability.py

2019-02-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8129:

Affects Version/s: (was: Impala 3.1.0)
   Impala 3.2.0

> Build failure: query_test/test_observability.py
> ---
>
> Key: IMPALA-8129
> URL: https://issues.apache.org/jira/browse/IMPALA-8129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Paul Rogers
>Assignee: Lars Volker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> {{query_test/test_observability.py}} failed in the multiple builds:
> Erasure-coding build:
> {noformat}
> 18:49:01 === FAILURES 
> ===
> 18:49:01 ___ TestObservability.test_global_exchange_counters 
> 
> 18:49:01 [gw0] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/../infra/python/env/bin/python
> 18:49:01 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 18:49:01 assert "ExchangeScanRatio: 3.19" in profile
> 18:49:01 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=704d1f6b09400fba:b91dc70):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG build...  - OptimizationTime: 32.000ms\n
>- PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 
> 26.000ms\n'
> {noformat}
> Core build:
> {noformat}
> 07:36:43 FAIL 
> query_test/test_observability.py::TestObservability::()::test_global_exchange_counters
> 07:36:43 === FAILURES 
> ===
> 07:36:43 ___ TestObservability.test_global_exchange_counters 
> 
> 07:36:43 [gw2] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/../infra/python/env/bin/python
> 07:36:43 query_test/test_observability.py:400: in 
> test_global_exchange_counters
> 07:36:43 assert "ExchangeScanRatio: 3.19" in profile
> 07:36:43 E   assert 'ExchangeScanRatio: 3.19' in 'Query 
> (id=b546ddcfab65e431:471aa218):\n  DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...  - OptimizationTime: 32.000ms\n 
>   - PeakMemoryUsage: 220.00 KB (225280)\n   - PrepareTime: 32.000ms\n'
> {noformat}
> Assigning to Lars since it may be related to the patch for IMPALA-7731: Add 
> Read/Exchange counters to profile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8103) Plan hints show up as "--" comments in analysed query

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8103:
--
Target Version: Product Backlog
  Priority: Minor  (was: Major)

> Plan hints show up as "--" comments in analysed query
> -
>
> Key: IMPALA-8103
> URL: https://issues.apache.org/jira/browse/IMPALA-8103
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Andrew Sherman
>Priority: Minor
>
> I noticed that the hints added in IMPALA-5821 show up in the -- style rather 
> than /**/
> {code}
> Sql Statement: select * from tpch.lineitem join /*+ broadcast */ 
> tpch.part on l_partkey = p_partkey limit 
> ...
> Analyzed query: SELECT * FROM tpch.lineitem INNER JOIN
> -- +broadcast
> tpch.part ON l_partkey = p_partkey LIMIT CAST(5 AS TINYINT)
> {code}
> I guess this works and maybe its fine, but I was really confused when I saw 
> it. It looks like getPlanHintsSql() uses this to generate views in such a way 
> that Hive will ignore the hints, but that concern doesn't seem relevant to 
> this use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7641) Memory Limit Exceeded

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7641.
---
Resolution: Cannot Reproduce

> Memory Limit Exceeded
> -
>
> Key: IMPALA-7641
> URL: https://issues.apache.org/jira/browse/IMPALA-7641
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.4
>Reporter: Ahshan
>Priority: Minor
>  Labels: resource-management
> Attachments: profile(8).txt
>
>
> We are using CDH distribution with impala version -impalad version 
> 2.6.0-cdh5.8.2 RELEASE 
>  
> As per my understanding memory requirement is of 288 MB and we have an total 
> of 18 Impala Daemons which sum upto 5184MB of total memory consumption 
> considering the above details, it should not lead to an memory issue when the 
> MEM_LIMIT is set to 20GB
> Hence , Could you please let us know the cause of memory limit exceeding 
> select * from emp_sales where job_id = 55451 and uploaded_month = 201808 
> limit 1
>  +---+
> |Explain String|
> +---+
> |Estimated Per-Host Requirements: Memory=288.00MB VCores=1|
> | |
> |01:EXCHANGE [UNPARTITIONED]|
> | |limit: 1|
> | | |
> |00:SCAN HDFS [fenet5.hmig_os_changes_details_malicious]|
> |partitions=1/25 files=3118 size=110.01GB|
> |predicates: job_id = 55451|
> |limit: 1|
> +---+
> WARNINGS: 
>  Memory limit exceeded
>  HdfsParquetScanner::ReadDataPage() failed to allocate 269074889 bytes for 
> dictionary.
> Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.23 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.63 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.27 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.39 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 16.09 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.74 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.74 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.74 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 15.20 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 14.61 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.11 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.47 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.47 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.47 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
> 

[jira] [Resolved] (IMPALA-7641) Memory Limit Exceeded

2019-02-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7641.
---
Resolution: Cannot Reproduce

> Memory Limit Exceeded
> -
>
> Key: IMPALA-7641
> URL: https://issues.apache.org/jira/browse/IMPALA-7641
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.4
>Reporter: Ahshan
>Priority: Minor
>  Labels: resource-management
> Attachments: profile(8).txt
>
>
> We are using CDH distribution with impala version -impalad version 
> 2.6.0-cdh5.8.2 RELEASE 
>  
> As per my understanding memory requirement is of 288 MB and we have an total 
> of 18 Impala Daemons which sum upto 5184MB of total memory consumption 
> considering the above details, it should not lead to an memory issue when the 
> MEM_LIMIT is set to 20GB
> Hence , Could you please let us know the cause of memory limit exceeding 
> select * from emp_sales where job_id = 55451 and uploaded_month = 201808 
> limit 1
>  +---+
> |Explain String|
> +---+
> |Estimated Per-Host Requirements: Memory=288.00MB VCores=1|
> | |
> |01:EXCHANGE [UNPARTITIONED]|
> | |limit: 1|
> | | |
> |00:SCAN HDFS [fenet5.hmig_os_changes_details_malicious]|
> |partitions=1/25 files=3118 size=110.01GB|
> |predicates: job_id = 55451|
> |limit: 1|
> +---+
> WARNINGS: 
>  Memory limit exceeded
>  HdfsParquetScanner::ReadDataPage() failed to allocate 269074889 bytes for 
> dictionary.
> Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.23 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.63 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.27 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.39 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 16.09 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.74 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.74 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.74 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 15.20 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 14.61 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.11 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.47 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.47 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.47 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
> 

[jira] [Created] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-01 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8151:
-

 Summary: HiveUdfCall assumes StringValue is 16 bytes
 Key: IMPALA-8151
 URL: https://issues.apache.org/jira/browse/IMPALA-8151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.2.0
Reporter: Tim Armstrong
Assignee: Pooja Nilangekar


HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
{code}
  switch (GetChild(i)->type().type) {
case TYPE_BOOLEAN:
case TYPE_TINYINT:
  // Using explicit sizes helps the compiler unroll memcpy
  memcpy(input_ptr, v, 1);
  break;
case TYPE_SMALLINT:
  memcpy(input_ptr, v, 2);
  break;
case TYPE_INT:
case TYPE_FLOAT:
  memcpy(input_ptr, v, 4);
  break;
case TYPE_BIGINT:
case TYPE_DOUBLE:
  memcpy(input_ptr, v, 8);
  break;
case TYPE_TIMESTAMP:
case TYPE_STRING:
case TYPE_VARCHAR:
  memcpy(input_ptr, v, 16);
  break;
default:
  DCHECK(false) << "NYI";
  }
{code}

STRING and VARCHAR were only 16 bytes because of padding. This padding is 
removed by IMPALA-7367, so this will read past the end of the actual value. 
This could in theory lead to a crash.

We need to change the value, but we should probably also switch to 
sizeof(StringValue) so that it doesn't get broken by similar changes in future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-01 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8151:
-

 Summary: HiveUdfCall assumes StringValue is 16 bytes
 Key: IMPALA-8151
 URL: https://issues.apache.org/jira/browse/IMPALA-8151
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.2.0
Reporter: Tim Armstrong
Assignee: Pooja Nilangekar


HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
{code}
  switch (GetChild(i)->type().type) {
case TYPE_BOOLEAN:
case TYPE_TINYINT:
  // Using explicit sizes helps the compiler unroll memcpy
  memcpy(input_ptr, v, 1);
  break;
case TYPE_SMALLINT:
  memcpy(input_ptr, v, 2);
  break;
case TYPE_INT:
case TYPE_FLOAT:
  memcpy(input_ptr, v, 4);
  break;
case TYPE_BIGINT:
case TYPE_DOUBLE:
  memcpy(input_ptr, v, 8);
  break;
case TYPE_TIMESTAMP:
case TYPE_STRING:
case TYPE_VARCHAR:
  memcpy(input_ptr, v, 16);
  break;
default:
  DCHECK(false) << "NYI";
  }
{code}

STRING and VARCHAR were only 16 bytes because of padding. This padding is 
removed by IMPALA-7367, so this will read past the end of the actual value. 
This could in theory lead to a crash.

We need to change the value, but we should probably also switch to 
sizeof(StringValue) so that it doesn't get broken by similar changes in future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)