[jira] [Commented] (IMPALA-8271) Refactor the use of Thrift enums in query-options.cc

2019-03-04 Thread Fredy Wijaya (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784103#comment-16784103
 ] 

Fredy Wijaya commented on IMPALA-8271:
--

[~arodoni_cloudera] nope.

> Refactor the use of Thrift enums in query-options.cc
> 
>
> Key: IMPALA-8271
> URL: https://issues.apache.org/jira/browse/IMPALA-8271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Fredy Wijaya
>Priority: Minor
>  Labels: ramp-up
>
> Currently the logic for handling Thrift enums in query-options is very 
> error-prone. A change in Thrift enums require updating query-options.cc. For 
> example: 
> https://github.com/apache/impala/blob/master/be/src/service/query-options.cc#L276-L288.
>  This CR: https://gerrit.cloudera.org/c/12635/ is an attempt to fix such 
> issue for compression_codec.
> This ticket aims to update the use of Thrift enums with the style similar to: 
> https://gerrit.cloudera.org/c/12635/ to make it less error-prone. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-6326) segfault during impyla HiveServer2Cursor.cancel_operation() over SSL

2019-03-04 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784082#comment-16784082
 ] 

Tim Armstrong edited comment on IMPALA-6326 at 3/5/19 5:06 AM:
---

I have a strong suspicion that the root cause of at least some of the issues is 
the way run_query forks off a thread in _hash_result(), because that could end 
up with two threads accessing the same underlying thrift connection.

I might try to inject some failures there to see if the symptoms reproduce more 
frequently.


was (Author: tarmstrong):
I have a strong suspicious that the root cause of at least some of the issues 
is the way run_query forks off a thread in _hash_result(), because that could 
end up with two threads accessing the same underlying thrift connection.

I might try to inject some failures there to see if the symptoms reproduce more 
frequently.

> segfault during impyla HiveServer2Cursor.cancel_operation() over SSL
> 
>
> Key: IMPALA-6326
> URL: https://issues.apache.org/jira/browse/IMPALA-6326
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.10.0, Impala 2.11.0
>Reporter: Matthew Mulder
>Priority: Major
> Attachments: test_fork_crash.py
>
>
> During a stress test on a secure cluster one of the clients crashed in 
> HiveServer2Cursor.cancel_operation().
> The stress test debug log shows{code}2017-12-13 16:50:52,624 21607 Query 
> Consumer DEBUG:concurrent_select[579]:Requesting memory reservation
> 2017-12-13 16:50:52,624 21607 Query Consumer 
> DEBUG:concurrent_select[245]:Reserved 102 MB; 1455 MB available; 95180 MB 
> overcommitted
> 2017-12-13 16:50:52,625 21607 Query Consumer 
> DEBUG:concurrent_select[581]:Received memory reservation
> 2017-12-13 16:50:52,658 21607 Query Consumer 
> DEBUG:concurrent_select[865]:Using tpcds_300_decimal_parquet database
> 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> USE tpcds_300_decimal_parquet
> 2017-12-13 16:50:52,825 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> SET ABORT_ON_ERROR=1
> 2017-12-13 16:50:53,060 21607 Query Consumer 
> DEBUG:concurrent_select[877]:Setting mem limit to 102 MB
> 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> SET MEM_LIMIT=102M
> 2017-12-13 16:50:53,370 21607 Query Consumer 
> DEBUG:concurrent_select[881]:Running query with 102 MB mem limit at 
> vc0704.test with timeout secs 52:
> select
>   dt.d_year,
>   item.i_category_id,
>   item.i_category,
>   sum(ss_ext_sales_price)
> from
>   date_dim dt,
>   store_sales,
>   item
> where
>   dt.d_date_sk = store_sales.ss_sold_date_sk
>   and store_sales.ss_item_sk = item.i_item_sk
>   and item.i_manager_id = 1
>   and dt.d_moy = 11
>   and dt.d_year = 2000
> group by
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> order by
>   sum(ss_ext_sales_price) desc,
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> limit 100;
> 2017-12-13 16:51:08,491 21607 Query Consumer 
> DEBUG:concurrent_select[889]:Query id is b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:15,337 21607 Query Consumer 
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:22,316 21607 Query Consumer 
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:27,266 21607 Fetch Results b6425b84aa45f633:9ce7cad9 
> DEBUG:concurrent_select[1009]:Fetching result for query with id 
> b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:44,625 21607 Query Consumer 
> DEBUG:concurrent_select[940]:Attempting cancellation of query with id 
> b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:44,627 21607 Query Consumer INFO:hiveserver2[259]:Canceling 
> active operation{code}The impalad log shows{code}I1213 16:50:54.287511 136399 
> admission-controller.cc:510] Schedule for 
> id=b6425b84aa45f633:9ce7cad9 in pool_name=root.systest 
> cluster_mem_needed=816.00 MB PoolConfig: max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I1213 16:50:54.289767 136399 admission-controller.cc:515] Stats: 
> agg_num_running=184, agg_num_queued=0, agg_mem_reserved=1529.63 GB,  
> local_host(local_mem_admitted=132.02 GB, num_admitted_running=21, 
> num_queued=0, backend_mem_reserved=194.58 GB)
> I1213 16:50:54.291550 136399 admission-controller.cc:531] Admitted query 
> id=b6425b84aa45f633:9ce7cad9
> I1213 16:50:54.296922 136399 coordinator.cc:99] Exec() 
> query_id=b6425b84aa45f633:9ce7cad9 stmt=/* Mem: 102 MB. Coordinator: 
> vc0704.test. */
> select
>   dt.d_year,
>   item.i_category_id,
>   item.i_category,
>   sum(ss_ext_sales_price)
> from
>   date_dim dt,
>   store_sales,
>   item
> where
>   dt.d_date_sk = store_sales.ss_sold_date_sk
>   and 

[jira] [Assigned] (IMPALA-6326) segfault during impyla HiveServer2Cursor.cancel_operation() over SSL

2019-03-04 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6326:
-

Assignee: Tim Armstrong

> segfault during impyla HiveServer2Cursor.cancel_operation() over SSL
> 
>
> Key: IMPALA-6326
> URL: https://issues.apache.org/jira/browse/IMPALA-6326
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.10.0, Impala 2.11.0
>Reporter: Matthew Mulder
>Assignee: Tim Armstrong
>Priority: Major
> Attachments: test_fork_crash.py
>
>
> During a stress test on a secure cluster one of the clients crashed in 
> HiveServer2Cursor.cancel_operation().
> The stress test debug log shows{code}2017-12-13 16:50:52,624 21607 Query 
> Consumer DEBUG:concurrent_select[579]:Requesting memory reservation
> 2017-12-13 16:50:52,624 21607 Query Consumer 
> DEBUG:concurrent_select[245]:Reserved 102 MB; 1455 MB available; 95180 MB 
> overcommitted
> 2017-12-13 16:50:52,625 21607 Query Consumer 
> DEBUG:concurrent_select[581]:Received memory reservation
> 2017-12-13 16:50:52,658 21607 Query Consumer 
> DEBUG:concurrent_select[865]:Using tpcds_300_decimal_parquet database
> 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> USE tpcds_300_decimal_parquet
> 2017-12-13 16:50:52,825 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> SET ABORT_ON_ERROR=1
> 2017-12-13 16:50:53,060 21607 Query Consumer 
> DEBUG:concurrent_select[877]:Setting mem limit to 102 MB
> 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> SET MEM_LIMIT=102M
> 2017-12-13 16:50:53,370 21607 Query Consumer 
> DEBUG:concurrent_select[881]:Running query with 102 MB mem limit at 
> vc0704.test with timeout secs 52:
> select
>   dt.d_year,
>   item.i_category_id,
>   item.i_category,
>   sum(ss_ext_sales_price)
> from
>   date_dim dt,
>   store_sales,
>   item
> where
>   dt.d_date_sk = store_sales.ss_sold_date_sk
>   and store_sales.ss_item_sk = item.i_item_sk
>   and item.i_manager_id = 1
>   and dt.d_moy = 11
>   and dt.d_year = 2000
> group by
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> order by
>   sum(ss_ext_sales_price) desc,
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> limit 100;
> 2017-12-13 16:51:08,491 21607 Query Consumer 
> DEBUG:concurrent_select[889]:Query id is b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:15,337 21607 Query Consumer 
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:22,316 21607 Query Consumer 
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:27,266 21607 Fetch Results b6425b84aa45f633:9ce7cad9 
> DEBUG:concurrent_select[1009]:Fetching result for query with id 
> b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:44,625 21607 Query Consumer 
> DEBUG:concurrent_select[940]:Attempting cancellation of query with id 
> b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:44,627 21607 Query Consumer INFO:hiveserver2[259]:Canceling 
> active operation{code}The impalad log shows{code}I1213 16:50:54.287511 136399 
> admission-controller.cc:510] Schedule for 
> id=b6425b84aa45f633:9ce7cad9 in pool_name=root.systest 
> cluster_mem_needed=816.00 MB PoolConfig: max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I1213 16:50:54.289767 136399 admission-controller.cc:515] Stats: 
> agg_num_running=184, agg_num_queued=0, agg_mem_reserved=1529.63 GB,  
> local_host(local_mem_admitted=132.02 GB, num_admitted_running=21, 
> num_queued=0, backend_mem_reserved=194.58 GB)
> I1213 16:50:54.291550 136399 admission-controller.cc:531] Admitted query 
> id=b6425b84aa45f633:9ce7cad9
> I1213 16:50:54.296922 136399 coordinator.cc:99] Exec() 
> query_id=b6425b84aa45f633:9ce7cad9 stmt=/* Mem: 102 MB. Coordinator: 
> vc0704.test. */
> select
>   dt.d_year,
>   item.i_category_id,
>   item.i_category,
>   sum(ss_ext_sales_price)
> from
>   date_dim dt,
>   store_sales,
>   item
> where
>   dt.d_date_sk = store_sales.ss_sold_date_sk
>   and store_sales.ss_item_sk = item.i_item_sk
>   and item.i_manager_id = 1
>   and dt.d_moy = 11
>   and dt.d_year = 2000
> group by
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> order by
>   sum(ss_ext_sales_price) desc,
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> limit 100;
> I1213 16:50:59.263310 136399 query-state.cc:151] Using query memory limit 
> from query options: 102.00 MB
> I1213 16:50:59.267033 136399 mem-tracker.cc:189] Using query memory limit: 
> 102.00 MB
> I1213 16:50:59.272271 136399 coordinator.cc:357] starting execution on 8 
> backends for query b6425b84aa45f633:9ce7cad9
> I1213 16:51:07.525143 136399 coordinator.cc:370] started execution on 8 
> 

[jira] [Commented] (IMPALA-6326) segfault during impyla HiveServer2Cursor.cancel_operation() over SSL

2019-03-04 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784082#comment-16784082
 ] 

Tim Armstrong commented on IMPALA-6326:
---

I have a strong suspicious that the root cause of at least some of the issues 
is the way run_query forks off a thread in _hash_result(), because that could 
end up with two threads accessing the same underlying thrift connection.

I might try to inject some failures there to see if the symptoms reproduce more 
frequently.

> segfault during impyla HiveServer2Cursor.cancel_operation() over SSL
> 
>
> Key: IMPALA-6326
> URL: https://issues.apache.org/jira/browse/IMPALA-6326
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.10.0, Impala 2.11.0
>Reporter: Matthew Mulder
>Priority: Major
> Attachments: test_fork_crash.py
>
>
> During a stress test on a secure cluster one of the clients crashed in 
> HiveServer2Cursor.cancel_operation().
> The stress test debug log shows{code}2017-12-13 16:50:52,624 21607 Query 
> Consumer DEBUG:concurrent_select[579]:Requesting memory reservation
> 2017-12-13 16:50:52,624 21607 Query Consumer 
> DEBUG:concurrent_select[245]:Reserved 102 MB; 1455 MB available; 95180 MB 
> overcommitted
> 2017-12-13 16:50:52,625 21607 Query Consumer 
> DEBUG:concurrent_select[581]:Received memory reservation
> 2017-12-13 16:50:52,658 21607 Query Consumer 
> DEBUG:concurrent_select[865]:Using tpcds_300_decimal_parquet database
> 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> USE tpcds_300_decimal_parquet
> 2017-12-13 16:50:52,825 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> SET ABORT_ON_ERROR=1
> 2017-12-13 16:50:53,060 21607 Query Consumer 
> DEBUG:concurrent_select[877]:Setting mem limit to 102 MB
> 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: 
> SET MEM_LIMIT=102M
> 2017-12-13 16:50:53,370 21607 Query Consumer 
> DEBUG:concurrent_select[881]:Running query with 102 MB mem limit at 
> vc0704.test with timeout secs 52:
> select
>   dt.d_year,
>   item.i_category_id,
>   item.i_category,
>   sum(ss_ext_sales_price)
> from
>   date_dim dt,
>   store_sales,
>   item
> where
>   dt.d_date_sk = store_sales.ss_sold_date_sk
>   and store_sales.ss_item_sk = item.i_item_sk
>   and item.i_manager_id = 1
>   and dt.d_moy = 11
>   and dt.d_year = 2000
> group by
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> order by
>   sum(ss_ext_sales_price) desc,
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> limit 100;
> 2017-12-13 16:51:08,491 21607 Query Consumer 
> DEBUG:concurrent_select[889]:Query id is b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:15,337 21607 Query Consumer 
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:22,316 21607 Query Consumer 
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:27,266 21607 Fetch Results b6425b84aa45f633:9ce7cad9 
> DEBUG:concurrent_select[1009]:Fetching result for query with id 
> b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:44,625 21607 Query Consumer 
> DEBUG:concurrent_select[940]:Attempting cancellation of query with id 
> b6425b84aa45f633:9ce7cad9
> 2017-12-13 16:51:44,627 21607 Query Consumer INFO:hiveserver2[259]:Canceling 
> active operation{code}The impalad log shows{code}I1213 16:50:54.287511 136399 
> admission-controller.cc:510] Schedule for 
> id=b6425b84aa45f633:9ce7cad9 in pool_name=root.systest 
> cluster_mem_needed=816.00 MB PoolConfig: max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I1213 16:50:54.289767 136399 admission-controller.cc:515] Stats: 
> agg_num_running=184, agg_num_queued=0, agg_mem_reserved=1529.63 GB,  
> local_host(local_mem_admitted=132.02 GB, num_admitted_running=21, 
> num_queued=0, backend_mem_reserved=194.58 GB)
> I1213 16:50:54.291550 136399 admission-controller.cc:531] Admitted query 
> id=b6425b84aa45f633:9ce7cad9
> I1213 16:50:54.296922 136399 coordinator.cc:99] Exec() 
> query_id=b6425b84aa45f633:9ce7cad9 stmt=/* Mem: 102 MB. Coordinator: 
> vc0704.test. */
> select
>   dt.d_year,
>   item.i_category_id,
>   item.i_category,
>   sum(ss_ext_sales_price)
> from
>   date_dim dt,
>   store_sales,
>   item
> where
>   dt.d_date_sk = store_sales.ss_sold_date_sk
>   and store_sales.ss_item_sk = item.i_item_sk
>   and item.i_manager_id = 1
>   and dt.d_moy = 11
>   and dt.d_year = 2000
> group by
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> order by
>   sum(ss_ext_sales_price) desc,
>   dt.d_year,
>   item.i_category_id,
>   item.i_category
> limit 100;
> I1213 16:50:59.263310 136399 query-state.cc:151] Using query memory limit 
> from query options: 102.00 MB
> 

[jira] [Commented] (IMPALA-8256) ImpalaServicePool::RejectTooBusy() should print more meaningful message

2019-03-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784051#comment-16784051
 ] 

ASF subversion and git services commented on IMPALA-8256:
-

Commit 63d45d59bae3fb37571088c0a2418a9df7630c51 in impala's branch 
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=63d45d5 ]

IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

An incoming request to a RPC service can be rejected due to either
exceeding the memory limit or maximum allowed queue length.
It's unclear from the current error message which of those factors
contributes to the failure as neither the actual queue length nor
the memory consumption is printed.

This patch fixes the problem by printing the estimated queue length
and memory consumption when a RPC request is dropped.

Testing done: verified the new error message with test_rpc_timeout.py

Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Reviewed-on: http://gerrit.cloudera.org:8080/12624
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


>  ImpalaServicePool::RejectTooBusy() should print more meaningful message
> 
>
> Key: IMPALA-8256
> URL: https://issues.apache.org/jira/browse/IMPALA-8256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> An RPC request to a service can be rejected either due to exceeding the 
> memory limit or maximum allowed queue length. It's unclear from the current 
> error message which of those factors contribute to the failure as neither the 
> actual queue length nor the memory consumption is printed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6326) segfault during impyla HiveServer2Cursor.cancel_operation() over SSL

2019-03-04 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784046#comment-16784046
 ] 

Tim Armstrong commented on IMPALA-6326:
---

{noformat}
18:01:56 2019-03-04 18:01:56,425 14376 Fetch Results 
164d2c564b750b6c:2cd8d9e2 ERROR:hiveserver2[943]:Failed to open 
transport (tries_left=3)
18:01:56 Traceback (most recent call last):
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py",
 line 940, in _execute
18:01:56 return func(request)
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py",
 line 505, in GetOperationStatus
18:01:56 return self.recv_GetOperationStatus()
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py",
 line 516, in recv_GetOperationStatus
18:01:56 (fname, mtype, rseqid) = self._iprot.readMessageBegin()
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/toolchain/thrift-0.9.3-p5/python/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 126, in readMessageBegin
18:01:56 sz = self.readI32()
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/toolchain/thrift-0.9.3-p5/python/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 206, in readI32
18:01:56 buff = self.trans.readAll(4)
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/toolchain/thrift-0.9.3-p5/python/lib/python2.7/site-packages/thrift/transport/TTransport.py",
 line 58, in readAll
18:01:56 chunk = self.read(sz - have)
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/thrift_sasl/__init__.py",
 line 159, in read
18:01:56 self._read_frame()
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/thrift_sasl/__init__.py",
 line 163, in _read_frame
18:01:56 header = read_all_compat(self._trans, 4)
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/thrift_sasl/six.py",
 line 31, in 
18:01:56 read_all_compat = lambda trans, sz: trans.readAll(sz)
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/toolchain/thrift-0.9.3-p5/python/lib/python2.7/site-packages/thrift/transport/TTransport.py",
 line 58, in readAll
18:01:56 chunk = self.read(sz - have)
18:01:56   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/toolchain/thrift-0.9.3-p5/python/lib/python2.7/site-packages/thrift/transport/TSocket.py",
 line 105, in read
18:01:56 buff = self.handle.recv(sz)
18:01:56   File "/usr/lib/python2.7/ssl.py", line 341, in recv
18:01:56 return self.read(buflen)
18:01:56   File "/usr/lib/python2.7/ssl.py", line 260, in read
18:01:56 return self._sslobj.read(len)
18:01:56 SSLError: [Errno 1] _ssl.c:1429: error:1408F081:SSL 
routines:SSL3_GET_RECORD:block cipher pad is wrong
18:01:56 Process Process-36:
18:01:56 Traceback (most recent call last):
18:01:56   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in 
_bootstrap
18:01:56 self.run()
18:01:56   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in 
run
18:01:56 self._target(*self._args, **self._kwargs)
18:01:56   File "tests/stress/concurrent_select.py", line 841, in 
_start_single_runner
18:01:56 mesg=error_msg))
18:01:56 Exception: Query tpcds_300_decimal_parquet_q51a ID None failed: Bad 
version in readMessageBegin: -614891738
18:01:56 Query runner (14376) exited with exit code 1
{noformat}
{noformat}
18:00:15 2019-03-04 18:00:15,073 14414 Fetch Results 
e743fed952c6e11a:6c88df9c ERROR:hiveserver2[943]:Failed to open 
transport (tries_left=3)
18:00:15 Traceback (most recent call last):
18:00:15   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py",
 line 940, in _execute
18:00:15 return func(request)
18:00:15   File 
"/data0/jenkins/workspace/impala-cdh6.x-test-stress-secure-manual/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py",
 line 505, in GetOperationStatus
18:00:15 return self.recv_GetOperationStatus()
18:00:15   File 

[jira] [Created] (IMPALA-8281) Implement SHOW GRANT GROUP

2019-03-04 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-8281:


 Summary: Implement SHOW GRANT GROUP 
 Key: IMPALA-8281
 URL: https://issues.apache.org/jira/browse/IMPALA-8281
 Project: IMPALA
  Issue Type: Sub-task
  Components: Catalog, Frontend
Reporter: Fredy Wijaya


Syntax:
{noformat}
SHOW GRANT GROUP  [ON ]
{noformat}

The command is to show list of privileges for a given group with an optional ON 
clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8281) Implement SHOW GRANT GROUP

2019-03-04 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-8281:


 Summary: Implement SHOW GRANT GROUP 
 Key: IMPALA-8281
 URL: https://issues.apache.org/jira/browse/IMPALA-8281
 Project: IMPALA
  Issue Type: Sub-task
  Components: Catalog, Frontend
Reporter: Fredy Wijaya


Syntax:
{noformat}
SHOW GRANT GROUP  [ON ]
{noformat}

The command is to show list of privileges for a given group with an optional ON 
clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8280) Implement SHOW GRANT USER

2019-03-04 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-8280:


 Summary: Implement SHOW GRANT USER 
 Key: IMPALA-8280
 URL: https://issues.apache.org/jira/browse/IMPALA-8280
 Project: IMPALA
  Issue Type: Sub-task
  Components: Catalog, Frontend
Reporter: Fredy Wijaya


Syntax:
{noformat}
SHOW GRANT USER  [ON ]
{noformat}

The command is to show list of privileges for a given user with an optional ON 
clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8280) Implement SHOW GRANT USER

2019-03-04 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-8280:


 Summary: Implement SHOW GRANT USER 
 Key: IMPALA-8280
 URL: https://issues.apache.org/jira/browse/IMPALA-8280
 Project: IMPALA
  Issue Type: Sub-task
  Components: Catalog, Frontend
Reporter: Fredy Wijaya


Syntax:
{noformat}
SHOW GRANT USER  [ON ]
{noformat}

The command is to show list of privileges for a given user with an optional ON 
clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8278) Fix MetastoreEventsProcessorTest flakiness

2019-03-04 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8278:

Summary: Fix MetastoreEventsProcessorTest flakiness  (was: Fix 
testEventProcessorFetchAfterHMSRestart)

> Fix MetastoreEventsProcessorTest flakiness
> --
>
> Key: IMPALA-8278
> URL: https://issues.apache.org/jira/browse/IMPALA-8278
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> {{testEventProcessorFetchAfterHMSRestart}} test case in 
> {{MetastoreEventsProcessorTest}} causes flakiness because it creates a new 
> event processor pointing to the same the catalog instance. This means that 
> all the events generated are now being processed by two events processor 
> instances and they both try to modify the state of catalogd causing race 
> conditions. The failures vary and depend a lot on the timing. I see the 
> following exception which is related to this issue.
> Easiest way to figure out if this is a problem is to look into FeSupport logs 
> of the test to confirm if a event ID is being processed twice (i.e you see to 
> exactly similar logs for a given event id).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8274) Missing update to index into profiles vector in Coordinator::BackendState::ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8274.

   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Missing update to index into profiles vector in 
> Coordinator::BackendState::ApplyExecStatusReport()
> --
>
> Key: IMPALA-8274
> URL: https://issues.apache.org/jira/browse/IMPALA-8274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.2.0
>
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update 
> of a fragment instance. As a result, we may end up passing the wrong profile 
> to {{instance_stats->Update()}}. This may lead to random crashes in 
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
>   int idx = 0;
>   const bool has_profile = thrift_profiles.profile_trees.size() > 0;
>   TRuntimeProfileTree empty_profile;
>   for (const FragmentInstanceExecStatusPB& instance_exec_status :
>backend_exec_status.instance_exec_status()) {
> int64_t report_seq_no = instance_exec_status.report_seq_no();
> int instance_idx = 
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
> DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
> InstanceStats* instance_stats = instance_stats_map_[instance_idx];
> int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
> DCHECK(instance_stats->exec_params_.instance_id ==
> ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> // Ignore duplicate or out-of-order messages.
> if (report_seq_no <= last_report_seq_no) {
>   VLOG_QUERY << Substitute("Ignoring stale update for query instance $0 
> with "
>   "seq no $1", PrintId(instance_stats->exec_params_.instance_id), 
> report_seq_no);
>   continue; <<--- // XXX bad
> }
> DCHECK(!instance_stats->done_);
> DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
> const TRuntimeProfileTree& profile =
> has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
> instance_stats->Update(instance_exec_status, profile, exec_summary,
> scan_range_progress);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8274) Missing update to index into profiles vector in Coordinator::BackendState::ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8274.

   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Missing update to index into profiles vector in 
> Coordinator::BackendState::ApplyExecStatusReport()
> --
>
> Key: IMPALA-8274
> URL: https://issues.apache.org/jira/browse/IMPALA-8274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.2.0
>
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update 
> of a fragment instance. As a result, we may end up passing the wrong profile 
> to {{instance_stats->Update()}}. This may lead to random crashes in 
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
>   int idx = 0;
>   const bool has_profile = thrift_profiles.profile_trees.size() > 0;
>   TRuntimeProfileTree empty_profile;
>   for (const FragmentInstanceExecStatusPB& instance_exec_status :
>backend_exec_status.instance_exec_status()) {
> int64_t report_seq_no = instance_exec_status.report_seq_no();
> int instance_idx = 
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
> DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
> InstanceStats* instance_stats = instance_stats_map_[instance_idx];
> int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
> DCHECK(instance_stats->exec_params_.instance_id ==
> ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> // Ignore duplicate or out-of-order messages.
> if (report_seq_no <= last_report_seq_no) {
>   VLOG_QUERY << Substitute("Ignoring stale update for query instance $0 
> with "
>   "seq no $1", PrintId(instance_stats->exec_params_.instance_id), 
> report_seq_no);
>   continue; <<--- // XXX bad
> }
> DCHECK(!instance_stats->done_);
> DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
> const TRuntimeProfileTree& profile =
> has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
> instance_stats->Update(instance_exec_status, profile, exec_summary,
> scan_range_progress);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-8248) Re-organize authorization tests

2019-03-04 Thread radford nguyen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

radford nguyen updated IMPALA-8248:
---
Description: 
We have authorization tests that are specific to Sentry and authorization tests 
that can be applicable to any authorization provider. We need to re-organize 
the authorization tests to easily differentiate between Sentry-specific tests 
vs generic authorization tests.

 
h3. Approach
 # Move `AuthorizationTest.java` and `AuthorizationStmtTest.java` to 
`org.apache.impala.authorization`
 # Rename `CustomClusterGroupMapper` and 
`CustomClusterResourceAuthorizationProvider` to `TestSentryGroupMapper` and 
`TestSentryAuthorizationProvider` since those two class aren't specific to 
custom cluster anymore.
 # Move those two files into `org.apache.impala.testutil` instead since they're 
not actually test classes.

Note: all classes to remain in `test` sourceset

  was:We have authorization tests that are specific to Sentry and authorization 
tests that can be applicable to any authorization provider. We need to 
re-organize the authorization tests to easily differentiate between 
Sentry-specific tests vs generic authorization tests.


> Re-organize authorization tests
> ---
>
> Key: IMPALA-8248
> URL: https://issues.apache.org/jira/browse/IMPALA-8248
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: radford nguyen
>Priority: Major
>
> We have authorization tests that are specific to Sentry and authorization 
> tests that can be applicable to any authorization provider. We need to 
> re-organize the authorization tests to easily differentiate between 
> Sentry-specific tests vs generic authorization tests.
>  
> h3. Approach
>  # Move `AuthorizationTest.java` and `AuthorizationStmtTest.java` to 
> `org.apache.impala.authorization`
>  # Rename `CustomClusterGroupMapper` and 
> `CustomClusterResourceAuthorizationProvider` to `TestSentryGroupMapper` and 
> `TestSentryAuthorizationProvider` since those two class aren't specific to 
> custom cluster anymore.
>  # Move those two files into `org.apache.impala.testutil` instead since 
> they're not actually test classes.
> Note: all classes to remain in `test` sourceset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8248) Re-organize authorization tests

2019-03-04 Thread radford nguyen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

radford nguyen updated IMPALA-8248:
---
Description: 
We have authorization tests that are specific to Sentry and authorization tests 
that can be applicable to any authorization provider. We need to re-organize 
the authorization tests to easily differentiate between Sentry-specific tests 
vs generic authorization tests.

 
h3. Approach
 # Move {{AuthorizationTest.java}} and {{AuthorizationStmtTest.java}} to 
{{org.apache.impala.authorization}}
 # Rename {{CustomClusterGroupMapper}} and 
{{CustomClusterResourceAuthorizationProvider}} to {{TestSentryGroupMapper}} and 
{{TestSentryAuthorizationProvider}} since those two class aren't specific to 
custom cluster anymore.
 # Move those two files into {{org.apache.impala.testutil}} instead since 
they're not actually test classes.

Note: all classes to remain in {{test}} sourceset

  was:
We have authorization tests that are specific to Sentry and authorization tests 
that can be applicable to any authorization provider. We need to re-organize 
the authorization tests to easily differentiate between Sentry-specific tests 
vs generic authorization tests.

 
h3. Approach
 # Move `AuthorizationTest.java` and `AuthorizationStmtTest.java` to 
`org.apache.impala.authorization`
 # Rename `CustomClusterGroupMapper` and 
`CustomClusterResourceAuthorizationProvider` to `TestSentryGroupMapper` and 
`TestSentryAuthorizationProvider` since those two class aren't specific to 
custom cluster anymore.
 # Move those two files into `org.apache.impala.testutil` instead since they're 
not actually test classes.

Note: all classes to remain in `test` sourceset


> Re-organize authorization tests
> ---
>
> Key: IMPALA-8248
> URL: https://issues.apache.org/jira/browse/IMPALA-8248
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: radford nguyen
>Priority: Major
>
> We have authorization tests that are specific to Sentry and authorization 
> tests that can be applicable to any authorization provider. We need to 
> re-organize the authorization tests to easily differentiate between 
> Sentry-specific tests vs generic authorization tests.
>  
> h3. Approach
>  # Move {{AuthorizationTest.java}} and {{AuthorizationStmtTest.java}} to 
> {{org.apache.impala.authorization}}
>  # Rename {{CustomClusterGroupMapper}} and 
> {{CustomClusterResourceAuthorizationProvider}} to {{TestSentryGroupMapper}} 
> and {{TestSentryAuthorizationProvider}} since those two class aren't specific 
> to custom cluster anymore.
>  # Move those two files into {{org.apache.impala.testutil}} instead since 
> they're not actually test classes.
> Note: all classes to remain in {{test}} sourceset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8248) Re-organize authorization tests

2019-03-04 Thread radford nguyen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8248 started by radford nguyen.
--
> Re-organize authorization tests
> ---
>
> Key: IMPALA-8248
> URL: https://issues.apache.org/jira/browse/IMPALA-8248
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: radford nguyen
>Priority: Major
>
> We have authorization tests that are specific to Sentry and authorization 
> tests that can be applicable to any authorization provider. We need to 
> re-organize the authorization tests to easily differentiate between 
> Sentry-specific tests vs generic authorization tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8272) test_catalog_tablesfilesusage failing

2019-03-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783928#comment-16783928
 ] 

ASF subversion and git services commented on IMPALA-8272:
-

Commit 43adfac5078780cd939d8ba23d481529dbebf0aa in impala's branch 
refs/heads/master from Yongzhi Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=43adfac ]

IMPALA-8272: Fix test_catalog_tablesfilesusage failing

The test can run in any context, do not make any assumption.

Change-Id: I41cfa59882edafcd5e61d2e119cd8e8bff08e544
Reviewed-on: http://gerrit.cloudera.org:8080/12649
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_catalog_tablesfilesusage failing
> -
>
> Key: IMPALA-8272
> URL: https://issues.apache.org/jira/browse/IMPALA-8272
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.2.0
>Reporter: Bikramjeet Vig
>Assignee: Yongzhi Chen
>Priority: Critical
>  Labels: broken-build
>
> test_catalog_tablesfilesusage fails in exhaustive builds because the way the 
> test is set up, it expects a certain table to always show up in the top 3 
> list but if the catalog at that time has already loaded data for tables that 
> have more files that the expected table, then it would not show up and the 
> test would fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8274) Missing update to index into profiles vector in Coordinator::BackendState::ApplyExecStatusReport()

2019-03-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783929#comment-16783929
 ] 

ASF subversion and git services commented on IMPALA-8274:
-

Commit 110b362a52eda053caa0d177016e22a23a1a9612 in impala's branch 
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=110b362 ]

IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

The coordinator skips over any stale or duplicated status
reports of fragment instances. In the previous implementation,
the index pointing into the vector of Thrift profiles wasn't
updated when skipping over a status report. This breaks the
assumption that the status reports and thrift profiles vectors
have one-to-one correspondence. Consequently, we may end up
passing the wrong profile to InstanceStats::Update(), leading
to random crashes.

This change fixes the problem above by using iterators to
iterate through the status reports and thrift profiles vectors
and ensures that both iterators are updated on every iteration
of the loop.

Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Reviewed-on: http://gerrit.cloudera.org:8080/12651
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Missing update to index into profiles vector in 
> Coordinator::BackendState::ApplyExecStatusReport()
> --
>
> Key: IMPALA-8274
> URL: https://issues.apache.org/jira/browse/IMPALA-8274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>  Labels: crash
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update 
> of a fragment instance. As a result, we may end up passing the wrong profile 
> to {{instance_stats->Update()}}. This may lead to random crashes in 
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
>   int idx = 0;
>   const bool has_profile = thrift_profiles.profile_trees.size() > 0;
>   TRuntimeProfileTree empty_profile;
>   for (const FragmentInstanceExecStatusPB& instance_exec_status :
>backend_exec_status.instance_exec_status()) {
> int64_t report_seq_no = instance_exec_status.report_seq_no();
> int instance_idx = 
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
> DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
> InstanceStats* instance_stats = instance_stats_map_[instance_idx];
> int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
> DCHECK(instance_stats->exec_params_.instance_id ==
> ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> // Ignore duplicate or out-of-order messages.
> if (report_seq_no <= last_report_seq_no) {
>   VLOG_QUERY << Substitute("Ignoring stale update for query instance $0 
> with "
>   "seq no $1", PrintId(instance_stats->exec_params_.instance_id), 
> report_seq_no);
>   continue; <<--- // XXX bad
> }
> DCHECK(!instance_stats->done_);
> DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
> const TRuntimeProfileTree& profile =
> has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
> instance_stats->Update(instance_exec_status, profile, exec_summary,
> scan_range_progress);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8249) End-to-end test framework doesn't read aggregated counters properly

2019-03-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783927#comment-16783927
 ] 

ASF subversion and git services commented on IMPALA-8249:
-

Commit dc1bc3ca03337d6d63d88261226047bb7a55493b in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dc1bc3c ]

IMPALA-8249: End-to-end test framework doesn't read aggregated counters properly

Updated compute_aggregation() function to not read the pretty-printed
value from the runtime profile, but the accurate value which is at
the end of the line in parenthesis, e.g.:

RowsReturned: 2.14M (2142543)

The old regex tried to parse '2.14M' with 'd+', which resulted in '2'
instead of 2142543.

I tested the change manually and added a test case to
'tests/unittests/test_result_verifier.py'.

Change-Id: I2a6fc0d3f7cbaa87aa848cdafffad21fb1514930
Reviewed-on: http://gerrit.cloudera.org:8080/12589
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> End-to-end test framework doesn't read aggregated counters properly
> ---
>
> Key: IMPALA-8249
> URL: https://issues.apache.org/jira/browse/IMPALA-8249
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> The test framework doesn't always read the correct value of counters from the 
> runtime profile. In the .test files we can have a RUNTIME_PROFILE section 
> where we can test our expectations against runtime profile data. We can even 
> calculate aggregates of runtime data, currently only SUM is supported over 
> integer data, e.g.:
> {code:java}
>  RUNTIME_PROFILE
> aggregation(SUM, RowsReturned): 2142543
> {code}
>  However, the counters are pretty-printed in the runtime profile, which means 
> that if they are greater than 1000, a shortened version is printed first, 
> then the accurate number comes in parenthesis , e.g.:
> {code:java}
> RowsReturned: 2.14M (2142543){code}
>  When the test framework parses the value of an aggregated counter, it 
> wrongly tries to parse the short version as a number, which returns a wrong 
> value (2 instead of 2142543 in the example).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7190) Remove unsupported format write support

2019-03-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783926#comment-16783926
 ] 

ASF subversion and git services commented on IMPALA-7190:
-

Commit 597e378dce448b9488f1dd13c0668fc3c6f828a8 in impala's branch 
refs/heads/2.x from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=597e378 ]

IMPALA-7190: Remove unsupported format writer support

This patch removes write support for unsupported formats like Sequence,
Avro and compressed text. Also, the related query options
ALLOW_UNSUPPORTED_FORMATS and SEQ_COMPRESSION_MODE have been migrated
to the REMOVED query options type.

Testing:
Ran exhaustive build.

Change-Id: I821dc7495a901f1658daa500daf3791b386c7185
Reviewed-on: http://gerrit.cloudera.org:8080/10823
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/12642
Reviewed-by: Tim Armstrong 


> Remove unsupported format write support
> ---
>
> Key: IMPALA-7190
> URL: https://issues.apache.org/jira/browse/IMPALA-7190
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Let's remove the formats gated by ALLOW_UNSUPPORTED_FORMATS since progress 
> stalled a long time ago. It sounds like there's a consensus on the mailing 
> list to remove the code:
> [https://lists.apache.org/thread.html/749bef4914350ae0756bc88961db2dd39901a649a9cef6949eda5870@%3Cdev.impala.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8279) Revert IMPALA-6658 to avoid ETL performance regression

2019-03-04 Thread Andrew Sherman (JIRA)
Andrew Sherman created IMPALA-8279:
--

 Summary: Revert IMPALA-6658 to avoid ETL performance regression
 Key: IMPALA-8279
 URL: https://issues.apache.org/jira/browse/IMPALA-8279
 Project: IMPALA
  Issue Type: Bug
Reporter: Andrew Sherman


The fix for IMPALA-6658 seems to cause a measurable regression on 

{quote}
use tpcds;
create TABLE store_sales_unpart stored as parquet as SELECT * FROM 
tpcds.store_sales;
INSERT OVERWRITE TABLE store_sales_unpart SELECT * FROM store_sales;
{quote}

Revert the change to avoid the regression.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8279) Revert IMPALA-6658 to avoid ETL performance regression

2019-03-04 Thread Andrew Sherman (JIRA)
Andrew Sherman created IMPALA-8279:
--

 Summary: Revert IMPALA-6658 to avoid ETL performance regression
 Key: IMPALA-8279
 URL: https://issues.apache.org/jira/browse/IMPALA-8279
 Project: IMPALA
  Issue Type: Bug
Reporter: Andrew Sherman


The fix for IMPALA-6658 seems to cause a measurable regression on 

{quote}
use tpcds;
create TABLE store_sales_unpart stored as parquet as SELECT * FROM 
tpcds.store_sales;
INSERT OVERWRITE TABLE store_sales_unpart SELECT * FROM store_sales;
{quote}

Revert the change to avoid the regression.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7826) Potential NPE in CatalogOpExecutor

2019-03-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7826:
---

Assignee: Paul Rogers

> Potential NPE in CatalogOpExecutor
> --
>
> Key: IMPALA-7826
> URL: https://issues.apache.org/jira/browse/IMPALA-7826
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> {{CatalogOpExecutor}} has two copies of the following:
> {code:java}
>Db db = catalog_.getDb(dbName);
>if (db == null) {
>  throw new CatalogException("Database: " + db.getName() + " does not 
> exist.");
>}
> {code}
> If {{db}} is null, we can’t call {{.getName()}} on that object. (The IDE 
> showed a warning for this which is why my attention was directed to it.) 
> We’ll get a null pointer exception (NPE) when creating the error message.
> IMPALA-7823 includes the obvious fix, change {{db.getName()}} to {{dbName}}.
> But, there may be deeper problems:
> # Perhaps someone thoughtfully wrapped this call stack in a try/catch block 
> and used the NPE to infer that the DB was not found.
> # Perhaps if-statement is wrong: perhaps the catalog_.getDb() method returns 
> a Db object even if not found, and the if-statement should be checking for “! 
> db.isValid()” or some such.
> # Perhaps the code is dead: it is simply never called.
> # Most likely: perhaps this code is used, but the semantics are such that we 
> already checked the DB earlier in the flow. The check here is superfluous: it 
> can never fail. The check, if we had one, should be an assertion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7826) Potential NPE in CatalogOpExecutor

2019-03-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7826.
-
Resolution: Fixed

Fixed as part of another patch.

> Potential NPE in CatalogOpExecutor
> --
>
> Key: IMPALA-7826
> URL: https://issues.apache.org/jira/browse/IMPALA-7826
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> {{CatalogOpExecutor}} has two copies of the following:
> {code:java}
>Db db = catalog_.getDb(dbName);
>if (db == null) {
>  throw new CatalogException("Database: " + db.getName() + " does not 
> exist.");
>}
> {code}
> If {{db}} is null, we can’t call {{.getName()}} on that object. (The IDE 
> showed a warning for this which is why my attention was directed to it.) 
> We’ll get a null pointer exception (NPE) when creating the error message.
> IMPALA-7823 includes the obvious fix, change {{db.getName()}} to {{dbName}}.
> But, there may be deeper problems:
> # Perhaps someone thoughtfully wrapped this call stack in a try/catch block 
> and used the NPE to infer that the DB was not found.
> # Perhaps if-statement is wrong: perhaps the catalog_.getDb() method returns 
> a Db object even if not found, and the if-statement should be checking for “! 
> db.isValid()” or some such.
> # Perhaps the code is dead: it is simply never called.
> # Most likely: perhaps this code is used, but the semantics are such that we 
> already checked the DB earlier in the flow. The check here is superfluous: it 
> can never fail. The check, if we had one, should be an assertion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8273) Change metastore configuration template so that table parameters do not exclude impala specific properties

2019-03-04 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8273 started by Vihang Karajgaonkar.
---
> Change metastore configuration template so that table parameters do not 
> exclude impala specific properties
> --
>
> Key: IMPALA-8273
> URL: https://issues.apache.org/jira/browse/IMPALA-8273
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> CDH Hive has a configuration 
> {{hive.metastore.notification.parameters.exclude.patterns}} which gives the 
> ability to exclude certain parameter keys from notification events. This is 
> mainly used as a safety valve in case there are huge values stored in these 
> parameter maps. The template file should make sure that the parameter 
> exclusion is disabled (or atleast configured such that it does not exclude 
> {{impala.disableHmsSync}} which is needed by this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8002) Unstable join ordering for equivalent tables

2019-03-04 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783891#comment-16783891
 ] 

Paul Rogers commented on IMPALA-8002:
-

See IMPALA-8219 for one way to resolve this issue.

> Unstable join ordering for equivalent tables
> 
>
> Key: IMPALA-8002
> URL: https://issues.apache.org/jira/browse/IMPALA-8002
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider the following test: {{PlannerTest.testJoins()}}:
> {noformat}
> select t1.d, t2.d
> from functional.nulltable t1, functional.nulltable t2, functional.nulltable t3
> where t1.d IS DISTINCT FROM t2.d
> and t3.a != t2.g
>  PLAN
> PLAN-ROOT SINK
> |
> 04:NESTED LOOP JOIN [INNER JOIN]
> |  predicates: t3.a != t2.g
> |
> |--02:SCAN HDFS [functional.nulltable t3]
> | partitions=1/1 files=1 size=18B
> |
> 03:NESTED LOOP JOIN [INNER JOIN]
> |  predicates: t1.d IS DISTINCT FROM t2.d
> |
> |--00:SCAN HDFS [functional.nulltable t1]
> | partitions=1/1 files=1 size=18B
> |
> 01:SCAN HDFS [functional.nulltable t2]
>partitions=1/1 files=1 size=18B
> {noformat}
> Despite no changes in the planner code, on one run the order flipped to the 
> above. Previously, 01 was t1 and 00 was t2.
> Likely, the behavior when two tables are equivalent has some kind of 
> non-determinism, perhaps storing candidates in a Java Set or Map with 
> undefined order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8050) IS [NOT] NULL gives wrong selectivity when null count is missing

2019-03-04 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783889#comment-16783889
 ] 

Paul Rogers commented on IMPALA-8050:
-

A patch for this is available, but it is some work to update various tests. 
Will offer the patch again once several other planner patches are merged to 
avoid excessive test case churn.

> IS [NOT] NULL gives wrong selectivity when null count is missing
> 
>
> Key: IMPALA-8050
> URL: https://issues.apache.org/jira/browse/IMPALA-8050
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Suppose we have the following query:
> {noformat}
> select *
> from tpch.customer c
> where c.c_mktsegment is null
> {noformat}
> If we have a null count, we can estimate selectivity based on that number. In 
> the case of the TPC-H test data, after a recent fix to add null count back, 
> null count is zero so the cardinality of the predicate {{c.c_mktsegment is 
> null}} is 0 and no rows should be returned. Yet, the query plan shows:
> {noformat}
> PLAN-ROOT SINK
> |
> 00:SCAN HDFS [tpch.customer c]
>partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=15.00K
>predicates: c.c_comment IS NULL
> {noformat}
> So, the first bug is that the existing code which is supposed to consider 
> null count (found in {{IsNullPredicate.analyzeImpl()}} does not work. Reason: 
> the code in {{ColumnStats}} to check if we have nulls is wrong:
> {code:java}
>   public boolean hasNulls() { return numNulls_ > 0; }
> {code}
> Zero is a perfectly valid null count: it means a NOT NULL column. The marker 
> for a missing null count is -1 as shown in another method:
> {code:java}
>   public boolean hasStats() { return numNulls_ != -1 || numDistinctValues_ != 
> -1; }
> {code}
> This is probably an ambiguity in the name: does "has nulls" mean:
> * Do we have valid null count stats?
> * Do we have null count stats and we have at least some nulls?
> Fortunately, the only other use of this method is in (disabled) tests.
> h4. Handle Missing Null Counts
> Second, if the null count is not available (for older stats), the next-best 
> approximation is 1/NDV. The code currently guesses 0.1. The 0.1 estimate is 
> fine if NDV is not available either.
> Note that to properly test some of these cases requires new tables in the 
> test suite with no or partial stats.
> h4. Special Consideration for Outer Joins
> When this predicate is applied to the result of an outer join, the estimation 
> methods above *will not* work. Using the table null count to estimate an 
> outer join null count is clearly wrong, as is using the table NDV value. The 
> fall-back of .1 will tend to under-estimate an outer join.
> Instead, what is needed is a more complex estimate. Assume a left outer join 
> (all rows from left, plus matching rows from right.)
> {noformat}
> |join| = |left 휎 key is not null| * |right|/|key| + |left 휎 key is null|
> {noformat}
> So, to estimate {{IS NULL}} or {{IS NOT NULL}} after an outer join must use a 
> different algorithm then when estimating it in a scan.
> This suggests that expression selectivity is not an independent exercise as 
> the code currently assumes it is. Instead, it must be aware of its context. 
> In this case, the underlying null count for the column in the predicate must 
> be adjusted when used in an outer join.
> The following TPC-H query gives a very clear example (see {{card-join.test}}):
> {code:sql}
> select c.c_custkey, o.o_orderkey
> from tpch.customer c
> left outer join tpch.orders o on c.c_custkey = o.o_custkey
> where o.o_clerk is null
> {code}
> The plan, with the {{IS NULL}} filter applied twice (correct structure, wrong 
> cardinality estimate):
> {noformat}
> PLAN-ROOT SINK
> |
> 02:HASH JOIN [RIGHT OUTER JOIN]
> |  hash predicates: o.o_custkey = c.c_custkey
> |  other predicates: o.o_clerk IS NULL
> |  runtime filters: RF000 <- c.c_custkey
> |  row-size=51B cardinality=0
> |
> |--00:SCAN HDFS [tpch.customer c]
> | partitions=1/1 files=1 size=23.08MB row-size=8B cardinality=150.00K
> |
> 01:SCAN HDFS [tpch.orders o]
>partitions=1/1 files=1 size=162.56MB row-size=43B cardinality=1.50M
>runtime filters: RF000 -> o.o_custkey
> {noformat}
> The math:
> * The query obtains all customer rows, {{|customer| = 150K}}.
> * The query obtains all order rows where the clerk is null, which is none.
> * The query then left outer joins the customer table with orders. Since only 
> 100K customers have orders, 50K do not. The result would be a join with 50K 
> null clerks.
> * But, because the {{IS NULL}} calculations after the join consider only the 
> {{orders}} null count, all the other rows are assumed discarded.
> So, it may be that 

[jira] [Updated] (IMPALA-8086) Check query option value when set

2019-03-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8086:

Summary: Check query option value when set   (was: Check the value during 
set )

> Check query option value when set 
> --
>
> Key: IMPALA-8086
> URL: https://issues.apache.org/jira/browse/IMPALA-8086
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Janaki Lahorani
>Priority: Major
>
> When a parameter is set some value, it is evaluated when the query is run.  
> It should ideally be evaluated with the set is called.
> [localhost:21000] functional_kudu> set runtime_filter_mode=On;
> RUNTIME_FILTER_MODE set to On
> [localhost:21000] functional_kudu> select STRAIGHT_JOIN count(*) from 
> decimal_rtf_tbl a join [BROADCAST] decimal_rtf_tbl_tiny_d5_kudu b where 
> a.d5_0 = b.d5_0;
> Query: select STRAIGHT_JOIN count(*) from decimal_rtf_tbl a join [BROADCAST] 
> decimal_rtf_tbl_tiny_d5_kudu b where a.d5_0 = b.d5_0
> Query submitted at: 2018-12-07 20:00:55 (Coordinator: 
> http://janaki-OptiPlex-7050:25000)
> ERROR: Errors parsing query options
> Invalid runtime filter mode 'On'. Valid modes are OFF(0), LOCAL(1) or 
> GLOBAL(2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8157) Log exceptions from the front end

2019-03-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-8157.
-
Resolution: Won't Fix

Turns out the back end does log exceptions. Perhaps when I filed this I could 
not find them. Closing this bug for now unless I verify that the exceptions are 
not, in fact, getting logged.

> Log exceptions from the front end
> -
>
> Key: IMPALA-8157
> URL: https://issues.apache.org/jira/browse/IMPALA-8157
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The BE calls into the FE for a variety of operations. Each of these may fail 
> in expected ways (invalid query, say) or unexpected ways (a code change 
> introduces a null pointer exception.)
> At present, the BE logs only the exception, and only at the INFO level. This 
> ticket asks to log all unexpected exceptions at the ERROR level. The basic 
> idea is to extend all FE entry points to do:
> {code:java}
> try {
>   // Do the operation
> } catch (ExpectedException e) {
>   // Don't log expected exceptions
>   throw e;
> } catch (Throwable e) {
>   LOG.error("Something went wrong", e);
>   throw e;
> }
> {code}
> The above code logs all exceptions except for those that are considered 
> expected. The job of this ticket is to:
> * Find all the entry points
> * Identify which, if any, exceptions are expected
> * Add logging code with an error message that identifies the operation
> This pattern was tested ad-hoc to find a bug during development and seems to 
> work fine. As. a result, the change is mostly a matter of the above three 
> steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8278) Fix testEventProcessorFetchAfterHMSRestart

2019-03-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created IMPALA-8278:
---

 Summary: Fix testEventProcessorFetchAfterHMSRestart
 Key: IMPALA-8278
 URL: https://issues.apache.org/jira/browse/IMPALA-8278
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{testEventProcessorFetchAfterHMSRestart}} test case in 
{{MetastoreEventsProcessorTest}} causes flakiness because it creates a new 
event processor pointing to the same the catalog instance. This means that all 
the events generated are now being processed by two events processor instances 
and they both try to modify the state of catalogd causing race conditions. The 
failures vary and depend a lot on the timing. I see the following exception 
which is related to this issue.

Easiest way to figure out if this is a problem is to look into FeSupport logs 
of the test to confirm if a event ID is being processed twice (i.e you see to 
exactly similar logs for a given event id).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8258) Enhance functional tables with realistic star-schema simulation

2019-03-04 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8258 started by Paul Rogers.
---
> Enhance functional tables with realistic star-schema simulation
> ---
>
> Key: IMPALA-8258
> URL: https://issues.apache.org/jira/browse/IMPALA-8258
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The tables in the `functional` db provide many interesting cases. The tables 
> in TPC-H and TPC-DS simulate a well-behaved application.
> We also need some tables that show messy, real-world cases:
> * Correlated filters (same filter on multiple tables)
> * Correlated keys (same join key across multiple tables)
> * Extreme data skew
> A simple four-table start-schema structure can give us what we want.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8265) Reject INSERT/UPSERT queries with ORDER BY and no OFFSET/LIMIT

2019-03-04 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783867#comment-16783867
 ] 

Tim Armstrong commented on IMPALA-8265:
---

Yeah I think I understand the problem now. It's definitely an unexpected 
interaction between multiple decisions that made sense in isolation. Definitely 
we don't want any surprising behaviour, but we also don't want to break any 
existing workflows, so ideally we would have a solution that avoided both kinds 
of issues. I think we'd have to consider making this a hard failure a breaking 
change, so not valid in a minor release. Here are some ideas:

# We could do nothing, which ensures no existing workflows are broken, and try 
to improve documentation. Potentially we could add a flag and/or switch the 
behaviour later in a major version. This leaves the potential for confusion 
among users.
# We could change it to a hard error immediately, maybe overridable by an 
option. I think this is unacceptable because of potential for breakage
# We could change the behaviour so that it has the expected behaviour. This 
solves the confusion and doesn't break existing workflows (aside from 
weirdly-written queries getting slower because of the sort).
## Ordering is enforced only between rows with the same primary key. I.e. we 
can still partition rows by the primary key and insert in parallel. This would 
mean that the side-effects of inserts are not strictly ordered.
## Ordering is enforced among all rows. This would force us to send all rows 
through the same node.

To me, options 1. and 3.1 seem viable. 3.1 requires some real work but avoids 
the biggest downsides and makes some new workloads possible. We already insert 
sorts before Kudu inserts/upserts but this changes the semantics a bit.

> Reject INSERT/UPSERT  queries with ORDER BY and no OFFSET/LIMIT
> ---
>
> Key: IMPALA-8265
> URL: https://issues.apache.org/jira/browse/IMPALA-8265
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Andy Stadtler
>Priority: Critical
>
> Currently Impala doesn't honor a order by without a limit or offset in a 
> insert ... select operation. While Impala currently throws a warning it seems 
> like this query should be rejected with the same message. Especially now with 
> the UPSERT ability and Kudu its obvious logic to take a table of duplicate 
> rows and use the following query.
> {code:java}
> UPSERT INTO kudu_table SELECT col1, col2, col3 FROM duplicate_row_table ORDER 
> BY timestamp_column ASC;{code}
> Impala will happily take this query and write incorrect data. The same query 
> works fine as a SELECT only query and it's easy to see where users would make 
> the mistake of reusing it in an INSERT/UPSERT.
>  
> Rejecting the query with the warning message would make sure the user knew 
> the ORDER BY would not be honored and make sure they added a limit, changed 
> their query logic or removed the order by.
>  
> {quote}*Sorting considerations:* Although you can specify an {{ORDER BY}} 
> clause in an {{INSERT ... SELECT}} statement, any {{ORDER BY}} clause is 
> ignored and the results are not necessarily sorted. An {{INSERT ... SELECT}} 
> operation potentially creates many different data files, prepared on 
> different data nodes, and therefore the notion of the data being stored in 
> sorted order is impractical.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8273) Change metastore configuration template so that table parameters do not exclude impala specific properties

2019-03-04 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783865#comment-16783865
 ] 

Vihang Karajgaonkar commented on IMPALA-8273:
-

Adding the gerrit link

> Change metastore configuration template so that table parameters do not 
> exclude impala specific properties
> --
>
> Key: IMPALA-8273
> URL: https://issues.apache.org/jira/browse/IMPALA-8273
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> CDH Hive has a configuration 
> {{hive.metastore.notification.parameters.exclude.patterns}} which gives the 
> ability to exclude certain parameter keys from notification events. This is 
> mainly used as a safety valve in case there are huge values stored in these 
> parameter maps. The template file should make sure that the parameter 
> exclusion is disabled (or atleast configured such that it does not exclude 
> {{impala.disableHmsSync}} which is needed by this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783854#comment-16783854
 ] 

Tim Armstrong commented on IMPALA-8276:
---

I heard [~Paul.Rogers] is taking over, so I guess that question should be 
directed to him (if he's not already doing it).

> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Assignee: Paul Rogers
>Priority: Blocker
>  Labels: correctness
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
> a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8276:
-

Assignee: Paul Rogers

> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Assignee: Paul Rogers
>Priority: Blocker
>  Labels: correctness
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
> a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8277) CHECK can be hit when there are gaps in present CPU numbers (KUDU-2721)

2019-03-04 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8277:
-

 Summary: CHECK can be hit when there are gaps in present CPU 
numbers (KUDU-2721)
 Key: IMPALA-8277
 URL: https://issues.apache.org/jira/browse/IMPALA-8277
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.1.0, Impala 3.2.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This is a placeholder to port KUDU-2721 to our gutil once it's fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8277) CHECK can be hit when there are gaps in present CPU numbers (KUDU-2721)

2019-03-04 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8277:
-

 Summary: CHECK can be hit when there are gaps in present CPU 
numbers (KUDU-2721)
 Key: IMPALA-8277
 URL: https://issues.apache.org/jira/browse/IMPALA-8277
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.1.0, Impala 3.2.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This is a placeholder to port KUDU-2721 to our gutil once it's fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7804) Various scanner tests intermittently failing on S3 on different runs

2019-03-04 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7804.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

Closing this, as we made some test changes that alleviated this issue in 3.2

> Various scanner tests intermittently failing on S3 on different runs
> 
>
> Key: IMPALA-7804
> URL: https://issues.apache.org/jira/browse/IMPALA-7804
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: David Knupp
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: S3, broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> The failures have to do with getting AWS client credentials.
> *query_test/test_scanners.py:696: in test_decimal_encodings*
> _Stacktrace_
> {noformat}
> query_test/test_scanners.py:696: in test_decimal_encodings
> self.run_test_case('QueryTest/parquet-decimal-formats', vector, 
> unique_database)
> common/impala_test_suite.py:496: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:358: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:438: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:260: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E -255.00,-255.00,-255.00 == -255.00,-255.00,-255.00
> E -255.00,-255.00,-255.00 != -65535.00,-65535.00,-65535.00
> E -65535.00,-65535.00,-65535.00 != -999.99,-999.99,-999.99
> E -65535.00,-65535.00,-65535.00 != 
> 0.00,-.99,-.99
> E -999.99,-999.99,-999.99 != 0.00,0.00,0.00
> E -999.99,-999.99,-999.99 != 
> 0.00,.99,.99
> E 0.00,-.99,-.99 != 
> 255.00,255.00,255.00
> E 0.00,-.99,-.99 != 
> 65535.00,65535.00,65535.00
> E 0.00,0.00,0.00 != 999.99,999.99,999.99
> E 0.00,0.00,0.00 != None
> E 0.00,.99,.99 != None
> E 0.00,.99,.99 != None
> E 255.00,255.00,255.00 != None
> E 255.00,255.00,255.00 != None
> E 65535.00,65535.00,65535.00 != None
> E 65535.00,65535.00,65535.00 != None
> E 999.99,999.99,999.99 != None
> E 999.99,999.99,999.99 != None
> E Number of rows returned (expected vs actual): 18 != 9
> {noformat}
> _Standard Error_
> {noformat}
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_huge_num_rows_76a09ef1` CASCADE;
> -- 2018-11-01 09:42:41,140 INFO MainThread: Started query 
> 4c4bc0e7b69d7641:130ffe73
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_huge_num_rows_76a09ef1`;
> -- 2018-11-01 09:42:42,402 INFO MainThread: Started query 
> e34d714d6a62cba1:2a8544d0
> -- 2018-11-01 09:42:42,405 INFO MainThread: Created database 
> "test_huge_num_rows_76a09ef1" for test ID 
> "query_test/test_scanners.py::TestParquet::()::test_huge_num_rows[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> 18/11/01 09:42:43 DEBUG s3a.S3AFileSystem: Initializing S3AFileSystem for 
> impala-test-uswest2-1
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Propagating entries under 
> fs.s3a.bucket.impala-test-uswest2-1.
> 18/11/01 09:42:43 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: For URI s3a://impala-test-uswest2-1/, 
> using credentials AWSCredentialProviderList: BasicAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider 
> com.amazonaws.auth.InstanceProfileCredentialsProvider@15bbf42f
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.connection.maximum is 
> 1500
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.attempts.maximum is 20
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of 
> 

[jira] [Updated] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-03-04 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8189:
--
Fix Version/s: Impala 3.2.0

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 

[jira] [Resolved] (IMPALA-7804) Various scanner tests intermittently failing on S3 on different runs

2019-03-04 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7804.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

Closing this, as we made some test changes that alleviated this issue in 3.2

> Various scanner tests intermittently failing on S3 on different runs
> 
>
> Key: IMPALA-7804
> URL: https://issues.apache.org/jira/browse/IMPALA-7804
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: David Knupp
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: S3, broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> The failures have to do with getting AWS client credentials.
> *query_test/test_scanners.py:696: in test_decimal_encodings*
> _Stacktrace_
> {noformat}
> query_test/test_scanners.py:696: in test_decimal_encodings
> self.run_test_case('QueryTest/parquet-decimal-formats', vector, 
> unique_database)
> common/impala_test_suite.py:496: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:358: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:438: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:260: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E -255.00,-255.00,-255.00 == -255.00,-255.00,-255.00
> E -255.00,-255.00,-255.00 != -65535.00,-65535.00,-65535.00
> E -65535.00,-65535.00,-65535.00 != -999.99,-999.99,-999.99
> E -65535.00,-65535.00,-65535.00 != 
> 0.00,-.99,-.99
> E -999.99,-999.99,-999.99 != 0.00,0.00,0.00
> E -999.99,-999.99,-999.99 != 
> 0.00,.99,.99
> E 0.00,-.99,-.99 != 
> 255.00,255.00,255.00
> E 0.00,-.99,-.99 != 
> 65535.00,65535.00,65535.00
> E 0.00,0.00,0.00 != 999.99,999.99,999.99
> E 0.00,0.00,0.00 != None
> E 0.00,.99,.99 != None
> E 0.00,.99,.99 != None
> E 255.00,255.00,255.00 != None
> E 255.00,255.00,255.00 != None
> E 65535.00,65535.00,65535.00 != None
> E 65535.00,65535.00,65535.00 != None
> E 999.99,999.99,999.99 != None
> E 999.99,999.99,999.99 != None
> E Number of rows returned (expected vs actual): 18 != 9
> {noformat}
> _Standard Error_
> {noformat}
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_huge_num_rows_76a09ef1` CASCADE;
> -- 2018-11-01 09:42:41,140 INFO MainThread: Started query 
> 4c4bc0e7b69d7641:130ffe73
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_huge_num_rows_76a09ef1`;
> -- 2018-11-01 09:42:42,402 INFO MainThread: Started query 
> e34d714d6a62cba1:2a8544d0
> -- 2018-11-01 09:42:42,405 INFO MainThread: Created database 
> "test_huge_num_rows_76a09ef1" for test ID 
> "query_test/test_scanners.py::TestParquet::()::test_huge_num_rows[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> 18/11/01 09:42:43 DEBUG s3a.S3AFileSystem: Initializing S3AFileSystem for 
> impala-test-uswest2-1
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Propagating entries under 
> fs.s3a.bucket.impala-test-uswest2-1.
> 18/11/01 09:42:43 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 18/11/01 09:42:43 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: For URI s3a://impala-test-uswest2-1/, 
> using credentials AWSCredentialProviderList: BasicAWSCredentialsProvider 
> EnvironmentVariableCredentialsProvider 
> com.amazonaws.auth.InstanceProfileCredentialsProvider@15bbf42f
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.connection.maximum is 
> 1500
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of fs.s3a.attempts.maximum is 20
> 18/11/01 09:42:43 DEBUG s3a.S3AUtils: Value of 
> 

[jira] [Resolved] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-03-04 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8189.
--
Resolution: Fixed

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> 

[jira] [Resolved] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-03-04 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8189.
--
Resolution: Fixed

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> 

[jira] [Commented] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783765#comment-16783765
 ] 

Tim Armstrong commented on IMPALA-8276:
---

[~yzhangal] do you have some view definitions that are sufficient to reproduce 
the issue?

> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Blocker
>  Labels: correctness
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
> a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8269) Clean up authorization test package structure

2019-03-04 Thread Mahendra Korepu (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8269 started by Mahendra Korepu.
---
> Clean up authorization test package structure
> -
>
> Key: IMPALA-8269
> URL: https://issues.apache.org/jira/browse/IMPALA-8269
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Mahendra Korepu
>Priority: Minor
>  Labels: ramp-up
>
> The task is to do some clean up on the authorization test package structure.
> 1. Move AuthorizatioinTest.java and AuthorizationStmtTest.java to 
> authorization test package.
> 2. Rename CustomClusterGroupMapper and 
> CustomClusterResourceAuthorizationProvider to TestSentryGroupMapper and 
> TestSentryResourceAuthorizationProvider since those two class aren't specific 
> to custom cluster anymore.
> 3. Move those two files into `testutil` instead since they're not actually 
> test classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8248) Re-organize authorization tests

2019-03-04 Thread radford nguyen (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

radford nguyen reassigned IMPALA-8248:
--

Assignee: radford nguyen

> Re-organize authorization tests
> ---
>
> Key: IMPALA-8248
> URL: https://issues.apache.org/jira/browse/IMPALA-8248
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: radford nguyen
>Priority: Major
>
> We have authorization tests that are specific to Sentry and authorization 
> tests that can be applicable to any authorization provider. We need to 
> re-organize the authorization tests to easily differentiate between 
> Sentry-specific tests vs generic authorization tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-8269) Clean up authorization test package structure

2019-03-04 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reopened IMPALA-8269:
--

This isn't fixed yet.

> Clean up authorization test package structure
> -
>
> Key: IMPALA-8269
> URL: https://issues.apache.org/jira/browse/IMPALA-8269
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Mahendra Korepu
>Priority: Minor
>  Labels: ramp-up
>
> The task is to do some clean up on the authorization test package structure.
> 1. Move AuthorizatioinTest.java and AuthorizationStmtTest.java to 
> authorization test package.
> 2. Rename CustomClusterGroupMapper and 
> CustomClusterResourceAuthorizationProvider to TestSentryGroupMapper and 
> TestSentryResourceAuthorizationProvider since those two class aren't specific 
> to custom cluster anymore.
> 3. Move those two files into `testutil` instead since they're not actually 
> test classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8269) Clean up authorization test package structure

2019-03-04 Thread Mahendra Korepu (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahendra Korepu resolved IMPALA-8269.
-
Resolution: Fixed

Changes pushed up for review: https://gerrit.cloudera.org/#/c/12654/

> Clean up authorization test package structure
> -
>
> Key: IMPALA-8269
> URL: https://issues.apache.org/jira/browse/IMPALA-8269
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Mahendra Korepu
>Priority: Minor
>  Labels: ramp-up
>
> The task is to do some clean up on the authorization test package structure.
> 1. Move AuthorizatioinTest.java and AuthorizationStmtTest.java to 
> authorization test package.
> 2. Rename CustomClusterGroupMapper and 
> CustomClusterResourceAuthorizationProvider to TestSentryGroupMapper and 
> TestSentryResourceAuthorizationProvider since those two class aren't specific 
> to custom cluster anymore.
> 3. Move those two files into `testutil` instead since they're not actually 
> test classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8269) Clean up authorization test package structure

2019-03-04 Thread Mahendra Korepu (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahendra Korepu resolved IMPALA-8269.
-
Resolution: Fixed

Changes pushed up for review: https://gerrit.cloudera.org/#/c/12654/

> Clean up authorization test package structure
> -
>
> Key: IMPALA-8269
> URL: https://issues.apache.org/jira/browse/IMPALA-8269
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Mahendra Korepu
>Priority: Minor
>  Labels: ramp-up
>
> The task is to do some clean up on the authorization test package structure.
> 1. Move AuthorizatioinTest.java and AuthorizationStmtTest.java to 
> authorization test package.
> 2. Rename CustomClusterGroupMapper and 
> CustomClusterResourceAuthorizationProvider to TestSentryGroupMapper and 
> TestSentryResourceAuthorizationProvider since those two class aren't specific 
> to custom cluster anymore.
> 3. Move those two files into `testutil` instead since they're not actually 
> test classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8269) Clean up authorization test package structure

2019-03-04 Thread Mahendra Korepu (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8269 started by Mahendra Korepu.
---
> Clean up authorization test package structure
> -
>
> Key: IMPALA-8269
> URL: https://issues.apache.org/jira/browse/IMPALA-8269
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Mahendra Korepu
>Priority: Minor
>  Labels: ramp-up
>
> The task is to do some clean up on the authorization test package structure.
> 1. Move AuthorizatioinTest.java and AuthorizationStmtTest.java to 
> authorization test package.
> 2. Rename CustomClusterGroupMapper and 
> CustomClusterResourceAuthorizationProvider to TestSentryGroupMapper and 
> TestSentryResourceAuthorizationProvider since those two class aren't specific 
> to custom cluster anymore.
> 3. Move those two files into `testutil` instead since they're not actually 
> test classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8274) Missing update to index into profiles vector in Coordinator::BackendState::ApplyExecStatusReport()

2019-03-04 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple updated IMPALA-8274:
--
Labels: crash  (was: )

> Missing update to index into profiles vector in 
> Coordinator::BackendState::ApplyExecStatusReport()
> --
>
> Key: IMPALA-8274
> URL: https://issues.apache.org/jira/browse/IMPALA-8274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>  Labels: crash
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update 
> of a fragment instance. As a result, we may end up passing the wrong profile 
> to {{instance_stats->Update()}}. This may lead to random crashes in 
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
>   int idx = 0;
>   const bool has_profile = thrift_profiles.profile_trees.size() > 0;
>   TRuntimeProfileTree empty_profile;
>   for (const FragmentInstanceExecStatusPB& instance_exec_status :
>backend_exec_status.instance_exec_status()) {
> int64_t report_seq_no = instance_exec_status.report_seq_no();
> int instance_idx = 
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
> DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
> InstanceStats* instance_stats = instance_stats_map_[instance_idx];
> int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
> DCHECK(instance_stats->exec_params_.instance_id ==
> ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> // Ignore duplicate or out-of-order messages.
> if (report_seq_no <= last_report_seq_no) {
>   VLOG_QUERY << Substitute("Ignoring stale update for query instance $0 
> with "
>   "seq no $1", PrintId(instance_stats->exec_params_.instance_id), 
> report_seq_no);
>   continue; <<--- // XXX bad
> }
> DCHECK(!instance_stats->done_);
> DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
> const TRuntimeProfileTree& profile =
> has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
> instance_stats->Update(instance_exec_status, profile, exec_summary,
> scan_range_progress);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8276:
--
Priority: Blocker  (was: Major)

> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Blocker
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
> a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8276:
--
Labels: correctness  (was: )

> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Blocker
>  Labels: correctness
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
> a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated IMPALA-8276:
--
Description: 
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query body is the same for both, because the former generated 
the bogus predicate and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 

  was:
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query body is the same for both, because the former generated 
the bogus predicate and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 


> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Major
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
> view2 b on a.p = b.q) a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created IMPALA-8276:
-

 Summary: Self equal to self predicate "x = x" generated by Impala 
caused incorrect query result
 Key: IMPALA-8276
 URL: https://issues.apache.org/jira/browse/IMPALA-8276
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Yongjun Zhang


Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a count(*) query returned fewer rows than a CTAS query, 
though the query is the same, because the former generated the bogus predicate 
and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated IMPALA-8276:
--
Description: 
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query body is the same for both, because the former generated 
the bogus predicate and the latter doesn't.

For example,
{code:java}
select count(*) from 
(select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
a{code}
returned fewer rows than
{code:java}
create table abc as 
select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 

  was:
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query body is the same for both, because the former generated 
the bogus predicate and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 


> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Major
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from 
> (select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = b.q) 
> a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select a.*, b.x, b.y, b.z_dt,  from view1 a left join view2 b on a.p = 
> b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated IMPALA-8276:
--
Description: 
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query body is the same for both, because the former generated 
the bogus predicate and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 

  was:
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query is the same, because the former generated the bogus 
predicate and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 


> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Major
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query body is the same for both, because the former 
> generated the bogus predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
> view2 b on a.p = b.q) a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
> view2 b on a.p = b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: 

[jira] [Updated] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated IMPALA-8276:
--
Description: 
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
query, though the query is the same, because the former generated the bogus 
predicate and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 

  was:
Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a count(*) query returned fewer rows than a CTAS query, 
though the query is the same, because the former generated the bogus predicate 
and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 


> Self equal to self predicate "x = x" generated by Impala caused incorrect 
> query result
> --
>
> Key: IMPALA-8276
> URL: https://issues.apache.org/jira/browse/IMPALA-8276
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Yongjun Zhang
>Priority: Major
>
> Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x 
> = x" is generated by Impala and caused incorrect query result, because this 
> kind of predicate return false for "null" entries.
> It was observed that a {{count(*)}} query returned fewer rows than a CTAS 
> query, though the query is the same, because the former generated the bogus 
> predicate and the latter doesn't.
> For example,
> {code:java}
> select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
> view2 b on a.p = b.q) a{code}
> returned fewer rows than
> {code:java}
> create table abc as 
> select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
> view2 b on a.p = b.q{code}
>  because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
> notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
> query plan in Impala query profile because a and b are aliases of view1 and 
> view2,  both of which are views created in a very nested way that involves 
> table table1. 
> Though in cdh5.12.1 the select and the count query returns different result 
> in the initial case, an attempted reproduction shows that both queries get 
> bogus predicates. And cdh5.15.2 has the same problem.  Was not able to try 
> out with most recent master branch of impala due to meta data incompatibility.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8276) Self equal to self predicate "x = x" generated by Impala caused incorrect query result

2019-03-04 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created IMPALA-8276:
-

 Summary: Self equal to self predicate "x = x" generated by Impala 
caused incorrect query result
 Key: IMPALA-8276
 URL: https://issues.apache.org/jira/browse/IMPALA-8276
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Yongjun Zhang


Reported with cdh5.12.1, that "self equal to self" kind of bogus predicate "x = 
x" is generated by Impala and caused incorrect query result, because this kind 
of predicate return false for "null" entries.

It was observed that a count(*) query returned fewer rows than a CTAS query, 
though the query is the same, because the former generated the bogus predicate 
and the latter doesn't.

For example,
{code:java}
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q) a{code}
returned fewer rows than
{code:java}
create table abc as 
select count(*) from (select a.*, b.x, b.y, b.z_dt,  from view1 a left join 
view2 b on a.p = b.q{code}
 because predicate {{a.z = a.z_dt}} was created (for reasons to understand, 
notice b.z_dt is an alias of b.z), exhibited as "table1.z = table1.z" in the 
query plan in Impala query profile because a and b are aliases of view1 and 
view2,  both of which are views created in a very nested way that involves 
table table1. 

Though in cdh5.12.1 the select and the count query returns different result in 
the initial case, an attempted reproduction shows that both queries get bogus 
predicates. And cdh5.15.2 has the same problem.  Was not able to try out with 
most recent master branch of impala due to meta data incompatibility.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8274) Missing update to index into profiles vector in Coordinator::BackendState::ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783086#comment-16783086
 ] 

Michael Ho commented on IMPALA-8274:


FWIW, the bug above led to crash like the following:
{noformat}
F0302 10:21:04.562525 22393 coordinator-backend-state.cc:571] Check failed: 
per_fragment_instance_idx < exec_summary.exec_stats.size() (62 vs. 1)  
name=HDFS_SCAN_NODE (id=3) instance_id=e54a26423c426f58:ecf1f6b400b5 
fragment_idx=4
{noformat}

{noformat}
(gdb) bt
#0  0x7f0e215c0207 in raise () from ./sysroot/lib64/libc.so.6
#1  0x7f0e215c18f8 in abort () from ./sysroot/lib64/libc.so.6
#2  0x047fe4d4 in google::DumpStackTraceAndExit() ()
#3  0x047f4f2d in google::LogMessage::Fail() ()
#4  0x047f67d2 in google::LogMessage::SendToLog() ()
#5  0x047f4907 in google::LogMessage::Flush() ()
#6  0x047f7ece in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x0275dd6a in 
impala::Coordinator::BackendState::InstanceStats::Update (this=0x17d393910, 
exec_status=..., thrift_profile=..., exec_summary=0x1a72a940, 
scan_range_progress=0x1a72a8d8)
at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/runtime/coordinator-backend-state.cc:571
#8  0x0275b0cf in 
impala::Coordinator::BackendState::ApplyExecStatusReport (this=0x2e71f0100, 
backend_exec_status=..., thrift_profiles=..., exec_summary=0x1a72a940, 
scan_range_progress=0x1a72a8d8,
dml_exec_state=0x1a72aa80) at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/runtime/coordinator-backend-state.cc:337
#9  0x027474bb in impala::Coordinator::UpdateBackendExecStatus 
(this=0x1a72a880, request=..., thrift_profiles=...) at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/runtime/coordinator.cc:713
#10 0x020d5c46 in impala::ClientRequestState::UpdateBackendExecStatus 
(this=0xe3a1c000, request=..., thrift_profiles=...) at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/service/client-request-state.cc:1303
#11 0x02038291 in impala::ControlService::ReportExecStatus 
(this=0x1596cad0, request=0x7835ba70, response=0x47894bfa0, 
rpc_context=0x47894aea0)
at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/src/service/control-service.cc:152
#12 0x020dbac4 in 
impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
const&, scoped_refptr 
const&)::{lambda(google::protobuf::Message const*, google::protobuf::Message*, 
kudu::rpc::RpcContext*)#2}::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const ()
at 
/usr/src/debug/impala-3.2.0-cdh6.2.x-SNAPSHOT/be/generated-sources/gen-cpp/control_service.service.cc:62
{noformat}

> Missing update to index into profiles vector in 
> Coordinator::BackendState::ApplyExecStatusReport()
> --
>
> Key: IMPALA-8274
> URL: https://issues.apache.org/jira/browse/IMPALA-8274
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Blocker
>
> {{idx}} isn't updated in case we skip a duplicated or stale duplicated update 
> of a fragment instance. As a result, we may end up passing the wrong profile 
> to {{instance_stats->Update()}}. This may lead to random crashes in 
> {{Coordinator::BackendState::InstanceStats::Update}}.
> {noformat}
>   int idx = 0;
>   const bool has_profile = thrift_profiles.profile_trees.size() > 0;
>   TRuntimeProfileTree empty_profile;
>   for (const FragmentInstanceExecStatusPB& instance_exec_status :
>backend_exec_status.instance_exec_status()) {
> int64_t report_seq_no = instance_exec_status.report_seq_no();
> int instance_idx = 
> GetInstanceIdx(instance_exec_status.fragment_instance_id());
> DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
> InstanceStats* instance_stats = instance_stats_map_[instance_idx];
> int64_t last_report_seq_no = instance_stats->last_report_seq_no_;
> DCHECK(instance_stats->exec_params_.instance_id ==
> ProtoToQueryId(instance_exec_status.fragment_instance_id()));
> // Ignore duplicate or out-of-order messages.
> if (report_seq_no <= last_report_seq_no) {
>   VLOG_QUERY << Substitute("Ignoring stale update for query instance $0 
> with "
>   "seq no $1", PrintId(instance_stats->exec_params_.instance_id), 
> report_seq_no);
>   continue; <<--- // XXX bad
> }
> DCHECK(!instance_stats->done_);
> DCHECK(!has_profile || idx < thrift_profiles.profile_trees.size());
> const TRuntimeProfileTree& profile =
> has_profile ? thrift_profiles.profile_trees[idx++] : empty_profile;
> instance_stats->Update(instance_exec_status, profile, exec_summary,
>