[jira] [Commented] (IMPALA-10564) No error returned when inserting an overflowed value into a decimal column

2021-03-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309127#comment-17309127
 ] 

ASF subversion and git services commented on IMPALA-10564:
--

Commit 281a47caad1c7c53033ee1ce0affa35b8fd4d2d7 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=281a47c ]

IMPALA-10564 (part 2): Fixed test_ctas_exprs failure for S3 build

New test case TestDecimalOverflowExprs::test_ctas_exprs was added
in the first patch for IMPALA-10564. But it failed in S3 build with
Parquet format since the table was not successfully created when
CTAS query failed.
This patch fixed the test failure by skipping checking if NULL is
inserted into table after CTAS failed for S3 build with Parquet.

Testing:
 - Reproduced the test failure in local box with defaultFS as s3a.
   Verified the fixing was working with defaultFS as s3a.
 - Passed EE_TEST.

Change-Id: Ia627ca70ed41764e86be348a0bc19e330b3334d2
Reviewed-on: http://gerrit.cloudera.org:8080/17228
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> No error returned when inserting an overflowed value into a decimal column
> --
>
> Key: IMPALA-10564
> URL: https://issues.apache.org/jira/browse/IMPALA-10564
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.0
>
>
> When using CTAS statements or INSERT-SELECT statements to insert rows to 
> table with decimal columns, Impala insert NULL for overflowed decimal values, 
> instead of returning error. This issue happens when the data expression for 
> the decimal column in SELECT sub-query consists at least one alias. This 
> issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the 
> cases with the data expression for the decimal columns as constants so that 
> the overflowed decimal values could be detected by frontend during expression 
> analysis.  If there is alias (variables) in the data expression for the 
> decimal column, Frontend could not evaluate data expression in expression 
> analysis phase. Only backend could evaluate the data expression when backend 
> execute fragment instances for SELECT sub-queries. The log messages showed 
> that the executor detected the decimal overflow error, but somehow it did not 
> propagate the error to the coordinator, hence the error was not returned to 
> the client.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10564) No error returned when inserting an overflowed value into a decimal column

2021-03-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309126#comment-17309126
 ] 

ASF subversion and git services commented on IMPALA-10564:
--

Commit 281a47caad1c7c53033ee1ce0affa35b8fd4d2d7 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=281a47c ]

IMPALA-10564 (part 2): Fixed test_ctas_exprs failure for S3 build

New test case TestDecimalOverflowExprs::test_ctas_exprs was added
in the first patch for IMPALA-10564. But it failed in S3 build with
Parquet format since the table was not successfully created when
CTAS query failed.
This patch fixed the test failure by skipping checking if NULL is
inserted into table after CTAS failed for S3 build with Parquet.

Testing:
 - Reproduced the test failure in local box with defaultFS as s3a.
   Verified the fixing was working with defaultFS as s3a.
 - Passed EE_TEST.

Change-Id: Ia627ca70ed41764e86be348a0bc19e330b3334d2
Reviewed-on: http://gerrit.cloudera.org:8080/17228
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> No error returned when inserting an overflowed value into a decimal column
> --
>
> Key: IMPALA-10564
> URL: https://issues.apache.org/jira/browse/IMPALA-10564
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.0
>
>
> When using CTAS statements or INSERT-SELECT statements to insert rows to 
> table with decimal columns, Impala insert NULL for overflowed decimal values, 
> instead of returning error. This issue happens when the data expression for 
> the decimal column in SELECT sub-query consists at least one alias. This 
> issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the 
> cases with the data expression for the decimal columns as constants so that 
> the overflowed decimal values could be detected by frontend during expression 
> analysis.  If there is alias (variables) in the data expression for the 
> decimal column, Frontend could not evaluate data expression in expression 
> analysis phase. Only backend could evaluate the data expression when backend 
> execute fragment instances for SELECT sub-queries. The log messages showed 
> that the executor detected the decimal overflow error, but somehow it did not 
> propagate the error to the coordinator, hence the error was not returned to 
> the client.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10598) test_cache_reload_validation is flaky

2021-03-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309125#comment-17309125
 ] 

ASF subversion and git services commented on IMPALA-10598:
--

Commit 5b27b7ca7232a17d2a099f8567553004248989f2 in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5b27b7c ]

IMPALA-10598: Deflake test_cache_reload_validation

This patch deflakes the test test_cache_reload_validation in
test_hdfs_caching.py e2e test. The util method which the test relies on to
get the count of list of cache directives by parsing the output of command
"hdfs cacheadmin -listDirectives -stats" does not consider that the output
may contain trailing new lines or headers. Hence the test fails because the
expected number of cache directives does not match the number of lines
of the output.

The fix parses the line "Found  entries" in the output when available
and returns the count from that line. If the line is not found, it fallbacks
to the earlier implementation of using the number of lines.

Testing:
1. The test was failing for me when run individually. After the patch, I looped
the test 10 times without any errors.

Change-Id: I2d491e90af461d5db3575a5840958d17ca90901c
Reviewed-on: http://gerrit.cloudera.org:8080/17210
Reviewed-by: Vihang Karajgaonkar 
Tested-by: Impala Public Jenkins 


> test_cache_reload_validation is flaky
> -
>
> Key: IMPALA-10598
> URL: https://issues.apache.org/jira/browse/IMPALA-10598
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>  Labels: flaky-test
>
> I noticed that when I run 
> {noformat}
> bin/impala-py.test tests/query_test/test_hdfs_caching.py -k 
> test_cache_reload_validation
> {noformat}
> I see a the following failure on master branch. 
> {noformat}
>  TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation
> assert num_entries_pre + 4 == get_num_cache_requests(), \
> E   AssertionError: Adding the tables should be reflected by the number of 
> cache directives.
> E   assert (2 + 4) == 7
> E+  where 7 = get_num_cache_requests()
> {noformat}
> This failure is reproducible for me every time but I am not sure why the 
> jenkins job don't show this test failure. When I looked into this I found 
> that the test depends on the method
> get the number of cache directives on the hdfs.
> {noformat}
>   def get_num_cache_requests_util():
> rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives 
> -stats")
> assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, 
> stderr)
> return len(stdout.split('\n'))
> {noformat}
> This output of this command when there are no entries is 
> {noformat}
> Found 0 entries
> {noformat}
> when there are entries the output looks like 
> {noformat}
> Found 4 entries
>   ID POOL   REPL EXPIRY  PATH 
>BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
>  225 testPool  8 never   /test-warehouse/cachedb.db/cached_tbl_reload 
>   0 0 0 0
>  226 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part  0  
>0 0 0
>  227 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1  0  
>0 0 0
>  228 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2  0  
>0 0 0
> {noformat}
> When there are no entries there is also a additional new line which is 
> counted.
> So when there are no entries the method outputs 2 and when there are 4 
> entries the method outputs 7 which causes the failure because the test 
> expects 2+4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10397) TestAutoScaling.test_single_workload failed in exhaustive release build

2021-03-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309128#comment-17309128
 ] 

ASF subversion and git services commented on IMPALA-10397:
--

Commit 0b79464d9c74d3cc89230a5a3ec3c3955ea2a953 in impala's branch 
refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0b79464 ]

IMPALA-10397: Fix test_single_workload

The logs on failed runs indicated that the autoscaler never started
another cluster. This can only happen if it never notices a queued
query which is possible since this test was only failing in release
builds. This patch increases the runtime of the sample query to
make execution more predictable.

Testing:
Looped on my local on a release build

Change-Id: Ide3c7fb4509ce9a797b4cbdd141b2a319b923d4e
Reviewed-on: http://gerrit.cloudera.org:8080/17218
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestAutoScaling.test_single_workload failed in exhaustive release build
> ---
>
> Key: IMPALA-10397
> URL: https://issues.apache.org/jira/browse/IMPALA-10397
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Zoltán Borók-Nagy
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.0
>
>
> TestAutoScaling.test_single_workload failed in an exhaustive release build.
> *Error details*
> AssertionError: Number of backends did not reach 5 within 45 s assert 
> any( at 0x7f772c155e10>)
> *Stack trace*
> {noformat}
> custom_cluster/test_auto_scaling.py:95: in test_single_workload
>  assert any(self._get_num_backends() >= cluster_size or sleep(1)
> E AssertionError: Number of backends did not reach 5 within 45 s
> E assert any( at 0x7f772c155e10>){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10607) TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build

2021-03-25 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309054#comment-17309054
 ] 

Wenzhe Zhou commented on IMPALA-10607:
--

Verified that this issue does not happen when query option 
S3_SKIP_INSERT_STAGING is set as FALSE.  When this query option is set as TRUE, 
INSERT writes to S3 go directly to their final location rather than being 
copied there by the coordinator. If CTAS finishs with error, the parquet 
partition file is left as un-finalized. To fix it, we could call  
WriteFileFooter() before HdfsParquetTableWriter::AppendRows() return with 
error. Or delete the HDFS file when AppendRows() return error and 
ShouldSkipStaging() return TRUE. 

> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> 
>
> Key: IMPALA-10607
> URL: https://issues.apache.org/jira/browse/IMPALA-10607
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> Stack trace:
> Stack trace for S3 build. 
> [https://master-03.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/34/]
> query_test.test_decimal_queries.TestDecimalOverflowExprs.test_ctas_exprs[protocol:
>  beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#34 )
> Took 13 sec.
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
>  has an invalid file length: 4
> Stacktrace
> query_test/test_decimal_queries.py:170: in test_ctas_exprs
> "SELECT count(*) FROM %s" % TBL_NAME_1)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:814:
>  in wrapper
> return function(*args, **kwargs)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:822:
>  in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:923:
>  in __execute_query
> return impalad_client.execute(query, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> self.wait_for_finished(handle)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
>  has an invalid file length: 4
> Standard Error
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> DROP DATABASE IF EXISTS `test_ctas_exprs_7304e515` CASCADE;
> – 2021-03-24 03:56:00,840 INFO MainThread: Started query 
> 574a532f47ac7c80:c1c62ae0
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> CREATE DATABASE `test_ctas_exprs_7304e515`;
> – 2021-03-24 03:56:03,120 INFO MainThread: Started query 
> 424b970f206e271f:ade0b524
> – 2021-03-24 03:56:03,121 INFO MainThread: Created database 
> "test_ctas_exprs_7304e515" for test ID 
> 

[jira] [Created] (IMPALA-10614) HBase scans fail with NotServingRegionException

2021-03-25 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-10614:
---

 Summary: HBase scans fail with NotServingRegionException
 Key: IMPALA-10614
 URL: https://issues.apache.org/jira/browse/IMPALA-10614
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat


A bunch of HBase queries in AsAN build are failing during scan with 
NotServingRegionException:

*Error:*
{code:java}
Messagequery_test/test_queries.py:301: in test_file_partitions 
self.run_test_case('QueryTest/hdfs-partitions', vector) 
common/impala_test_suite.py:665: in run_test_case result = exec_fn(query, 
user=test_section.get('USER', '').strip() or None) 
common/impala_test_suite.py:603: in __exec_in_impala result = 
self.__execute_query(target_impalad_client, query, user=user) 
common/impala_test_suite.py:923: in __execute_query return 
impalad_client.execute(query, user=user) common/impala_connection.py:205: in 
execute return self.__beeswax_client.execute(sql_stmt, user=user) 
beeswax/impala_beeswax.py:187: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:365: in __execute_query 
self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in 
wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery 
aborted:RetriesExhaustedException: Failed after attempts=36, exceptions: E   
2021-03-25T00:47:46.305Z, java.net.SocketTimeoutException: callTimeout=6, 
callDuration=68568: org.apache.hadoop.hbase.NotServingRegionException: 
functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75. 
is not online on localhost,16022,1616626409823 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3389)
 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3366)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1467)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3067)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3400)
 Eat 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
 Eat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) Eat 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) Eat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) E
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) E  
  row '' on table 'functional_hbase.alltypesagg' at 
region=functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75.,
 hostname=localhost,16022,1616624163705, seqNum=2 EE   CAUSED BY: 
SocketTimeoutException: callTimeout=6, callDuration=68568: 
org.apache.hadoop.hbase.NotServingRegionException: 
functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75. 
is not online on localhost,16022,1616626409823 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3389)
 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3366)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1467)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3067)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3400)
 Eat 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
 Eat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) Eat 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) Eat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) E
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) E  
  row '' on table 'functional_hbase.alltypesagg' at 
region=functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75.,
 hostname=localhost,16022,1616624163705, seqNum=2 E   CAUSED BY: 
NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: 
functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75. 
is not online on localhost,16022,1616626409823 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3389)
 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3366)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1467)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3067)
 Eat 

[jira] [Created] (IMPALA-10614) HBase scans fail with NotServingRegionException

2021-03-25 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-10614:
---

 Summary: HBase scans fail with NotServingRegionException
 Key: IMPALA-10614
 URL: https://issues.apache.org/jira/browse/IMPALA-10614
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat


A bunch of HBase queries in AsAN build are failing during scan with 
NotServingRegionException:

*Error:*
{code:java}
Messagequery_test/test_queries.py:301: in test_file_partitions 
self.run_test_case('QueryTest/hdfs-partitions', vector) 
common/impala_test_suite.py:665: in run_test_case result = exec_fn(query, 
user=test_section.get('USER', '').strip() or None) 
common/impala_test_suite.py:603: in __exec_in_impala result = 
self.__execute_query(target_impalad_client, query, user=user) 
common/impala_test_suite.py:923: in __execute_query return 
impalad_client.execute(query, user=user) common/impala_connection.py:205: in 
execute return self.__beeswax_client.execute(sql_stmt, user=user) 
beeswax/impala_beeswax.py:187: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:365: in __execute_query 
self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in 
wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery 
aborted:RetriesExhaustedException: Failed after attempts=36, exceptions: E   
2021-03-25T00:47:46.305Z, java.net.SocketTimeoutException: callTimeout=6, 
callDuration=68568: org.apache.hadoop.hbase.NotServingRegionException: 
functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75. 
is not online on localhost,16022,1616626409823 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3389)
 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3366)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1467)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3067)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3400)
 Eat 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
 Eat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) Eat 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) Eat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) E
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) E  
  row '' on table 'functional_hbase.alltypesagg' at 
region=functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75.,
 hostname=localhost,16022,1616624163705, seqNum=2 EE   CAUSED BY: 
SocketTimeoutException: callTimeout=6, callDuration=68568: 
org.apache.hadoop.hbase.NotServingRegionException: 
functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75. 
is not online on localhost,16022,1616626409823 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3389)
 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3366)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1467)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3067)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3400)
 Eat 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278)
 Eat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) Eat 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) Eat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) E
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) E  
  row '' on table 'functional_hbase.alltypesagg' at 
region=functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75.,
 hostname=localhost,16022,1616624163705, seqNum=2 E   CAUSED BY: 
NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: 
functional_hbase.alltypesagg,,1616624507550.4f6445d522027c7db04be651c7241b75. 
is not online on localhost,16022,1616626409823 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3389)
 Eat 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3366)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1467)
 Eat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3067)
 Eat 

[jira] [Created] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10613:


 Summary: Expose table and partition metadata over HMS API
 Key: IMPALA-10613
 URL: https://issues.apache.org/jira/browse/IMPALA-10613
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Catalogd caches the table and partition metadata. If an external FE needs to be 
supported to query using the Impala, it would need to get this metadata from 
catalogd to compile the query and generate the plan. While a subset of the 
metadata which is cached in catalogd, is sourced from Hive metastore, it also 
caches file metadata which is needed by the Impala backend to create the Impala 
plan. It would be good to expose the table and partition metadata cached in 
catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can 
potentially use this metadata to create a plan. This JIRA tracks the work 
needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10613:


 Summary: Expose table and partition metadata over HMS API
 Key: IMPALA-10613
 URL: https://issues.apache.org/jira/browse/IMPALA-10613
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Catalogd caches the table and partition metadata. If an external FE needs to be 
supported to query using the Impala, it would need to get this metadata from 
catalogd to compile the query and generate the plan. While a subset of the 
metadata which is cached in catalogd, is sourced from Hive metastore, it also 
caches file metadata which is needed by the Impala backend to create the Impala 
plan. It would be good to expose the table and partition metadata cached in 
catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can 
potentially use this metadata to create a plan. This JIRA tracks the work 
needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10612) Catalogd changes to support external FE

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10612:


 Summary: Catalogd changes to support external FE
 Key: IMPALA-10612
 URL: https://issues.apache.org/jira/browse/IMPALA-10612
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


This issue is used to track the work needed to expose metadata in the catalogd 
over HMS API so that any HMS compatible client would be able to use catalogd as 
a metadata cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10612) Catalogd changes to support external FE

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10612:


 Summary: Catalogd changes to support external FE
 Key: IMPALA-10612
 URL: https://issues.apache.org/jira/browse/IMPALA-10612
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


This issue is used to track the work needed to expose metadata in the catalogd 
over HMS API so that any HMS compatible client would be able to use catalogd as 
a metadata cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10611) test_wide_row fails with 'Failed to allocate row batch'

2021-03-25 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-10611:
---

 Summary: test_wide_row fails with 'Failed to allocate row batch'
 Key: IMPALA-10611
 URL: https://issues.apache.org/jira/browse/IMPALA-10611
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat


It appears that we may not be accounting for memory properly somewhere and as a 
result we end up with allocations much more than reservations resulting in OOM 
error. It's also possible we may not be freeing memory properly (could be a 
leak or memory accumulation due to smart pointers etc).

It is also possible that something (table data?) changed in the test case and 
as a result the memory limit is not sufficient.

*Error:*
{code:java}
Messagequery_test/test_scanners.py:276: in test_wide_row 
self.run_test_case('QueryTest/wide-row', new_vector) 
common/impala_test_suite.py:665: in run_test_case result = exec_fn(query, 
user=test_section.get('USER', '').strip() or None) 
common/impala_test_suite.py:603: in __exec_in_impala result = 
self.__execute_query(target_impalad_client, query, user=user) 
common/impala_test_suite.py:923: in __execute_query return 
impalad_client.execute(query, user=user) common/impala_connection.py:205: in 
execute return self.__beeswax_client.execute(sql_stmt, user=user) 
beeswax/impala_beeswax.py:187: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:365: in __execute_query 
self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in 
wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery 
aborted:Memory limit exceeded: Failed to allocate row batch E   EXCHANGE_NODE 
(id=1) could not allocate 16.00 MB without exceeding limit. E   Error occurred 
on backend ip-172-31-14-211:27000 E   Memory left in process limit: 9.09 GB E   
Memory left in query limit: 9.95 MB E   
Query(7647dab034192a59:e4f29729): Limit=100.00 MB Reservation=40.00 MB 
ReservationLimit=68.00 MB OtherMemory=50.05 MB Total=90.05 MB Peak=90.05 MB E   
  Unclaimed reservations: Reservation=32.00 MB OtherMemory=0 Total=32.00 MB 
Peak=40.00 MB E Fragment 7647dab034192a59:e4f297290001: 
Reservation=8.00 MB OtherMemory=50.04 MB Total=58.04 MB Peak=58.05 MB E   
HDFS_SCAN_NODE (id=0): Reservation=8.00 MB OtherMemory=30.01 MB Total=38.01 MB 
Peak=58.03 MB E   KrpcDataStreamSender (dst_id=1): Total=9.84 KB Peak=9.84 
KB E CodeGen: Total=1004.00 B Peak=118.00 KB E Fragment 
7647dab034192a59:e4f29729: Reservation=0 OtherMemory=16.00 KB 
Total=16.00 KB Peak=16.00 KB E   EXCHANGE_NODE (id=1): Reservation=8.00 KB 
OtherMemory=0 Total=8.00 KB Peak=8.00 KB E KrpcDeferredRpcs: Total=0 
Peak=0 E   PLAN_ROOT_SINK: Total=0 Peak=0 E CodeGen: Total=90.00 B 
Peak=65.00 KB{code}
*StackTrace:*
{code:java}
query_test/test_scanners.py:276: in test_wide_row
self.run_test_case('QueryTest/wide-row', new_vector)
common/impala_test_suite.py:665: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:603: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:923: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:365: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:386: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Memory limit exceeded: Failed to allocate row batch
E   EXCHANGE_NODE (id=1) could not allocate 16.00 MB without exceeding limit.
E   Error occurred on backend ip-172-31-14-211:27000
E   Memory left in process limit: 9.09 GB
E   Memory left in query limit: 9.95 MB
E   Query(7647dab034192a59:e4f29729): Limit=100.00 MB Reservation=40.00 
MB ReservationLimit=68.00 MB OtherMemory=50.05 MB Total=90.05 MB Peak=90.05 MB
E Unclaimed reservations: Reservation=32.00 MB OtherMemory=0 Total=32.00 MB 
Peak=40.00 MB
E Fragment 7647dab034192a59:e4f297290001: Reservation=8.00 MB 
OtherMemory=50.04 MB Total=58.04 MB Peak=58.05 MB
E   HDFS_SCAN_NODE (id=0): Reservation=8.00 MB OtherMemory=30.01 MB 
Total=38.01 MB Peak=58.03 MB
E   KrpcDataStreamSender (dst_id=1): Total=9.84 KB Peak=9.84 KB
E CodeGen: Total=1004.00 B Peak=118.00 KB
E Fragment 7647dab034192a59:e4f29729: Reservation=0 
OtherMemory=16.00 KB Total=16.00 KB Peak=16.00 KB
E   EXCHANGE_NODE (id=1): Reservation=8.00 KB 

[jira] [Created] (IMPALA-10611) test_wide_row fails with 'Failed to allocate row batch'

2021-03-25 Thread Abhishek Rawat (Jira)
Abhishek Rawat created IMPALA-10611:
---

 Summary: test_wide_row fails with 'Failed to allocate row batch'
 Key: IMPALA-10611
 URL: https://issues.apache.org/jira/browse/IMPALA-10611
 Project: IMPALA
  Issue Type: Bug
Reporter: Abhishek Rawat


It appears that we may not be accounting for memory properly somewhere and as a 
result we end up with allocations much more than reservations resulting in OOM 
error. It's also possible we may not be freeing memory properly (could be a 
leak or memory accumulation due to smart pointers etc).

It is also possible that something (table data?) changed in the test case and 
as a result the memory limit is not sufficient.

*Error:*
{code:java}
Messagequery_test/test_scanners.py:276: in test_wide_row 
self.run_test_case('QueryTest/wide-row', new_vector) 
common/impala_test_suite.py:665: in run_test_case result = exec_fn(query, 
user=test_section.get('USER', '').strip() or None) 
common/impala_test_suite.py:603: in __exec_in_impala result = 
self.__execute_query(target_impalad_client, query, user=user) 
common/impala_test_suite.py:923: in __execute_query return 
impalad_client.execute(query, user=user) common/impala_connection.py:205: in 
execute return self.__beeswax_client.execute(sql_stmt, user=user) 
beeswax/impala_beeswax.py:187: in execute handle = 
self.__execute_query(query_string.strip(), user=user) 
beeswax/impala_beeswax.py:365: in __execute_query 
self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in 
wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery 
aborted:Memory limit exceeded: Failed to allocate row batch E   EXCHANGE_NODE 
(id=1) could not allocate 16.00 MB without exceeding limit. E   Error occurred 
on backend ip-172-31-14-211:27000 E   Memory left in process limit: 9.09 GB E   
Memory left in query limit: 9.95 MB E   
Query(7647dab034192a59:e4f29729): Limit=100.00 MB Reservation=40.00 MB 
ReservationLimit=68.00 MB OtherMemory=50.05 MB Total=90.05 MB Peak=90.05 MB E   
  Unclaimed reservations: Reservation=32.00 MB OtherMemory=0 Total=32.00 MB 
Peak=40.00 MB E Fragment 7647dab034192a59:e4f297290001: 
Reservation=8.00 MB OtherMemory=50.04 MB Total=58.04 MB Peak=58.05 MB E   
HDFS_SCAN_NODE (id=0): Reservation=8.00 MB OtherMemory=30.01 MB Total=38.01 MB 
Peak=58.03 MB E   KrpcDataStreamSender (dst_id=1): Total=9.84 KB Peak=9.84 
KB E CodeGen: Total=1004.00 B Peak=118.00 KB E Fragment 
7647dab034192a59:e4f29729: Reservation=0 OtherMemory=16.00 KB 
Total=16.00 KB Peak=16.00 KB E   EXCHANGE_NODE (id=1): Reservation=8.00 KB 
OtherMemory=0 Total=8.00 KB Peak=8.00 KB E KrpcDeferredRpcs: Total=0 
Peak=0 E   PLAN_ROOT_SINK: Total=0 Peak=0 E CodeGen: Total=90.00 B 
Peak=65.00 KB{code}
*StackTrace:*
{code:java}
query_test/test_scanners.py:276: in test_wide_row
self.run_test_case('QueryTest/wide-row', new_vector)
common/impala_test_suite.py:665: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:603: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:923: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:365: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:386: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:Memory limit exceeded: Failed to allocate row batch
E   EXCHANGE_NODE (id=1) could not allocate 16.00 MB without exceeding limit.
E   Error occurred on backend ip-172-31-14-211:27000
E   Memory left in process limit: 9.09 GB
E   Memory left in query limit: 9.95 MB
E   Query(7647dab034192a59:e4f29729): Limit=100.00 MB Reservation=40.00 
MB ReservationLimit=68.00 MB OtherMemory=50.05 MB Total=90.05 MB Peak=90.05 MB
E Unclaimed reservations: Reservation=32.00 MB OtherMemory=0 Total=32.00 MB 
Peak=40.00 MB
E Fragment 7647dab034192a59:e4f297290001: Reservation=8.00 MB 
OtherMemory=50.04 MB Total=58.04 MB Peak=58.05 MB
E   HDFS_SCAN_NODE (id=0): Reservation=8.00 MB OtherMemory=30.01 MB 
Total=38.01 MB Peak=58.03 MB
E   KrpcDataStreamSender (dst_id=1): Total=9.84 KB Peak=9.84 KB
E CodeGen: Total=1004.00 B Peak=118.00 KB
E Fragment 7647dab034192a59:e4f29729: Reservation=0 
OtherMemory=16.00 KB Total=16.00 KB Peak=16.00 KB
E   EXCHANGE_NODE (id=1): Reservation=8.00 KB 

[jira] [Updated] (IMPALA-10326) PrincipalPrivilegeTree doesn't handle empty string and wildcards correctly

2021-03-25 Thread Laszlo Gaal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Gaal updated IMPALA-10326:
-
Summary: PrincipalPrivilegeTree doesn't handle empty string and wildcards 
correctly  (was: PrincipalPrivilegeTree doesn't handle emtry string and 
wildcards correctly)

> PrincipalPrivilegeTree doesn't handle empty string and wildcards correctly
> --
>
> Key: IMPALA-10326
> URL: https://issues.apache.org/jira/browse/IMPALA-10326
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: Csaba Ringhofer
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: sentry
>
> Two bugs slipped through the tests in IMPALA-9242:
> 1. It assumes that unfilled elements in a privilege (e.g. table in a database 
> privilege) are null, which is true in FE tests, but it is an empty string 
> when the privilege comes from SentryPrivilege.
> 2. wildcards (e.g. * in server=server1->db=tpcds->table=*) are not handled 
> well in the tree - it will be added as a table level privilege, while it 
> should be added as a database-wide privilege



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10597) Enable setting 'iceberg.file_format'

2021-03-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10597:
---
Description: 
Currently we prohibit setting the following properties:
 * iceberg.catalog
 * iceberg.catalog_location
 * iceberg.file_format
 * icceberg.table_identifier

Impala needs these properties to be able to correctly use the table. However, 
if the table was created by an other engine. e.g. by Hive, then the table won't 
have these properties.

We need to allow setting at least 'iceberg.file_format'.

  was:
Currently we prohibit setting the following properties:
 * iceberg.catalog
 * iceberg.catalog_location
 * iceberg.file_format
 * icceberg.table_identifier

Impala needs these properties to be able to correctly use the table. However, 
if the table was created by an other engine. e.g. by Hive, then the table won't 
have these properties.

Therefore to make such tables usable, we must allow setting the above 
properties.


> Enable setting 'iceberg.file_format'
> 
>
> Key: IMPALA-10597
> URL: https://issues.apache.org/jira/browse/IMPALA-10597
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently we prohibit setting the following properties:
>  * iceberg.catalog
>  * iceberg.catalog_location
>  * iceberg.file_format
>  * icceberg.table_identifier
> Impala needs these properties to be able to correctly use the table. However, 
> if the table was created by an other engine. e.g. by Hive, then the table 
> won't have these properties.
> We need to allow setting at least 'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10597) Enable setting 'iceberg.file_format'

2021-03-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10597:
---
Summary: Enable setting 'iceberg.file_format'  (was: Enable setting Iceberg 
table properties)

> Enable setting 'iceberg.file_format'
> 
>
> Key: IMPALA-10597
> URL: https://issues.apache.org/jira/browse/IMPALA-10597
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently we prohibit setting the following properties:
>  * iceberg.catalog
>  * iceberg.catalog_location
>  * iceberg.file_format
>  * icceberg.table_identifier
> Impala needs these properties to be able to correctly use the table. However, 
> if the table was created by an other engine. e.g. by Hive, then the table 
> won't have these properties.
> Therefore to make such tables usable, we must allow setting the above 
> properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10610) Support multiple file formats in a single Iceberg Table

2021-03-25 Thread Jira
Zoltán Borók-Nagy created IMPALA-10610:
--

 Summary: Support multiple file formats in a single Iceberg Table
 Key: IMPALA-10610
 URL: https://issues.apache.org/jira/browse/IMPALA-10610
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Frontend
Reporter: Zoltán Borók-Nagy


Iceberg allows having different file formats in a single table. It stores the 
file format information for each data file.

Impala only allows a single file format per partition. Iceberg tables are 
handled as non-partitioned HMS tables (Iceberg partitioning is more or less 
hidden from Impala). Therefore currently Impala don't allow having different 
file formats in a single table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10610) Support multiple file formats in a single Iceberg Table

2021-03-25 Thread Jira
Zoltán Borók-Nagy created IMPALA-10610:
--

 Summary: Support multiple file formats in a single Iceberg Table
 Key: IMPALA-10610
 URL: https://issues.apache.org/jira/browse/IMPALA-10610
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Frontend
Reporter: Zoltán Borók-Nagy


Iceberg allows having different file formats in a single table. It stores the 
file format information for each data file.

Impala only allows a single file format per partition. Iceberg tables are 
handled as non-partitioned HMS tables (Iceberg partitioning is more or less 
hidden from Impala). Therefore currently Impala don't allow having different 
file formats in a single table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10578) Big Query influence other query seriously when hardware not reach limit

2021-03-25 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308564#comment-17308564
 ] 

Quanlong Huang commented on IMPALA-10578:
-

Thanks for uploading the profiles. Are the machines in a high load? E.g. how is 
the "load average" shown by top/htop. In the profile of the small query, the 
timeline shows
{code:java}
- Ready to start on 56 backends: 15.271ms (1.649ms)
- All 56 execution backends (171 fragment instances) started: 18s505ms 
(18s489ms){code}
I think this indicates a high load on the machines so it takes time to launch 
the fragment instance threads.

For the Trace log, I think it's possible that acquiring the lock here take more 
time if the OS is scheduling too many threads:
 
[https://github.com/apache/impala/blob/e3bafcbef4fd7152ecfcbc7d331e41e9778caf15/be/src/runtime/krpc-data-stream-recvr.cc#L454]

I think we need more metrics in {{KrpcDataStreamRecvr::SenderQueue::AddBatch}} 
to drill down, as commented by this TODO: 
[https://github.com/apache/impala/blob/e3bafcbef4fd7152ecfcbc7d331e41e9778caf15/be/src/runtime/krpc-data-stream-recvr.cc#L433]

Maybe we should prioritize IMPALA-10137 for this.

cc [~drorke], [~rizaon], [~joemcdonnell], [~twmarshall]

> Big Query influence other query seriously when hardware not reach limit 
> 
>
> Key: IMPALA-10578
> URL: https://issues.apache.org/jira/browse/IMPALA-10578
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
> Environment: impala-3.4
> 80 machines with 96 cpu and 256GB mem
> scratch-dir is on separate disk different from HDFS data dir
>Reporter: wesleydeng
>Priority: Major
> Attachments: big_query.txt.bz2, image-2021-03-10-19-59-24-188.png, 
> image-2021-03-16-16-32-37-862.png, small_query_be_influenced_very_slow.txt.bz2
>
>
> When a big query is running(use mt_dop=8), other query is very difficult to 
> start. 
> A small query (select distinct one field from a small table)  may take about 
> 1 minutes, normallly it take only about 1~3 second.
>  From the impalad log, I found a incomprehensible log like this:
> !image-2021-03-16-16-32-37-862.png|width=836,height=189!
> !image-2021-03-10-19-59-24-188.png|width=892,height=435!
>  
> When the Big query is running, data spilled  has happened because mem_limit 
> was set and this big query waste a lot of memory.
>  
> In the attchment, I append the profile of big query and small query. The 
> small query can be finished in seconds normally. the timeline of small query 
> show  as below:
> Query Timeline: 21m39s
>  - Query submitted: 48.846us (48.846us)
>  - Planning finished: 2.934ms (2.886ms)
>  - Submit for admission: 12.572ms (9.637ms)
>  - Completed admission: 13.622ms (1.050ms)
>  - Ready to start on 56 backends: 15.271ms (1.649ms)
>  *- All 56 execution backends (171 fragment instances) started: 18s505ms 
> (18s489ms)*
>  - Rows available: 51s770ms (33s265ms)
>  - First row fetched: 57s220ms (5s449ms)
>  - Last row fetched: 59s119ms (1s899ms)
>  - Released admission control resources: 1m1s (2s223ms)
>  - AdmissionControlTimeSinceLastUpdate: 80.000ms
>  - ComputeScanRangeAssignmentTimer: 439.749us
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10609) NullPointerException in loading tables introduced by ranger masking policies

2021-03-25 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10609 started by Quanlong Huang.
---
> NullPointerException in loading tables introduced by ranger masking policies
> 
>
> Key: IMPALA-10609
> URL: https://issues.apache.org/jira/browse/IMPALA-10609
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Found some NullPointerException logs in logs/fe_tests/FeSupport.INFO when 
> running ranger unit tests, e.g. AuthorizationStmtTest#testInsert:
> {code:java}
> I0325 17:03:06.550729 11365 AuthorizationTestBase.java:494] Testing authzOk 
> for insert into functional.zipcode_incomes(id) values('123')
> E0325 17:03:06.551160 11365 StmtMetadataLoader.java:317] Failed to collect 
> policy tables for functional.zipcode_incomes
> Java exception follows:
> java.lang.NullPointerException
> at 
> org.apache.impala.authorization.ranger.RangerAuthorizationChecker.evalColumnMask(RangerAuthorizationChecker.java:503)
> at 
> org.apache.impala.authorization.ranger.RangerAuthorizationChecker.needsMaskingOrFiltering(RangerAuthorizationChecker.java:350)
> at 
> org.apache.impala.authorization.TableMask.needsMaskingOrFiltering(TableMask.java:67)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.collectPolicyTables(StmtMetadataLoader.java:351)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:315)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:156)
> at 
> org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:132)
> at 
> org.apache.impala.common.FrontendTestBase.parseAndAnalyze(FrontendTestBase.java:320)
> at 
> org.apache.impala.authorization.AuthorizationTestBase.authzOk(AuthorizationTestBase.java:495)
> at 
> org.apache.impala.authorization.AuthorizationTestBase.authzOk(AuthorizationTestBase.java:488)
> at 
> org.apache.impala.authorization.AuthorizationTestBase.access$300(AuthorizationTestBase.java:74)
> at 
> org.apache.impala.authorization.AuthorizationTestBase$AuthzTest.ok(AuthorizationTestBase.java:331)
> at 
> org.apache.impala.authorization.AuthorizationStmtTest.testInsert(AuthorizationStmtTest.java:794)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> 

[jira] [Created] (IMPALA-10609) NullPointerException in loading tables introduced by ranger masking policies

2021-03-25 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10609:
---

 Summary: NullPointerException in loading tables introduced by 
ranger masking policies
 Key: IMPALA-10609
 URL: https://issues.apache.org/jira/browse/IMPALA-10609
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Found some NullPointerException logs in logs/fe_tests/FeSupport.INFO when 
running ranger unit tests, e.g. AuthorizationStmtTest#testInsert:
{code:java}
I0325 17:03:06.550729 11365 AuthorizationTestBase.java:494] Testing authzOk for 
insert into functional.zipcode_incomes(id) values('123')
E0325 17:03:06.551160 11365 StmtMetadataLoader.java:317] Failed to collect 
policy tables for functional.zipcode_incomes
Java exception follows:
java.lang.NullPointerException
at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.evalColumnMask(RangerAuthorizationChecker.java:503)
at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.needsMaskingOrFiltering(RangerAuthorizationChecker.java:350)
at 
org.apache.impala.authorization.TableMask.needsMaskingOrFiltering(TableMask.java:67)
at 
org.apache.impala.analysis.StmtMetadataLoader.collectPolicyTables(StmtMetadataLoader.java:351)
at 
org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:315)
at 
org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:156)
at 
org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:132)
at 
org.apache.impala.common.FrontendTestBase.parseAndAnalyze(FrontendTestBase.java:320)
at 
org.apache.impala.authorization.AuthorizationTestBase.authzOk(AuthorizationTestBase.java:495)
at 
org.apache.impala.authorization.AuthorizationTestBase.authzOk(AuthorizationTestBase.java:488)
at 
org.apache.impala.authorization.AuthorizationTestBase.access$300(AuthorizationTestBase.java:74)
at 
org.apache.impala.authorization.AuthorizationTestBase$AuthzTest.ok(AuthorizationTestBase.java:331)
at 
org.apache.impala.authorization.AuthorizationStmtTest.testInsert(AuthorizationStmtTest.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 

[jira] [Created] (IMPALA-10609) NullPointerException in loading tables introduced by ranger masking policies

2021-03-25 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10609:
---

 Summary: NullPointerException in loading tables introduced by 
ranger masking policies
 Key: IMPALA-10609
 URL: https://issues.apache.org/jira/browse/IMPALA-10609
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Found some NullPointerException logs in logs/fe_tests/FeSupport.INFO when 
running ranger unit tests, e.g. AuthorizationStmtTest#testInsert:
{code:java}
I0325 17:03:06.550729 11365 AuthorizationTestBase.java:494] Testing authzOk for 
insert into functional.zipcode_incomes(id) values('123')
E0325 17:03:06.551160 11365 StmtMetadataLoader.java:317] Failed to collect 
policy tables for functional.zipcode_incomes
Java exception follows:
java.lang.NullPointerException
at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.evalColumnMask(RangerAuthorizationChecker.java:503)
at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.needsMaskingOrFiltering(RangerAuthorizationChecker.java:350)
at 
org.apache.impala.authorization.TableMask.needsMaskingOrFiltering(TableMask.java:67)
at 
org.apache.impala.analysis.StmtMetadataLoader.collectPolicyTables(StmtMetadataLoader.java:351)
at 
org.apache.impala.analysis.StmtMetadataLoader.getMissingTables(StmtMetadataLoader.java:315)
at 
org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:156)
at 
org.apache.impala.analysis.StmtMetadataLoader.loadTables(StmtMetadataLoader.java:132)
at 
org.apache.impala.common.FrontendTestBase.parseAndAnalyze(FrontendTestBase.java:320)
at 
org.apache.impala.authorization.AuthorizationTestBase.authzOk(AuthorizationTestBase.java:495)
at 
org.apache.impala.authorization.AuthorizationTestBase.authzOk(AuthorizationTestBase.java:488)
at 
org.apache.impala.authorization.AuthorizationTestBase.access$300(AuthorizationTestBase.java:74)
at 
org.apache.impala.authorization.AuthorizationTestBase$AuthzTest.ok(AuthorizationTestBase.java:331)
at 
org.apache.impala.authorization.AuthorizationStmtTest.testInsert(AuthorizationStmtTest.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 

[jira] [Comment Edited] (IMPALA-10607) TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build

2021-03-25 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308405#comment-17308405
 ] 

Wenzhe Zhou edited comment on IMPALA-10607 at 3/25/21, 6:06 AM:


When tried to read the table after CTAS failed, got following error "Query 
aborted:Parquet file 
s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
 has an invalid file length: 4".  The query ended up with a corrupt table on S3 
when the CTAS  finished with error. It seems the Parquet file is not finalized 
on S3 when the query was aborted. 

That sounds like a bug. It's low priority since the table isn't expected to 
have meaningful contents anyways.

Before the patch for IMPALA-10564 was merged, CTAS with selection from other 
source table (for example, create table t11 as select id, cast(a*b*c as decimal 
(28,10)) from t10) fails when there is decimal overflow. Verified that we got 
same error on S3 when tried to access the table after CTAS failed. So this is 
NOT a new issue.

When HdfsParquetTableWriter::AppendRows() return an error,  
HdfsTableSink::WriteRowsToPartition return error without calling 
HdfsTableSink::FinalizePartitionFile() so that 
HdfsParquetTableWriter::Finalize() is not called. This could cause data file 
corruption on S3. It's tricky to fix the issue. If 
HdfsParquetTableWriter::Finalize() is called, NULL will be wrote to table. But 
we don't expect to insert NULL into the table when the query is aborted.


was (Author: wzhou):
When tried to read the table after CTAS failed, got following error "Query 
aborted:Parquet file 
s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
 has an invalid file length: 4".  The query ended up with a corrupt table on S3 
when the CTAS  finished with error. It seems the Parquet file is not finalized 
on S3 when the query was aborted. 

That sounds like a bug. It's low priority since the table isn't expected to 
have meaningful contents anyways.

Before the patch for IMPALA-10564 was merged, CTAS with selection from other 
source table (for example, create table t11 as select id, cast(a*b*c as decimal 
(28,10)) from t10) fails when there is decimal overflow. Verified that we got 
same error on S3 when tried to access the table after CTAS failed. So this is 
NOT a new issue.

When HdfsParquetTableWriter::AppendRows() return an error,  
HdfsTableSink::WriteRowsToPartition return error without calling 

HdfsTableSink::FinalizePartitionFile() so that 
HdfsParquetTableWriter::Finalize() is not called. This could cause data file 
corruption. It's tricky to fix the issue. If HdfsParquetTableWriter::Finalize() 
is called, NULL will be wrote to table. But we don't expect to insert into the 
table.

> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> 
>
> Key: IMPALA-10607
> URL: https://issues.apache.org/jira/browse/IMPALA-10607
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> Stack trace:
> Stack trace for S3 build. 
> [https://master-03.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/34/]
> query_test.test_decimal_queries.TestDecimalOverflowExprs.test_ctas_exprs[protocol:
>  beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#34 )
> Took 13 sec.
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
>  has an invalid file length: 4
> Stacktrace
> query_test/test_decimal_queries.py:170: in test_ctas_exprs
> "SELECT count(*) FROM %s" % TBL_NAME_1)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:814:
>  in wrapper
> return function(*args, **kwargs)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:822:
>  in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:923:
>  in __execute_query
> return impalad_client.execute(query, user=user)
> 

[jira] [Commented] (IMPALA-10607) TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build

2021-03-25 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308405#comment-17308405
 ] 

Wenzhe Zhou commented on IMPALA-10607:
--

When tried to read the table after CTAS failed, got following error "Query 
aborted:Parquet file 
s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
 has an invalid file length: 4".  The query ended up with a corrupt table on S3 
when the CTAS  finished with error. It seems the Parquet file is not finalized 
on S3 when the query was aborted. 

That sounds like a bug. It's low priority since the table isn't expected to 
have meaningful contents anyways.

Before the patch for IMPALA-10564 was merged, CTAS with selection from other 
source table (for example, create table t11 as select id, cast(a*b*c as decimal 
(28,10)) from t10) fails when there is decimal overflow. Verified that we got 
same error on S3 when tried to access the table after CTAS failed. So this is 
NOT a new issue.

When HdfsParquetTableWriter::AppendRows() return an error,  
HdfsTableSink::WriteRowsToPartition return error without calling 

HdfsTableSink::FinalizePartitionFile() so that 
HdfsParquetTableWriter::Finalize() is not called. This could cause data file 
corruption. It's tricky to fix the issue. If HdfsParquetTableWriter::Finalize() 
is called, NULL will be wrote to table. But we don't expect to insert into the 
table.

> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> 
>
> Key: IMPALA-10607
> URL: https://issues.apache.org/jira/browse/IMPALA-10607
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> TestDecimalOverflowExprs::test_ctas_exprs failed in S3 build
> Stack trace:
> Stack trace for S3 build. 
> [https://master-03.jenkins.cloudera.com/job/impala-cdpd-master-staging-core-s3/34/]
> query_test.test_decimal_queries.TestDecimalOverflowExprs.test_ctas_exprs[protocol:
>  beeswax | exec_option: \\{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> Failing for the past 1 build (Since Failed#34 )
> Took 13 sec.
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException: Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
>  has an invalid file length: 4
> Stacktrace
> query_test/test_decimal_queries.py:170: in test_ctas_exprs
> "SELECT count(*) FROM %s" % TBL_NAME_1)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:814:
>  in wrapper
> return function(*args, **kwargs)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:822:
>  in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_test_suite.py:923:
>  in __execute_query
> return impalad_client.execute(query, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/common/impala_connection.py:205:
>  in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:187:
>  in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:365:
>  in __execute_query
> self.wait_for_finished(handle)
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/tests/beeswax/impala_beeswax.py:386:
>  in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Parquet file 
> s3a://impala-test-uswest2-1/test-warehouse/test_ctas_exprs_7304e515.db/overflowed_decimal_tbl_1/b74f0ce129189cf1-4c3c5bd6_1609291350_data.0.parq
>  has an invalid file length: 4
> Standard Error
> SET 
> client_identifier=query_test/test_decimal_queries.py::TestDecimalOverflowExprs::()::test_ctas_exprs[protocol:beeswax|exec_option:\{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0};
> SET sync_ddl=False;
> – executing against localhost:21000
> DROP DATABASE IF EXISTS `test_ctas_exprs_7304e515`