[jira] [Resolved] (IMPALA-9074) Add support for zstd in ORC

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-9074.

Target Version: Impala 4.0.0
Resolution: Fixed

> Add support for zstd in ORC
> ---
>
> Key: IMPALA-9074
> URL: https://issues.apache.org/jira/browse/IMPALA-9074
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: id_name_zstd.orc
>
>
> The ORC lib already supports reading/writing to zstd compressed ORC files. 
> However, I failed in a quick try in Impala:
> {code:sql}
> hive> create table orc_zstd (id int, name string) stored as orc;
> $ hdfs dfs -put id_name_zstd.orc 
> hdfs://localhost:20500/test-warehouse/orc_zstd
> impala-shell> invalidate metadata orc_zstd;
> impala-shell> select * from orc_zstd;
> ERROR: Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_zstd/id_name_zstd.orc: Unknown 
> compression codec 5
> {code}
> The ORC file is generated by the csv-import tool: 
> https://github.com/apache/orc/blob/rel/release-1.6.0/tools/src/CSVFileImport.cc
> (Manually changing the compression from ZLIB to ZSTD in it)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10808) Crash of illegal decimal schema in test_fuzz_decimal_tbl

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-10808.
-
Resolution: Fixed

> Crash of illegal decimal schema in test_fuzz_decimal_tbl
> 
>
> Key: IMPALA-10808
> URL: https://issues.apache.org/jira/browse/IMPALA-10808
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.1.0
>
>
> Recently saw two unrelated jobs failed by the same crash:
>  * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14369]
>  * [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14381]
> For example in the second job, the test that crashes impalad is {code}
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[protocol:beeswax|exec_option:{'debug_action':'-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5';'abort_on_error':False;'mem_limit':'512m';'num_nodes':0}|table_format:parquet/none
> {code}
> The failure is
> {code:java}
> I0720 03:34:53.168516 126039 runtime-state.cc:196] 
> 8a42e69ff49106c8:d2096a71] Error from query 
> 8a42e69ff49106c8:d2096a70: File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_4a8e12be.db/decimal_tbl/d6=1/copy1_6b48619353a75ffb-66460f74_973668612_data.0.parq'
>  column 'd1' does not have the decimal precision set.
> F0720 03:34:53.168567 126039 types.h:282] 8a42e69ff49106c8:d2096a71] 
> Check failed: precision > 0 (0 vs. 0)
> {code}
> CC [~boroknagyz] who owns the first job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10381) Fix overloading of --ldap_passwords_in_clear_ok

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10381:

Fix Version/s: Impala 4.0.0

> Fix overloading of --ldap_passwords_in_clear_ok
> ---
>
> Key: IMPALA-10381
> URL: https://issues.apache.org/jira/browse/IMPALA-10381
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 4.0.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> The --ldap_passwords_in_clear_ok flag was originally intended to allow 
> configurations where Impala connects to LDAP without SSL, for testing 
> purposes.
> Since then, two other uses of the flag have been added: 1) for controlling 
> whether cookies include the 'Secure' attribute and 2) for controlling whether 
> the webserver allows LDAP auth to be enabled if SSL isn't.
> Some use cases may prefer to control these values separately - for example, 
> in a Kubernetes environment there may be SSL termination that happens at the 
> ingress such that SSL isn't enabled on the webserver but its still safe to 
> have LDAP auth enabled, in which case the 'Secure' attribute is still desired 
> for cookies.
> We should separate this out into 3 different flags. Because the flag was 
> marked 'for testing only', I don't think this needs to be considered a 
> breaking change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10447) Missing \n every 1024 or 2048 lines when exporting output from shell to a file

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10447:

Fix Version/s: Impala 4.0.0

> Missing \n every 1024 or 2048 lines when exporting output from shell to a file
> --
>
> Key: IMPALA-10447
> URL: https://issues.apache.org/jira/browse/IMPALA-10447
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 4.0.0
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> When Impala shell exports output to a file, for example 
> {code:java}
> impala-shell -B  -V  -q "select * from tpcds.item" -o filex.csv 
> --output_delimiter=';'{code}
> then every 1024 (or maybe 2048) rows a newline is missing.
> I think the problem is here: 
> [https://github.com/apache/impala/blob/9bb7157bf014282c95ab3e233b80d77e00c95b52/shell/shell_output.py#L119]
>  where we now use write instead of print to output the rows.
>  It may be sufficient to add a
> {code:java}
> out_file.write('\n')
> {code}
> here



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10472) Add a flag to control Kudu client connection negotiation timeout in backend

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10472:
---

Assignee: Alexey Serbin

> Add a flag to control Kudu client connection negotiation timeout in backend
> ---
>
> Key: IMPALA-10472
> URL: https://issues.apache.org/jira/browse/IMPALA-10472
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>  Labels: Kudu, client, rpc
> Fix For: Impala 4.0.0
>
>
> Since [KUDU-2966|http://issues.apache.org/jira/browse/KUDU-2966] is 
> addressed, it's now possible to control the RPC connection negotiation 
> timeout from the Kudu client's side (C++ client).  To use the newly 
> introduced functionality in Impala's back-end, it's necessary to provide a 
> control knob for that.
> This should help to address cases where busy/overloaded cluster nodes hosting 
> Kudu tablet servers aren't fast enough to negotiate new connections within 
> the default timeout interval (in most cases that's about Kudu server's 
> connection negotiation threads being scheduled as needed and getting enough 
> CPU time).  In practice, it's necessary to customize the corresponding 
> setting on the Kudu server side: it's controlled by the 
> {{\-\-rpc_negotiation_timeout_ms}} flag for {{kudu-master}} and 
> {{kudu-tserver}}.
> The idea is to add a gflag named like 
> {{\-\-kudu_client_connection_negotiation_timeout_ms}} with default value of 
> 3000.  The default value 3000 is to keep the new code backwards-compatible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10471) Make deadline configurable for SIGTERMIN graceful shutdown

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10471:

Fix Version/s: Impala 4.0.0

> Make deadline configurable for SIGTERMIN graceful shutdown 
> ---
>
> Key: IMPALA-10471
> URL: https://issues.apache.org/jira/browse/IMPALA-10471
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Tamas Mate
>Assignee: Amogh Margoor
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0.0
>
>
> While the graceful shutdown deadline can be configured when the SHUTDOWN() 
> statement is executed, when the shutdown is initiated with SIGTERMIN signal 
> it will use a default 1 year deadline. The related 
> {{ImpalaShutdownSignalHandler}} can be found 
> [here|https://github.com/apache/impala/blame/a81c6a78294d1da72b57ed90ec4e365de8c4e54b/be/src/common/init.cc#L179].
> {code:java}
> ...
> const int ONE_YEAR_IN_SECONDS = 365 * 24 * 60 * 60;
> Status status = impala_server->StartShutdown(ONE_YEAR_IN_SECONDS, 
> _status);
> ...
> {code}
> The {{--shutdown_deadline_s}} flag should be respected in this case as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10472) Add a flag to control Kudu client connection negotiation timeout in backend

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10472:

Fix Version/s: Impala 4.0.0

> Add a flag to control Kudu client connection negotiation timeout in backend
> ---
>
> Key: IMPALA-10472
> URL: https://issues.apache.org/jira/browse/IMPALA-10472
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: Kudu, client, rpc
> Fix For: Impala 4.0.0
>
>
> Since [KUDU-2966|http://issues.apache.org/jira/browse/KUDU-2966] is 
> addressed, it's now possible to control the RPC connection negotiation 
> timeout from the Kudu client's side (C++ client).  To use the newly 
> introduced functionality in Impala's back-end, it's necessary to provide a 
> control knob for that.
> This should help to address cases where busy/overloaded cluster nodes hosting 
> Kudu tablet servers aren't fast enough to negotiate new connections within 
> the default timeout interval (in most cases that's about Kudu server's 
> connection negotiation threads being scheduled as needed and getting enough 
> CPU time).  In practice, it's necessary to customize the corresponding 
> setting on the Kudu server side: it's controlled by the 
> {{\-\-rpc_negotiation_timeout_ms}} flag for {{kudu-master}} and 
> {{kudu-tserver}}.
> The idea is to add a gflag named like 
> {{\-\-kudu_client_connection_negotiation_timeout_ms}} with default value of 
> 3000.  The default value 3000 is to keep the new code backwards-compatible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10499) test_misc failing

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10499:

Fix Version/s: Impala 4.0.0

> test_misc failing
> -
>
> Key: IMPALA-10499
> URL: https://issues.apache.org/jira/browse/IMPALA-10499
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Bikramjeet Vig
>Assignee: Tamas Mate
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.0.0
>
>
> IMPALA-10379 added this test recently.
> {noformat}
> query_test/test_queries.py:187: in test_misc
> self.run_test_case('QueryTest/misc', vector)
> common/impala_test_suite.py:691: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:527: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:409: in verify_raw_results
> verify_results(expected_types, actual_types, order_matters=True)
> common/test_result_verifier.py:305: in verify_results
> assert expected_results == actual_results
> E   assert ['INT'] == ['TINYINT']
> E At index 0 diff: 'INT' != 'TINYINT'
> E Use -v to get the full diff
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10505) Avoid creating misleading audit logs when a requesting user does not have privileges on the underlying tables of a view

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10505:

Fix Version/s: Impala 4.0.0

> Avoid creating misleading audit logs when a requesting user does not have 
> privileges on the underlying tables of a view
> ---
>
> Key: IMPALA-10505
> URL: https://issues.apache.org/jira/browse/IMPALA-10505
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> We found that misleading audit logs could be generated in Impala if a 
> requesting user granted the privileges on a view does not have the privileges 
> on the table(s) on which the view is based. Such an issue could be reproduced 
> as follows.
>  # Start an authorization-enabled Impala cluster.
>  # As the user {{admin}}, execute "{{CREATE VIEW 
> default.v_functional_alltypestiny AS SELECT id, bool_col FROM 
> functional.alltypestiny;}}".
>  # As the user {{admin}}, execute "{{GRANT SELECT ON TABLE 
> default.v_functional_alltypestiny TO USER non_owner;}}".
>  # As the user {{admin}}, execute "{{REFRESH AUTHORIZATION;}}".
>  # Add a break point at 
> [RangerBufferAuditHandler#flush()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/ranger/RangerBufferAuditHandler.java#L122]
>  to observe the {{AuthzZuditEvent}}'s added to '{{auditEvents_}}' after the 
> following statement.
>  # As the user {{non_owner}}, execute "{{SELECT COUNT(\*) FROM 
> default.v_functional_alltypestiny;}}"
> We will find that only 1 {{AuthzAuditEvent}} was logged. Specifically, the 
> field of '{{resourcePath}}' is "{{functional/alltypestiny}}" and the field of 
> '{{accessResult}}' is 0, indicating this is a failed authorization for the 
> underlying table of the view. But actually the user '{{non_owner}}' is and 
> should be allowed to execute the statement since it was granted the privilege 
> on the view.
> Therefore, we should remove such a confusing log entry and also retain the 
> audit log entry corresponding to the privilege check for the view, i.e., 
> {{default.v_functional_alltypestiny}}.
> I have the following findings after an initial investigation.
> Under the hood Impala performed 2 privileges checks. One for the view and the 
> other for the table on which the view is based. Since the user has been 
> granted the {{SELECT}} privilege on the view, the first privilege check would 
> succeed, whereas the second privilege check would fail since the user does 
> not have the {{SELECT}} privilege on the underlying table.
> Each privilege check resulted in one audit log entry generated by the Ranger 
> server. Thus the first audit log entry would be a successful audit event 
> because it corresponds to the privilege check for the view. However, the 
> second privilege check resulted in a failed audit event since it corresponds 
> to the privilege check for the underlying table and the requesting user does 
> not have the {{SELECT}} privilege on the table. Impala performed the 2nd 
> check for a reason. In short, the requesting user is not allowed to access 
> the runtime profile if the user does not have the privileges on the 
> underlying table(s). Refer to 
> [BaseAuthorizationChecker#authorize()|https://github.com/apache/impala/blob/aeeff53e884a67ee7f5980654a1d394c6e3e34ac/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L175-L190]
>  for further details.
> On the other hand, for a list of audit events resulting from a query, if 
> there exists a failed audit event, only the first failed audit event would be 
> kept by Impala and then sent to Ranger. That is the reason why in the end we 
> only saw that failed audit event.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10550) Add External Frontend service port

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10550:

Fix Version/s: Impala 4.0.0

> Add External Frontend service port
> --
>
> Key: IMPALA-10550
> URL: https://issues.apache.org/jira/browse/IMPALA-10550
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> As part of external frontend support - we want to expose the additional 
> thrift calls on a separate port for security purposes. Users may want to 
> expose the normal hs2 service(s) to  users while exposing the external 
> frontend thrift calls to only an external frontend.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10593) Skip runtime filter for outer joins when Expr not constant after null substitution

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10593:

Fix Version/s: Impala 4.0.0

> Skip runtime filter for outer joins when Expr not constant after null 
> substitution
> --
>
> Key: IMPALA-10593
> URL: https://issues.apache.org/jira/browse/IMPALA-10593
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
> Fix For: Impala 4.0.0
>
>
> Currently there is code that asserts that an Expr is not constant after 
> substituting SlotRefs with constant nulls.
> A third party tool needs this restriction to be weakened.  In a case where an 
> Expr is checked and the Expr is not constant even after substituting nulls, 
> the result will be to not generate a runtime filter for that Expr.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10593) Skip runtime filter for outer joins when Expr not constant after null substitution

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10593:
---

Assignee: Steve Carlin

> Skip runtime filter for outer joins when Expr not constant after null 
> substitution
> --
>
> Key: IMPALA-10593
> URL: https://issues.apache.org/jira/browse/IMPALA-10593
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>
> Currently there is code that asserts that an Expr is not constant after 
> substituting SlotRefs with constant nulls.
> A third party tool needs this restriction to be weakened.  In a case where an 
> Expr is checked and the Expr is not constant even after substituting nulls, 
> the result will be to not generate a runtime filter for that Expr.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10640) Support reading Parquet Bloom filters - most common types

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10640:

Fix Version/s: Impala 4.1.0

> Support reading Parquet Bloom filters - most common types
> -
>
> Key: IMPALA-10640
> URL: https://issues.apache.org/jira/browse/IMPALA-10640
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: parquet
> Fix For: Impala 4.1.0
>
>
> Support reading Parquet Bloom filters for the most common types: integers, 
> float, double and Impala strings. Supporting these types is relatively easy 
> in comparison to most other types. Support for other types may be added later.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10642) Write support for Parquet Bloom filters - most common types

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10642:

Fix Version/s: Impala 4.1.0

> Write support for Parquet Bloom filters - most common types
> ---
>
> Key: IMPALA-10642
> URL: https://issues.apache.org/jira/browse/IMPALA-10642
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Support writing Parquet Bloom filters for the most common types: integers, 
> float, double and Impala strings. Support for other types may be added later.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10654) Improve the precision of DecimalValue::ToDouble

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10654:

Fix Version/s: Impala 4.0.0

> Improve the precision of DecimalValue::ToDouble
> --
>
> Key: IMPALA-10654
> URL: https://issues.apache.org/jira/browse/IMPALA-10654
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Amogh Margoor
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 4.0.0
>
>
> From discussion of IMPALA-10350, it was noted that 
> [DecimalValue::ToDouble|https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/runtime/decimal-value.inline.h#L725]
>  is not accurate.
> Current approach is: 
> {code:java}
>  static_cast(value_) / pow(10.0, scale).
> {code}
> Inaccuracy is due to fact that only integers from −2^53 to 2^53 can be 
> represented accurately by double precision without any loss. Hence, above 
> approach would not work for numbers like -0.43149576573887316. For 
> DecimalValue representing -0.43149576573887316, value_ would be 
> -43149576573887316 and scale would be 17. As value_ < -2^53, result would not 
> be accurate. 
> Hence through discussion in IMPALA-10350, we propose to use thirdparty 
> library https://github.com/lemire/fast_double_parser, which handles above 
> scenario in a performant manner. Library's internal representation of Decimal 
> is similar to the Impala's DecimalValue and function 
> [compute_float_64|https://github.com/lemire/fast_double_parser/blob/e4f6319bfa9cbc829f7f99ae88c1d2fb205c15e8/include/fast_double_parser.h#L232]
>  can be used for the conversion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10680) Replace StringToFloatInternal that converts String to Float using fast_double_parser library

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10680:

Fix Version/s: Impala 4.1.0

> Replace StringToFloatInternal that converts String to Float using 
> fast_double_parser library
> 
>
> Key: IMPALA-10680
> URL: https://issues.apache.org/jira/browse/IMPALA-10680
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Amogh Margoor
>Assignee: Amogh Margoor
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Based on the comment made by [~csringhofer] 
> [here|https://issues.apache.org/jira/browse/IMPALA-10350?focusedCommentId=17324270=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17324270],
>  we can use fast_double_parser (introduced by IMPALA-10654) to do String to 
> Double conversion 
> [here|https://github.com/apache/impala/blob/master/be/src/util/string-parser.h#L459].
>  This would ensure precision loss can be avoided in cases like below:
> {code:java}
> select cast("0.43149576573887316" as double);
> result: 0.4314957657388731
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10683) TestHdfsParquetTableWriter.test_double_precision broken on S3

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10683:

Fix Version/s: Impala 4.1.0
   Impala 4.0.0

> TestHdfsParquetTableWriter.test_double_precision broken on S3
> -
>
> Key: IMPALA-10683
> URL: https://issues.apache.org/jira/browse/IMPALA-10683
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0.0
>Reporter: Csaba Ringhofer
>Assignee: Amogh Margoor
>Priority: Major
> Fix For: Impala 4.0.0, Impala 4.1.0
>
>
> The issue is that this new test uses Hive, which doesn't work with S3 in the 
> Impala test environment. Other tests use a bunch of skipIfs to avoid this:
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_parquet.py#L549



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10696) Minor size differences breaks metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10696:

Fix Version/s: Impala 4.1.0
   Impala 4.0.0

> Minor size differences breaks 
> metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation
> 
>
> Key: IMPALA-10696
> URL: https://issues.apache.org/jira/browse/IMPALA-10696
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.0.0
> Environment: Ubuntu 16.04, jenkins.impala.io
>Reporter: Jim Apple
>Assignee: liuyao
>Priority: Blocker
> Fix For: Impala 4.0.0, Impala 4.1.0
>
>
> One test is breaking in the 4.0.0 RC2, hence I marked this as blocker. 
> [~liuyao] , I picked your name as the assignee since I thought you might be 
> knowledgeable about this part of the codebase. Here's the test output:
> {noformat}
> E   assert Items in expected results not found in actual results:
> E ' partitions: 0/24 rows=17.91K'
> E Items in actual results:
> E 'Per-Host Resource Estimates: Memory=20MB'
> E '|  output exprs: id'
> E ''
> E '   HDFS partitions=24/24 files=36 size=281.43KB'
> E ' table: rows=unavailable size=unavailable'
> E '   stored statistics:'
> E '|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0'
> E ' columns: unavailable'
> E '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]'
> E '   tuple-ids=0 row-size=4B cardinality=17.90K'
> E '|'
> E 'Max Per-Host Resource Reservation: Memory=4.01MB Threads=2'
> E 'Analyzed query: SELECT id FROM 
> test_stats_extrapolation_5c6bdfd.alltypes'
> E 'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1'
> E ' partitions: 0/24 rows=17.90K'
> E 'test_stats_extrapolation_5c6bdfd.alltypes'
> E 'PLAN-ROOT SINK'
> E '   in pipelines: 00(GETNEXT)'
> E '   extrapolated-rows=unavailable max-scan-range-rows=unavailable'
> E 'WARNING: The following tables are missing relevant table and/or column 
> statistics.'
> E '|  Per-Host Resources: mem-estimate=20.00MB mem-reservation=4.01MB 
> thread-reservation=2'
> E '   mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1'
> {noformat}
>  
>  [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13812/consoleText]
>  
> CC [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10721) MetastoreServiceHandler should extend AbstractThriftHiveMetastore

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10721:

Fix Version/s: Impala 4.1.0

> MetastoreServiceHandler should extend AbstractThriftHiveMetastore
> -
>
> Key: IMPALA-10721
> URL: https://issues.apache.org/jira/browse/IMPALA-10721
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
> Fix For: Impala 4.1.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> MetastoreServiceHandler should extend AbstractThriftHiveMetastore
>  which has default implementation of all the HMS APIs.
>  This avoids broken builds in Impala, whenever it
>  dynamically picks Hive GBN, which might have new HMS APIs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10703) PrintPath() crashes with ARRAY in ORC format

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10703:

Fix Version/s: Impala 4.1.0

> PrintPath() crashes with ARRAY in ORC format
> 
>
> Key: IMPALA-10703
> URL: https://issues.apache.org/jira/browse/IMPALA-10703
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Gabor Kaszab
>Assignee: Amogh Margoor
>Priority: Major
>  Labels: complextype, orc
> Fix For: Impala 4.1.0
>
>
> Repro steps:
>  - Issue only happens in debug build as apparently there is a DCHECK failing.
>  - You have to launch Impala with --log_level=3 option to increase the log 
> level.
>  - Then running this query crashes Impala:
> {code:java}
> select inner_arr.ITEM.e from functional_orc_def.complextypestbl tbl, 
> functional_orc_def.complextypestbl.nested_struct.c.d.ITEM inner_arr;
> {code}
>  
> Backtrace (relevant part):
> {code:java}
> #7  0x0280c2b4 in 
> impala::PrintPath[abi:cxx11](impala::TableDescriptor const&, std::vector std::allocator > const&) (tbl_desc=..., path=...) at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/util/debug-util.cc:237
> #8  0x02a69eeb in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x10e79000, tuple_desc=..., 
> selected_nodes=0x7fe54980a7d0, pos_slots=0x7fe54980a780)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:452
> #9  0x02a69cf7 in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x10e79000, tuple_desc=..., 
> selected_nodes=0x7fe54980a7d0, pos_slots=0x7fe54980a780)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:449
> #10 0x02a6a547 in impala::HdfsOrcScanner::SelectColumns 
> (this=0x10e79000, tuple_desc=...)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:497
> #11 0x02a67720 in impala::HdfsOrcScanner::Open (this=0x10e79000, 
> context=0x7fe54980b260)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-orc-scanner.cc:237
> #12 0x029f19c9 in 
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0xd280800, 
> partition=0xaac3d80, 
> context=0x7fe54980b260, scanner=0x7fe54980b258)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node-base.cc:874
> #13 0x02baab86 in impala::HdfsScanNode::ProcessSplit (this=0xd280800, 
> filter_ctxs=..., 
> expr_results_pool=0x7fe54980b500, scan_range=0xac59c00, 
> scanner_thread_reservation=0x7fe54980b428)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:480
> #14 0x02baa28a in impala::HdfsScanNode::ScannerThread 
> (this=0xd280800, first_thread=true, 
> scanner_thread_reservation=8192) at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:418
> #15 0x02ba95f2 in impala::HdfsScanNodeoperator()(void) 
> const (__closure=0x7fe54980bc28)
> at 
> /home/gaborkaszab/shadow/Impala-upstream/be/src/exec/hdfs-scan-node.cc:339
> {code}
> This DCHECK fails:
>  
> [https://github.com/apache/impala/blob/a47700ed790c2415e52a85e40063bed53a7cb9e8/be/src/util/debug-util.cc#L237]
> {code:java}
> Check failed: path[i] == 1 (5 vs. 1)
> {code}
> There was a similar issue recently, but here a different DCHECK fails:
> https://issues.apache.org/jira/browse/IMPALA-9918



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10724) Add mutable validWriteIdList

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10724:

Fix Version/s: Impala 4.1.0

> Add mutable validWriteIdList
> 
>
> Key: IMPALA-10724
> URL: https://issues.apache.org/jira/browse/IMPALA-10724
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Although the current implementation for validWriteIdList is not strictly 
> immutable, it is in some sense to provide a read-only view snapshot. This 
> change is to add another class to provide functionalities for manipulating 
> the writeIdList. We could use this to keep writeIdList up-to-date for 
> event-based metadata refreshing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10739) Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10739:

Fix Version/s: Impala 4.1.0

> Add support for ALTER TABLE tbl SET PARTITION SPEC for Iceberg tables
> -
>
> Key: IMPALA-10739
> URL: https://issues.apache.org/jira/browse/IMPALA-10739
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Impala should support partition evolution for Iceberg tables, i.e. it should 
> be able to set a new partition spec for an Iceberg table via DDL.
> The command should be
> {noformat}
> ALTER TABLE  SET PARTITION SPEC()
> {noformat}
> to be aligned with Hive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10817) Share metastoreHmsDDL lock b/w CatalogOpExecutor and Catalog metastore server

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10817:

Fix Version/s: Impala 4.1.0

> Share metastoreHmsDDL lock b/w CatalogOpExecutor and Catalog metastore server
> -
>
> Key: IMPALA-10817
> URL: https://issues.apache.org/jira/browse/IMPALA-10817
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Currently, when doing create/drop table/db from catalogD, catalogOpExecutor 
> (via Impala Shell) and metastore server (via HS2) acquires lock on their own 
> respective lock objects to prevent concurrent create/drop operations in HMS. 
> But that does not prevent these concurrent operations across 
> CatalogOpExecutor and Metastore server. For example currently a user can 
> perform create/drop HMS operation from Impala shell and catalog metastore 
> server concurrently which is not the desired behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10801) Check the latest compaction Id before serving request

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10801:

Fix Version/s: Impala 4.1.0

> Check the latest compaction Id before serving request
> -
>
> Key: IMPALA-10801
> URL: https://issues.apache.org/jira/browse/IMPALA-10801
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Cache compaction Id for a given table/file-metadata in CatalogD.
> Whenever there is a read request to CatalogD, get the latest compaction event 
> Id from HMS, compare it with what is cached in CatalogD, and based on that 
> decide whether to serve the data from cache or to refresh it from the 
> filesystem. This can avoid notification based cache invalidation.
> Also, since there will be an open txn for the current long running query 
> which is being served from CatalogD, we can be sure that current 
> file-metadata being served is not already deleted by the cleaner.
> This proposal will use a new HMS API 
> (https://issues.apache.org/jira/browse/HIVE-24828) to get the latest 
> compaction id for a table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10811) RPC to submit query getting stuck for AWS NLB forever.

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10811:

Fix Version/s: Impala 4.1.0

> RPC to submit query getting stuck for AWS NLB forever.
> --
>
> Key: IMPALA-10811
> URL: https://issues.apache.org/jira/browse/IMPALA-10811
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Amogh Margoor
>Assignee: Qifan Chen
>Priority: Major
> Fix For: Impala 4.1.0
>
> Attachments: profile+(13).txt
>
>
> Initial RPC to submit a query and fetch the query handle can take quite long 
> time to return as it can do various operations for planning and submission 
> that involve executing  Catalog Operations like Rename, Alter Table Recover 
> partition  that can take time on tables with many 
> partitions([https://github.com/apache/impala/blob/1231208da7104c832c13f272d1e5b8f554d29337/be/src/exec/catalog-op-executor.cc#L92]).
>  Attached is the profile of one such DDL query (with few fields hidden).
> These RPCs are: 
> 1. Beeswax:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-beeswax-server.cc#L57]
> 2. HS2:
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-hs2-server.cc#L462]
>  
> One of the side effects of such RPC taking long time is that clients such as 
> impala-shell using AWS NLB can get stuck for ever. The reason is NLB tracks 
> and closes connections after 350s and cannot be configured. But after closing 
> the connection it doesn;t send TCP RST to the client. Only when client tries 
> to send data or packets NLB issues back TCP RST to indicate connection is not 
> alive. Documentation is here: 
> [https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout].
>  Hence the impala-shell waiting for RPC to return gets stuck indefinitely.
> Hence, we may need to evaluate techniques for RPCs to return query handle 
> after
>  # Creating Driver: 
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1150]
>  # Register Query: 
> [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/service/impala-server.cc#L1168]
>  and execute later parts of RPC asynchronously in different thread without 
> blocking the RPC. That way clients can get query handle and poll for it for 
> state and results.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10817) Share metastoreHmsDDL lock b/w CatalogOpExecutor and Catalog metastore server

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10817:
---

Assignee: Sourabh Goyal

> Share metastoreHmsDDL lock b/w CatalogOpExecutor and Catalog metastore server
> -
>
> Key: IMPALA-10817
> URL: https://issues.apache.org/jira/browse/IMPALA-10817
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>
> Currently, when doing create/drop table/db from catalogD, catalogOpExecutor 
> (via Impala Shell) and metastore server (via HS2) acquires lock on their own 
> respective lock objects to prevent concurrent create/drop operations in HMS. 
> But that does not prevent these concurrent operations across 
> CatalogOpExecutor and Metastore server. For example currently a user can 
> perform create/drop HMS operation from Impala shell and catalog metastore 
> server concurrently which is not the desired behavior



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10821) TestTPCHJoinQueries.test_outer_joins failed in s3 build

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10821:

Fix Version/s: Impala 4.1.0

> TestTPCHJoinQueries.test_outer_joins failed in s3 build
> ---
>
> Key: IMPALA-10821
> URL: https://issues.apache.org/jira/browse/IMPALA-10821
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Wenzhe Zhou
>Assignee: Yida Wu
>Priority: Major
>  Labels: broken-build
> Fix For: Impala 4.1.0
>
>
> The unit-test TestTPCHJoinQueries.test_outer_joins failed in following build:
> [https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/63/]
>  
> The failed test case was added recently by patch: 
> [https://gerrit.cloudera.org/#/c/17610/]
> Error Message
> query_test/test_join_queries.py:155: in test_outer_joins 
> self.run_test_case('tpch-outer-joins', new_vector) 
> common/impala_test_suite.py:709: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:545: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:246: in verify_query_result_is_subset assert 
> expected_literal_strings <= actual_literal_strings E assert Items in expected 
> results not found in actual results: E '| 00:SCAN HDFS [default.t1 b]' E 
> '01:SCAN HDFS [default.t2 a]' E Items in actual results: E '05:EXCHANGE 
> [UNPARTITIONED]' E '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]' E '| |' E ' 
> runtime filters: RF000 -> a.`SELECT`, RF001 -> a.`select`' E 'PLAN-ROOT SINK' 
> E '' E '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]' E 'default.t1, 
> default.t2' E '| row-size=8B cardinality=37' E '03:EXCHANGE 
> [HASH(a.`SELECT`,a.`select`)]' E '|' E '| row-size=16B cardinality=37' E ' 
> row-size=8B cardinality=78.25K' E '| hash predicates: a.`SELECT` = 
> b.`INSERT`, a.`select` = b.`insert`' E 'WARNING: The following tables are 
> missing relevant table and/or column statistics.' E '| runtime filters: RF000 
> <- b.`INSERT`, RF001 <- b.`insert`' E '| S3 partitions=1/1 files=1 size=292B' 
> E '| 00:SCAN S3 [default.t1 b]' E 'Per-Host Resource Estimates: Memory=75MB' 
> E '01:SCAN S3 [default.t2 a]' E ' S3 partitions=1/1 files=1 size=611.34KB' E 
> 'Max Per-Host Resource Reservation: Memory=10.95MB Threads=6'
> Stacktrace
> query_test/test_join_queries.py:155: in test_outer_joins
>  self.run_test_case('tpch-outer-joins', new_vector)
> common/impala_test_suite.py:709: in run_test_case
>  self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:545: in __verify_results_and_errors
>  replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
>  VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:246: in verify_query_result_is_subset
>  assert expected_literal_strings <= actual_literal_strings
> E assert Items in expected results not found in actual results:
> E '| 00:SCAN HDFS [default.t1 b]'
> E '01:SCAN HDFS [default.t2 a]'
> E Items in actual results:
> E '05:EXCHANGE [UNPARTITIONED]'
> E '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]'
> E '| |'
> E ' runtime filters: RF000 -> a.`SELECT`, RF001 -> a.`select`'
> E 'PLAN-ROOT SINK'
> E ''
> E '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]'
> E 'default.t1, default.t2'
> E '| row-size=8B cardinality=37'
> E '03:EXCHANGE [HASH(a.`SELECT`,a.`select`)]'
> E '|'
> E '| row-size=16B cardinality=37'
> E ' row-size=8B cardinality=78.25K'
> E '| hash predicates: a.`SELECT` = b.`INSERT`, a.`select` = b.`insert`'
> E 'WARNING: The following tables are missing relevant table and/or column 
> statistics.'
> E '| runtime filters: RF000 <- b.`INSERT`, RF001 <- b.`insert`'
> E '| S3 partitions=1/1 files=1 size=292B'
> E '| 00:SCAN S3 [default.t1 b]'
> E 'Per-Host Resource Estimates: Memory=75MB'
> E '01:SCAN S3 [default.t2 a]'
> E ' S3 partitions=1/1 files=1 size=611.34KB'
> E 'Max Per-Host Resource Reservation: Memory=10.95MB Threads=6'



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10879) Add parquet stats to iceberg manifest

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10879:

Fix Version/s: Impala 4.1.0

> Add parquet stats to iceberg manifest
> -
>
> Key: IMPALA-10879
> URL: https://issues.apache.org/jira/browse/IMPALA-10879
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Parquet stats should be written to iceberg manifest as per-datafile metrics.
> This task is specifically about the following metrics:
> - column_sizes : Map from column id to the total size on disk of all regions 
> that store the column. Does not include bytes necessary to read other 
> columns, like footers. Leave null for row-oriented formats
> - null_value_counts : Map from column id to number of null values in the 
> column.
> - lower_bounds : Map from column id to lower bound in the column serialized 
> as binary. Each value must be less than or equal to all non-null, non-NaN 
> values in the column for the file.
> - upper_bounds : Map from column id to upper bound in the column serialized 
> as binary. Each value must be greater than or equal to all non-null, non-Nan 
> values in the column for the file.
> Iceberg manifest doc: 
> https://iceberg.apache.org/spec/#manifests
> lower_bounds and upper_bounds values should be Single-value serialized to 
> binary:
> https://iceberg.apache.org/spec/#appendix-d-single-value-serialization



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10975) Minor refactoring in alter table DDL operation in catalogd

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10975:

Fix Version/s: Impala 4.1.0

> Minor refactoring in alter table DDL operation in catalogd
> --
>
> Key: IMPALA-10975
> URL: https://issues.apache.org/jira/browse/IMPALA-10975
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog, Frontend
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> For almost all alter table DDL operations in catalogOpExecutor, we add table 
> to catalog update if reloadMetadata is true in the end. However for certain 
> sub ddl operations like ADD and DROP partitions, the table to update catalog 
> is performed locally. This Jira is to refactor addTableToCatalogUpdate() and 
> call it from one place for all the sub ddls. This refactoring would be 
> helpful when we introduce code changes of syncing db/table to latest event id 
> from catalogOpExecutor in future. 
>  
> cc - [~vihangk1]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10973) Empty scan nodes are scheduled to the (exclusive) coordinator

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10973:

Fix Version/s: Impala 4.1.0

> Empty scan nodes are scheduled to the (exclusive) coordinator
> -
>
> Key: IMPALA-10973
> URL: https://issues.apache.org/jira/browse/IMPALA-10973
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>  Labels: scalability, scheduler
> Fix For: Impala 4.1.0
>
>
> Currently fragments with scan nodes that have no scan ranges are scheduled to 
> the coordinator, even if it is an exclusive coordinator:
> https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L805
> As "parent" fragments are often scheduled to be collocated with their 
> children, the condition of "being scheduled to the coordinator" can spread 
> through the plan tree.
> This can be disastrous to scalability in clusters with lot of executors but 
> few coordinators and is also very counter-intuitive, as scanning an empty 
> table shouldn't have a major effect on the query. 
>  
> To reproduce locally:
> bin/start-impala-cluster.py --use_exclusive_coordinators -c 1
> in Impala shell:
> select id from functional.alltypes;
> profile; -- scan nodes will be scheduled to 2 hosts
> select f2 from functional.emptytable union all select id from 
> functional.alltypes;
> profile; --  scan nodes will be scheduled to 3 hosts



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10984) Improve performance of FROM_UNIXTIME function.

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10984:

Fix Version/s: Impala 4.1.0

> Improve performance of FROM_UNIXTIME function.
> --
>
> Key: IMPALA-10984
> URL: https://issues.apache.org/jira/browse/IMPALA-10984
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> FROM_UNIXTIME function is implemented by calling TimestampValue::ToString() 
> in TimestampFunctions::FromUnix().
> We found out that evaluation of TimestampValue::ToString() can get trapped in 
> tcmalloc::CentralFreeList lock, as shown in this pstack
>  
> {code:java}
> #0 0x0277d81a in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #1 0x027d17f9 in SpinLock::SlowLock() ()
> #2 0x0287a399 in tcmalloc::CentralFreeList::RemoveRange(void**, 
> void**, int) ()
> #3 0x028882f3 in 
> tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
> #4 0x029c5e88 in tc_newarray ()
> #5 0x7faedc677169 in std::string::_Rep::_S_create(unsigned long, unsigned 
> long, std::allocator const&) () from 
> /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p4948.16676264/lib/impala/lib/libstdc++.so.6
> #6 0x00f769de in impala::TimestampValue::ToString() const ()
> #7 0x7faeb317e08e in ?? ()
> #8 0x7fad62af6068 in ?? ()
> #9 0x7faedc8c20c0 in ?? () from 
> /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p4948.16676264/lib/impala/lib/libstdc++.so.6
> #10 0x in ?? (){code}
>  
> This is presumably due to the combination use of stringstream, 
> boost::gregorian::to_iso_extended_string and 
> boost::posix_time::to_simple_string that involve multiple string allocation 
> and copying.
> This can be problematic when FROM_UNIXTIME is being evaluated for millions of 
> rows.
> We should come up with better implementation that involve less string 
> allocation and copying.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10998) Backend test scratch-tuple-batch-test failed in ASAN build

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10998:

Fix Version/s: Impala 4.1.0

> Backend test scratch-tuple-batch-test failed in ASAN build
> --
>
> Key: IMPALA-10998
> URL: https://issues.apache.org/jira/browse/IMPALA-10998
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Wenzhe Zhou
>Assignee: Amogh Margoor
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 4.1.0
>
>
> Backend test scratch-tuple-batch-test failed in my recent ASAN builds ran on 
> impala-private-parameterized. It also happened in recent cdw-master-staging 
> ASAN build: 
> [https://master-03.jenkins.cloudera.com/job/impala-cdw-master-staging-core-asan/47/
>  
> |https://master-03.jenkins.cloudera.com/job/impala-cdw-master-staging-core-asan/47/]and
>  asf-master core asan build 
> [https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-asan/94/]
> This new backend test was added by recent commits 
> [https://gerrit.cloudera.org/#/c/17860/].
> Here is the console output:
> 21:11:01 ==19341==ERROR: AddressSanitizer: heap-buffer-overflow on address 
> 0x603000390d74 at pc 0x01d18abf bp 0x7ffd7f9fef30 sp 0x7ffd7f9fef28
> 21:11:01 READ of size 4 at 0x603000390d74 thread T0
> 21:11:01 #0 0x1d18abe in 
> ScratchTupleBatchTest_TestRandomGeneratedMicroBatches_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/scratch-tuple-batch-test.cc:159:13
> 21:11:01 #1 0x5ea0299 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5ea0299)
> 21:11:01 #2 0x5e99079 in testing::Test::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5e99079)
> 21:11:01 #3 0x5e9915b in testing::TestInfo::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5e9915b)
> 21:11:01 #4 0x5e99294 in testing::TestCase::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5e99294)
> 21:11:01 #5 0x5e9993f in testing::internal::UnitTestImpl::RunAllTests() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5e9993f)
> 21:11:01 #6 0x5e99a76 in testing::UnitTest::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5e99a76)
> 21:11:01 #7 0x1d18cab in main 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/scratch-tuple-batch-test.cc:187:10
> 21:11:01 #8 0x7feb3bf0bc04 in __libc_start_main (/lib64/libc.so.6+0x21c04)
> 21:11:01 #9 0x1c20796 in _start 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x1c20796)
> 21:11:01 
> 21:11:01 0x603000390d74 is located 0 bytes to the right of 20-byte region 
> [0x603000390d60,0x603000390d74)
> 21:11:01 allocated by thread T0 here:
> 21:11:01 #0 0x1d135b0 in operator new(unsigned long) 
> /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92
> 21:11:01 #1 0x1d26ecf in void std::vector 
> >::_M_range_initialize(int const*, int const*, 
> std::forward_iterator_tag) 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:1328:35
> 21:11:01 #2 0x1d1ac6f in std::vector 
> >::vector(std::initializer_list, std::allocator const&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_vector.h:387:2
> 21:11:01 #3 0x1d18767 in 
> ScratchTupleBatchTest_TestRandomGeneratedMicroBatches_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/scratch-tuple-batch-test.cc:156:22
> 21:11:01 #4 0x5ea0299 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/scratch-tuple-batch-test+0x5ea0299)
> 21:11:01 
> 21:11:01 SUMMARY: AddressSanitizer: heap-buffer-overflow 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/scratch-tuple-batch-test.cc:159:13
>  in 

[jira] [Updated] (IMPALA-11000) DHECK hit in FillScratchMicroBatches

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11000:

Fix Version/s: Impala 4.1.0

> DHECK hit in FillScratchMicroBatches
> 
>
> Key: IMPALA-11000
> URL: https://issues.apache.org/jira/browse/IMPALA-11000
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Amogh Margoor
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Happened during exhaustive tests. 
> Log:
> {code}
> F1101 16:05:08.671600  1042 hdfs-parquet-scanner.cc:2397] 
> 66490317b236519e:1cc2e03a] Check failed: scratch_batch_->AtEnd() 
> {code}
> Query:
> {code}
> I1101 16:05:02.796250   367 Frontend.java:1637] 
> 66490317b236519e:1cc2e03a] Analyzing query: create table ctas_cancel 
> stored as parquetfile as SELECT STRAIGHT_JOIN *
>FROM lineitem
>   JOIN /*+broadcast*/ orders ON o_orderkey = l_orderkey
>   JOIN supplier ON s_suppkey = l_suppkey
>WHERE o_orderstatus = 'F'
>ORDER BY l_orderkey
>LIMIT 1 db: tpch_parquet
> {code}
> The query should come from around here:
> https://github.com/apache/impala/blob/master/tests/query_test/test_cancellation.py#L149
> Callstack:
> {code} 
> 3 
> impalad!impala::HdfsParquetScanner::FillScratchMicroBatches(std::vector  std::allocator > const&, impala::RowBatch*, 
> bool*, impala::ScratchMicroBatch const*, int, int, int*) 
> [hdfs-parquet-scanner.cc : 2397 + 0xf]
>  4  impalad!impala::Status 
> impala::HdfsParquetScanner::AssembleRows(impala::RowBatch*, bool*) 
> [hdfs-parquet-scanner.cc : 2287 + 0x6d]
>  5  impalad!impala::HdfsParquetScanner::GetNextInternal(impala::RowBatch*) 
> [hdfs-parquet-scanner.cc : 539 + 0x2b]
>  6  impalad!impala::HdfsParquetScanner::ProcessSplit() 
> [hdfs-parquet-scanner.cc : 427 + 0x39]
>  7  
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*) [hdfs-scan-node.cc : 500 + 0x28]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11007) Webserver should not log errors when handling HTTP HEAD

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11007:

Fix Version/s: Impala 4.1.0

> Webserver should not log errors when handling HTTP HEAD 
> 
>
> Key: IMPALA-11007
> URL: https://issues.apache.org/jira/browse/IMPALA-11007
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> If you send a HEAD request  to Impala's webserver, for  example
> {code}
> curl -I http://localhost:25000/metrics
> {code}
> then the logs will contain scary messages:
> {code}
> I1025 10:39:52.337021 3578299 webserver.cc:591] Webserver: error reading: 
> Connection reset by peer
> {code}
> This does not happen with 
> {code}
> curl http://localhost:25000/metrics
> {code}
> Fix this by making the Impala webserver not send any content in replies to a 
> HEAD message



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11011) Impala crashes in OrcStructReader::NumElements()

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11011:

Fix Version/s: Impala 4.1.0

> Impala crashes in OrcStructReader::NumElements()
> 
>
> Key: IMPALA-11011
> URL: https://issues.apache.org/jira/browse/IMPALA-11011
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Running the query
> {code:java}
> select inner_arr.ITEM
> from functional_orc_def.complextypestbl.nested_struct.c.d.ITEM as 
> inner_arr;{code}
> {{in a non-full-acid version/copy of functional_orc_def.complextypestbl 
> crashes Impala because in OrcStructReader::NumElements() 'vbatch_' is NULL 
> and we dereference it.}}
> {{Steps to reproduce:}}
> {{1. Use Hive to create a non-full-acid copy of the table:}}
>  * Enter the Hive cmd line:
> {code:java}
> hive beeline -u 'jdbc:hive2://localhost:11050/default'{code}
>  * Copy the table with this command:
> {code:java}
> create table complextypestbl_non_acid stored as orc tblproperties 
> ("transactional"="true", "transactional_properties"="insert_only") as select 
> * from complextypestbl;{code}
> 2.  In Impala, run the query on the copied table:
> {code:java}
> set disable_codegen=true;
> select inner_arr.ITEM
> from functional_orc_def.complextypestbl_non_acid.nested_struct.c.d.ITEM as 
> inner_arr;{code}
>  
> Call stack from GDB:
> {code:java}
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7fd5e49e9921 in __GI_abort () at abort.c:79
> #2  0x7fd5e7929589 in os::abort(bool) () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #3  0x7fd5e7b04fb3 in VMError::report_and_die() () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #4  0x7fd5e7933ce4 in JVM_handle_linux_signal () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #5  0x7fd5e79263b8 in signalHandler(int, siginfo_t*, void*) () from 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> #6  
> #7  0x02c3bd7f in impala::OrcStructReader::NumElements 
> (this=0xf043290) at be/src/exec/orc-column-readers.h:603
> #8  0x02c371b7 in impala::OrcListReader::NumElements 
> (this=0x11009420) at be/src/exec/orc-column-readers.cc:563
> #9  0x02c371b7 in impala::OrcListReader::NumElements 
> (this=0x11009340) at be/src/exec/orc-column-readers.cc:563
> #10 0x02c3be5b in impala::OrcStructReader::NumElements 
> (this=0xf043200) at be/src/exec/orc-column-readers.h:606
> #11 0x02c3be5b in impala::OrcStructReader::NumElements 
> (this=0xf042ea0) at be/src/exec/orc-column-readers.h:606
> #12 0x02c3be5b in impala::OrcStructReader::NumElements 
> (this=0xf042e10) at be/src/exec/orc-column-readers.h:606
> #13 0x02c3497f in impala::OrcStructReader::EndOfBatch 
> (this=0xf042e10) at be/src/exec/orc-column-readers.cc:294
> #14 0x02bf5389 in impala::HdfsOrcScanner::GetNextInternal 
> (this=0xeca4000, row_batch=0xf1c95a0) at be/src/exec/hdfs-orc-scanner.cc:648
> #15 0x02bf46b7 in impala::HdfsOrcScanner::ProcessSplit 
> (this=0xeca4000) at be/src/exec/hdfs-orc-scanner.cc:588
> #16 0x02d427ff in impala::HdfsScanNode::ProcessSplit (this=0xff85800, 
> filter_ctxs=..., expr_results_pool=0x7fd41a29b4e0, scan_range=0xf2bde00, 
> scanner_thread_reservation=0x7fd41a29b408) at 
> be/src/exec/hdfs-scan-node.cc:500
> #17 0x02d41b80 in impala::HdfsScanNode::ScannerThread 
> (this=0xff85800, first_thread=false, scanner_thread_reservation=16384) at 
> be/src/exec/hdfs-scan-node.cc:418
> #18 0x02d40ee8 in impala::HdfsScanNodeoperator()(void) 
> const (__closure=0x7fd41a29bc08) at be/src/exec/hdfs-scan-node.cc:339
> #19 0x02d43afb in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...)
>     at 
> /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #20 0x022de8ca in boost::function0::operator() 
> (this=0x7fd41a29bc00) at 
> /opt/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #21 0x02aa43a0 in 
> impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (name=..., category=..., 
> functor=..., parent_thread_info=0x7fd40f8858a0, 
> thread_started=0x7fd40f8846a0) at be/src/util/thread.cc:360
> #22 0x02aacd01 in 
> boost::_bi::list5 

[jira] [Updated] (IMPALA-11020) CHECK failure in AttachStdoutStderrLocked

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11020:

Fix Version/s: Impala 4.1.0

> CHECK failure in AttachStdoutStderrLocked
> -
>
> Key: IMPALA-11020
> URL: https://issues.apache.org/jira/browse/IMPALA-11020
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Riza Suminto
>Priority: Critical
> Fix For: Impala 4.1.0
>
>
> Stack trace is
> {code}
> #0  0x7fa1b4fb51f7 in raise () from /lib64/libc.so.6
> #1  0x7fa1b4fb68e8 in abort () from /lib64/libc.so.6
> #2  0x056d5e84 in google::DumpStackTraceAndExit() ()
> #3  0x056cb2bd in google::LogMessage::Fail() ()
> #4  0x056ccb6d in google::LogMessage::SendToLog() ()
> #5  0x056cac1b in google::LogMessage::Flush() ()
> #6  0x056ce7d9 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x01f8e6b8 in AttachStdoutStderrLocked () at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/be/src/common/logging.cc:113
> #8  0x01f8ef2d in impala::AttachStdoutStderr () at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/be/src/common/logging.cc:203
> #9  0x01f8703f in LogMaintenanceThread () at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/be/src/common/init.cc:179
> #10 0x01f8d925 in 
> boost::detail::function::void_function_invoker0::invoke 
> (function_ptr=...) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:117
> #11 0x022a9eb4 in boost::function0::operator() 
> (this=0x7fa1b2c64b20) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/function/function_template.hpp:763
> #12 0x02a7557d in 
> impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (name=..., category=..., 
> functor=..., parent_thread_info=0x0, thread_started=0x7fffbeeac2e0) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/be/src/util/thread.cc:360
> #13 0x02a7decd in 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void 
> (*&)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list0&, int) (this=0x9f4c4c0, 
> f=@0x9f4c4b8: 0x2a7523a 
>  std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*)>, a=...) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/bind/bind.hpp:531
> #14 0x02a7ddf1 in boost::_bi::bind_t (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() (this=0x9f4c4b8) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/bind/bind.hpp:1294
> #15 0x02a7ddb2 in boost::detail::thread_data void (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > 
> >::run() (this=0x9f4c380) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.74.0-p1/include/boost/thread/detail/thread.hpp:120
> #16 0x04374bd1 in thread_proxy ()
> #17 0x7fa1b8649e25 in start_thread () 

[jira] [Updated] (IMPALA-11025) Creation of functional.insert_only_transactional_table fails wIth 'illegal location for managed table'

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11025:

Fix Version/s: Impala 4.1.0

> Creation of  functional.insert_only_transactional_table fails wIth 'illegal 
> location for managed table'
> ---
>
> Key: IMPALA-11025
> URL: https://issues.apache.org/jira/browse/IMPALA-11025
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Critical
> Fix For: Impala 4.1.0
>
> Attachments: IMPALA-11025_stack.txt
>
>
> Hive complains 'Illegal location for managed table' although location 
> '/test-warehouse/managed/insert_only_transactional_table' appears to be 
> within database's managed location.
> {code}
> INFO  : Compiling 
> command(queryId=jenkins_2026193803_e342124e-7a94-4024-b11a-58578cdf2ce4): 
> CREATE  TABLE IF NOT EXISTS functional.insert_only_transactional_table (
> col1 int
> )
> STORED AS TEXTFILE
> LOCATION '/test-warehouse/managed/insert_only_transactional_table'
> TBLPROPERTIES (
> 'transactional_properties' = 'insert_only',
> 'transactional' = 'true'
> )
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=jenkins_2026193803_e342124e-7a94-4024-b11a-58578cdf2ce4); 
> Time taken: 0.025 seconds
> INFO  : Executing 
> command(queryId=jenkins_2026193803_e342124e-7a94-4024-b11a-58578cdf2ce4): 
> CREATE  TABLE IF NOT EXISTS functional.insert_only_transactional_table (
> col1 int
> )
> STORED AS TEXTFILE
> LOCATION '/test-warehouse/managed/insert_only_transactional_table'
> TBLPROPERTIES (
> 'transactional_properties' = 'insert_only',
> 'transactional' = 'true'
> )
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Illegal location for managed table, it has to be within 
> database's managed location)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1170) 
> ~[hive-exec-3.1.3000.7.1.8.0-393.jar:3.1.3000.7.1.8.0-393]
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1175) 
> ~[hive-exec-3.1.3000.7.1.8.0-393.jar:3.1.3000.7.1.8.0-393]
> {code}
> (For full hive stack see attachment)
> HMS log:
> {code}
> 2021-11-16T19:38:03,185  INFO [pool-9-thread-58] 
> metastore.MetastoreDefaultTransformer: Starting translation for 
> transformDatabase for processor HMSClient-@localhost with [EXTWRITE, EXTREAD, 
> HIVEBUCKET2, HIVEFULLACIDREAD, HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, 
> HIVEMANAGESTATS, HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, 
> HIVEMQT, HIVEONLYMQTWRITE] on database functional 
> locationUri=hdfs://localhost:20500/test-warehouse/functional.db 
> managedLocationUri=hdfs://localhost:20500/test-warehouse/managed/functional.db
> 2021-11-16T19:38:03,185  INFO [pool-9-thread-58] 
> metastore.MetastoreDefaultTransformer: Transformer returning 
> database:Database(name:functional, description:null, 
> locationUri:hdfs://localhost:20500/test-warehouse/functional.db, 
> parameters:{}, ownerName:jenkins, ownerType:USER, catalogName:hive, 
> createTime:1637119984, 
> managedLocationUri:hdfs://localhost:20500/test-warehouse/managed/functional.db)
> 2021-11-16T19:38:03,323  INFO [pool-9-thread-58] metastore.HiveMetaStore: 63: 
> source:127.0.0.1 create_table_req: 
> Table(tableName:insert_only_transactional_table, dbName:functional, 
> owner:jenkins, createTime:1637120283, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/managed/insert_only_transactional_table,
>  inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
> partitionKeys:[], parameters:{bucketing_version=2, 
> transactional_properties=insert_only, transactional=true}, 
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, 
> privileges:PrincipalPrivilegeSet(userPrivileges:{jenkins=[PrivilegeGrantInfo(privilege:INSERT,
>  createTime:-1, grantor:jenkins, grantorType:USER, grantOption:true), 
> PrivilegeGrantInfo(privilege:SELECT, createTime:-1, grantor:jenkins, 
> grantorType:USER, grantOption:true), 

[jira] [Updated] (IMPALA-11027) Support for ShellBasedUnixGroupMapping for Impala's user delegation via groups

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11027:

Fix Version/s: Impala 4.1.0

> Support for ShellBasedUnixGroupMapping for Impala's user delegation via groups
> --
>
> Key: IMPALA-11027
> URL: https://issues.apache.org/jira/browse/IMPALA-11027
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Amogh Margoor
>Assignee: Amogh Margoor
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> When impala.doAs.user is set for user delegation, Impala checks if the 
> delegation is allowed or not based on either of the following:
>  # user mapping: Specified using 
> {code:java}
> authorized_proxy_user_config{code}
>  # groups mapping:  Specified using. authorized_proxy_group_config
>  
>  
> For checking group mapping currently JNIBasedUnixGroupMapping is supported 
> and not ShellBasedUnixGroupMapping. Ref: 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/GroupsMapping.html]
> This was done because of caveats that ShellBasedUnixGroupMapping spawns a new 
> shell command to figure out groups for 'impala.doAs.user' when group mapping 
> is specified. Many numerous shell commands could potentially cause issues 
> like resource crunch, file descriptor issues and also zombie processes. Hence 
> it is discouraged. However, we should support it for users that understand 
> these caveats well and still want to use it. One reason could be that other 
> components of impala might not have moved to JNI based group mapping and 
> still be using Shell based.
> Regarding caveats few things help:
>  # Chances are zombie process are very low.
>  # Due to vfork being used we do not see too much of resource consumption 
> during process spawning. On memory it takes around 8KB and duration of 
> process is around 16-17ms.
>  # immediate exec after vfork would ensure other resources that might get 
> cloned via vfork are present for very short duration.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11021) Impala throw IllegalStateException when use predicate hint in query

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11021:

Fix Version/s: Impala 4.1.0

> Impala throw IllegalStateException when use predicate hint in query
> ---
>
> Key: IMPALA-11021
> URL: https://issues.apache.org/jira/browse/IMPALA-11021
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0.0
>Reporter: Sheng Wang
>Assignee: Sheng Wang
>Priority: Minor
> Fix For: Impala 4.1.0
>
>
> Hi [~stigahuang],[~amargoor], recently when I worked on IMPALA-7942, I found 
> a bug when use predicate hint in query. Here is the query to reproduce this 
> exception:
> {code:java}
> select * from tpch.lineitem where /* +ALWAYS_TRUE_TEST */ l_shipdate <= 
> (select '1998-09-02')  limit 10;
> {code}
> Here is the stack:
> {code:java}
> I1117 00:42:38.977468 23408 jni-util.cc:286] 
> b14b98e13bd40747:07fc212b] java.lang.IllegalStateException: Failed 
> analysis after expr substitution.
> at org.apache.impala.analysis.Expr.substituteList(Expr.java:1118)
> at 
> org.apache.impala.analysis.SelectStmt.materializeRequiredSlots(SelectStmt.java:1069)
> at 
> org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:701)
> at 
> org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:278)
> at 
> org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:170)
> at 
> org.apache.impala.planner.Planner.createPlanFragments(Planner.java:121)
> at org.apache.impala.planner.Planner.createPlans(Planner.java:248)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1543)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1885)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1733)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1625)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1595)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:162)
> {code}
> I've already read the code, this is due to predicate analyze failed after 
> clone, this code check failed:
> {code:java}
> Preconditions.checkState(!globalState_.warningsRetrieved)
> {code}
>  I'm not sure if this bug already fixed on master branch. If not, I'd like to 
> try this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11037) Bump ORC to 1.7-p4 to contain the improvement of ORC-1020

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11037.
-
Resolution: Fixed

> Bump ORC to 1.7-p4 to contain the improvement of ORC-1020
> -
>
> Key: IMPALA-11037
> URL: https://issues.apache.org/jira/browse/IMPALA-11037
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: orc_1.7.0-p3_random_int32.svg, 
> orc_1.7.0-p3_random_int64.svg, orc_1.7.0-p4_random_int32.svg, 
> orc_1.7.0-p4_random_int64.svg
>
>
> ORC-1020 improves read performance of the ORC library in scanning random 
> integers. Columns that encoded into integers, e.g. dictionary encoded 
> strings, will also benifit from this.
> This Jira aims to add ORC-1020 to our native-toolchain and bump our orc 
> version to 1.7-p4 to contain it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11037) Bump ORC to 1.7-p4 to contain the improvement of ORC-1020

2021-12-02 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11037:

Fix Version/s: Impala 4.1.0

> Bump ORC to 1.7-p4 to contain the improvement of ORC-1020
> -
>
> Key: IMPALA-11037
> URL: https://issues.apache.org/jira/browse/IMPALA-11037
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Fix For: Impala 4.1.0
>
> Attachments: orc_1.7.0-p3_random_int32.svg, 
> orc_1.7.0-p3_random_int64.svg, orc_1.7.0-p4_random_int32.svg, 
> orc_1.7.0-p4_random_int64.svg
>
>
> ORC-1020 improves read performance of the ORC library in scanning random 
> integers. Columns that encoded into integers, e.g. dictionary encoded 
> strings, will also benifit from this.
> This Jira aims to add ORC-1020 to our native-toolchain and bump our orc 
> version to 1.7-p4 to contain it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10886) TestReusePartitionMetadata.test_reuse_partition_meta fails

2021-12-02 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452715#comment-17452715
 ] 

Quanlong Huang commented on IMPALA-10886:
-

Uploaded a fix for review: https://gerrit.cloudera.org/c/18066/

> TestReusePartitionMetadata.test_reuse_partition_meta fails
> --
>
> Key: IMPALA-10886
> URL: https://issues.apache.org/jira/browse/IMPALA-10886
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: test_local_catalog.patch
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14670/testReport/junit/custom_cluster.test_local_catalog/TestReusePartitionMetadata/test_reuse_partition_meta/
> {code}
> custom_cluster/test_local_catalog.py:586: in test_reuse_partition_meta
> self.check_missing_partitions(unique_database, 1)
> custom_cluster/test_local_catalog.py:595: in check_missing_partitions
> assert match.group(1) == str(partition_misses)
> E   assert '0' == '1'
> E - 0
> E + 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10801) Check the latest compaction Id before serving request

2021-12-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452699#comment-17452699
 ] 

ASF subversion and git services commented on IMPALA-10801:
--

Commit 4077bc849ae14bb92a463aeeb6c8f5c1fca658c9 in impala's branch 
refs/heads/master from Yu-Wen Lai
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4077bc8 ]

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata. Since this
checking brings additional overhead for queries, we introduce a flag
auto_check_compaction and set it as false by default for now. We will
find some other efficient ways to do compaction checking in the future.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Reviewed-on: http://gerrit.cloudera.org:8080/18043
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Check the latest compaction Id before serving request
> -
>
> Key: IMPALA-10801
> URL: https://issues.apache.org/jira/browse/IMPALA-10801
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> Cache compaction Id for a given table/file-metadata in CatalogD.
> Whenever there is a read request to CatalogD, get the latest compaction event 
> Id from HMS, compare it with what is cached in CatalogD, and based on that 
> decide whether to serve the data from cache or to refresh it from the 
> filesystem. This can avoid notification based cache invalidation.
> Also, since there will be an open txn for the current long running query 
> which is being served from CatalogD, we can be sure that current 
> file-metadata being served is not already deleted by the cleaner.
> This proposal will use a new HMS API 
> (https://issues.apache.org/jira/browse/HIVE-24828) to get the latest 
> compaction id for a table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10764) Web UI shows error in the /logs page, if stdout/stderr is not redirected to INFO/ERROR logs

2021-12-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452701#comment-17452701
 ] 

ASF subversion and git services commented on IMPALA-10764:
--

Commit 9d61bc450eddee46fb9a4e6d9acef992c753988b in impala's branch 
refs/heads/master from Andrew Sherman
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9d61bc4 ]

IMPALA-10764: hide /logs link in webui if --logtostderr=true

If you start an Impala daemon with the flags
 "--logtostderr=true --redirect_stdout_stderr=false"
then log files are not created. After this the webui will display an
error if you click on the "/logs" link. Fix this by not adding
the "/logs" link in this case.

TESTING
- Added a new test that validates the navbar links in the webui.
- Ran exhaustive tests

Change-Id: I65234213f32902caa1f4368181b49f012a4dbcb3
Reviewed-on: http://gerrit.cloudera.org:8080/18062
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Web UI shows error in the /logs page, if stdout/stderr is not redirected to 
> INFO/ERROR logs
> ---
>
> Key: IMPALA-10764
> URL: https://issues.apache.org/jira/browse/IMPALA-10764
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Andrew Sherman
>Priority: Major
>
> The error message looks like this: 
> {code:java}
> Error: Couldn't open INFO log file /statestored.INFO{code}
> We should probably hide the /logs page in the Web UI if stdout/stderr is not 
> redirected to INFO/ERROR logs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11037) Bump ORC to 1.7-p4 to contain the improvement of ORC-1020

2021-12-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452700#comment-17452700
 ] 

ASF subversion and git services commented on IMPALA-11037:
--

Commit d467a2f96d0ca03c77aae1b30b2bcacfff20a8e1 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d467a2f ]

IMPALA-11037: Bump ORC to 1.7.0-p4

This patch bumps the ORC version from 1.7.0-p3 to 1.7.0-p4 to contain
the improvement of ORC-1020.

Change-Id: I8444c6f8ff4addbaa33dbb64d8bfd937ab1db1bf
Reviewed-on: http://gerrit.cloudera.org:8080/18060
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> Bump ORC to 1.7-p4 to contain the improvement of ORC-1020
> -
>
> Key: IMPALA-11037
> URL: https://issues.apache.org/jira/browse/IMPALA-11037
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: orc_1.7.0-p3_random_int32.svg, 
> orc_1.7.0-p3_random_int64.svg, orc_1.7.0-p4_random_int32.svg, 
> orc_1.7.0-p4_random_int64.svg
>
>
> ORC-1020 improves read performance of the ORC library in scanning random 
> integers. Columns that encoded into integers, e.g. dictionary encoded 
> strings, will also benifit from this.
> This Jira aims to add ORC-1020 to our native-toolchain and bump our orc 
> version to 1.7-p4 to contain it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11032) Automatic Refresh of Metadata for Local Catalog after Compaction

2021-12-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452698#comment-17452698
 ] 

ASF subversion and git services commented on IMPALA-11032:
--

Commit 4077bc849ae14bb92a463aeeb6c8f5c1fca658c9 in impala's branch 
refs/heads/master from Yu-Wen Lai
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4077bc8 ]

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata. Since this
checking brings additional overhead for queries, we introduce a flag
auto_check_compaction and set it as false by default for now. We will
find some other efficient ways to do compaction checking in the future.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Reviewed-on: http://gerrit.cloudera.org:8080/18043
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Automatic Refresh of Metadata for Local Catalog after Compaction
> 
>
> Key: IMPALA-11032
> URL: https://issues.apache.org/jira/browse/IMPALA-11032
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> After Hive compaction for tables created in the Hive warehouse, queries(HIVE 
> ACID table) made in Impala possibly fail with a FileNotFoundException if a 
> file removed by the Hive cleaner.
> In IMPALA-10801, we check latest compaction id before serving metadata from 
> Catalogd. However, coordinators don't take advantage of that. Coordinators 
> have local cache so we will have do the same compaction check for coordinator 
> as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10886) TestReusePartitionMetadata.test_reuse_partition_meta fails

2021-12-02 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452686#comment-17452686
 ] 

Quanlong Huang commented on IMPALA-10886:
-

I think I know what happens now. For self-event detection, we don't detect all 
DROP_PARTITION events:
https://github.com/apache/impala/blob/097b10104f23e0927d5b21b43a79f6cc10425f59/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L1983
{code:java}
  public static class DropPartitionEvent extends MetastoreTableEvent {
...
protected SelfEventContext getSelfEventContext() {
  throw new UnsupportedOperationException("self-event evaluation is not 
needed for "
  + "this event type");
}
{code}

I think this is by-design. Instead, we have logics in 
{{CatalogOpExecutor#canDropPartitionFromEvent()}} to skip DROP_PARTITION 
events: 
https://github.com/apache/impala/blob/cc6f6d5c91ba1db3fca83c65f7d2f87c98077025/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4138

The issue is that we loss the CreateEventId for the new added partitions when 
reloading them in updateCatalog(), i.e. the {{partitionToEventId}} is not used 
in the following {{loadTableMetadata}} call. So we have CreateEventId being -1 
on them and get them dropped by the DROP_PARTITION event.
https://github.com/apache/impala/blob/cc6f6d5c91ba1db3fca83c65f7d2f87c98077025/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L6260

> TestReusePartitionMetadata.test_reuse_partition_meta fails
> --
>
> Key: IMPALA-10886
> URL: https://issues.apache.org/jira/browse/IMPALA-10886
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: test_local_catalog.patch
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14670/testReport/junit/custom_cluster.test_local_catalog/TestReusePartitionMetadata/test_reuse_partition_meta/
> {code}
> custom_cluster/test_local_catalog.py:586: in test_reuse_partition_meta
> self.check_missing_partitions(unique_database, 1)
> custom_cluster/test_local_catalog.py:595: in check_missing_partitions
> assert match.group(1) == str(partition_misses)
> E   assert '0' == '1'
> E - 0
> E + 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11046) When GetTupleIdx fails, it should return INVALID_IDX, not bring down impalad

2021-12-02 Thread Steve Carlin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Carlin reassigned IMPALA-11046:
-

Assignee: Steve Carlin

> When GetTupleIdx fails, it should return INVALID_IDX, not bring down impalad
> 
>
> Key: IMPALA-11046
> URL: https://issues.apache.org/jira/browse/IMPALA-11046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>
> The following code exists in runtime/descriptors.cc:
>  int RowDescriptor::GetTupleIdx(TupleId id) const {
>   DCHECK_LT(id, tuple_idx_map_.size()) << "RowDescriptor: " << DebugString();
>    return tuple_idx_map_[id];
>  }
>  
> If the id doesn't exist in the map, it returns INVALID_IDX. However, if the 
> id >= tuple_idx_size, it crashes the server.
> I was working on an issue on the frontend where I passed an incorrect index 
> and it failed the query when I passed a bad index in both instances, but it 
> was much preferable not to crash the server and only fail the query.  So the 
> proposal here is to get rid of the DCHECK_LT and replace it with a "return 
> INVALID_IDX" when it fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11046) When GetTupleIdx fails, it should return INVALID_IDX, not bring down impalad

2021-12-02 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-11046:
-

 Summary: When GetTupleIdx fails, it should return INVALID_IDX, not 
bring down impalad
 Key: IMPALA-11046
 URL: https://issues.apache.org/jira/browse/IMPALA-11046
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Steve Carlin


The following code exists in runtime/descriptors.cc:



 int RowDescriptor::GetTupleIdx(TupleId id) const {

  DCHECK_LT(id, tuple_idx_map_.size()) << "RowDescriptor: " << DebugString();

   return tuple_idx_map_[id];

 }

 

If the id doesn't exist in the map, it returns INVALID_IDX. However, if the id 
>= tuple_idx_size, it crashes the server.

I was working on an issue on the frontend where I passed an incorrect index and 
it failed the query when I passed a bad index in both instances, but it was 
much preferable not to crash the server and only fail the query.  So the 
proposal here is to get rid of the DCHECK_LT and replace it with a "return 
INVALID_IDX" when it fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11045) Should start transaction when auto_check_compaction is enabled

2021-12-02 Thread Yu-Wen Lai (Jira)
Yu-Wen Lai created IMPALA-11045:
---

 Summary: Should start transaction when auto_check_compaction is 
enabled
 Key: IMPALA-11045
 URL: https://issues.apache.org/jira/browse/IMPALA-11045
 Project: IMPALA
  Issue Type: Bug
Reporter: Yu-Wen Lai
Assignee: Yu-Wen Lai


This is a follow-up of IMPALA-11032. Currently Impala doesn't open transaction 
for select queries, so we might still get a FileNotFound error when there is a 
compaction happen after compaction checking and the files get cleaned by the 
cleaner. To provide strong guarantee for this compaction checking approach, we 
need to open a transaction even for select queries so that the cleaner won't 
clean the files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8592) Add support for insert events for 'LOAD DATA..' statements from Impala.

2021-12-02 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452610#comment-17452610
 ] 

Vihang Karajgaonkar commented on IMPALA-8592:
-

One of the usecase here is that if you have multiple Impala clusters a load 
data statement in one Impala will not generate any events and hence the table 
will need to be refreshed on all the Impala clusters.

> Add support for insert events for 'LOAD DATA..' statements from Impala.
> ---
>
> Key: IMPALA-8592
> URL: https://issues.apache.org/jira/browse/IMPALA-8592
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Anurag Mantripragada
>Priority: Major
>
> Hive generates INSERT events for LOAD DATA.. statements. We should support 
> the same in Impala.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10886) TestReusePartitionMetadata.test_reuse_partition_meta fails

2021-12-02 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452520#comment-17452520
 ] 

Vihang Karajgaonkar commented on IMPALA-10886:
--

Do we know why we don't detect DROP_PARTITION as self-event in this case?

> TestReusePartitionMetadata.test_reuse_partition_meta fails
> --
>
> Key: IMPALA-10886
> URL: https://issues.apache.org/jira/browse/IMPALA-10886
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: test_local_catalog.patch
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14670/testReport/junit/custom_cluster.test_local_catalog/TestReusePartitionMetadata/test_reuse_partition_meta/
> {code}
> custom_cluster/test_local_catalog.py:586: in test_reuse_partition_meta
> self.check_missing_partitions(unique_database, 1)
> custom_cluster/test_local_catalog.py:595: in check_missing_partitions
> assert match.group(1) == str(partition_misses)
> E   assert '0' == '1'
> E - 0
> E + 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9857) Batch ALTER_PARTITION events

2021-12-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-9857.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11028) Table loading could fail if metastore cleans up old events

2021-12-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-11028.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Table loading could fail if metastore cleans up old events
> --
>
> Key: IMPALA-11028
> URL: https://issues.apache.org/jira/browse/IMPALA-11028
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> After IMPALA-10502, Catalogd tracks the table's create event id. When the 
> table is loaded for the first time, it updates the create event id of the 
> table. But if the table is loaded for the first time after a long delay 
> (after 24 hrs) it is possible the metastore cleans up old notification logs 
> entries which are required by catalogd during the table load.
> See this snippet from TableLoader.java
> {noformat}
>   if (eventId != -1 && catalog_.isEventProcessingActive()) {
> // If the eventId is not -1 it means this table was likely created by 
> Impala.
> // However, since the load operation of the table can happen much 
> later, it is
> // possible that the table was recreated outside Impala and hence the 
> eventId
> // which is stored in the loaded table needs to be updated to the 
> latest.
> // we are only interested in fetching the events if we have a valid 
> eventId
> // for a table. For tables where eventId is unknown are not created by
> // this catalogd and hence the self-event detection logic does not 
> apply.
> events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, 
> eventId,
> notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE
> .equals(notificationEvent.getEventType())
> && 
> notificationEvent.getDbName().equalsIgnoreCase(db.getName())
> && 
> notificationEvent.getTableName().equalsIgnoreCase(tblName));
>   }
> {noformat}
> {{getNextMetastoreEvents}} method can throw the following exception if the 
> metastore has cleaned up older entries (by default 24 hrs). This is 
> controlled by configuration {{hive.metastore.event.db.listener.timetolive}} 
> on the metastore side.
> I could reproduce the problem setting the following metastore configs.
> {noformat}
> hive.metastore.event.db.listener.clean.interval=10s
> hive.metastore.event.db.listener.timetolive=120s
> {noformat}
> Now run the following Impala script
> {noformat}
> create table t1 (c1 int);
> create table t2 (c1 int);
> select sleep(24);
> create table t3 (c1 int);
> select * from t1;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-11042) Special characters are not escaped during LDAP search bind authentication

2021-12-02 Thread Tamas Mate (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452501#comment-17452501
 ] 

Tamas Mate edited comment on IMPALA-11042 at 12/2/21, 4:42 PM:
---

It is a bit tricky, when I add an extra escape {{\}} within Impala, the 
following works:
{code}
(uniqueMember=cn=Doe\\, John,ou=Users2,dc=myorg,dc=com)
{code}

However, when I use {{ldapsearch}} with bash, I have to add an additional {{\}}:
{code}
(uniqueMember=cn=Doe\\\, John,ou=Users2,dc=myorg,dc=com)
{code}

So far I have tested this with AD and the LDAP which is embedded in the unit 
tests, in both cases the double backlash worked.


was (Author: tmate):
It is a bit tricky, when I add an extra escape {{\}} within Impala, the 
following works:
{code}
(uniqueMember=cn=Doe\\, John,ou=Users2,dc=myorg,dc=com)
{code}

However, when I use {{ldapsearch}} with bash, I have to add an additional {{\}}:
{code}
(uniqueMember=cn=Doe\\\, John,ou=Users2,dc=myorg,dc=com)
{code}

So far I have tested this with AD and the LDAP which is embedded in the unit 
tests.

> Special characters are not escaped during LDAP search bind authentication
> -
>
> Key: IMPALA-11042
> URL: https://issues.apache.org/jira/browse/IMPALA-11042
> Project: IMPALA
>  Issue Type: Bug
>  Components: Security
>Affects Versions: Impala 4.0.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Major
>
> For search bind authentication during group search }1{ notation is 
> allowed, it represents the user's distinguished name, which is extracted from 
> the result of the user search. In certain use-cases this can contain special 
> characters, for example this a valid {{dn: cn=Doe\, 
> John,ou=Users2,dc=myorg,dc=com}}. This string is then used to create a group 
> search filter, however from the client end these characters should be escaped 
> properly, without that the following happens:
> {code}
> W1201 15:27:45.801143 32013 ldap-util.cc:196] LDAP search failed with base 
> DN=ou=Groups,dc=myorg,dc=com and filter=(uniqueMember=cn=Doe\, 
> John,ou=Users2,dc=myorg,dc=com) : Bad search filter
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11042) Special characters are not escaped during LDAP search bind authentication

2021-12-02 Thread Tamas Mate (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452501#comment-17452501
 ] 

Tamas Mate commented on IMPALA-11042:
-

It is a bit tricky, when I add an extra escape {{\}} within Impala, the 
following works:
{code}
(uniqueMember=cn=Doe\\, John,ou=Users2,dc=myorg,dc=com)
{code}

However, when I use {{ldapsearch}} with bash, I have to add an additional {{\}}:
{code}
(uniqueMember=cn=Doe\\\, John,ou=Users2,dc=myorg,dc=com)
{code}

So far I have tested this with AD and the LDAP which is embedded in the unit 
tests.

> Special characters are not escaped during LDAP search bind authentication
> -
>
> Key: IMPALA-11042
> URL: https://issues.apache.org/jira/browse/IMPALA-11042
> Project: IMPALA
>  Issue Type: Bug
>  Components: Security
>Affects Versions: Impala 4.0.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Major
>
> For search bind authentication during group search }1{ notation is 
> allowed, it represents the user's distinguished name, which is extracted from 
> the result of the user search. In certain use-cases this can contain special 
> characters, for example this a valid {{dn: cn=Doe\, 
> John,ou=Users2,dc=myorg,dc=com}}. This string is then used to create a group 
> search filter, however from the client end these characters should be escaped 
> properly, without that the following happens:
> {code}
> W1201 15:27:45.801143 32013 ldap-util.cc:196] LDAP search failed with base 
> DN=ou=Groups,dc=myorg,dc=com and filter=(uniqueMember=cn=Doe\, 
> John,ou=Users2,dc=myorg,dc=com) : Bad search filter
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10886) TestReusePartitionMetadata.test_reuse_partition_meta fails

2021-12-02 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452389#comment-17452389
 ] 

Quanlong Huang commented on IMPALA-10886:
-

Yeah, this issue happens when the INSERT finishes before the DROP_PARTITION 
event is processed.

I think there are two approaches to fix this. One is detecting the 
DROP_PARTITION event as a self-event and skip it. The other way is make sure 
the new partition has a larger createEventId so the DROP_PARTITION event won't 
be processed:
{code:java}
  private boolean canDropPartitionFromEvent(long eventId, HdfsTable hdfsTable,
  List values) throws CatalogException {
...
// if the partition has been created since the event was generated, skip
// dropping the event.
if (hdfsPartition.getCreateEventId() > eventId) { // The CreateEventId 
is -1 in this case.
  LOG.info("Not dropping partition {} of table {} since it's create event 
id {} is "
  + "higher than eventid {}", hdfsPartition.getPartitionName(),
  hdfsTable.getFullName(), hdfsPartition.getCreateEventId(), eventId);
  return false;
}
return true;
  }
{code}
In this case, the partition is reloaded in executing updateCatalog() request 
for the INSERT. Its CreateEventId is -1 so the check fail.

[~vihangk1], what do you think?

> TestReusePartitionMetadata.test_reuse_partition_meta fails
> --
>
> Key: IMPALA-10886
> URL: https://issues.apache.org/jira/browse/IMPALA-10886
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: test_local_catalog.patch
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14670/testReport/junit/custom_cluster.test_local_catalog/TestReusePartitionMetadata/test_reuse_partition_meta/
> {code}
> custom_cluster/test_local_catalog.py:586: in test_reuse_partition_meta
> self.check_missing_partitions(unique_database, 1)
> custom_cluster/test_local_catalog.py:595: in check_missing_partitions
> assert match.group(1) == str(partition_misses)
> E   assert '0' == '1'
> E - 0
> E + 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org