[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13:

(2 comments)

> Patch Set 13: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9985/

To reproduce the compilation error, you need to execute "export 
USE_APACHE_HIVE=true" before using buildall.sh.

http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java@955
PS13, Line 955: MetaStoreServerUtils
This class doesn't exist in apache-hive-3.1.3. I think we should still use 
MetaStoreUtils#validateColumnName().

https://github.com/apache/hive/blob/rel/release-3.1.3/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L505-L507

If the implementation of this method in two MetastoreShim classes are the same, 
we can put this method in Hive3MetastoreShimBase.


http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/20506/13/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@125
PS13, Line 125: import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
nit: already imported at L87



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 08 Dec 2023 07:36:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14618/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 08 Dec 2023 03:01:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20485/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20485/3//COMMIT_MSG@7
PS3, Line 7: IMPALA-10949: Improve batching logic of partition
   : events
nit: put the title in one line


http://gerrit.cloudera.org:8080/#/c/20485/3/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/20485/3/tests/custom_cluster/test_events_custom_configs.py@403
PS3, Line 403: EventProcessorUtils.wait_for_event_processing(self)
Can we also verify all events are skipped? The metric of "events-skipped" can 
be checked.



--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 08 Dec 2023 02:34:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14617/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 08 Dec 2023 02:35:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-07 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1556
PS3, Line 1556: wasEventSyncTurnedOn
> Wouldn't it be better to move this block earlier, before self event check?
In the above scenario, first (2) is executed, and while reloading table 
metadata it'll fetch altered table property from HMS.
At (3), if the table is loaded, it'll mark the table as stale (See L#809). When 
(4) arrives, inflight versions will not have (2) since the table was marked as 
stale in the previous step, so the event is processed and marked that event 
sync is enabled.
If there is no external event before (4), then the event would have been 
skipped as self-event.

I have added an end-to-end test to mimic this scenario.


http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1557
PS3, Line 1557: handleEventSyncTurnedOn();
> I think that moving this block to a separate function like handleEventSyncT
Ack


http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1562
PS3, Line 1562: oad.
> Are we prepared for throwing an exception here? Using getTableNoThrow seems
Good point. will implement your suggestion



--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 08 Dec 2023 02:22:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-07 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20648

to look at the new patch set (#4).

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..

IMPALA-10987: Changing impala.disableHmsSync in
Hive should not break event processing

Currently we require a global invalidate to reset the events processor
if the events sync is re-enabled on a table from HMS. This patch
eliminates the need to reset the catalog cache when events sync is
re-enabled.

Implementation details: when events sync is re-enabled on table via HMS
1) If the table exists in Impala,
  a) We can just invalidate the table, if the current event is greater
than the create event id of the table, so that it is reloaded the first
time query accesses it.
  b) Otherwise we can just ignore the event.
2) If the table doesn't exist in Impala, create a Incomplete table, if
there is no entry in the event delete log for this table.

Note: If the eventSync is disabled on a table, for all subsequent table
events, if the table object is loaded, mark the table as stale, so that
it is reloaded the next time query accesses it.

Testing:
1) manually verified few scenarios.
2) Added test case for the above scenarios.

Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 157 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/20648/4
--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-07 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20491

to look at the new patch set (#4).

Change subject: IMPALA-12443: Add catalog timeline for all ddl profiles
..

IMPALA-12443: Add catalog timeline for all ddl profiles

This is a follow-up work of IMPALA-12024 where we add the catalog
timeline for CreateTable statements. Using the same mechanism, this
patch adds catalog timeline for all DDL/DML profiles, including
REFRESH and INSERT.

The goal is to add timeline markers after each step that could be
blocked, e.g. acquiring locks, external RPCs. Tried to add some constant
strings for widely used events, e.g. "Fetched table from Metastore".
Didn't do so for events that only occurs once.

Most of the catalog methods have a new argument for tracking the
execution timeline. To avoid adding null checks everywhere, for code
paths that don't need a catalog profile, e.g. EventProcessor, creates an
unused catalogTimeline as the argument. We can use them in future works,
e.g. expose execution timeline of a slow processing on an HMS event.

This patch also removes some unused overloads of HdfsTable#load() and
HdfsTable#reloadPartitionsFromNames().

Tests:
 - Add e2e test to verify the catalog timeline in some DDLs.

Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogdTableInvalidator.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/MaterializedViewHdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/EventSequence.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M tests/query_test/test_observability.py
33 files changed, 924 insertions(+), 603 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/20491/4
--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14616/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:55:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:34:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39
PS5, Line 39: const int ENGINE_CACHE_SIZE = 990;
> This is larger than the value that was here before. Was the test failing un
Yeah, this is the real number of the cache containing one compiled function. 
The earlier count was misleading as the cache was empty. I've adjusted the 
testcase, and added LlvmCodeGenCacheTest::CheckObjCacheExists to confirm the 
existence of contents within the cache.


http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176
PS5, Line 176: // The function is to create and return a CodeGenObjectCache 
which contains a specific
> nit: "contains" not "contians"
Done



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:28:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, after the compilation of the module, the compiled
codegened functions of the module would be set to the execution
engine, therefore funtions could be used by Impala. During function
compilation within the module, if an ObjectCache is set to the
execution engine, the compiled codegened functions would be also
written into the cache. This way, if we keep the cache, when
revisiting the same module (fragment), we can efficiently reuse
the specific ObjectCache, loading pre-compiled codegened functions
and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc
File be/src/codegen/llvm-codegen-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@39
PS5, Line 39: const int ENGINE_CACHE_SIZE = 990;
This is larger than the value that was here before. Was the test failing until 
this patch set?


http://gerrit.cloudera.org:8080/#/c/20733/5/be/src/codegen/llvm-codegen-cache-test.cc@176
PS5, Line 176: // The function to create and return a CodeGenObjectCache which 
contians a specific
nit: "contains" not "contians"



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 08 Dec 2023 00:07:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9985/


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 23:41:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

2023-12-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20681 )

Change subject: IMPALA-12322: Support converting UTC timestamps read from Kudu 
to local time
..


Patch Set 12: Code-Review+1

(3 comments)

Great work! Only minor comments.

http://gerrit.cloudera.org:8080/#/c/20681/12/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/20681/12/common/function-registry/impala_functions.py@257
PS12, Line 257:   [['to_utc_timestamp'], 'TIMESTAMP', ['TIMESTAMP', 'STRING', 
'BOOLEAN'],
  :"impala::TimestampFunctions::ToUtcUnambiguous"],
Maybe this should rather go to the invisible functions list?
I am not sure here - while I would prefer to not make this a supported 
functions, it can be useful for users in some cases.


http://gerrit.cloudera.org:8080/#/c/20681/12/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20681/12/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@744
PS12, Line 744:*/
Can you mention that now this can return a list for timestamps?


http://gerrit.cloudera.org:8080/#/c/20681/12/testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test:

http://gerrit.cloudera.org:8080/#/c/20681/12/testdata/workloads/functional-query/queries/QueryTest/kudu_predicate_with_timestamp_conversion.test@18
PS12, Line 18:  order by id
here and at other queries: "order by id" is not necessary - the rows in the 
results section + the actual results are ordered before comparison by default 
in EE tests

otherwise most tests would need order by, as Impala has no guarantee for the 
order of returned rows if there are multiple files in a table



--
To view, visit http://gerrit.cloudera.org:8080/20681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Gerrit-Change-Number: 20681
Gerrit-PatchSet: 12
Gerrit-Owner: Zihao Ye 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 07 Dec 2023 22:39:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14615/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 07 Dec 2023 20:01:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 07 Dec 2023 19:49:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-07 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20485

to look at the new patch set (#3).

Change subject: IMPALA-10949: Improve batching logic of partition events
..

IMPALA-10949: Improve batching logic of partition
events

Currently the batching logic for partitions evaluates self-event check
by looking at the event id of the last event in the batch. This patch
improves the batching logic by evaluating table's lastSyncEventId to
the current event's event id and decide whether to batch the event or
not. This way we can have fewer batch sizes and avoid self events if
any, beforehand.

Testing: Added an end-to-end test to verfiy the batch size for
ALTER_PARTITION batch events.

Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M tests/custom_cluster/test_events_custom_configs.py
2 files changed, 38 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/20485/3
--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-07 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@741
PS2, Line 741: Older
> the name doesn't match the semantics, the function returns true if the even
Ack


http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@747
PS2, Line 747:
> Exception handling could be skipped by using getTableNoThrow instead of get
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 07 Dec 2023 19:33:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 19:13:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9985/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 19:13:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables

2023-12-07 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20677 )

Change subject: IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables
..


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/iceberg-delete-sink.h
File be/src/exec/iceberg-delete-sink.h:

http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/iceberg-delete-sink.h@81
PS9, Line 81:   /// Verifies that the row batch does not contain duplicated 
rows.
Could you add an explanation of intent, it is not clear for me why this is 
needed.


http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/iceberg-delete-sink.h@82
PS9, Line 82: VerifyRowBatch
Nit: For readability could you rename to something like: 
VerifyRowsNotDuplicated/VerifyRowDistinctiveness


http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/multi-table-sink.cc
File be/src/exec/multi-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/exec/multi-table-sink.cc@52
PS9, Line 52: tsinkBase
nit: tsink_base


http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/20677/9/be/src/runtime/dml-exec-state.h@72
PS9, Line 72:   void UpdatePartition(const std::string& partition_name,
:   int64_t num_rows_delta, const DmlStatsPB* insert_stats,
:   bool is_delete = false);
nit: should fit into 2 lines


http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java@120
PS9, Line 120: id
nit: ++nextTableId_ and return nextTableId_


http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java:

http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java@87
PS9, Line 87:   public List getDeletePartitionExprs(Analyzer analyzer)
:   throws AnalysisException {
nit: should fit into 1 line


http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/IcebergModifyImpl.java@94
PS9, Line 94:   public List getDeleteResultExprs(Analyzer analyzer)
:   throws AnalysisException {
nit: should fit into 1 line


http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/planner/Planner.java@185
PS9, Line 185:
nit: indentation



--
To view, visit http://gerrit.cloudera.org:8080/20677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08
Gerrit-Change-Number: 20677
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 07 Dec 2023 17:55:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10987: Changing impala.disableHmsSync in Hive should not break event processing

2023-12-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20648 )

Change subject: IMPALA-10987: Changing impala.disableHmsSync in Hive should not 
break event processing
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1556
PS3, Line 1556: wasEventSyncTurnedOn
Wouldn't it be better to move this block earlier, before self event check?

I am unsure about the handling of enabling metadata sync as self event. Assume 
the following scenario:
0. event processing starts disabled for a table
1. another component changes a random table prop with ALTER TABLE
2. before processing the event from 1 impala enables hms sync on the table
3. the event from 1 arrives, but catalogd skips it as hms sync was still 
disabled at that point
4. the event from 2 arrives, but it is skipped at it is a self event

I think that at 4 catalogd should mark the table as stale. as it didn't process 
event from 1


http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1557
PS3, Line 1557: // check if the table exists or not. 1) if the table 
doesn't exist create an
I think that moving this block to a separate function like 
handleEventSyncTurnedOn() would make the main path clearer.


http://gerrit.cloudera.org:8080/#/c/20648/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1562
PS3, Line 1562: getTable
Are we prepared for throwing an exception here? Using getTableNoThrow seems 
more fitting to avoid exception when the DB also doesn't exist (not sure if 
this is possible at this point)



--
To view, visit http://gerrit.cloudera.org:8080/20648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37055990be49e91462ebc98aa97009ca768a0072
Gerrit-Change-Number: 20648
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 07 Dec 2023 15:41:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20760 )

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14614/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Dec 2023 15:36:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

2023-12-07 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20760

to look at the new patch set (#3).

Change subject: IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.
..

IMPALA-12313: (part 3) Add UPDATE support for Iceberg tables.

Part 2 had some limitations, i.e. it could not update Iceberg
tables if any of the following were true:
 * UPDATE value of partitioning column
 * UPDATE table that went through partition evolution
 * Table has SORT BY properties

The problem with partitions is that the delete record and new
data record might belong to different partitions and records
are shuffled across based on the partitions of the delete
records, hence the data files might not get written efficiently.

The problem with SORT BY properties, is that we need to write
the position delete files ordered by (file_path, position).

To address the above problems, this patch set introduces a new
backend operator: IcebergBufferedDeleteSink. This new operator
extracts and aggregates the delete record information from
the incoming row batches, then in FlushFinal it orders the
position delete records and writes them out to files. This
mechanism is similar to Hive's approach:
https://github.com/apache/hive/pull/3251

Now records can get shuffled around based on the new data records'
partition values, and the SORT operator sorts the records based on
the SORT BY properties.

There's only one case we don't allow the UPDATE statement:
 * UPDATE partition column AND
 * Right-hand side of assignment is non-constant expression AND
 * UPDATE statement has a JOIN

When all of the above conditions meet, it would be possible to
have an incorrect JOIN condition that has multiple matches for the
data records, then the duplicated records would be shuffled
independently (based on the new partition value) to different
backend SINKs, and the different backend SINK would not be able
to detect the duplicates.

If any of the above conditions was false, then the duplicated records
would be shuffled together to the same SINK, that could do the
duplicate check.

This patch also moves some code from IcebergDeleteSink to the
newly introduced IcebergDeleteSinkBase.

TODO:
 * planner tests
 * e2e tests
 * concurrent tests
 * Impala/Hive interop tests

Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
A be/src/exec/iceberg-buffered-delete-sink.cc
A be/src/exec/iceberg-buffered-delete-sink.h
A be/src/exec/iceberg-delete-sink-base.cc
A be/src/exec/iceberg-delete-sink-base.h
A be/src/exec/iceberg-delete-sink-config.cc
A be/src/exec/iceberg-delete-sink-config.h
M be/src/exec/iceberg-delete-sink.cc
M be/src/exec/iceberg-delete-sink.h
M be/src/exec/table-sink-base.cc
M be/src/exec/table-sink-base.h
M be/src/exprs/slot-ref.h
M be/src/runtime/dml-exec-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
A fe/src/main/java/org/apache/impala/planner/IcebergBufferedDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-update.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-basic.test
29 files changed, 1,168 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/20760/3
--
To view, visit http://gerrit.cloudera.org:8080/20760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2bb97b4454165a292975d88dc9c23adb22ff7315
Gerrit-Change-Number: 20760
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 14:51:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14613/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 14:43:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: WIP IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14612/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Dec 2023 14:26:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 13:

(1 comment)

> Uploaded patch set 13.

http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py
File tests/metadata/test_column_unicode.py:

http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py@35
PS12, Line 35: 
cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
> This test still runs "for different file formats" which is completely redun
Done



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 14:16:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..

IMPALA-12465: Unicode column name support

Impala depends on Hive functions for column name validation and uses
validateName() function for the same. Since Hive already supports
unicode column names, the patch just updates the column name validation
function to validateColumnName(). validateName() checks for a certain
conformance based on pattern matching standards while
validateColumnName() places no restrictions on column names at the
Metadata level.

Testing: The support is tested and cross-checked with Hive. The tests
can be found in unicode-column-name.test.

Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
---
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
A tests/metadata/test_column_unicode.py
M tests/shell/test_shell_interactive.py
6 files changed, 374 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/13
--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] WIP IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-07 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20491

to look at the new patch set (#3).

Change subject: WIP IMPALA-12443: Add catalog timeline for all ddl profiles
..

WIP IMPALA-12443: Add catalog timeline for all ddl profiles

This patch adds catalog timeline for all DDL/DML profiles, including
REFRESH and INSERT.

The goal is to add timeline markers after each step that could be
blocked, e.g. acquiring locks, external RPCs.

Removed some unused overloads of HdfsTable#load() and
HdfsTable#reloadPartitionsFromNames().

Tried to add some constant strings for widely used events, e.g. "Fetched
table from Metastore". Didn't do so for events that only occurs once.

Most of the catalog methods have a new argument for tracking the
execution timeline. To avoid adding null checks everywhere, for code
paths that don't have a catalog profile, creates an unused
catalogTimeline as the argument.

Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogdTableInvalidator.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/MaterializedViewHdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/EventSequence.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
28 files changed, 783 insertions(+), 544 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/20491/3
--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-12443: Add catalog timeline for all ddl profiles

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20491 )

Change subject: WIP IMPALA-12443: Add catalog timeline for all ddl profiles
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20491/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20491/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1200
PS3, Line 1200:   fileFormatParams.getFile_format(), 
numUpdatedPartitions, catalogTimeline);
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbceefaeb24c66eb1a064c449d6f56077ea347c5
Gerrit-Change-Number: 20491
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Dec 2023 14:00:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14611/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 07 Dec 2023 13:52:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py
File tests/metadata/test_column_unicode.py:

http://gerrit.cloudera.org:8080/#/c/20506/12/tests/metadata/test_column_unicode.py@35
PS12, Line 35: 
cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
This test still runs "for different file formats" which is completely redundant 
because the test file already contains different file formats, so the exact 
same test will run multiple times.



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 13:46:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@17
PS4, Line 17:  after the compila
> Could you clarify the relationship of the module and exec engine here? I.e.
Yes. Added some comments.



--
To view, visit http://gerrit.cloudera.org:8080/20733
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3c1b46bb9018ed0320817141785a3bdc41fa677
Gerrit-Change-Number: 20733
Gerrit-PatchSet: 5
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 07 Dec 2023 13:38:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables

2023-12-07 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20677 )

Change subject: IMPALA-12313: (part 2) Limited UPDATE support for Iceberg tables
..


Patch Set 9:

(4 comments)

I went through this patch mostly for my own education. Left some minor 
questions or comments, nothing serious. Great work Zoli!

http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
File fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java:

http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java@54
PS9, Line 54:   abstract void substituteResultExprs(ExprSubstitutionMap smap, 
Analyzer analyzer);
nit: could you add a comment for this function


http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/20677/9/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@1214
PS9, Line 1214:   private static Expr 
getIcebergPartitionTransformExpr(IcebergPartitionField partField,
Peter's DROP PARTITION patch added some new functionality for converting 
strings into partition values (?) if I'm not mistaken. Just mentioning it here 
to see if any of those could be re-used.


http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
File 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test:

http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test@753
PS9, Line 753: AuthorizationException: User '$USER' does not have privileges to 
access: functional_parquet.iceberg_v2_delete_positional
Isn't the error msg misleading. It says the user doesn't have permissions on 
the table, but in fact the issue is that there is column masking on the table.

This use case anyway made me think: wouldn't it make sense to allow the user to 
update these tables if they don't change the masked columns? I can understand 
that from implementation pow this could be difficult.


http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test:

http://gerrit.cloudera.org:8080/#/c/20677/9/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@647
PS9, Line 647: masked tables
nit: I guess these are not masked tables, but tables with row filtering.



--
To view, visit http://gerrit.cloudera.org:8080/20677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff0ef6075a2b6ebe130d15daa389ac1a505a7a08
Gerrit-Change-Number: 20677
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 07 Dec 2023 13:36:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/3//COMMIT_MSG@28
PS3, Line 28: compared to the previous use of ExecutionEngine. Post-change,
> Doesn't the benchmarking framework print a grand total? If not, we can let
It seems our benchmarking framework doesn't print the total. Will double check 
and file a jira if we don't have it.


http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20733/4//COMMIT_MSG@18
PS4, Line 18:  the executi
> Why are they pre-compiled? Aren't they compiled already? Or you mean they a
I think the "pre-compiled" means it is compiled in the last round and ready for 
use. Yeah, maybe call it "compiled" is better, because in this case it means 
the first round to write the compiled stuff into the cache.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.h@164
PS4, Line 164: cached_engine
> Thinking about this, wouldn't 'cached_engine...' express the meaning better
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-cache.cc@155
PS4, Line 155: y
> I think removing the negation and inverting the branches is better, one les
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc
File be/src/codegen/llvm-codegen-object-cache.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@39
PS4, Line 39: uffer().size();
> Is a copy necessary here? Can't we move the buffer somehow?
CodeGenObjectCache doesn't have the ownership of ObjBuffer, the llvm engine may 
use the ObjBuffer later, so it is safer to make a copy. (And MemoryBufferRef 
may not allow to move the real Buffer, the example I can see is to do a copy)


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@42
PS4, Line 42: // can conserve memory.
> Shouldn't we add a DCHECK here? If this happens it is a bug.
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@57
PS4, Line 57:   // ID. If the ID doesn't match the one in the object cache, a 
warning is logged for
> Will it always copy the buffer? Is there a way to access it without copying
>From the definition of the interface getObject(), "Returns a pointer to a 
>newly allocated MemoryBuffer that contains the object which corresponds with 
>Module M", it would be safer to return the copy of the buffer. 
>https://llvm.org/doxygen/classllvm_1_1ObjectCache.html#details


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@60
PS4, Line 60:   std::lock_guard l(mutex_);
> Shouldn't we add a DCHECK here? If this happens it is a bug.
Done


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen-object-cache.cc@63
PS4, Line 63:   << cached_obj_->getMemBufferRef().getBufferSize();
> Can this function be called when 'key_' is empty? Isn't that a bug?
If there is no cache yet, would return nullptr, for example, the llvm engine 
would look up the cache to see if it contains the pre-compiled module, if it 
doesn't contain any, the llvm engine would proceed to compile the module and 
then write compiled module to the cache. This happens in cache look up and 
doesn't hit any.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@330
PS4, Line 330:
> See comment on llvm-codegen-cache.h:164.
Changed the one in llvm-codegen-cache.h, but would prefer to keep this, because 
this one is the cache for llvm engine to write the compiled functions to, it 
doesn't have to be anything "cached". Added some comments.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@925
PS4, Line 925:
> Do I understand it correctly that this is for writing to the cache (storing
Yes, we can say that the engine_cache_ is for writing, and engine_cache_cached_ 
is for reading. Added some comments.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.h@929
PS4, Line 929:
> We could extend the comment with what is written on llvm-codegen.cc:1499, i
Added some comments.


http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20733/4/be/src/codegen/llvm-codegen.cc@1192
PS4, 

[Impala-ASF-CR] IMPALA-11805: Use llvm ObjectCache for codegen caching

2023-12-07 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/20733 )

Change subject: IMPALA-11805: Use llvm ObjectCache for codegen caching
..

IMPALA-11805: Use llvm ObjectCache for codegen caching

Currently, we employ llvm::ExecutionEngine for codegen caching,
providing access to compiled functions within the cached engine.
However, the real challenge is the ExecutionEngine uses a lot of
memory which largely exceeds our memory estimates and it is very
hard to predict.

This patch addresses this issue by using llvm::ObjectCache for
codegen caching. In our case, each execution engine would have
only one module, after the compilation of the module, the compiled
codegened functions of the module would be set to the execution
engine, therefore funtions could be used by Impala. During function
compilation within the module, if an ObjectCache is set to the
execution engine, the compiled codegened functions would be also
written into the cache. This way, if we keep the cache, when
revisiting the same module (fragment), we can efficiently reuse
the specific ObjectCache, loading pre-compiled codegened functions
and saving time.

The tpch performance test indicates no significant regression
compared to the previous use of ExecutionEngine. Post-change,
the actual memory usage of each codegen caching entry is notably
reduced.

+--+--+---++-++---++---++-+---+
| Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval  |
+--+--+---++-++---++---++-+---+
| TPCH(1)  | TPCH-Q15 | parquet / none / none | 0.48   | 0.46|   +5.16% 
  |   4.31%   |   4.78%| 8 |   +3.51%   | 2.11| 2.22  |
| TPCH(1)  | TPCH-Q17 | parquet / none / none | 1.28   | 1.25|   +1.76% 
  |   1.83%   |   1.88%| 8 |   +3.53%   | 1.34| 1.88  |
| TPCH(1)  | TPCH-Q8  | parquet / none / none | 0.77   | 0.75|   +3.11% 
  |   3.88%   |   2.75%| 8 |   +0.79%   | 0.83| 1.81  |
| TPCH(1)  | TPCH-Q7  | parquet / none / none | 0.72   | 0.71|   +1.89% 
  |   3.48%   |   3.15%| 8 |   +1.61%   | 0.83| 1.13  |
| TPCH(1)  | TPCH-Q22 | parquet / none / none | 0.43   | 0.41|   +3.40% 
  |   1.21%   |   6.11%| 8 |   +0.05%   | 0.45| 1.54  |
| TPCH(1)  | TPCH-Q10 | parquet / none / none | 0.74   | 0.73|   +2.04% 
  |   0.75%   |   2.57%| 8 |   +1.13%   | 2.11| 2.16  |
| TPCH(1)  | TPCH-Q20 | parquet / none / none | 0.55   | 0.54|   +1.42% 
  |   2.95%   |   1.14%| 8 |   +0.19%   | 0.57| 1.25  |
| TPCH(1)  | TPCH-Q11 | parquet / none / none | 0.25   | 0.25|   +0.37% 
  |   2.41%   |   3.09%| 8 |   +0.82%   | 0.45| 0.27  |
| TPCH(1)  | TPCH-Q19 | parquet / none / none | 0.62   | 0.61|   +0.92% 
  |   3.17%   |   1.53%| 8 |   +0.05%   | 0.19| 0.73  |
| TPCH(1)  | TPCH-Q4  | parquet / none / none | 0.42   | 0.42|   +0.37% 
  |   1.24%   |   1.08%| 8 |   +0.03%   | 0.19| 0.63  |
| TPCH(1)  | TPCH-Q5  | parquet / none / none | 0.72   | 0.72|   +0.19% 
  |   3.08%   |   3.23%| 8 |   +0.07%   | 0.57| 0.12  |
| TPCH(1)  | TPCH-Q18 | parquet / none / none | 1.20   | 1.19|   +0.04% 
  |   0.12%   |   0.15%| 8 |   +0.03%   | 0.57| 0.53  |
| TPCH(1)  | TPCH-Q9  | parquet / none / none | 1.13   | 1.13|   +0.02% 
  |   2.05%   |   2.04%| 8 |   -0.01%   | -0.57   | 0.02  |
| TPCH(1)  | TPCH-Q3  | parquet / none / none | 0.60   | 0.60|   +0.03% 
  |   3.71%   |   3.99%| 8 |   -0.04%   | -0.06   | 0.01  |
| TPCH(1)  | TPCH-Q2  | parquet / none / none | 0.44   | 0.44|   -0.70% 
  |   8.56%   |   7.42%| 8 |   +0.61%   | 0.45| -0.18 |
| TPCH(1)  | TPCH-Q6  | parquet / none / none | 0.31   | 0.31|   -0.07% 
  |   1.73%   |   1.56%| 8 |   -0.06%   | -1.34   | -0.09 |
| TPCH(1)  | TPCH-Q14 | parquet / none / none | 0.48   | 0.48|   -0.25% 
  |   1.20%   |   1.47%| 8 |   -0.09%   | -0.57   | -0.37 |
| TPCH(1)  | TPCH-Q12 | parquet / none / none | 0.54   | 0.54|   -0.86% 
  |   1.90%   |   4.48%| 8 |   +0.11%   | 0.32| -0.50 |
| TPCH(1)  | TPCH-Q21 | parquet / none / none | 1.35   | 1.37|   -1.56% 
  |   3.11%   |   2.55%| 8 |   -0.19%   | -0.70   | -1.11 |
| TPCH(1)  | TPCH-Q16 | parquet / none / none | 0.26   | 0.27|   -1.74% 
  |   9.53%   | * 13.61% * | 8 

[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..


Patch Set 2:

(7 comments)

Left some minor comments, but the code looks good to me.

http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@126
PS2, Line 126: Recusrive
Recursive


http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@130
PS2, Line 130: inot
into


http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@87
PS2, Line 87:   // Create field accessors
:   for (SlotDescriptor* slot_desc: tuple_desc_->slots()) {
: if (slot_desc->type().IsStructType()) {
:   // Recursively get the field ids for struct fields
:   RETURN_IF_ERROR(CollectStructFieldAccessors(env, 
slot_desc));
: } else if (slot_desc->col_path().size() > 1) {
:   // Slot that is child of a struct without tuple, can occur 
when a struct member is
:   // in the select list
:   int root_type_index = slot_desc->col_path()[0];
:   ColumnType current_type =
:   
tuple_desc_->table_desc()->col_descs()[root_type_index].type();
:   for (int i = 1; i < slot_desc->col_path().size() - 1; ++i) {
: current_type = 
current_type.children[slot_desc->col_path()[i]];
:   }
:   int field_id = 
current_type.field_ids[slot_desc->col_path().back()];
:   RETURN_IF_ERROR(AddAccessorForFieldId(env, field_id, 
slot_desc->id()));
: } else {
:   // For primitive types outside STRUCT use the 
ColumnDescriptors
:   int field_id = 
tuple_desc_->table_desc()->GetColumnDesc(slot_desc).field_id();
:   RETURN_IF_ERROR(AddAccessorForFieldId(env, field_id, 
slot_desc->id()));
: }
:   }
nit: this could be moved to its own method.


http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/20759/2/be/src/exec/iceberg-metadata/iceberg-row-reader.h@44
PS2, Line 44: Materlilize
Materialize


http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@230
PS2, Line 230: rootTable instanceof IcebergMetadataTable) return;
 : if (!(rootTable instanceof FeFsTable
nit: we could have a helper method for readability like 
'TableTypeSupportsStruct(FeTable tbl)'


http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java:

http://gerrit.cloudera.org:8080/#/c/20759/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@73
PS2, Line 73: "Adding column: \"" + col.getName() + "\" with type: \"" + 
col.getType() +
:   "\" to metadata table."
LOG.trace() supports formatting with placeholders {}.


http://gerrit.cloudera.org:8080/#/c/20759/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/20759/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@496
PS2, Line 496: 
Can we also add a SELECT * query with EXPAND_COMPLEX_TYPES=true?



--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 07 Dec 2023 12:27:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14610/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 11:14:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12597: Basic Equality delete read support for Iceberg tables

2023-12-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20753 )

Change subject: IMPALA-12597: Basic Equality delete read support for Iceberg 
tables
..


Patch Set 1:

(17 comments)

Left a few comments, but the change looks great!

http://gerrit.cloudera.org:8080/#/c/20753/1/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/20753/1/be/src/exec/partitioned-hash-join-builder.h@87
PS1, Line 87: treat_nulls_equal_
IS NOT DISTINCT FROM is a well-known SQL term, I think it would be better to 
keep that, but also add the additional comments about NULL-handling.


http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/CatalogObjects.thrift@625
PS1, Line 625: equality_ids
This might have a better place in TIcebergTable. Though I see this is probably 
temporary, and later we might have the equality_ids in THdfsFileDesc.


http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/20753/1/common/thrift/PlanNodes.thrift@404
PS1, Line 404: treat_nulls_equal
Again, I think we should keep the SQL terminology here, but keep the additional 
comment about NULLs.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@58
PS1, Line 58: except it returns True if the rhs is NULL
Can we update this comment: except it returns True of both sides are NULLs.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@106
PS1, Line 106: TODO
Could you please add IMPALA-12598?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@108
PS1, Line 108: column
nit: columns


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@160
PS1, Line 160: || isDeleteRowsJoin_
This wouldn't be needed if we passed Operator.NOT_DISTINCT in the equality 
predicates.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@197
PS1, Line 197: positionDeleteFiles_.isEmpty() && equalityDeleteFiles_.isEmpty()
nit: for readability, it might be worth to extract this condition to a 
'noDeleteFiles()' method


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@258
PS1, Line 258: addVirtualDataSeqNumSlot(tblRef_);
nit: this is always needed for equality deletes, so this method call could be 
moved to addEqualityColumnSlots().


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@323
PS1, Line 323: tblRef.getDesc().getSlots().stream()
 : .filter(s -> s.getVirtualColumnType() ==
 : TVirtualColumnType.ICEBERG_DATA_SEQUENCE_NUMBER)
 : .findFirst()
Maybe SingleNodePlanner.addSlotRefToDesc() could return he slot desc.


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@405
PS1, Line 405: dataSlotDesc.getColumn() instanceof IcebergColumn
This must be always true for non-virtual columns, right?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@424
PS1, Line 424: data file
table?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@429
PS1, Line 429: Operator.EQ
Could we have Operator.NOT_DISTINCT here?


http://gerrit.cloudera.org:8080/#/c/20753/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@464
PS1, Line 464: if (getIceTable().getIcebergApiTable().schemas().size() > 1) 
{
 :   throw new ImpalaRuntimeException("Equality delete files 
are not supported for " +
 :   "tables with schema evolution");
 : }
Why do we have this restriction? We 

[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..


Patch Set 12:

(5 comments)

> Uploaded patch set 12.

http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:

http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@3
PS10, Line 3: testtbl1(
> Actually you don't need it in SHOW TABLES either, you can just write
I've removed it from most formats except orc and avro, it seems we have a bug 
where we use HIVE_QUERY which we can file as a bug in another jira.


http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@8
PS10, Line 8: # Make sure creating a table with the same name doesn't throw an 
error when
> Ok, we can leave this test in.
Done


http://gerrit.cloudera.org:8080/#/c/20506/10/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@45
PS10, Line 45:  QUERY
> Ok.
Ack


http://gerrit.cloudera.org:8080/#/c/20506/5/tests/metadata/column-unicode.py
File tests/metadata/column-unicode.py:

http://gerrit.cloudera.org:8080/#/c/20506/5/tests/metadata/column-unicode.py@12
PS5, Line 12:
> That being said, it's not required.
Ack


http://gerrit.cloudera.org:8080/#/c/20506/10/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/20506/10/tests/shell/test_shell_interactive.py@408
PS10, Line 408: test_tbl
> Ok, you can leave it like that. But in that case extracting the table name
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 10:50:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12465: Unicode column name support

2023-12-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/20506 )

Change subject: IMPALA-12465: Unicode column name support
..

IMPALA-12465: Unicode column name support

Impala depends on Hive functions for column name validation and uses
validateName() function for the same. Since Hive already supports
unicode column names, the patch just updates the column name validation
function to validateColumnName(). validateName() checks for a certain
conformance based on pattern matching standards while
validateColumnName() places no restrictions on column names at the
Metadata level.

Testing: The support is tested and cross-checked with Hive. The tests
can be found in unicode-column-name.test.

Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
---
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
A testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
A tests/metadata/test_column_unicode.py
M tests/shell/test_shell_interactive.py
6 files changed, 369 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/20506/12
--
To view, visit http://gerrit.cloudera.org:8080/20506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1ad9d63ac1b9631a0f4a433798bd5109aa2ed718
Gerrit-Change-Number: 20506
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/14609/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 07 Dec 2023 10:36:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-07 Thread Tamas Mate (Code Review)
Tamas Mate has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..

IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

As the slots have already been created on the frontend this change
focuses on populating them on the backend side. There are two major
parts of this commit. Obtaining the right Accessors for the slot and
recursively filling the tuples with data.

The field ids are present in the struct slot's ColumnType field as a
list of integers. This list can be indexed with the correct element of
the SchemaPath to obtain the field id for a struct member and with that
the Accessor.

Once the Accessors are available the IcebergRowReader's MaterializeTuple
method can be called recursively to write the primitive slots of a
struct slot.

Testing:
 - Added E2E tests

Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
M be/src/exec/iceberg-metadata/iceberg-row-reader.cc
M be/src/exec/iceberg-metadata/iceberg-row-reader.h
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
8 files changed, 164 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/20759/2
--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12205: Add support to STRUCT type Iceberg Metadata table columns

2023-12-07 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20759 )

Change subject: IMPALA-12205: Add support to STRUCT type Iceberg Metadata table 
columns
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.h@45
PS1, Line 45:   Status MaterializeTuple(JNIEnv* env, jobject struct_like_row, 
const TupleDescriptor* tuple_desc,
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@79
PS1, Line 79:   accessed_value = env->CallObjectMethod(accessor, 
iceberg_accessor_get_, struct_like_row);
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@104
PS1, Line 104: RETURN_IF_ERROR(WriteStructSlot(env, struct_like_row, 
slot_desc, tuple, tuple_data_pool));
> line too long (98 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20759/1/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@172
PS1, Line 172: Status IcebergRowReader::WriteStructSlot(JNIEnv* env, jobject 
struct_like_row, SlotDescriptor* slot_desc, Tuple* tuple,
> line too long (119 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/20759
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I953ad7253b270f2855bfcaee4ad023d1c4469273
Gerrit-Change-Number: 20759
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 07 Dec 2023 10:07:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11501: Add flag to allow catalog-cache operations on masked tables

2023-12-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Dec 2023 09:30:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10949: Improve batching logic of partition events

2023-12-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20485 )

Change subject: IMPALA-10949: Improve batching logic of partition events
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@741
PS2, Line 741: Newer
the name doesn't match the semantics, the function returns true if the event's 
event id is NOT newer than lastSyncedEventId


http://gerrit.cloudera.org:8080/#/c/20485/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@747
PS2, Line 747: catch
Exception handling could be skipped by using getTableNoThrow instead of 
getTable. It returns null if the db or table is not found.



--
To view, visit http://gerrit.cloudera.org:8080/20485
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e79510739347cbe669719a9e4cb9cabd5daa7d3
Gerrit-Change-Number: 20485
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 07 Dec 2023 09:06:55 +
Gerrit-HasComments: Yes