[Impala-ASF-CR] IMPALA-12362: (part-1/4) Refactor service management scripts.

2024-04-05 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20921 )

Change subject: IMPALA-12362: (part-1/4) Refactor service management scripts.
..


Patch Set 15: Code-Review+2

Carry Zihao's +1. Thanks for improving the packaging support!


--
To view, visit http://gerrit.cloudera.org:8080/20921
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147
Gerrit-Change-Number: 20921
Gerrit-PatchSet: 15
Gerrit-Owner: Xiang Yang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: ttz <2433038...@qq.com>
Gerrit-Comment-Date: Sat, 06 Apr 2024 06:14:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12362: (part-1/4) Refactor service management scripts.

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20921 )

Change subject: IMPALA-12362: (part-1/4) Refactor service management scripts.
..


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10498/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20921
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147
Gerrit-Change-Number: 20921
Gerrit-PatchSet: 16
Gerrit-Owner: Xiang Yang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: ttz <2433038...@qq.com>
Gerrit-Comment-Date: Sat, 06 Apr 2024 06:14:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12362: (part-1/4) Refactor service management scripts.

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20921 )

Change subject: IMPALA-12362: (part-1/4) Refactor service management scripts.
..


Patch Set 16: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20921
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f4dcad9cfa12d351d562e7ef8c0a8957d3ca147
Gerrit-Change-Number: 20921
Gerrit-PatchSet: 16
Gerrit-Owner: Xiang Yang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Reviewer: ttz <2433038...@qq.com>
Gerrit-Comment-Date: Sat, 06 Apr 2024 06:14:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21249 )

Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 06 Apr 2024 01:14:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12921: [WIP] Support locally built Ranger

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21160 )

Change subject: IMPALA-12921: [WIP] Support locally built Ranger
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15788/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Gerrit-Change-Number: 21160
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 06 Apr 2024 00:46:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12921: [WIP] Support locally built Ranger

2024-04-05 Thread Fang-Yu Rao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21160

to look at the new patch set (#3).

Change subject: IMPALA-12921: [WIP] Support locally built Ranger
..

IMPALA-12921: [WIP] Support locally built Ranger

This patch adds the support for locally built Ranger in Impala's
minicluster. The instructions on how to build Ranger locally and point
Impala to the locally built Ranger server are provided as below.

Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We
execute the following to build Ranger. By default, the compressed tar
files are produced under $RANGER_SRC_DIR/target.

mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \
package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true

We also need to build Impala's Java code so that Impala's Java code
could consume the locally produced Ranger classes.

It then suffices to execute the following to point
Impala to the locally built Ranger server.

1. export RANGER_VERSION_OVERRIDE=\
   $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \
   -Dexpression=project.version -DforceStdout)

2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\
   ranger-${RANGER_VERSION_OVERRIDE}-admin

3. source $IMPALA_HOME/bin/impala-config.sh

4. tar zxv -f $RANGER_SRC_DIR/target/\
   ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \
   -C $RANGER_SRC_DIR/target/

5. $IMPALA_HOME/bin/create-test-configuration.sh

6. $IMPALA_HOME/bin/create-test-configuration.sh \
   -create_ranger_policy_db

7. $IMPALA_HOME/testdata/bin/run-ranger-server.sh

8. $IMPALA_HOME/testdata/bin/setup-ranger.sh

To-do:
 - Wait for RANGER-4770 to be resolved.
 - Wait for RANGER-4771 to be resolved.

Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
---
M bin/impala-config.sh
M testdata/bin/setup-ranger.sh
R testdata/cluster/ranger/setup/all_database_policy_revised.json.template
M testdata/cluster/ranger/setup/impala_user_non_owner.json.template
M testdata/cluster/ranger/setup/impala_user_owner.json.template
5 files changed, 25 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/21160/3
--
To view, visit http://gerrit.cloudera.org:8080/21160
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Gerrit-Change-Number: 21160
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12291 impala checks hdfs ranger policy

2024-04-05 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20221 )

Change subject: IMPALA-12291 impala checks hdfs ranger policy
..


Patch Set 9:

> Patch Set 9:
>
> Hi all, based on our discussion so far, there are 2 approaches discussed. Any 
> any better alternative is also appreciated!
> - Set the access level of a loaded table to both READ and WRITE as long as 
> Ranger is used as the authorization provider.
> - Introduce a startup flag to allow the administrator to decide whether to 
> skip the file system permissions check during table loading.
>
> There are some implementation details to consider for each approach.
> - How to determine whether Ranger is enabled (corresponding to the 1st 
> approach)? It seems checking the value of the key 
> 'dfs.namenode.inode.attributes.provider.class' in core-site.xml via the 
> instance of Configuration as done in the patch set 9 could not be verified 
> easily via a new test due to HDFS Ranger plug-in not being configured 
> correctly. To be more specific, if we try to add the following configuration 
> via  
> https://github.com/apache/impala/blob/master/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py,
>  the name node of HDFS could not be started. More plumbing has to be done to 
> set up the plug-in. On the other hand, using the instance of BackendConfig 
> allows us to add an end-to-end test to briefly verify Impala's behavior after 
> the patch more easily.
>
> 
>  dfs.namenode.inode.attributes.provider.class
>  org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer
> 
>
> - What should be the default behavior with respect to setting the access 
> level of a loaded table after this patch (corresponding to the 2nd approach)? 
> It looks like making Impala assume the READ and WRITE access by default may 
> be better especially in the legacy catalog mode. This relieves the Impala 
> administrator of having to manually tweak the HDFS access control entries of 
> the target HDFS paths that are not writable to the Impala service every time 
> when an end user encounters such a problem.
>
> I have also collected the related tests that need to be revised if we decide 
> to adopt the 2nd approach and make Impala assume the READ and WRITE access by 
> default.
> - 
> https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_permissions.py#L56
>  (TestHdfsPermissions.test_insert_into_read_only_table()).
> - 
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L563
>  (TestInsertBehaviour.test_multiple_group_acls).
> - 
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L331
>  (TestInsertBehaviour.test_readonly_table_dir).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L362
>  (TestInsertBehaviour.test_insert_acl_permissions).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L439
>  (TestInsertBehaviour.test_load_permissions).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L252
>  (TestInsertBehaviour.test_mixed_partition_permissions).
> -  
> https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L202
>  (TestInsertBehaviour.test_insert_file_permissions).
> -  
> https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4008
>  (AnalyzeStmtsTest.TestLoadData).
> -  
> https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4042-L4047
>  (AnalyzeStmtsTest.TestInsert).
> - 
> https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java#L110-L119
>  (CatalogObjectToFromThriftTest.TestPartitionedTable).

The first option sounds good to me:
- Set the access level of a loaded table to both READ and WRITE as long as 
Ranger is used as the authorization provider.

The second option of introducing a startup flag would create confusion because 
it would only apply to the legacy catalog mode, not the local catalog mode. 
Further, for the cloud object stores, we are in any case doing the first 
approach.

The downside of the first approach is that it will defer the full access level 
checks to execution time and suppose some partitions allow the write operation 
while other partitions don't, we can end up with partial writes into the table. 
 However, this issue already exists with the local catalog mode (which is the 
default mode).


--
To view, visit http://gerrit.cloudera.org:8080/20221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Gerrit-Change-Number: 20221
Gerrit-PatchSet: 

[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21029 )

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15787/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 05 Apr 2024 23:17:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12872: Use Calcite for optimization - part 1: simple queries

2024-04-05 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21109 )

Change subject: IMPALA-12872: Use Calcite for optimization - part 1: simple 
queries
..


Patch Set 17:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/rel/util/CreateExprVisitor.java@1
PS17, Line 1: /*
In some files, the License header is enclosed in the multi statement syntax /*. 
 .. */ whereas in others it is the single line comment.
Pls make it consistent in all the files and use whatever is being used in 
existing Impala files.
Impala uses the single line comment style.  '// '


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@30
PS17, Line 30:  queryCtx;
Pls add underscore


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/ImpalaCalciteCatalogReader.java@31
PS17, Line 31: stmtTableCache
Add underscore


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@94
PS16, Line 94: :
nit: add space or newline after :  so that the query string is separated out.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@148
PS16, Line 148:   if (e == null) {
> I wanted the ability to print the stack trace hree.  But I also throw an In
So this should be e != null then.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@166
PS16, Line 166:   if (line.trim().startsWith("--") || 
line.trim().equals("")) {
Comments can be multi-statement also e.g.
 /*  ignore
   this
*/
SELECT a, b, c.
There could be other patterns not handled here.  Can we not use the parser 
methods directly ? If this is supposed to be a temporary function or if this is 
expected to go through revision later, pls add the relevant comments.


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@209
PS16, Line 209: LOG.info("Using Impala planner");
Can you make this consistent with the corresponding message for Calcite planner 
on line 94.  Something like:
LOG.info("Using Impala Planner for the following query: " + queryCtx.getStmt());


http://gerrit.cloudera.org:8080/#/c/21109/16/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@245
PS16, Line 245: Frontend Timeline (Calcite Planner)")
This QueryContext is used in both cases right ?  i.e if Calcite planner fails, 
fall back to Impala planner.  So this message should be modified if the fall 
back occurs.


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@116
PS17, Line 116: logical plan
nit: to be precise, can we say 'initial logical plan'


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@121
PS17, Line 121: optimized plan
nit: to be precise, can we say 'optimized logical plan' .


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteJniFrontend.java@234
PS17, Line 234:   // hack to match the FrontEnd code
Can you add some more context to this .. which part of the FE code is this 
referring to ?


http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java
File 
java/calcite-planner/src/main/java/org/apache/impala/calcite/service/CalciteQueryParser.java:

http://gerrit.cloudera.org:8080/#/c/21109/17/java/calcite-planner/src/main/java/org/apache/impala

[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21029 )

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..


Patch Set 20:

(1 comment)

ps20 is a bigger refactoring where InProgressTableModification is being used in 
more places. I will rerun exhaustive tests overnight.

http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2671
PS19, Line 2671: y {
> nit: It might be worth to change this and 2 other overload of this method i
Moved it inside InProgressTableModification class.



--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 05 Apr 2024 22:55:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Jason Fehr, Sai Hemanth Gantasala, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21029

to look at the new patch set (#20).

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..

IMPALA-12543: Detect self-events before finishing DDL

test_iceberg_self_events has been flaky for not having
tbls_refreshed_before equal to tbls_refreshed_after in-between query
executions.

Further investigation reveals concurrency bug due to db/table level
lock is not taken during db/table self-events check (IMPALA-12461
part1). The order of ALTER TABLE operation is as follow:

1. alter table starts in CatalogOpExecutor
2. table level lock is taken
3. HMS RPC starts (CatalogOpExecutor.applyAlterTable())
4. HMS generates the event
5. HMS RPC returns
6. table is reloaded
7. catalog version is added to inflight event list
8. table level lock is releases

Meanwhile the event processor thread fetches the new event after 4 and
before 7. Because of IMPALA-12461 (part 1), it can also finish
self-events checking before reaching 7. Before IMPALA-12461, self-events
would have needed to wait for 8. Note that this issue is only relevant
for table level events, as self-events checking for partition level
events still takes table lock.

This patch fix the issue by adding newCatalogVersion to the table's
inflight event list before updating HMS using helper class
InProgressTableModification. If HMS update does not complete (ie., an
exception is thrown), the new newCatalogVersion that was added is then
removed.

This patch also fix few smaller issues, including:
- Avoid incrementing EVENTS_SKIPPED_METRIC if numFilteredEvents == 0 in
  MetastoreEventFactory.getFilteredEvents().
- Increment EVENTS_SKIPPED_METRIC in
  MetastoreTableEvent.reloadTableFromCatalog() if table is already in
  the middle of reloading (revealed through flaky
  test_skipping_older_events).
- Rephrase misleading log message in
  MetastoreEventProcessor.getNextMetastoreEvents().

Testing:
- Add TestEventProcessingWithImpala, run it with debug_action
  and sync_ddl dimensions.
- Pass exhaustive tests.

Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/custom_cluster/test_events_custom_configs.py
9 files changed, 886 insertions(+), 591 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21029/20
--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-05 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
..


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27
PS6, Line 27: class TupleFileReader;
> Here's what I think is happening:
Setting default destructor in tuple-cache-node.cc sounds like a good safety to 
add.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@33
PS6, Line 33: namespace impala {
> No, removed
Ack


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@180
PS6, Line 180: if (status.ok()) {
> I don't think we need to support Reset(), so I changed it's definition to t
Ack


http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-read-write-test.cc
File be/src/exec/tuple-file-read-write-test.cc:

http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-read-write-test.cc@182
PS8, Line 182:   // Read back the empy file and make sure the reader can handle 
it
typo: empy -> empty


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc
File be/src/exec/tuple-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@89
PS6, Line 89:
> Good point, we should check after each read() call. I switch to use Kudu's
Ack


http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-reader.cc
File be/src/exec/tuple-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-reader.cc@99
PS8, Line 99:   // Now, we know the total size of the variable-length data, and 
we can read
ReadV already supports populating multiple kudu::Slice, should we do this as an 
array of 3 slices rather than splitting them up after the read?


http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-writer.h
File be/src/exec/tuple-file-writer.h:

http://gerrit.cloudera.org:8080/#/c/21171/8/be/src/exec/tuple-file-writer.h@109
PS8, Line 109:   bool aborted_ = false;
Could these be a state enum instead?


http://gerrit.cloudera.org:8080/#/c/21171/8/tests/custom_cluster/test_tuple_cache.py
File tests/custom_cluster/test_tuple_cache.py:

http://gerrit.cloudera.org:8080/#/c/21171/8/tests/custom_cluster/test_tuple_cache.py@101
PS8, Line 101:   start_args=CACHE_START_ARGS + " 
--tuple_cache_capacity=64MB", cluster_size=1,
This arg doesn't exist anymore, right?



--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 05 Apr 2024 20:58:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21249 )

Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15786/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 05 Apr 2024 20:31:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-05 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21249


Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..

IMPALA-12894: Addendum: Re-enable test_plain_count_star_optimization

test_plain_count_star_optimization was disabled by IMPALA-12894
part 1, and part 2 didn't re-enable it. This patch re-enables it.

Change-Id: I30629632742c0d402a6bb852a169359edac59eba
---
M tests/query_test/test_iceberg.py
1 file changed, 0 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21249/1
--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12894: Addendum: Re-enable test plain count star optimization

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21249 )

Change subject: IMPALA-12894: Addendum: Re-enable 
test_plain_count_star_optimization
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10497/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30629632742c0d402a6bb852a169359edac59eba
Gerrit-Change-Number: 21249
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 05 Apr 2024 20:08:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12291 impala checks hdfs ranger policy

2024-04-05 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20221 )

Change subject: IMPALA-12291 impala checks hdfs ranger policy
..


Patch Set 9:

Hi all, based on our discussion so far, there are 2 approaches discussed. Any 
any better alternative is also appreciated!
- Set the access level of a loaded table to both READ and WRITE as long as 
Ranger is used as the authorization provider.
- Introduce a startup flag to allow the administrator to decide whether to skip 
the file system permissions check during table loading.

There are some implementation details to consider for each approach.
- How to determine whether Ranger is enabled (corresponding to the 1st 
approach)? It seems checking the value of the key 
'dfs.namenode.inode.attributes.provider.class' in core-site.xml via the 
instance of Configuration as done in the patch set 9 could not be verified 
easily via a new test due to HDFS Ranger plug-in not being configured 
correctly. To be more specific, if we try to add the following configuration 
via  
https://github.com/apache/impala/blob/master/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py,
 the name node of HDFS could not be started. More plumbing has to be done to 
set up the plug-in. On the other hand, using the instance of BackendConfig 
allows us to add an end-to-end test to briefly verify Impala's behavior after 
the patch more easily.


 dfs.namenode.inode.attributes.provider.class
 org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer


- What should be the default behavior with respect to setting the access level 
of a loaded table after this patch (corresponding to the 2nd approach)? It 
looks like making Impala assume the READ and WRITE access by default may be 
better especially in the legacy catalog mode. This relieves the Impala 
administrator of having to manually tweak the HDFS access control entries of 
the target HDFS paths that are not writable to the Impala service every time 
when an end user encounters such a problem.

I have also collected the related tests that need to be revised if we decide to 
adopt the 2nd approach and make Impala assume the READ and WRITE access by 
default.
- 
https://github.com/apache/impala/blob/master/tests/metadata/test_hdfs_permissions.py#L56
 (TestHdfsPermissions.test_insert_into_read_only_table()).
- 
https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L563
 (TestInsertBehaviour.test_multiple_group_acls).
- 
https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L331
 (TestInsertBehaviour.test_readonly_table_dir).
-  
https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L362
 (TestInsertBehaviour.test_insert_acl_permissions).
-  
https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L439
 (TestInsertBehaviour.test_load_permissions).
-  
https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L252
 (TestInsertBehaviour.test_mixed_partition_permissions).
-  
https://github.com/apache/impala/blob/master/tests/query_test/test_insert_behaviour.py#L202
 (TestInsertBehaviour.test_insert_file_permissions).
-  
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4008
 (AnalyzeStmtsTest.TestLoadData).
-  
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java#L4042-L4047
 (AnalyzeStmtsTest.TestInsert).
- 
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java#L110-L119
 (CatalogObjectToFromThriftTest.TestPartitionedTable).


--
To view, visit http://gerrit.cloudera.org:8080/20221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Gerrit-Change-Number: 20221
Gerrit-PatchSet: 9
Gerrit-Owner: Halim Kim 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Halim Kim 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 05 Apr 2024 18:30:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21029 )

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15785/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 05 Apr 2024 18:24:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21029 )

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..


Patch Set 19:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21029/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21029/18//COMMIT_MSG@51
PS18, Line 51: - Pass exhaustive tests.
> Rerun exhaustive tests overnight and it is now failing at some tests. I'll
Done


http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3321
PS18, Line 3321: ns.checkState(mod
> Should replace this with modification.newVersionNumber()
Done


http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21029/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2671
PS19, Line 2671: addCatalogServiceIdentifiers
nit: It might be worth to change this and 2 other overload of this method into 
static and provide way to access it via InProgressTableModification object.



--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 05 Apr 2024 18:05:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Jason Fehr, Sai Hemanth Gantasala, Csaba Ringhofer, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21029

to look at the new patch set (#19).

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..

IMPALA-12543: Detect self-events before finishing DDL

test_iceberg_self_events has been flaky for not having
tbls_refreshed_before equal to tbls_refreshed_after in-between query
executions.

Further investigation reveals concurrency bug due to db/table level
lock is not taken during db/table self-events check (IMPALA-12461
part1). The order of ALTER TABLE operation is as follow:

1. alter table starts in CatalogOpExecutor
2. table level lock is taken
3. HMS RPC starts (CatalogOpExecutor.applyAlterTable())
4. HMS generates the event
5. HMS RPC returns
6. table is reloaded
7. catalog version is added to inflight event list
8. table level lock is releases

Meanwhile the event processor thread fetches the new event after 4 and
before 7. Because of IMPALA-12461 (part 1), it can also finish
self-events checking before reaching 7. Before IMPALA-12461, self-events
would have needed to wait for 8. Note that this issue is only relevant
for table level events, as self-events checking for partition level
events still takes table lock.

This patch fix the issue by adding newCatalogVersion to the table's
inflight event list before updating HMS. If HMS update does not
complete (ie., an exception is thrown), the new newCatalogVersion that
was added is then removed.

This patch also fix few smaller issues, including:
- Avoid incrementing EVENTS_SKIPPED_METRIC if numFilteredEvents == 0 in
  MetastoreEventFactory.getFilteredEvents().
- Increment EVENTS_SKIPPED_METRIC in
  MetastoreTableEvent.reloadTableFromCatalog() if table is already in
  the middle of reloading (revealed through flaky
  test_skipping_older_events).
- Rephrase misleading log message in
  MetastoreEventProcessor.getNextMetastoreEvents().

Testing:
- Add TestEventProcessingWithImpala, run it with debug_action
  and sync_ddl dimensions.
- Pass exhaustive tests.

Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/custom_cluster/test_events_custom_configs.py
9 files changed, 811 insertions(+), 549 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/21029/19
--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12543: Detect self-events before finishing DDL

2024-04-05 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21029 )

Change subject: IMPALA-12543: Detect self-events before finishing DDL
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21029/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3321
PS18, Line 3321: newCatalogVersion
Should replace this with modification.newVersionNumber()



--
To view, visit http://gerrit.cloudera.org:8080/21029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79
Gerrit-Change-Number: 21029
Gerrit-PatchSet: 18
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 05 Apr 2024 17:06:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15784/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 05 Apr 2024 16:40:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-05 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21171/7/be/src/exec/tuple-file-writer.cc
File be/src/exec/tuple-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/21171/7/be/src/exec/tuple-file-writer.cc@130
PS7, Line 130: Substitute("Write of size $0 would cause $1 to exceed 
the maximum file size of $2",
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py
File tests/custom_cluster/test_tuple_cache.py:

http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py@60
PS7, Line 60: }
> flake8: E123 closing bracket does not match indentation of opening bracket'
Done


http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py@118
PS7, Line 118: @
> flake8: E303 too many blank lines (2)
Done


http://gerrit.cloudera.org:8080/#/c/21171/7/tests/custom_cluster/test_tuple_cache.py@173
PS7, Line 173: e
> flake8: F841 local variable 'e' is assigned to but never used
Done



--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 05 Apr 2024 16:15:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-05 Thread Joe McDonnell (Code Review)
Hello Kurt Deschler, Yida Wu, Alexey Serbin, Michael Smith, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21171

to look at the new patch set (#8).

Change subject: IMPALA-12905: Disk-based tuple caching
..

IMPALA-12905: Disk-based tuple caching

This implements on-disk caching for the tuple cache. The
TupleCacheNode uses the TupleFileWriter and TupleFileReader
to write and read back tuples from local files. The file format
uses RowBatch's standard serialization used for KRPC data streams.

The TupleCacheMgr is the daemon-level structure that coordinates
the state machine for cache entries, including eviction. When a
writer is adding an entry, it inserts an IN_PROGRESS entry before
starting to write data. This does not count towards cache capacity,
because the total size is not known yet. This IN_PROGRESS entry
prevents other writers from concurrently writing the same entry.
If the write is successful, the entry transitions to the COMPLETE
state and updates the total size of the entry. If the write is
unsuccessful and a new execution might succeed, then the entry is
removed. If the write is unsuccessful and won't succeed later
(e.g. if the total size of the entry exceeds the max size of an
entry), then it transitions to the TOMBSTONE state. TOMBSTONE
entries avoid the overhead of trying to write entries that are
too large.

Given these states, when a TupleCacheNode is doing its initial
Lookup() call, one of three things can happen:
 1. It can find a COMPLETE entry and read it.
 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it
cannot read or write the entry.
 3. It finds no entry and inserts its own IN_PROGRESS entry
to start a write.

The tuple cache is configured using the tuple_cache parameter,
which is a combination of the cache directory and the capacity
similar to the data_cache parameter. For example, /data/0:100GB
uses directory /data/0 for the cache with a total capacity of
100GB. This currently supports a single directory, but it can
be expanded to multiple directories later if needed. The cache
eviction policy can be specified via the tuple_cache_eviction_policy
parameter, which currently supports LRU or LIRS. The tuple_cache
parameter cannot be specified if allow_tuple_caching=false.

This contains contributions from Michael Smith, Yida Wu,
and Joe McDonnell.

Testing:
 - This adds basic custom cluster tests for the tuple cache.

Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
---
M be/src/exec/CMakeLists.txt
M be/src/exec/tuple-cache-node.cc
M be/src/exec/tuple-cache-node.h
A be/src/exec/tuple-file-read-write-test.cc
A be/src/exec/tuple-file-reader.cc
A be/src/exec/tuple-file-reader.h
A be/src/exec/tuple-file-writer.cc
A be/src/exec/tuple-file-writer.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
A be/src/runtime/tuple-cache-mgr-test.cc
A be/src/runtime/tuple-cache-mgr.cc
A be/src/runtime/tuple-cache-mgr.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
A tests/custom_cluster/test_tuple_cache.py
17 files changed, 2,225 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/21171/8
--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12925: Fix decimal data type for external JDBC table

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21218 )

Change subject: IMPALA-12925: Fix decimal data type for external JDBC table
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b
Gerrit-Change-Number: 21218
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Fri, 05 Apr 2024 09:16:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12925: Fix decimal data type for external JDBC table

2024-04-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21218 )

Change subject: IMPALA-12925: Fix decimal data type for external JDBC table
..

IMPALA-12925: Fix decimal data type for external JDBC table

Decimal type is a primitive data type for Impala. Current code returns
wrong values for columns with decimal data type in external JDBC tables.

This patch fixes wrong values returned from JDBC data source, and
supports pushing down decimal type of predicates to remote database
and remote Impala.
The decimal precision and scale of the columns in external JDBC table
must be no less than the decimal precision and scale of the
corresponding columns in the table of remote database. Otherwise,
Impala fails with an error since it may cause truncation of decimal
data.

Testing:
 - Added Planner test for pushing down decimal type of predicates.
 - Added end-to-end unit-tests for tables with decimal type of columns
   for Postgres, MySQL, and Impala-to-Impala.
 - Passed core-tests.

Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b
Reviewed-on: http://gerrit.cloudera.org:8080/21218
Reviewed-by: Michael Smith 
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 
---
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
M 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
M 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M testdata/bin/clean-mysql-env.sh
M testdata/bin/create-ext-data-source-table.sql
M testdata/bin/load-ext-data-sources.sh
M testdata/bin/setup-mysql-env.sh
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M 
testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables-predicates.test
M 
testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test
M testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
M 
testdata/workloads/functional-query/queries/QueryTest/mysql-ext-jdbc-tables.test
14 files changed, 576 insertions(+), 21 deletions(-)

Approvals:
  Michael Smith: Looks good to me, but someone else must approve
  Abhishek Rawat: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/21218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8c9d2e0667c42c0e52436b158e3dfe3ec14b9e3b
Gerrit-Change-Number: 21218
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh