[Impala-ASF-CR] IMPALA-11661: Added new api in MetastoreServiceHandler for find next compact2 method

2022-10-15 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19140 )

Change subject: IMPALA-11661: Added new api in MetastoreServiceHandler for 
find_next_compact2 method
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19140/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/19140/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2290
PS1, Line 2290:   
To follow the convention, we should use 4 space indent here.



--
To view, visit http://gerrit.cloudera.org:8080/19140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f1663c16d2649c9c455e6dffde02894819b2761
Gerrit-Change-Number: 19140
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Sat, 15 Oct 2022 18:27:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-10-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..

IMPALA-8592: Add support for insert events for 'LOAD DATA' statements
from Impala

In this patch, we use TUpdateCatalogRequest to refresh metadata after
'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the
code for 'INSERT' statements. It will fire an insert event just same
as what we did for 'INSERT' statements.

We also fix the inconsistent indentation in event_processor_utils.py.

Testing:
- Run existing test_load.py
- Added test_load_data_from_impala() in test_event_processing.py

Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/metadata/test_event_processing.py
M tests/util/event_processor_utils.py
7 files changed, 194 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/6
--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-10-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19052/4/tests/metadata/test_event_processing.py
File tests/metadata/test_event_processing.py:

http://gerrit.cloudera.org:8080/#/c/19052/4/tests/metadata/test_event_processing.py@408
PS4, Line 408: into table {1}.{2}".format(staging_dir, unique_database, 
tbl_nopart))
> I think we need to mark this test using @pytest.mark.execute_serially. Othe
Thanks for pointing out this mark!



--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 04 Oct 2022 00:31:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-10-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..

IMPALA-8592: Add support for insert events for 'LOAD DATA' statements
from Impala

In this patch, we use TUpdateCatalogRequest to refresh metadata after
'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the
code for 'INSERT' statements. It will fire an insert event just same
as what we did for 'INSERT' statements.

We also fix the inconsistent indentation in event_processor_utils.py.

Testing:
- Run existing test_load.py
- Added test_load_data_from_impala() in test_event_processing.py

Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/metadata/test_event_processing.py
M tests/util/event_processor_utils.py
7 files changed, 195 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/5
--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-10-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..

IMPALA-8592: Add support for insert events for 'LOAD DATA' statements
from Impala

In this patch, we use TUpdateCatalogRequest to refresh metadata after
'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the
code for 'INSERT' statements. It will fire an insert event just same
as what we did for 'INSERT' statements.

We also fix the inconsistent indentation in event_processor_utils.py.

Testing:
- Run existing test_load.py
- Added test_load_data_from_impala() in test_event_processing.py

Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/metadata/test_event_processing.py
M tests/util/event_processor_utils.py
7 files changed, 193 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/4
--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-10-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG@16
PS1, Line 16: - Run existing test_load.py
> I see. Can we use the hive_client to fetch and verify the INSERT events dir
Cool. Let me try that.


http://gerrit.cloudera.org:8080/#/c/19052/3/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/19052/3/be/src/service/client-request-state.cc@2047
PS3, Line 2047:
> nit: 2 spaces indent here
Ack



--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 03 Oct 2022 16:57:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-09-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..


Patch Set 3:

(3 comments)

> Patch Set 1:
>
> (3 comments)
>
> This is a pretty nice fix!

http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19052/1//COMMIT_MSG@16
PS1, Line 16: - Run existing test_load.py
> We also need tests to verify the INSERT events. Could you add some tests in
I realized that replication cannot be used as a verification of insert event 
for external tables because hive replication for external tables relies on 
distcp instead of insert events. Given that LOAD DATA is only applicable to 
external tables, we need to use another way to verify the INSERT events. 
Therefore, I added a test and used number of skipped events as an implicit 
indicator. Let me know if you have better idea.


http://gerrit.cloudera.org:8080/#/c/19052/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/19052/1/be/src/service/client-request-state.cc@806
PS1, Line 806: string for unpartitione
> nit: Could you add a comment mentioning that the partition_name is an empty
Done


http://gerrit.cloudera.org:8080/#/c/19052/1/be/src/service/client-request-state.cc@809
PS1, Line 809:   
catalog_update.__set_sync_ddl(exec_request_->query_options.sync_ddl);
 :   catalog_update.__set_header(GetCatalogServiceRequestHeader());
 :   catalog_update.target_table = 
exec_request_->load_data_request.table_name.table_name;
 :   catalog_update.db_name = 
exec_request_->load_data_request.table_name.db_name;
 :   catalog_update.is_overwrite = 
exec_request_->load_data_request.overwrite;
 :
 :   const TNetworkAddress& address =
> nit: these duplicate the code in ClientRequestState::ExecLoadDataRequestImp
Done



--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Sat, 01 Oct 2022 01:59:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-09-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/19052 )

Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..

IMPALA-8592: Add support for insert events for 'LOAD DATA' statements
from Impala

In this patch, we use TUpdateCatalogRequest to refresh metadata after
'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the
code for 'INSERT' statements. It will fire an insert event just same
as what we did for 'INSERT' statements.

Testing:
- Run existing test_load.py
- Added test_load_data_from_impala() in test_event_processing.py

Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/metadata/test_event_processing.py
6 files changed, 129 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/3
--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11627: Build Impala with cdw dependencies

2022-09-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18994 )

Change subject: IMPALA-11627: Build Impala with cdw dependencies
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18994/6/java/pom.xml
File java/pom.xml:

http://gerrit.cloudera.org:8080/#/c/18994/6/java/pom.xml@109
PS6, Line 109:
nit: wrong indentation?



--
To view, visit http://gerrit.cloudera.org:8080/18994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id379030f4b314e139c875584eee438b7416d89a4
Gerrit-Change-Number: 18994
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 29 Sep 2022 23:03:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8592: Add support for insert events for 'LOAD DATA' statements from Impala

2022-09-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19052


Change subject: IMPALA-8592: Add support for insert events for 'LOAD DATA' 
statements from Impala
..

IMPALA-8592: Add support for insert events for 'LOAD DATA' statements
from Impala

In this patch, we use TUpdateCatalogRequest to refresh metadata after
'LOAD DATA' instead of TResetMetadataRequest so that we can reuse the
code for 'INSERT' statements. It will fire an insert event just same
as what we did for 'INSERT' statements.

Testing:
- Run existing test_load.py

Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
---
M be/src/service/client-request-state.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
4 files changed, 67 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/19052/1
--
To view, visit http://gerrit.cloudera.org:8080/19052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f1b470f40e0aaf891c9f3f327af393b2f9c74bc
Gerrit-Change-Number: 19052
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11160: Ignore stale ALTER PARTITION events on transactional tables

2022-09-20 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19020 )

Change subject: IMPALA-11160: Ignore stale ALTER_PARTITION events on 
transactional tables
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19020/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19020/1//COMMIT_MSG@25
PS1, Line 25: Tests
> The solution looks good, but one thing bugs me: shouldn't the original bug
Thanks Quanlong for catching this. I agree with Csaba that we should add more 
tests around event processing. I just created a follow-up Jira IMPALA-11598.



--
To view, visit http://gerrit.cloudera.org:8080/19020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5bb8cfc213093f3bbd0359c7084b277a3bd5264a
Gerrit-Change-Number: 19020
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 20 Sep 2022 17:10:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11540: Add logs for ALTER TABLE events that trigger slow metadata reload

2022-09-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18926 )

Change subject: IMPALA-11540: Add logs for ALTER_TABLE events that trigger slow 
metadata reload
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf344e6b423f88c9635ca8d61d53385b88ba4dce
Gerrit-Change-Number: 18926
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Thu, 08 Sep 2022 15:38:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11540: Add logs for ALTER TABLE events that trigger slow metadata reload

2022-08-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18926 )

Change subject: IMPALA-11540: Add logs for ALTER_TABLE events that trigger slow 
metadata reload
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf344e6b423f88c9635ca8d61d53385b88ba4dce
Gerrit-Change-Number: 18926
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Aug 2022 12:03:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9670: Fix unloaded views are shown as tables for GET TABLES requests

2022-06-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18626 )

Change subject: IMPALA-9670: Fix unloaded views are shown as tables for 
GET_TABLES requests
..


Patch Set 4: Code-Review+1

(2 comments)

Thank you for helping me understand the context!

http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1903
PS3, Line 1903: String tableName = tblMeta.getTableName().toLowerCase();
> Yeah, we can save the other toLowerCase() calls. As mentioned in our doc:
Ack


http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py@119
PS3, Line 119: IMPALAD_HOSTNAME_LIST[i] + ':' +
> We calculate the hs2 ports and hs2-http ports based on the specified beeswa
Ack



--
To view, visit http://gerrit.cloudera.org:8080/18626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I528bb20272ebdd66a0118c30efc2b0566f2b0e2f
Gerrit-Change-Number: 18626
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 20 Jun 2022 03:03:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9670: Fix unloaded views are shown as tables for GET TABLES requests

2022-06-17 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18626 )

Change subject: IMPALA-9670: Fix unloaded views are shown as tables for 
GET_TABLES requests
..


Patch Set 3:

(4 comments)

The patch looks good to me in general! Just left some question and minor 
comments.

http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@46
PS3, Line 46: import org.apache.hadoop.hive.metastore.TableType;
Maybe TableType can be removed as well?


http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1903
PS3, Line 1903: String tableName = tblMeta.getTableName().toLowerCase();
Could you confirm if converting tableName to lower case is OK? I ask just 
because it was not converted to lower case before. If it's OK, we probably 
don't need to call toLowerCase next line.


http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java
File fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java:

http://gerrit.cloudera.org:8080/#/c/18626/3/fe/src/main/java/org/apache/impala/catalog/Hive3MetastoreShimBase.java@290
PS3, Line 290:   case MATERIALIZED_VIEW:
I figure I should ask the question here because sometimes we treat materialized 
view as table. Does it matter in the context of this patch?


http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/18626/3/tests/common/impala_test_suite.py@119
PS3, Line 119: str(IMPALAD_BEESWAX_PORT_LIST[i] - IMPALAD_BEESWAX_PORT + 
IMPALAD_HS2_PORT)
I might missed something. Could you explain why we need to calculate port here? 
(same question for line 127)



--
To view, visit http://gerrit.cloudera.org:8080/18626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I528bb20272ebdd66a0118c30efc2b0566f2b0e2f
Gerrit-Change-Number: 18626
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 17 Jun 2022 23:47:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking

2022-03-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18324 )

Change subject: IMPALA-11181: Improving performance of compaction checking
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@705
PS3, Line 705:   if 
(partNameToCompactionId.containsKey(entry.getKey().getName())) {
 : stalePartitions.add(entry.getKey());
 : iter.remove();
> nit: Can we optimize this to the following case?
Done


http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/18324/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@295
PS3, Line 295:
> nit: Could you add a blank line before this?
Done



--
To view, visit http://gerrit.cloudera.org:8080/18324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
Gerrit-Change-Number: 18324
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 30 Mar 2022 16:20:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking

2022-03-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18324 )

Change subject: IMPALA-11181: Improving performance of compaction checking
..

IMPALA-11181: Improving performance of compaction checking

After HIVE-25753, we don't need to explicitly set all partitions' name
to get the latest compaction id. Besides, we can also send the last
compaction id over to HMS so that HMS will send back compaction info
only if there are newer compactions. In this way, we can avoid
unnecessary data transmitted between HMS and Catalogd.

Testing:
existing tests

Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
3 files changed, 31 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/4
--
To view, visit http://gerrit.cloudera.org:8080/18324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
Gerrit-Change-Number: 18324
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking

2022-03-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18324 )

Change subject: IMPALA-11181: Improving performance of compaction checking
..

IMPALA-11181: Improving performance of compaction checking

After HIVE-25753, we don't need to explicitly set all partitions' name
to get the latest compaction id. Besides, we can also send the last
compaction id over to HMS so that HMS will send back compaction info
only if there are newer compactions. In this way, we can avoid
unnecessary data transmitted between HMS and Catalogd.

Testing:
existing tests

Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
3 files changed, 30 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/3
--
To view, visit http://gerrit.cloudera.org:8080/18324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
Gerrit-Change-Number: 18324
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking

2022-03-21 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/18324 )

Change subject: IMPALA-11181: Improving performance of compaction checking
..

IMPALA-11181: Improving performance of compaction checking

After HIVE-25753, we don't need to explicitly set all partitions' name
to get the latest compaction id. Besides, we can also send the last
compaction id over to HMS so that HMS will send back compaction info
only if there are newer compactions. In this way, we can avoid
unnecessary data transmitted between HMS and Catalogd.

Testing:
existing tests

Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
3 files changed, 29 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/2
--
To view, visit http://gerrit.cloudera.org:8080/18324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
Gerrit-Change-Number: 18324
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11181: Improving performance of compaction checking

2022-03-15 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18324


Change subject: IMPALA-11181: Improving performance of compaction checking
..

IMPALA-11181: Improving performance of compaction checking

After HIVE-25753, we don't need to explicitly set all partitions' name
to get the latest compaction id. Besides, we can also send the last
compaction id over to HMS so that HMS will send back compaction info
only if there are newer compactions. In this way, we can avoid
unnecessary data transmitted between HMS and Catalogd.

Testing:
existing tests

Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
3 files changed, 24 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/18324/1
--
To view, visit http://gerrit.cloudera.org:8080/18324
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32e30ec418ad09bef862e61163539a910c96c44c
Gerrit-Change-Number: 18324
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489

2022-03-12 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/18296 )

Change subject: Bump up CDP_BUILD_NUMBER to 23144489
..

Bump up CDP_BUILD_NUMBER to 23144489

This patch is to include HIVE-25753, which is needed to improve the
performance of retrieving the latest committed compaction for a table.

Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
---
M bin/impala-config.sh
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
3 files changed, 24 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/7
--
To view, visit http://gerrit.cloudera.org:8080/18296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
Gerrit-Change-Number: 18296
Gerrit-PatchSet: 7
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489

2022-03-11 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/18296 )

Change subject: Bump up CDP_BUILD_NUMBER to 23144489
..

Bump up CDP_BUILD_NUMBER to 23144489

This patch is to include HIVE-25753, which is needed to improve the
performance of retrieving the latest committed compaction for a table.

Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
---
M bin/impala-config.sh
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
3 files changed, 18 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/6
--
To view, visit http://gerrit.cloudera.org:8080/18296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
Gerrit-Change-Number: 18296
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489

2022-03-09 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18296 )

Change subject: Bump up CDP_BUILD_NUMBER to 23144489
..

Bump up CDP_BUILD_NUMBER to 23144489

This patch is to include HIVE-25753, which is needed to improve the
performance of retrieving the latest committed compaction for a table.

Besides, we also need to fix ClassCastException after SerializableTable
is added to iceberg. Since BaseTable is always transformed to
SerializableTable for serialization, we cannot restore BaseTable after
deserializing it.

Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
10 files changed, 68 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/5
--
To view, visit http://gerrit.cloudera.org:8080/18296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
Gerrit-Change-Number: 18296
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489

2022-03-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18296 )

Change subject: Bump up CDP_BUILD_NUMBER to 23144489
..

Bump up CDP_BUILD_NUMBER to 23144489

This patch is to include HIVE-25753, which is needed to improve the
performance of retrieving the latest committed compaction for a table.

Besides, we also need to fix ClassCastException after SerializableTable
is added to iceberg. Since BaseTable is always transformed to
SerializableTable for serialization, we cannot restore BaseTable after
deserializing it.

Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
10 files changed, 67 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/4
--
To view, visit http://gerrit.cloudera.org:8080/18296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
Gerrit-Change-Number: 18296
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489

2022-03-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18296 )

Change subject: Bump up CDP_BUILD_NUMBER to 23144489
..

Bump up CDP_BUILD_NUMBER to 23144489

This patch is to include HIVE-25753, which is needed to improve the
performance of retrieving the latest committed compaction for a table.

Besides, we also need to fix ClassCastException after SerializableTable
is added to iceberg. Since BaseTable is always transformed to
SerializableTable for serialization, we cannot restore BaseTable after
deserializing it.

Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
10 files changed, 63 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/3
--
To view, visit http://gerrit.cloudera.org:8080/18296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
Gerrit-Change-Number: 18296
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 23144489

2022-03-07 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18296


Change subject: Bump up CDP_BUILD_NUMBER to 23144489
..

Bump up CDP_BUILD_NUMBER to 23144489

This patch is to include HIVE-25753, which is needed to improve the
performance of retrieving the latest committed compaction for a table.

Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
---
M bin/impala-config.sh
1 file changed, 12 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/18296/2
--
To view, visit http://gerrit.cloudera.org:8080/18296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifd4ae0cba48217483a40a51f97156fabfb00cf27
Gerrit-Change-Number: 18296
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata

2022-02-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18175 )

Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh 
table file metadata
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18175/3/tests/metadata/test_event_processing.py
File tests/metadata/test_event_processing.py:

http://gerrit.cloudera.org:8080/#/c/18175/3/tests/metadata/test_event_processing.py@a39
PS3, Line 39:
> @Yu-Wen: Please confirm the following:
Yes, the test will fail intermittently without fine-grained table refreshing. 
The issue was that we previously refresh file metadata at alter partition event 
but while alter partition event was processed the transaction might not be 
committed yet. If it is committed, we could get new file metadata. Otherwise, 
we would still see stale file metadata. After my patch, we can now refresh file 
metadata at commit event.



--
To view, visit http://gerrit.cloudera.org:8080/18175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d
Gerrit-Change-Number: 18175
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Thu, 03 Feb 2022 18:06:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata

2022-01-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18175 )

Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh 
table file metadata
..

IMPALA-11093: Fine grained table refreshing doesn't refresh table file
metadata

If we insert data into an acid partitioned table from Hive, the
generated events will be like open_txn -> alter_partition
-> commit_txn.

Previously we assumed the partition object with the alter_partition
event has write id < current write id. However, that is not a valid
assumption, the partition object is actually the write id allocated
in this transaction. That means in commit_txn event, we will have
a partition with write id equals to the write id of cached partition.
So we need to modify the '<' condition to '<='.

Tests:
After IMPALA-10923, we now refresh file metadata while processing
commit events. Therefore, we can add back the test disabled in
IMPALA-9057.

Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M tests/metadata/test_event_processing.py
2 files changed, 1 insertion(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/18175/3
--
To view, visit http://gerrit.cloudera.org:8080/18175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d
Gerrit-Change-Number: 18175
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata

2022-01-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18175 )

Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh 
table file metadata
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18175/1/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
File 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java:

http://gerrit.cloudera.org:8080/#/c/18175/1/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java@255
PS1, Line 255: updateMinOpenWriteId();
> how is this related to this change?
The minOpenWriteId is not used actually, so no harm as of now. I will remove 
this from the change and refactor this in another patch.



--
To view, visit http://gerrit.cloudera.org:8080/18175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d
Gerrit-Change-Number: 18175
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 28 Jan 2022 21:42:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11093: Fine grained table refreshing doesn't refresh table file metadata

2022-01-27 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18175


Change subject: IMPALA-11093: Fine grained table refreshing doesn't refresh 
table file metadata
..

IMPALA-11093: Fine grained table refreshing doesn't refresh table file
metadata

If we insert data into an acid partitioned table from Hive, the
generated events will be like open_txn -> alter_partition
-> commit_txn.

Previously we assumed the partition object with the alter_partition
event has write id < current write id. However, that is not a valid
assumption, the partition object is actually the write id allocated
in this transaction. That means in commit_txn event, we will have
a partition with write id equals to the write id of cached partition.
So we need to modify the '<' condition to '<='.

Tests:
Manually testing

Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
2 files changed, 2 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/18175/1
--
To view, visit http://gerrit.cloudera.org:8080/18175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idabeb522525c45f000ca0992348660fa5a5d4d2d
Gerrit-Change-Number: 18175
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 5:

There is one test failed at "Rows Processed" check in Dockerised-test but it 
seems similar to https://issues.apache.org/jira/browse/IMPALA-6004. It seems 
irrelevant to the patch.

Other failures in "ubuntu-16.04-from-scratch" didn't exist in one previous 
build so they might be flasky. A previous run of the same patch passed at: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/15371/.


--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 23:47:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata. Since this
checking brings additional overhead for queries, we introduce a flag
auto_check_compaction and set it as false by default for now. We will
find some other efficient ways to do compaction checking in the future.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
---
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
12 files changed, 356 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/5
--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc@348
PS3, Line 348: conduct
> nit, conducted
Ack


http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc@349
PS3, Line 349: m
> move to previous line?
Ack


http://gerrit.cloudera.org:8080/#/c/18043/3/be/src/service/impala-server.cc@349
PS3, Line 349: ala makes "
 : "additional RPCs to hive metastore for each table
> suggest you to change it to more generic since end users may not understand
Ack



--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 30 Nov 2021 00:03:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata. Since this
checking brings additional overhead for queries, we introduce a flag
auto_check_compaction and set it as false by default for now. We will
find some other efficient ways to do compaction checking in the future.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
---
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
12 files changed, 337 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/4
--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata. Since this
checking brings additional overhead for queries, we introduce a flag
auto_check_compaction and set it as false by default for now. We will
find some other efficient ways to do compaction checking in the future.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
---
M be/src/service/impala-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
13 files changed, 335 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/3
--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18043/2//COMMIT_MSG@10
PS2, Line 10: After compaction happened in Hive(HIVE ACID table), queries made 
in
: Impala possibly fail with a FileNotFoundException if files already
: removed by the Hive cleaner.
> IIRC, Impala only open transactions for DDL/DML operations. Do you know how
Thank Vihang and Quanlong for letting me know the problem. Impala does NOT open 
transactions for select queries so this approach doesn't work all the time...

Hive has a config that can delay the cleaner some period of time but we don't 
know exactly how long we should extend.
Given that this is time sensitive, I'm thinking we could make this feature 
optional for now. If this flag is set, say auto_check_compaction, let Impala 
open transactions for all the queries for ACID tables and do the compaction 
checking. Any thoughts?


http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/18043/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@898
PS2, Line 898: List stalePartitions = 
directProvider_.checkLatestCompaction(
 : refImpl.dbName_, refImpl.tableName_, refImpl, refToMeta);
> I think this introduces several HMS RPCs per query (some queries may call t
If we take the performance numbers on DWX as example, currently this API call 
takes 10 ~ 40 ms per table depending on the number of partitions. I will have a 
fix on the HMS side to solve an issue around this API that we need to pass all 
the partition names. That should make all the API execution time close to 10 ms.

Even though we can make some improvement around this API, I understand this is 
still introduce the overhead that might not neglectable. It might be better to 
introduce this feature with a flag and the table property to skip this check as 
Quanlong suggested.



--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 29 Nov 2021 02:56:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/18043 )

Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata.

Tests:
Added unit tests to CatalogdMetaProviderTest

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
8 files changed, 308 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/2
--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after Compaction

2021-11-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18043


Change subject: IMPALA-11032: Automatic Refresh of Metadata for Local Catalog 
after Compaction
..

IMPALA-11032: Automatic Refresh of Metadata for Local Catalog after
Compaction

After compaction happened in Hive(HIVE ACID table), queries made in
Impala possibly fail with a FileNotFoundException if files already
removed by the Hive cleaner.

In IMPALA-10801, catalogd checks the latest compaction id before serving
metadata. However, coordinators don't take advantage of that.
Coordinators have their own local cache, so we will have to do the
same check for coordinators as well. Besides, we also need to attach
writeIdList to requests that need to fetch file metadata.

Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
8 files changed, 303 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/18043/1
--
To view, visit http://gerrit.cloudera.org:8080/18043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I173ea848917b6a41139b25b80677111463bfdc4b
Gerrit-Change-Number: 18043
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-16 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17858/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17858/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4341
PS15, Line 4341: // Aborted write id is not allowed. The write id can 
be committed if the table
> ok. Please add a comment for it
Done


http://gerrit.cloudera.org:8080/#/c/17858/16/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17858/16/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2608
PS16, Line 2608:   List FDafter = 
tbl.getPartitionsForNames(
> From what I understand, this asserts that underlying file metatdata remaine
Yes, you are right. The file descriptors will be reused if the files are not 
changed. I added a metric to check if file metadata is reloaded.



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 17
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 16 Nov 2021 22:08:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-16 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables'
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
a partitioned ACID table(50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
17 files changed, 1,216 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/17
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 17
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-12 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables'
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
a partitioned ACID table(50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
17 files changed, 1,204 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/16
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-09 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3560
PS13, Line 3560:   LOG.debug("Not adding write ids to table {}.{} for event 
{} " +
> nit: add more details in the log message like table name, event id being pr
Done


http://gerrit.cloudera.org:8080/#/c/17858/11/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
File 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java:

http://gerrit.cloudera.org:8080/#/c/17858/11/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java@252
PS11, Line 252:   exceptions.add(currentId);
> Looked at the implementation of BitSet.get() and I think the following sequ
Yes, that already works because BitSet by default returns false if it is not 
set.


http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17858/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4368
PS13, Line 4368:   throw new CatalogException(
> nit: Would be good to add a log message with details about the rollback.
Done


http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2487
PS12, Line 2487: } finally {
> nit: Original config should be restored in finally block
Done


http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2499
PS12, Line 2499:   
stubCfg.setHms_event_incremental_refresh_transactional_table(true);
> nit: can include test name in the table name for example: test_abort_transa
Done


http://gerrit.cloudera.org:8080/#/c/17858/12/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2818
PS12, Line 2818:   }
> Instead of creating a new method createTransactionalTable, we can enhance g
I tried and verified that we need to set table params for creating 
transactional tables. Please see 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java#L174.



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 15
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 10 Nov 2021 00:14:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-09 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables'
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
a partitioned ACID table(50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
17 files changed, 1,130 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/15
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 15
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables'
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
a partitioned ACID table(50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
17 files changed, 1,093 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/14
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables'
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
a partitioned ACID table(50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
17 files changed, 1,089 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/13
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 13
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-11-01 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables'
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
a partitioned ACID table(50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
17 files changed, 1,002 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/12
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 12
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-26 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
18 files changed, 1,000 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/11
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-26 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
17 files changed, 956 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/10
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-25 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
17 files changed, 938 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/9
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 9
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-22 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes for partitioned tables by
  AllocWriteIdEvents, CommitTxnEvents, and AbortTxnEvents.
2. Conduct partition level refreshing for transactional tables
  addPartitionEvents, dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 928 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/8
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-20 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap> 
txnToWriteIds_ =
:   new ConcurrentHashMap<>();
> Thanks for the clarification. "the new HMS API getAllWriteEventInfo only re
getAllWriteEventInfo just return the data stored in the table 
TXN_WRITE_NOTIFICATION_LOG. AFAIK, HS2 calls add_write_notification_log that 
inserts records into TXN_WRITE_NOTIFICATION_LOG only for DML for transactional 
tables. I tried  few queries locally like "drop constraint", and they advance 
write id but don't add write notification log.

I tried to reduce the memory footprint here by saving write ids for 
transactional partitioned tables only. Besides, this map's size is just 
proportional to the simultaneous open transactions. Despite I don't have any 
real data points, we might not have a huge number of simultaneous "open" 
transactions?


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap> 
txnToWriteIds_ =
:   new ConcurrentHashMap<>();
> @Yu-Wen: In addition to what Vihang asked, how would we handle the followin
@Sourabh
Good question. Since I don't see a way to retrieve back the missing write id, 
we might accept that this write id remains open. When next time a request with 
writeIdList that has this write id as committed, we will reload the whole table 
because the writeIdList of the request is considered more recent. In some 
sense, the table cache is considered stale when the write id is not marked 
committed.


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891
PS6, Line 2891: case
> do we need a default: clause which throws a exception?
Done


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java@142
PS6, Line 142: volatile
> not sure why we need this?
@Vihang
I call getCreateEventId() in AllocWriteIdEvent without acquiring lock. Is there 
any chance createEventId will be set after the table is loaded? If not, we 
don't need this.


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2066
PS6, Line 2066: catalog_.removeWriteIds(txnId_);
> This line must be in finally block otherwise we are leaking memory in case
Thank you for catching this.


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2057
PS7, Line 2057: commitTxnMessage_.addWriteEventInfo(writeEventInfoList);
> Why are we modifying commitTxnMesage? Can't we get all the required info fr
@Sourabh
I actually imitated the code from hive repl: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/CommitTxnHandler.java#L166.
The upside is that I don't have to parse table and partition objects. It is 
done by CommitTxnMessage. As I can see from the hive code, it seems like this 
function is used like this by design.


http://gerrit.cloudera.org:8080/#/c/17858/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2080
PS7, Line 2080:   
Preconditions.checkNotNull(commitTxnMessage_.getPartitions());
> Why are we checking for non null partitions? Wouldn't unpartitioned table h
As long as we have called addWriteEventInfo, this would be empty list even for 
unpartitioned table. So, this is just to check we have added write event info. 
I can change the check to other variables to avoid confusion.


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File 

[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes by AllocWriteIdEvents,
  CommitTxnEvents, and AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents,
  dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 776 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/7
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 7
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes by AllocWriteIdEvents,
  CommitTxnEvents, and AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents,
  dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 773 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/6
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-13 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes by AllocWriteIdEvents,
  CommitTxnEvents, and AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents,
  dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 745 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/5
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-13 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Patch Set 4:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9
PS1, Line 9:
> +1
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift
File common/thrift/BackendGflags.thrift:

http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift@219
PS1, Line 219:   97: required bool 
hms_event_incremental_refresh_transactional_table
> What is really the reason of having a config for this? Is there a case wher
The initial thought was just to toggle on/off the feature for easily doing 
experiments. From the perspective of users, they would like to turn this off 
only when this feature has problems. Therefore, the goal is to make this 
feature robust enough and then we can get rid of this flag.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap> 
txnToWriteIds_ =
:   new ConcurrentHashMap<>();
> Do we really need this? Based on my understanding when the ALLOC_WRITE_ID e
The difficulty here is we have some DDLs advancing write id without changing 
data but the new HMS API getAllWriteEventInfo only return info for WRITE 
events. Let's say we have a DDL for table foo in txn 3 and this DDL allocates 
write id 3 for table foo. We can mark write id 3 as open for table foo when 
catalogd receives AllocWriteIdEvent. However, when it receives CommitTxnEvent 
for txn 3, we don't know write id 3 for table foo is associated with this 
transaction if we don't have a mapping table in catalog.

We cannot reload writeIdList alone either for commitTxnEvent because chances 
are that there are other committed txn after this event and simply reloading 
wrietIdList make the table become inconsistent. Any thoughts or alternative 
approaches?

Sorry that my previous patch was incomplete. The entry for a transaction should 
be deleted whenever the transaction is ended (committed or aborted).


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@777
PS1, Line 777:   }
 :
 :   public void removeWriteIds(Long txnId) {
 : Preconditions.checkNotNull(txnId);
 : txnToWriteIds_.remove(txnId);
 :   }
 : }
> this could be simplified as
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@788
PS1, Line 788:
> do we need to check for existence of txnId?
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2604
PS1, Line 2604: return true;
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2610
PS1, Line 2610:*/
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3578
PS1, Line 3578:
> pls add java doc
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3584
PS1, Line 3584:  {
> this can throw a NPE since one of the conditins for this if is tbl==null.
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3597
PS1, Line 3597:
> change to debug?
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3599
PS1, Line 3599: ibleForTesting
> This preconditions check is unnecessary. Also, use use something like Unloc
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2802
PS1, Line 2802: ddTimer(CAT
> Please add java doc for this.
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2804
PS1, Line 2804: me
> nit, we can change this to a simple switch-case statement to reduce the if
Done



[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-13 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes by AllocWriteIdEvents,
  CommitTxnEvents, and AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents,
  dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 738 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/4
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-13 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events
for transactional tables

To enable fine-grained table refreshing, there are three main changes
in this commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We
  will keep track of write id changes by AllocWriteIdEvents,
  CommitTxnEvents, and AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents,
  dropPartitionEvents, and AlterPartitionEvents.
3. Introduce a config
  hms_event_incremental_refresh_transactional_table, which can switch
  on/off the fine-grained table refreshing.

Performance Tests:
A simple test was performed by running insert into one partition for
partitioned ACID tables (50,000 partitions). Below are the time taken
to refresh this table by the event.

StorageBefore  After
=
S3 50 secs 50 msecs
local  3 secs  3 msecs

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
16 files changed, 735 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/3
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-13 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has restored this change. ( http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Restored
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-10-13 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has abandoned this change. ( http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..


Abandoned

Some missed modifications
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10958: Decouple getConstraintsInformation from hive.ql.metadata.Table

2021-10-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17910


Change subject: IMPALA-10958: Decouple getConstraintsInformation from 
hive.ql.metadata.Table
..

IMPALA-10958: Decouple getConstraintsInformation from
hive.ql.metadata.Table

After HIVE-22782, ql.metadata.Table object has no methods to set
PrimaryKeyInfo and ForeignKeyInfo alone. However, we call these two
functions In DescribeResultFactory to set constraints and pass the
table into HiveMetadataFormatUtils. Instead of calling the methods
from table, we can directly pass PrimaryKeyInfo and ForeignKeyInfo
to HiveMetadataFormatUtils so that Impala won't be influenced even
though the table class changes interface.

Additionally, we can get rid of ql.metadata.Table for
getTableInformation altogether since it just needs
metastore.api.Table internally.

Tests:
Ran core tests.

Change-Id: I2dfc54ae2f995dc4ab735d17dbbad9a48f6633da
---
M 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
3 files changed, 15 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/17910/1
--
To view, visit http://gerrit.cloudera.org:8080/17910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2dfc54ae2f995dc4ab735d17dbbad9a48f6633da
Gerrit-Change-Number: 17910
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10959: Reload MV as ACID tables

2021-10-08 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17911


Change subject: IMPALA-10959: Reload MV as ACID tables
..

IMPALA-10959: Reload MV as ACID tables

We observed that the event processor is broken after receiving a
partition event for materialized views (MV). This is because we are
treating MV as view in Impala but Hive generates partition events for MV,
which breaks current event processor.

In this patch, we let partition events of MV follow the code path of ACID
tables to reload the view. In the long term, we will need IMPALA-10723 to
treat materialized view as a table.

Tests:
- manually testing

Change-Id: Ibeab8cc53ad47d24df8baba81e1ec6ea4c80a084
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
1 file changed, 26 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17911/1
--
To view, visit http://gerrit.cloudera.org:8080/17911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibeab8cc53ad47d24df8baba81e1ec6ea4c80a084
Gerrit-Change-Number: 17911
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] Bump up the GBN to 17296101

2021-09-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17874 )

Change subject: Bump up the GBN to 17296101
..


Patch Set 2:

> Patch Set 1:
>
> (1 comment)
>
> LGTM. I left a minor comment below. I can +2 this once it is addressed.

Added. Could you please check again?


--
To view, visit http://gerrit.cloudera.org:8080/17874
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6
Gerrit-Change-Number: 17874
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 29 Sep 2021 16:21:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump up the GBN to 17296101

2021-09-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17874 )

Change subject: Bump up the GBN to 17296101
..

Bump up the GBN to 17296101

This patch bumps up the GBN to 17296101. This build
includes HIVE-25137, which introduce a new HMS API
to get acid write events of a transaction.

Additionally, it excludes the ranger-plugins-audit
from the dependency of ranger-plugins-common so that
maven can resolve dependencies.

Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6
---
M bin/impala-config.sh
M fe/pom.xml
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
3 files changed, 29 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17874/2
--
To view, visit http://gerrit.cloudera.org:8080/17874
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6
Gerrit-Change-Number: 17874
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] Bump up the GBN to 17296101

2021-09-27 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17874


Change subject: Bump up the GBN to 17296101
..

Bump up the GBN to 17296101

This patch bumps up the GBN to 17296101. This build
includes HIVE-25137, which introduce a new HMS API
to get acid write events of a transaction.

Additionally, it excludes the ranger-plugins-audit
form the dependency of ranger-plugins-common so that
maven can resolve dependencies.

Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6
---
M bin/impala-config.sh
M fe/pom.xml
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
3 files changed, 26 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/17874/1
--
To view, visit http://gerrit.cloudera.org:8080/17874
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I87a497882e80dbfc87077bdbc2f05216182003d6
Gerrit-Change-Number: 17874
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

2021-09-21 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
..


Patch Set 6:

(2 comments)

Thanks Vihang for introducing a way for batch event processing. I will rebase 
my patch for IMPALA-10923 on top of this patch.

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@270
PS6, Line 270: i=0, j=1
nit: spaces around assignment operator


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1803
PS6, Line 1803: for (T event : batchedEvents_) {
It seems ignoredPartitions still being processed here.



--
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 22 Sep 2021 03:55:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables

2021-09-20 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17858


Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
..

IMPALA-10923: Fine grained table refreshing at partition level events for 
transactional tables

To enable fine-grained table refreshing, there are three main changes in this 
commit.
1. Maintain validWriteIdList in Catalogd for transactional tables. We will keep 
track
  of write id changes by AllocWriteIdEvents, CommitTxnEvents, and 
AbortTxnEvents.
2. Trigger partition level refreshing for addPartitionEvents, 
dropPartitionEvents, and
  AlterPartitionEvents.
3. Introduce a config incremental_refresh_acid, which can switch on/off the 
fine-grained
  table refreshing.

Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
---
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
M fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/test/java/org/apache/impala/catalog/CatalogTableWriteIdTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 672 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/17858/1
--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-10 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194
PS14, Line 2194: te the table when the updatedTbl has a higher ValidWriteIdList
   :   // if we just rely on catalog version comparison which 
would break the logic to
   :   // reload on stale ValidWri
> I synced up with Yu-Wen offline. Based on the discussion, I understand the
Thanks Vihang for putting this together. I've added the comment.



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 10 Aug 2021 20:34:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-10 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@533
PS15, Line 533: " (c1 int) partitioned by (part int) stored as orc" +
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@547
PS15, Line 547: executeHiveSql("create table " + getTestFullAcidTblName() +
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@598
PS15, Line 598: TPartialPartitionInfo afterPartitionInfo =
> nit, this comment can be removed now.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 10 Aug 2021 20:32:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-10 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 369 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/16
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-09 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 362 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/15
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 15
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-05 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..


Patch Set 14:

(3 comments)

> Patch Set 14:
>
> (2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194
PS14, Line 2194: AcidUtils.compare((HdfsTable) existingTbl,
   :updatedTbl.getValidWriteIds(), 
tableId)
   : >= 0)
> Is it guaranteed that the existingTbl will have compacted files if it has a
No, it is not guaranteed. In this case, the table is coming from a full table 
loading and I supposed at this time the client who sent the query already 
acquired read/write lock on the table. Therefore, the file metadata loaded 
won't be cleaned for the lifetime of the query. We don't need to worry other 
queries because the file metadata will be updated next time when there is any 
compaction.

If I understand the original design correctly, the conditions here make sure we 
only update once for many requests to a single table. After my change, 
refreshing file metadata also changes catalogVersion. Let's say there are one 
table loading and one file metadata refreshing happening together. If file 
metadata refreshing is finished earlier and catalogVersion is updated, we 
should not discard the result of full table loading here since we need to 
return the table with most recent validWriteIdList.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@38
PS14, Line 38:   CatalogServiceCatalog catalog, 
GetLatestCommittedCompactionInfoRequest request)
> nit: Only pass MetastoreClient instead of whole catalog Object ?
Passing catalog here so we can get a client from pool only when needed.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@564
PS14, Line 564: Assert.assertEquals
> A comment above this line saying that compacted tables/partitions should on
Not sure about ORC tables. Will check.



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Thu, 05 Aug 2021 21:39:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 324 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/14
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-08-03 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 322 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/13
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 13
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] Bump up the GBN to 15549253

2021-08-02 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17746


Change subject: Bump up the GBN to 15549253
..

Bump up the GBN to 15549253

This patch bumps up the GBN to 15549253. This patch includes the fix by
Fang-Yu for using correct policy id to update the policy of "all - database"
due to the change on the Ranger side.

Testing:
* ran the create-load-data.sh

Change-Id: Ie7776e62dad0b9bec6c03fb9ee8f1b8728ff0e69
---
M bin/impala-config.sh
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
4 files changed, 26 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/17746/1
--
To view, visit http://gerrit.cloudera.org:8080/17746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie7776e62dad0b9bec6c03fb9ee8f1b8728ff0e69
Gerrit-Change-Number: 17746
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-30 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2131
PS10, Line 2131:   partsToBeRefreshed =
> @Yu-Wen: If there are multiple getOrLoad requests that end up at line 2131
@Sourabh Thanks for the suggestion. I will try if I can do something like 
loadAsync.

For refreshFileMetadata(), Vihang pointed out a potential race condition that 
we cannot make sure the whole table reloading was happened after a compaction 
or not. It is possible we end up still serve stale file metadata. To avoid more 
issues around race conditions, we'd need to refresh file metadata even there is 
a concurrent full table reloading.



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 30 Jul 2021 16:58:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-29 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
12 files changed, 326 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/11
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS10, Line 2127: Get non-ACID table with writeIdList:
> This text here comes as a message for the IllegalStateException thrown and
Ack


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3479
PS10, Line 3479: If there is an ongoing loading task, we don't reload file 
metadata but wait for the
   :* loading task completed and return the table just loaded.
> this comment is stale now that removed the loadReq logic.
Ack


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@41
PS10, Line 41: Log
> is this import needed?
Ack


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@345
PS10, Line 345: hdfsTable_.writeLock().lock();
> why is this needed? The table locks are generally taken at CatalogOpExecuto
This load fails after adding a precondition check for locking in 
HdfsTable.loadFileMetadataForPartitions. I suppose the lock is not taken at 
CatalogOpExecutor because hdfsTable_ is the internal object of icebergTable.


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@115
PS10, Line 115: able.writeLock().lock();
> Why is this needed?
Same as icebergTable. After adding a precondition check for locking in 
HdfsTable.loadFileMetadataForPartitions, this function would fail without 
taking lock.


http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json
File testdata/cluster/ranger/setup/policy_5_revised.json:

http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json@8
PS10, Line 8: 5
> how is this change related? If it is not can we remove it from this patch a
Since this patch bumps up cdp version and the new version of ranger would cause 
failure of create-load-data.sh. If I don't put this here, I cannot get a green 
testing. Or should I create a seperate commit to bump up cdp version?



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 28 Jul 2021 21:40:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-28 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
12 files changed, 387 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/10
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-27 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
11 files changed, 383 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/9
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 9
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-27 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
10 files changed, 376 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/8
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-27 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
9 files changed, 370 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/7
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 7
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-26 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG@7
PS4, Line 7: ACID ta
> nit, May be change this to say "ACID table" to be more specific.
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2125
PS4, Line 2125: Preconditions.checkSta
> Can you add a Preconditions check before this line to make sure that the ta
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS4, Line 2127: l.readLock().lock();
> nit, can we rename this variable to something like "partsToBeRefreshed" to
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2187
PS4, Line 2187:
> change to "ACID tables" since external tables are also HdfsTables
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3486
PS4, Line 3486:
  : if (!tryWriteLock(hdfsTable)) {
  :   throw new CatalogException(String.format(
  :   "Error during refreshing file metadata for table %s 
due to lock contention",
  :   hdfsTable.getFullName()));
  : }
  : long newVersion = incrementAndGetCatalogVersion();
  : v
> This logic seems to have a race condition. How do we know that the loadReq
Thanks for pointing out this. It is for optimization so I've removed it.


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@824
PS4, Line 824: if (isPartitioned()) {
 :   for (CompactionInfoStruct ci : resp.getCompactions()) {
 : HdfsPartition.Builder partBuilder = 
nameToPartBuilder.get(ci.getPa
> If you move this to line 805 you can avoid iterating the partBuilders twice
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827
PS4, Line 827: Preconditions.checkNotNull(partBuilder);
 : partBuilder.setLastCompactionId(ci.getId());
 :   }
 : } else {
 :   CompactionInfoStruct ci = 
Iterables.getOnlyElement(resp.getCompactions());
 :
> I think the code readability can be improved if you handle the non-partitio
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@520
PS4, Line 520: TGetPartialCatalogObjectResponse response =
> line too long (107 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@539
PS4, Line 539: response = sendRequest(request);
> line too long (114 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@569
PS4, Line 569: 
Assert.assertTrue(prePartitionInfo.getFile_descriptors().size() > 1);
> line too long (110 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@583
PS4, Line 583:   .wantFiles()
> line too long (92 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 26 Jul 2021 17:50:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

2021-07-26 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
ACID table
..

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
9 files changed, 367 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/6
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-26 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
9 files changed, 367 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/5
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-22 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
10 files changed, 378 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/4
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation

2021-07-22 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17703 )

Change subject: [WIP]: Initial commit to acquire table/database lock in 
metastore server before any HMS operation
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@192
PS3, Line 192:   LOG.debug("Successfully executed HMS API: " + apiName);
> @Kishen: Sure, will add a UT. For now it would be a no-op since we haven't
@Sourabh: I was thinking some methods might not need to wait until events are 
synced up but let eventProcessor to do that in the background. Given that there 
should be only a small number of updates, I agree this way is better and 
cleaner to keep each function has similar logic.



--
To view, visit http://gerrit.cloudera.org:8080/17703
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I085eab20db61282daf4549ddbcc018aaf63cc361
Gerrit-Change-Number: 17703
Gerrit-PatchSet: 4
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Thu, 22 Jul 2021 20:38:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-21 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811
PS2, Line 811: Map partNameToCompactionId = new HashMap<>();
> Thanks for the suggestion. A batch size of 1K makes sense to me. I will tes
In my local, the execution time of this api are ~1 ms for 1K partitions, ~10 ms 
for 10K paritions and ~30 ms for 50K partitions. Although it might takes a bit 
longer in a production env, we can expect it still falls in the range of tens 
of ms and I suppose it is a tolerable latency.



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 21 Jul 2021 22:18:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-21 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
10 files changed, 370 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/3
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] [WIP]: Initial commit to acquire table/database lock in metastore server before any HMS operation

2021-07-21 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17703 )

Change subject: [WIP]: Initial commit to acquire table/database lock in 
metastore server before any HMS operation
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17703/3/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServiceHandler.java@192
PS3, Line 192:   LOG.debug("Successfully executed HMS API: " + apiName);
> Can you add one sample test case, where CatalogOpExecutor and MetastoreServ
Do we need to sync table/database to latest event in this class? If we don't 
directly update cache here, is it possible to delay the sync up operation until 
next read?



--
To view, visit http://gerrit.cloudera.org:8080/17703
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I085eab20db61282daf4549ddbcc018aaf63cc361
Gerrit-Change-Number: 17703
Gerrit-PatchSet: 4
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 21 Jul 2021 18:59:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-20 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811
PS2, Line 811:   
metaStoreClient.getHiveClient().getLatestCommittedCompactionInfo(request);
> Should we parallelize this HMS api if table has large number of partitions
Thanks for the suggestion. A batch size of 1K makes sense to me. I will test it 
out.


http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@837
PS2, Line 837:   LOG.debug("Cached compaction id for {}: {} but the latest 
compaction id: {}",
> Log partition name as well?
Will add.



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Tue, 20 Jul 2021 16:55:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2135
PS1, Line 2135: CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics()
> line too long (129 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2140
PS1, Line 2140:   // Update the cache miss metric, as the valid write 
id list did not match and we
> line too long (158 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@797
PS1, Line 797:   private void getAndSetLastCompactionId(IMetaStoreClient client,
> line too long (113 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@828
PS1, Line 828:   String partName = lci.getPartitionname() == null ? 
DEFAULT_PARTITION_NAME :
> line too long (105 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2193
PS1, Line 2193:   return client.getHiveClient()
> line too long (100 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File 
fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@561
PS1, Line 561: testDbName, testPartitionedTbl, 
HdfsTable.FILEMETADATA_CACHE_MISS_METRIC);
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@563
PS1, Line 563: testDbName, testPartitionedTbl, 
HdfsTable.FILEMETADATA_CACHE_HIT_METRIC);
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@578
PS1, Line 578: testDbName, testPartitionedTbl, 
HdfsTable.FILEMETADATA_CACHE_MISS_METRIC);
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@580
PS1, Line 580: testDbName, testPartitionedTbl, 
HdfsTable.FILEMETADATA_CACHE_HIT_METRIC);
> line too long (96 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Mon, 19 Jul 2021 18:15:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
8 files changed, 368 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/2
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

2021-07-19 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17697


Change subject: IMPALA-10801: Check the latest compaction Id before serving 
request
..

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
8 files changed, 362 insertions(+), 43 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/1
--
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-07-12 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..

IMPALA-10724: Add mutable validWriteIdList

In this patch, we add a new class for manually updating writeIdList.
In terms of updating writeIdList, we introduce three methods including
addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds.

We will use this class in MetastoreEventProcessor for fine-grained
table refreshing. With the control of writeIdList, we will be able to
update the transactional table partially and keep it consistent.

There are some restrictions for MutableValidWriteIdList.
1. We need to mark a writeId open before mark it committed/aborted.
2. We only allow two writeId state transitions, open -> committed or
open -> aborted. Any other transition is NOT allowed.

Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
A 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
4 files changed, 573 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/5
--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10740: MetastoreServiceHandler should extend DefaultThriftHiveMetastore

2021-07-07 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17569 )

Change subject: IMPALA-10740: MetastoreServiceHandler should extend 
DefaultThriftHiveMetastore
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17569/2/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17569/2/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@344
PS2, Line 344: ,
nit: put a space after comma



--
To view, visit http://gerrit.cloudera.org:8080/17569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7e3f74dd96a7fec2ed13b0e5929f2b0a6b66e39f
Gerrit-Change-Number: 17569
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 07 Jul 2021 21:38:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10751: new API in CatalogD to be used by Event processor for caching txn to table write id mapping

2021-06-16 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17599 )

Change subject: IMPALA-10751: new API in CatalogD to be used by Event processor 
for caching txn to table write id mapping
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@776
PS1, Line 776: (
nit: add white space


http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@779
PS1, Line 779:
nit: wrong indentation


http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@787
PS1, Line 787:
nit: white space should be removed


http://gerrit.cloudera.org:8080/#/c/17599/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@789
PS1, Line 789: }
nit: redundant right curly bracket?



--
To view, visit http://gerrit.cloudera.org:8080/17599
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2058fdf591b2655a10a92192d5f629b72a85f08a
Gerrit-Change-Number: 17599
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Wed, 16 Jun 2021 23:57:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-15 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..

IMPALA-10724: Add mutable validWriteIdList

In this patch, we add a new class for manually updating writeIdList.
In terms of updating writeIdList, we introduce three methods including
addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds.

We will use this class in MetastoreEventProcessor for fine-grained
table refreshing. With the control of writeIdList, we will be able to
update the transactional table partially and keep it consistent.

There are some restrictions for MutableValidWriteIdList.
1. We need to mark a writeId open before mark it committed/aborted.
2. We only allow two writeId state transitions, open -> committed or
open -> aborted. Any other transition is NOT allowed.

Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
A 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
4 files changed, 576 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/4
--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-11 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17538/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17538/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2701
PS1, Line 2701:   public MutableValidWriteIdList getMutableValidWriteIds() {
> Why do you have to change the method signature ?
Oh I see. We can explicitly cast it to MutableValidWriteIdList when we need to 
update it. Will remove this.



--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 
Gerrit-Comment-Date: Fri, 11 Jun 2021 21:53:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-10 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..

IMPALA-10724: Add mutable validWriteIdList

In this patch, we add a new class for manually updating writeIdList.
In terms of updating writeIdList, we introduce three methods including
addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds.

We will use this class in MetastoreEventProcessor for fine-grained
table refreshing. With the control of writeIdList, we will be able to
update the transactional table partially and keep it consistent.

There are some restrictions for MutableValidWriteIdList.
1. We need to mark a writeId open before mark it committed/aborted.
2. We only allow two writeId state transitions, open -> committed or
open -> aborted. Any other transition is NOT allowed.

Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
A 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
4 files changed, 580 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/3
--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-10 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..

IMPALA-10724: Add mutable validWriteIdList

In this patch, we add a new class for manually updating writeIdList.
In terms of updating writeIdList, we introduce three methods including
addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds.

We will use this class in MetastoreEventProcessor for fine-grained
table refreshing. With the control of writeIdList, we will be able to
update the transactional table partially and keep it consistent.

There are some restrictions for MutableValidWriteIdList.
1. We need to mark a writeId open before mark it committed/aborted.
2. We only allow two writeId state transitions, open -> committed or
open -> aborted. Any other transition is NOT allowed.

Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
A 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
4 files changed, 577 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/2
--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yu-Wen Lai 


  1   2   >