Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21065 )

Change subject: [WIP]IMPALA-12832: Implicit invalidate metadata on event 
failures
......................................................................


Patch Set 8:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/21065/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21065/8//COMMIT_MSG@25
PS8, Line 25: - Passed FE tests
We also need manual tests on failures in handling the failure, i.e. when an 
event fail, we invalidate the tables. The invalidation could still failed by 
bugs like IMPALA-12831.

Currently, IMPALA-12831 is unresolved yet. We can use it in manual tests.
Or we can add another flag to inject failure in the error handling, and see if 
auto_global_invalidate_metadata works in this scenario.


http://gerrit.cloudera.org:8080/#/c/21065/8/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/21065/8/be/src/catalog/catalog-server.cc@174
PS8, Line 174: DEFINE_string
nit: define this as hidden using DEFINE_string_hidden()


http://gerrit.cloudera.org:8080/#/c/21065/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/21065/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1292
PS7, Line 1292:             e);
              :         catalog_.invalidateTableIfExists(dbName_, tblName_);
              :       }
nit: we can use warnLog()


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@a2811
PS8, Line 2811:
Could you explain why we remove this?


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@677
PS8, Line 677: BackendConfig.INSTANCE.getProcessEventFailureEventTypes()
nit: this can also be replaced with catalog_.getFailureEventsForTesting()


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@679
PS8, Line 679: Double
nit: use primitive type 'double'


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1368
PS8, Line 1368:       return false;
Maybe we can use CatalogServiceCatalog.invalidateDb() here. But it seems low 
priority since we haven't found bugs on db events yet?


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1787
PS8, Line 1787:             tableBefore_.getDbName(), 
tableBefore_.getTableName());
Please add a comment saying the new table will be invalidated in 
MetastoreTableEvent#onFailure(), i.e. the parent method.


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2902
PS8, Line 2902:         is not processed by then. So try to get the table again 
here if it is null */
This seems fixing an existing bug. Can we split this into another patch?
CommitCompactionEvent and ReloadEvent also have this issue, i.e. using 
catalog_.getTable() in the constructor.
https://github.com/apache/impala/blob/0c0a3fff39839b400370433568f37a317b7d4800/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L3072
https://github.com/apache/impala/blob/0c0a3fff39839b400370433568f37a317b7d4800/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L2879


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@3105
PS8, Line 3105:             tableWriteId.getDbName(), 
tableWriteId.getTblName());
I think we should deduplicate the table names before invalidating them. They 
could be the same table.


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1006
PS8, Line 1006:       try {
Can we add a log for this?


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1188
PS8, Line 1188:               if (!event.onFailure(e)) { throw e; }
Can we add a log when throwing the exception? e.g. "Unable to handle failure in 
processing event".


http://gerrit.cloudera.org:8080/#/c/21065/8/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@1190
PS8, Line 1190:               LOG.error(event.debugString("Failed to handle 
event processing failure"));
The invalidation in onFailure() could still fail by bugs like IMPALA-12831. 
Please add 'ex' in the log.

  LOG.error(event.debugString("Failed to handle event processing failure"), ex);


http://gerrit.cloudera.org:8080/#/c/21065/8/tests/metadata/test_event_processing_error.py
File tests/metadata/test_event_processing_error.py:

http://gerrit.cloudera.org:8080/#/c/21065/8/tests/metadata/test_event_processing_error.py@28
PS8, Line 28:                   "ALTER_PARTITION,INSERT,ABORT_TXN,COMMIT_TXN'"
Can we add all event types that we currently can handle? e.g. DROP_PARTITION, 
RELOAD, ALLOC_WRITE_ID_EVENT, COMMIT_COMPACTION are missing here.

How about batched event types like ALTER_PARTITIONS, INSERT_PARTITIONS? Should 
we add them as well?

For unsupported event types like ALTER_DATABASE, can we test the feature of 
auto_global_invalidate_metadata?

On the other hand, it's uncertain whether all events are covered in these 
tests. Can we add individual test for each event type? So we are sure the 
failure handling of them are all covered. E.g.

  @CustomClusterTestSuite.with_args(
      impalad_args="--use_local_catalog=true",
      catalogd_args="--catalog_topic_mode=minimal 
--inject_process_event_failure_event_types=ALTER_TABLE")
  def test_alter_table_event_failure(self, unique_database):
    # Run Hive statements to generate ALTER_TABLE events
    EventProcessorUtils.wait_for_event_processing(self)
    assert EventProcessorUtils.get_event_processor_status() == "ACTIVE"


http://gerrit.cloudera.org:8080/#/c/21065/8/tests/metadata/test_event_processing_error.py@61
PS8, Line 61:   def test_sync_event_based_replication(self):
IIUC, we just want some mix workloads. If so, we can merge these 3 into one so 
don't need to restart the cluster 3 times.

  @CustomClusterTestSuite.with_args(
      impalad_args="--use_local_catalog=true",
      catalogd_args=CATALOGD_ARGS)
  def test_event_processing_failures(self, unique_database):
    TestEventProcessingBase._run_test_insert_events_impl(self.hive_client, 
self.client,
      self.cluster, unique_database, True)
    assert EventProcessorUtils.get_event_processor_status() == "ACTIVE"
    TestEventProcessingBase._run_test_insert_events_impl(self.hive_client, 
self.client,
      self.cluster, unique_database, False)
    assert EventProcessorUtils.get_event_processor_status() == "ACTIVE"
    
TestEventProcessingBase._run_event_based_replication_tests_impl(self.hive_client,
      self.client, self.cluster, self.filesystem_client)
    assert EventProcessorUtils.get_event_processor_status() == "ACTIVE"



--
To view, visit http://gerrit.cloudera.org:8080/21065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia67fc04c995802d3b6b56f79564bf0954b012c6c
Gerrit-Change-Number: 21065
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Tue, 27 Feb 2024 08:42:40 +0000
Gerrit-HasComments: Yes

Reply via email to