pkuwm commented on a change in pull request #1066:
URL: https://github.com/apache/helix/pull/1066#discussion_r460207219



##########
File path: helix-core/src/main/java/org/apache/helix/HelixProperty.java
##########
@@ -148,7 +154,12 @@ public String toString() {
     }
   }
 
-  private Stat _stat;
+  protected Stat _stat;
+
+  public HelixProperty() {
+    _record = DEFAULT_ZNRECORD;
+    _stat = DEFAULT_STAT;
+  }

Review comment:
       If we don't have this default constructor, in Message's constructor, we 
need to call `super(ZNRecord)`. HelixProperty defines `ZNRecord` but not 
`SessionAwareZNRecord`. Then if we want Message's SessionAwareZNRecord, we may 
override `getRecord()` to convert `ZNRecord` to `SessionAwareZNRecord`: 
   ```
   Override
   public ZNRecord getRecord() {
     return new SessionAwareZNRecord(_record);
   }
   ```
   The problem is this creates a new record from the original one `_record`. We 
could not access the original one: `message.getRecord().setSimpleField()`.
   
   So we need to create SessionAwareZNRecord in Message's constructor and 
override parent's ZNRecord. Without the default constructor, we won't be able 
to do `_record = new SessionAwareZNRecord()` in Message.

##########
File path: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
##########
@@ -757,9 +762,22 @@ private void handleEvent(ClusterEvent event, 
BaseControllerDataProvider dataProv
     }
     event.addAttribute(AttributeName.ControllerDataProvider.name(), 
dataProvider);
 
-    logger.info(String.format("START: Invoking %s controller pipeline for 
cluster %s event: %s  %s",
-        manager.getClusterName(), dataProvider.getPipelineName(), 
event.getEventType(),
-        event.getEventId()));
+    String eventSessionId = 
event.getAttribute(AttributeName.EVENT_SESSION.name());
+    if (eventSessionId != null) {
+      String managerSessionId = manager.getSessionId();
+      if (!eventSessionId.equals(managerSessionId)) {

Review comment:
       I got your point. To achieve one level if statement, we could move 
managerSessionId outside. Then the scope of managerSessionId is bigger.
   ```
   String eventSessionId = 
event.getAttribute(AttributeName.EVENT_SESSION.name());
   String managerSessionId = manager.getSessionId();
   if (eventSessionId != null && !eventSessionId.equals(managerSessionId)) {
   ...
   }
   ```
   
   But after this if statement we don't need managerSessionId. This what I try 
to do, following `Effective Java Item - 45 : Minimize the scope of local 
variables`: https://gist.github.com/kpgalligan/4cf93abd88044df71da5

##########
File path: 
helix-core/src/main/java/org/apache/helix/controller/stages/MessageDispatchStage.java
##########
@@ -78,7 +78,17 @@ protected void processEvent(ClusterEvent event, 
MessageOutput messageOutput) thr
         batchMessage(dataAccessor.keyBuilder(), messagesToSend, resourceMap, 
liveInstanceMap,
             manager.getProperties());
 
+    String expectedSession = 
event.getAttribute(AttributeName.EVENT_SESSION.name());
+    // An early check for expected leader session. If the sessions don't 
match, it means the
+    // controller lost leadership, then messages should not be sent and 
pipeline should stop.
+    // This avoids potential double masters for a single partition.
+    if (expectedSession != null && 
!expectedSession.equals(manager.getSessionId())) {
+      throw new StageException(
+          "Controller: " + manager.getInstanceName() + " lost leadership! 
Expected session: "
+              + expectedSession + ", actual: " + manager.getSessionId());
+    }

Review comment:
       What I was thinking about is, we also check this stage in 
`ReadClusterDataStage()` because this place is the highly possible one that 
session expires. In our issue, the session expiry also happened in 
`ReadClusterDataStage`.
   
   If we don't check it at every stage, it just costs extra 
resource(CPU/memory/network IO) which we could tolerate. I think checking at 
every stage may be excessive because most of stages are pretty 
fast(`ReadClusterDataStage` takes longer time) and very tiny possibility that 
session changes. As long as we protect sending message, correctness is ensured. 
We wanted to make minimum code change.

##########
File path: helix-core/src/main/java/org/apache/helix/model/Message.java
##########
@@ -953,4 +970,35 @@ public boolean isValid() {
     }
     return true;
   }
+
+  // A class represents session aware ZNRecord for message. The message should 
be written to zk
+  // by the expected session.
+  // TODO: remove this class once public session-aware ZNRecord is available
+  private static class SessionAwareZNRecord extends ZNRecord implements 
SessionAwareZkWriteData {

Review comment:
       We still want to make minimum changes in this PR. Putting it private 
gives us more flexibility to change in future: may add more stuff to this 
class. If in future we find that we need `SessionAwareZNRecord` in other 
places, we should create such thing as public. We need to discuss more and have 
a more comprehensive design for public `SessionAwareZNRecord`. What do you 
think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to