Re: [PR] Fix global lock contention in `DistClusterControllerStateModel` caused by `Optional.empty()` singleton [helix]

via GitHub Mon, 07 Jul 2025 09:27:34 -0700


GrantPSpencer commented on code in PR #3052:
URL: https://github.com/apache/helix/pull/3052#discussion_r2190550821



##########
helix-core/src/test/java/org/apache/helix/participant/TestDistControllerStateModel.java:
##########
@@ -124,4 +132,69 @@ public void testReset() {
     stateModel.reset();
   }
 
+  /**
+   * Test to verify that different DistClusterControllerStateModel instances
+   * use separate lock objects, ensuring no cross-instance blocking.
+   */
+  @Test()
+  public void testNoSharedLockAcrossInstances() throws Exception {
+    LOG.info("Testing that lock objects are not shared across 
DistClusterControllerStateModel instances");
+
+    // Verify different instances have different lock objects
+    DistClusterControllerStateModel instance1 = new 
DistClusterControllerStateModel(ZK_ADDR);
+    DistClusterControllerStateModel instance2 = new 
DistClusterControllerStateModel(ZK_ADDR);
+
+    Field lockField = 
DistClusterControllerStateModel.class.getDeclaredField("_controllerLock");
+    lockField.setAccessible(true);
+
+    Object lock1 = lockField.get(instance1);
+    Object lock2 = lockField.get(instance2);
+
+    Assert.assertNotNull(lock1, "First instance should have a lock object");
+    Assert.assertNotNull(lock2, "Second instance should have a lock object");
+    Assert.assertNotSame(lock1, lock2, "Different instances must have 
different lock objects");
+
+    // Verify concurrent access doesn't block across instances
+    final int NUM_INSTANCES = 10;
+    ExecutorService executor = Executors.newFixedThreadPool(NUM_INSTANCES);
+    CountDownLatch startLatch = new CountDownLatch(1);
+    CountDownLatch completionLatch = new CountDownLatch(NUM_INSTANCES);
+    AtomicInteger completedInstances = new AtomicInteger(0);
+
+    for (int i = 0; i < NUM_INSTANCES; i++) {
+      final int instanceId = i;
+      final DistClusterControllerStateModel instance = new 
DistClusterControllerStateModel(ZK_ADDR);
+
+      executor.submit(() -> {
+        try {
+          startLatch.await(); // wait for all threads to be ready
+
+          // Simulate state transition operations that would use the lock
+          synchronized (lockField.get(instance)) {
+            // hold the lock here briefly to simulate real state transition 
work
+            Thread.sleep(100);
+            completedInstances.incrementAndGet();
+          }
+
+        } catch (Exception e) {
+          LOG.error("Instance {} failed during concurrent test", instanceId, 
e);
+        } finally {
+          completionLatch.countDown();
+        }
+      });
+    }
+
+    // start all threads simultaneously
+    startLatch.countDown();
+
+    // All instances should complete within reasonable time since they don't 
block each other
+    boolean allCompleted = completionLatch.await(2, TimeUnit.SECONDS);

Review Comment:
   nit:
   Will this always pass? num_instances = 10, so expecting only about 
100ms*10=1second of wait if locks were executed sequentially due to shared lock
   Should we reduce timeout or maybe refactor to just assert that while lock1 
synchronized block loops endlessly, lock2 can still complete? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org

Re: [PR] Fix global lock contention in `DistClusterControllerStateModel` caused by `Optional.empty()` singleton [helix]

Reply via email to