maedhroz commented on a change in pull request #1221:
URL: https://github.com/apache/cassandra/pull/1221#discussion_r722423093



##########
File path: test/unit/org/apache/cassandra/schema/SchemaKeyspaceTest.java
##########
@@ -183,8 +269,32 @@ public void testSchemaNoColumn()
                                     KeyspaceParams.simple(1),
                                     SchemaLoader.standardCFMD(testKS, 
testTable));
         // Delete all colmns in the schema
-        String query = String.format("DELETE FROM %s.%s WHERE keyspace_name=? 
and table_name=?", SchemaConstants.SCHEMA_KEYSPACE_NAME, 
SchemaKeyspace.COLUMNS);
+        String query = String.format("DELETE FROM %s.%s WHERE keyspace_name=? 
and table_name=?", SchemaConstants.SCHEMA_KEYSPACE_NAME, 
SystemKeyspaceConstants.COLUMNS);
         executeOnceInternal(query, testKS, testTable);
         SchemaKeyspace.fetchNonSystemKeyspaces();
     }
+
+    private void testMethodSynchronicity(Runnable method) throws Exception
+    {
+        ExecutorService pool = Executors.newFixedThreadPool(2);
+        CountDownLatch pendingStarts = new CountDownLatch(2);
+
+        threadBlock = new CountDownLatch(2);
+        pool.execute(() -> {
+            pendingStarts.countDown();
+            method.run();
+        });
+        pool.execute(() -> {
+            pendingStarts.countDown();
+            method.run();
+        });
+
+        // We have started 2 threads. 1 thread should be held by the latch 
threadBlock, the other should be held as
+        // proof of thread safety.
+        pendingStarts.await(10, TimeUnit.SECONDS);
+        Util.spinAssertEquals(1L, () -> threadBlock.getCount(), 10);
+
+        // Do not execute the tasks. Just kill it.
+        pool.shutdownNow();
+    }

Review comment:
       Forgive me for the longe comment here, but hopefully it brings together 
all my comments since the initial questions on CASSANDRA-16856...
   
   When we encounter an issue like this, our first instinct should always be to 
try to reproduce the suboptimal outcome in an environment that reasonably 
represents the one that generated the original problem. In this case, we 
weren't going to try to exactly reproduce the very old Fallout test, but the 
good thing is that we seemed to have a pretty good understanding of the 
problem: that `Schema` and `SchemaKeyspace` could clearly be used in a way that 
would allow us to read partially applied schema transformations from the tables 
in the schema keyspace.
   
   So, in this case, a small bit of refactoring occurred to a few of us, namely 
that making `Schema` the single place to access and modify schema would 
simplify things and allow us to make a more coherent fix. With that in place, 
but without yet adding proper synchronization, a reasonable step would be to 
test the public contract of `Schema` in the face of concurrent access to 
reproduce the problem before making the actual fix.
   
   For example, a useful test would be proving that 
`Schema.schemaKeyspaceAsMutations()` from one thread can see a partially 
applied schema change from `Schema#transform()` in another thread without 
proper synchronization. The tests as they exist here (if I'm understanding them 
correctly) don't quite do this, and further, they seem to all just check single 
method for mutual exclusion.
   
   The reason this matters is that the synchronization, as it currently exists 
here, might be broken. In the example above, `Schema#transform()` holds the 
monitor lock on the singleton `Schema` instance, while 
`Schema.schemaKeyspaceAsMutations()` holds the monitor lock on the `Schema` 
class. Mutual exclusion won't be enforced. We can probably fix that by 
accessing the new methods via `Schema.instance`, but my larger point is that 
the core of this whole process is a test that verifies the thread-safety 
guarantees provided by `Schema` in a realistic way, given the level of 
abstraction.
   
   (I'd be perfectly happy if we simply tested the scenario above and called it 
a day, but it's possible truncation might not be that hard to extend from it. 
We've done this a few different ways in places like 
`DigestResolverTest#multiThreadedNoRepairNeededReadCallback()` and 
`RepairDigestTrackingTest#testLocalDataAndRemoteRequestConcurrency()`, although 
you've also got some building blocks in this test above as well if you want to 
go the unit test rather than in-JVM test route, which is probably simpler.)
   
   Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to