narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464771973
##########
File path:
zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
* Tests getChildren() when there are an excessive number of children and
connection loss happens,
* the operation should terminate and exit retry loop.
*/
- @Test
+ @Test(timeOut = 30 * 1000L)
public void testGetChildrenOnLargeNumChildren() throws Exception {
- // Default packetLen is 4M. It is static final and initialized
- // when first zkClient is created.
- // So we could not just set "jute.maxbuffer" to change the value.
- // Reflection is needed to change the value.
- // Remove "final" modifier
- Field modifiersField = Field.class.getDeclaredField("modifiers");
- boolean isModifierAccessible = modifiersField.isAccessible();
- modifiersField.setAccessible(true);
-
- Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
- Field childrenLimitField =
-
org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
- modifiersField.setInt(packetLenField, packetLenField.getModifiers() &
~Modifier.FINAL);
- modifiersField.setInt(childrenLimitField,
childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
- boolean isPacketLenAccessible = packetLenField.isAccessible();
- packetLenField.setAccessible(true);
- int originPacketLen = packetLenField.getInt(null);
- // Keep 150 bytes for successfully creating each child node.
- packetLenField.set(null, 150);
-
- boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
- childrenLimitField.setAccessible(true);
- int originChildrenLimit = childrenLimitField.getInt(null);
- childrenLimitField.set(null, 2);
-
- String path = "/" + TestHelper.getTestMethodName();
- // Create 5 children to make packet length of children exceed 150 bytes
+ // Create 110K children to make packet length of children exceed 4 MB
// and cause connection loss for getChildren() operation
- for (int i = 0; i < 5; i++) {
- _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(),
true);
+ String path = "/" + TestHelper.getTestMethodName();
+
+ _zkClient.createPersistent(path);
+
+ for (int i = 0; i < 110; i++) {
+ List<Op> ops = new ArrayList<>(1000);
+ for (int j = 0; j < 1000; j++) {
+ String child = path + "/" + UUID.randomUUID().toString();
+ ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL));
+ }
+ // Reduce total creation time by batch creating znodes
+ _zkClient.multi(ops);
}
try {
_zkClient.getChildren(path);
- Assert.fail("Should not successfully get children.");
+ Assert.fail("Should not successfully get children because of connection
loss.");
} catch (ZkException expected) {
Assert.assertEquals(expected.getMessage(),
"org.apache.zookeeper.KeeperException$MarshallingErrorException: "
+ "KeeperErrorCode = MarshallingError");
} finally {
- packetLenField.set(null, originPacketLen);
- packetLenField.setAccessible(isPacketLenAccessible);
-
- childrenLimitField.set(null, originChildrenLimit);
- childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
- modifiersField.setAccessible(isModifierAccessible);
+ _zkClient.close();
+ _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
Assert.assertTrue(TestHelper.verify(() -> {
- _zkClient.deleteRecursively(path);
- return !_zkClient.exists(path);
+ try {
+ return _zkClient.delete(path);
Review comment:
I think this is fine for testing purposes, but here are some scenarios
we can think about that gets us from, say, 95% to 99% stability:
1. doesn't the packet len error cause the session to be terminated? Or does
your fix change that behavior? In this case if my assumption is true, the
client that created the ephemeral nodes will lose its session and the
ephemerals tied to the session will be gone anyway, so the nodes will be gone
before zkclient.close()?
2. this is a rare scenario, but sometimes we observe in-memory local clients
lose connection for some unknown reason. What impact would that have on the
test?
If we really wanted to be right and thorough, I would write persistent
nodes, and check that the numChildren field comes out to 110K from Stats with
an Assert statement, and then do the explicit deletion at the end. This would
guard against any potential client-side glitches and failures.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]