[ https://issues.apache.org/jira/browse/ZOOKEEPER-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731380#action_12731380 ]
Henry Robinson commented on ZOOKEEPER-460: ------------------------------------------ I need a little help getting to the bottom of this (I might be misreading Hudson's logs). The code in question is, I think, 'ok' (although a bit dodgy). The idea is to test the ability of a client - that is waiting because the max cnxns limit has been reached - to reconnect once a slot becomes free on the server. So ideally for this test close(1) should happen after createclient(2) has connected. As you say, this is a false assumption as the close might happen before the createClient(2) succeeds so there is no contention, but this should only be giving false positives - the second assert should eventually succeed. What I need to do to improve this is to replace createClient with a call that blocks until we at least know the connection attempt has been made, if that's possible. However the most recent Hudson failures don't seem to be related. From build 375: [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion [exec] Zookeeper_watchers::testDefaultSessionWatcher1 : OK [exec] Zookeeper_watchers::testDefaultSessionWatcher2 : OK [exec] Zookeeper_watchers::testObjectSessionWatcher1 : OK [exec] Zookeeper_watchers::testObjectSessionWatcher2 : OK [exec] Zookeeper_watchers::testNodeWatcher1 : OK [exec] Zookeeper_watchers::testChildWatcher1 : OK [exec] Zookeeper_watchers::testChildWatcher2 : OK [exec] [exec] /home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestClient.cc:289: Assertion: equality assertion failed [Expected: -101, Actual : -4] [exec] Failures !!! [exec] Run: 32 Failure total: 1 Failures: 1 Errors: 0 [exec] make: *** [run-check] Error 1 and the same from 376 (yesterday's build). These are failing in TestClient (specifically testAsyncWatcherAutoReset). The error here is that a stat completion callback is getting called with ZCONNECTIONLOSS, but is expecting to see ZNONODE, and the assert is failing. This test runs fine for me locally, so is the problem a heavily loaded Hudson, causing the connection loss? Similarly the failed build you point to, 371, fails TestClientRetry with a broken pipe error which to my novice eye sounds a bit like something falling over under load. It looks to me right now like the TestClientRetry code needs improving, but is benign as it should only cause false positives, and we need to understand the reasons why TestClient is failing. Does that sound right? > bad testRetry in cppunit tests (hudson failure) > ----------------------------------------------- > > Key: ZOOKEEPER-460 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-460 > Project: Zookeeper > Issue Type: Bug > Components: c client, tests > Reporter: Patrick Hunt > Assignee: Henry Robinson > Fix For: 3.2.1, 3.3.0 > > > the followng code failed on hudson > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/371/ > watchctx_t ctx1, ctx2; > zhandle_t *zk1 = createClient(&ctx1); > CPPUNIT_ASSERT_EQUAL(true, ctx1.waitForConnected(zk1)); > zhandle_t *zk2 = createClient(&ctx2); > zookeeper_close(zk1); > CPPUNIT_ASSERT_EQUAL(true, ctx2.waitForConnected(zk2)); > there's a problem with this test, it assumes that close(1) can be called > before createclient(2) gets connected. > this is not correct: createclient is an async call an in some cases the > connection can be established before > create client returns. > this shows a failure in this case because client1 was created, then client2 > attempted to connect > but failed due to this on the server (max conn exceeded): > sprintf(cmd, "export ZKMAXCNXNS=1;%s startClean %s", ZKSERVER_CMD, > getHostPorts()); > conn 2 failed and therefore the following assert eventually failed. > this code should not assume that close(1) will beat connect(2) > Henry can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.