2008/8/26 Martin Ritchie <[EMAIL PROTECTED]>: > Hi, > > Just raised a bug as a result of a CI failure for the > SyncWaitTimeoutDelayTest. > > It appears to me to be a protocol bug anyone fluent in 0-10 able to > say if the bug is also in 0-10? > > Is there going to be a 0-9 update that might address this? > > https://issues.apache.org/jira/browse/QPID-1262 > > The problem in a nutshell: > > TxCommitOk is not correlated with the TxCommit that initiated the work > on the broker. > So if our broker takes a long time (using SlowMessageStore) to perform > commit and client times out the wait for the TxCommitOK (as in the > SyncWaitTimeoutDelayTest) then it is possible that if a subsequent > TxCommit is sent that the TxCommitOk that is returned signals the wait > by mistake. > > AMQP Method Sequence: > [C]lient > [B]roker > [S]end > [R]eceive > > CS: TxCommit (a) > BR: TxCommit (a) > // Broker takes a lot of time > // Client times out waiting for TxCommit (a) > CS: TxCommit (b) > BS: TxCommitOk (a) > CR: TxCommitOk (a) > // At this point the the client thinks that its commit (a) has > succeeded, it hasn't. > > My only thoughts were > a) add correlation ids to the TxCommit TxCommitOk pairs, as was done > above for clarity in the explanation. > b) close the session in the event of a timeout and re-establish session. >
Option b) is the only safe alternative for 0-8/0-9. Completion of commands is correlated in 0-10 so this is no longer an issue... -- Rob > thoughts? > -- > Martin Ritchie >