On 01/13/2012 09:07 AM, Tom M wrote:
Hi Gordon,
sorry I didn't get back to you yesterday, but I needed to get an
opportunity to start a separate broker that I could stop on one of our
systems.

We had seen these different results with our deployed system, and I want to
make sure that this test client acted the same way.
(on our deployed sytem, most of failover testing had been done with a kill
on the broker and we would see the client detect the lost connection.  But
later, when a host died, we found this other condition, where the client
did not detect it)

So, I ran the test:
* started a separate broker (see below)
* started the same test client, as before, with the larger msg size
* did a kill on the broker:    kill -STOP<pid>
* saw the same results as you did, the client detected the loss connection
in about 2x heartbeat rate

Then, to verify my earlier results, I ran the same exact test, except this
time pulled the network cable:
* started a separate broker (same as previous run above)
* started the same test client, as before, with the larger msg size
* pulled network cable
* saw same results as my previous tests: client continued to "send" well
past the heartbeat timeout should have been (seeing same trace messages),
until about 80 seconds later, locked up.

Note, I've noted this result (client sending the larger msg missing the
detection of the lost connection) also happens if the broker host abruptly
dies (which is how we first detected the problem).


Note: I used the following to start broker:
/usr/sbin/qpidd -p 18102 --log-to-syslog no --log-to-file
/export/hps/dda/qpidd_x/log/qpidd_x.log --worker-threads 3 --data-dir
/export/hps/dda/qpidd_x/data --pid-dir /export/hps/dda/qpidd_x/pid-dir
--auth no --config /dev/null

So, please let me know if you can run the test again, but pulling the
network cable (I'm pulling net between broker and switch, but, I'm pretty
sure I've seen the same when pulling net between switch and client).
thanks,
Tom

You can simulate a network cable pull by telling iptables to drop packets. Attached an old script, no warranty. WARNING: if you've got remote access only to the machine in question be careful you don't shut yourself out! The attached script only drops corosync/openais packets, so you can still ssh etc.
#!/bin/sh
# Simulate a network failure for openais/corosync
if [ "$1" = "drop" ]; then
    cat <<- _EOT_ | iptables-restore
    *filter
    :INPUT ACCEPT [40:3040]
    -A INPUT -m state --state NEW -p udp --dport 5404 -j DROP
    -A INPUT -m state --state NEW -p udp --dport 5405 -j DROP
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [21:2164]
    -A OUTPUT -m state --state NEW -p udp --dport 5404 -j DROP
    -A OUTPUT -m state --state NEW -p udp --dport 5405 -j DROP
    COMMIT
    _EOT_
else
    iptables -F
fi
/sbin/iptables -L

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to