On 01/13/2012 09:07 AM, Tom M wrote:
Hi Gordon,
sorry I didn't get back to you yesterday, but I needed to get an
opportunity to start a separate broker that I could stop on one of our
systems.
We had seen these different results with our deployed system, and I want to
make sure that this test client acted the same way.
(on our deployed sytem, most of failover testing had been done with a kill
on the broker and we would see the client detect the lost connection. But
later, when a host died, we found this other condition, where the client
did not detect it)
So, I ran the test:
* started a separate broker (see below)
* started the same test client, as before, with the larger msg size
* did a kill on the broker: kill -STOP<pid>
* saw the same results as you did, the client detected the loss connection
in about 2x heartbeat rate
Then, to verify my earlier results, I ran the same exact test, except this
time pulled the network cable:
* started a separate broker (same as previous run above)
* started the same test client, as before, with the larger msg size
* pulled network cable
* saw same results as my previous tests: client continued to "send" well
past the heartbeat timeout should have been (seeing same trace messages),
until about 80 seconds later, locked up.
Note, I've noted this result (client sending the larger msg missing the
detection of the lost connection) also happens if the broker host abruptly
dies (which is how we first detected the problem).
Note: I used the following to start broker:
/usr/sbin/qpidd -p 18102 --log-to-syslog no --log-to-file
/export/hps/dda/qpidd_x/log/qpidd_x.log --worker-threads 3 --data-dir
/export/hps/dda/qpidd_x/data --pid-dir /export/hps/dda/qpidd_x/pid-dir
--auth no --config /dev/null
So, please let me know if you can run the test again, but pulling the
network cable (I'm pulling net between broker and switch, but, I'm pretty
sure I've seen the same when pulling net between switch and client).
thanks,
Tom
You can simulate a network cable pull by telling iptables to drop packets.
Attached an old script, no warranty.
WARNING: if you've got remote access only to the machine in question be careful
you don't shut yourself out! The attached script only drops corosync/openais
packets, so you can still ssh etc.
#!/bin/sh
# Simulate a network failure for openais/corosync
if [ "$1" = "drop" ]; then
cat <<- _EOT_ | iptables-restore
*filter
:INPUT ACCEPT [40:3040]
-A INPUT -m state --state NEW -p udp --dport 5404 -j DROP
-A INPUT -m state --state NEW -p udp --dport 5405 -j DROP
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [21:2164]
-A OUTPUT -m state --state NEW -p udp --dport 5404 -j DROP
-A OUTPUT -m state --state NEW -p udp --dport 5405 -j DROP
COMMIT
_EOT_
else
iptables -F
fi
/sbin/iptables -L
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]