Thanks Kim,
1) I'm using qpid from RedHat MRG 0.5.752581-34 for i386 and the store
module is RHM 0.5.3206-27, also for i386.
2) Yes, my messages are persistent. I am aware that transient
messages are discarded on recovery. Your comment that durable
messages "have their message content discarded when the policy limit
is reached" is surprising. I have an 80 GB store and a 1 GB queue
with flow-to-disk. My expectation was that I would be able to
continue to enqueue durable messages past the 1GB mark (up to 80% of
80 GB) and only the first (or latest, not sure) 1GB of queue data
would be held in memory. I expected the dequeues would sort of
"backfill" the 1GB memory buffer until the "flowed-to-disk" messages
were consumed from the store. I recognize that that this is sort of
dual-purposing the backing store to do both journaling and spillover,
but that doesn't seem too far fetched. I'm beginning to think this is
incorrect. Is there some kind of identity relating available memory
to store geometry to queue size (ie. is the following a good rule of
thumb "available memory" = "store size" = "sum(max-queue-size of all
queues)")?
Also, in an attempt to (slowly) recover the queue I allocated 2 GB
swap. This staved off the "Cannot read from child process" error with
a c++ bad_alloc exception which I could see as a debug statement in
the log. However, 2 hours later the system was still in the same
unresponsive state (seemingly trying to complete recovery). It had
not yet begun listening on 5672. I captured the lsof for qpidd user
in case that might also be helpful. Yes, the sfbroker.log file is big
and no, truncating it doesn't alleviate the problem :)
~]# lsof -u qpidd
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
qpidd 12365 qpidd cwd DIR 8,1 4096 573441 /root
qpidd 12365 qpidd rtd DIR 8,1 4096 2 /
qpidd 12365 qpidd txt REG 8,1 76165 1299146 /usr/
sbin/qpidd
qpidd 12365 qpidd mem REG 8,1 18031759 1215641 /usr/
lib/libqpidcommon.so.0.1.0
qpidd 12365 qpidd mem REG 8,1 14644 786449 /lib/
libdl-2.5.so
qpidd 12365 qpidd mem REG 8,1 97512 1214114 /usr/
lib/libsasl2.so.2.0.22
qpidd 12365 qpidd mem REG 8,1 206396 786499 /lib/
i686/nosegneg/libm-2.5.so
qpidd 12365 qpidd mem REG 8,1 44992 786703 /lib/
libgcc_s-4.1.2-20080825.so.1
qpidd 12365 qpidd mem REG 8,1 1620400 789604 /lib/
i686/nosegneg/libc-2.5.so
qpidd 12365 qpidd mem REG 8,1 127720 786500 /lib/
i686/nosegneg/libpthread-2.5.so
qpidd 12365 qpidd mem REG 8,1 74616 786483 /lib/
libresolv-2.5.so
qpidd 12365 qpidd mem REG 8,1 43544 786447 /lib/
libcrypt-2.5.so
qpidd 12365 qpidd mem REG 8,1 2632 1213919 /usr/
lib/libaio.so.1.0.1
qpidd 12365 qpidd mem REG 8,1 15584 1247077 /usr/
lib/openais/libcpg.so.2.0.0
qpidd 12365 qpidd mem REG 8,1 14144 786738 /lib/
libuuid.so.1.2
qpidd 12365 qpidd mem REG 8,1 16068 1215646 /usr/
lib/libcman.so.2.0.115
qpidd 12365 qpidd mem REG 8,1 42068 786501 /lib/
i686/nosegneg/librt-2.5.so
qpidd 12365 qpidd mem REG 8,1 62328 1215614 /usr/
lib/libboost_filesystem.so.1.33.1
qpidd 12365 qpidd mem REG 8,1 124432 786503 /lib/
ld-2.5.so
qpidd 12365 qpidd mem REG 8,1 625546 1266311 /usr/
lib/qpid/daemon/replicating_listener.so
qpidd 12365 qpidd mem REG 8,1 213760 1215620 /usr/
lib/libboost_program_options.so.1.33.1
qpidd 12365 qpidd mem REG 8,1 1127841 1266315 /usr/
lib/qpid/daemon/msgstore.so
qpidd 12365 qpidd mem REG 8,1 946263 1215643 /usr/
lib/libsslcommon.so.0.1.0
qpidd 12365 qpidd mem REG 8,1 96924 1214052 /usr/
lib/libnssutil3.so
qpidd 12365 qpidd mem REG 8,1 925520 1214126 /usr/
lib/libstdc++.so.6.0.8
qpidd 12365 qpidd mem REG 8,1 1109436 1213951 /usr/
lib/libdb_cxx-4.3.so
qpidd 12365 qpidd mem REG 8,1 203340 1214124 /usr/
lib/libssl3.so
qpidd 12365 qpidd mem REG 8,1 839781 1266317 /usr/
lib/qpid/daemon/xml.so
qpidd 12365 qpidd mem REG 8,1 16636786 1215692 /usr/
lib/libqpidbroker.so.0.1.0
qpidd 12365 qpidd mem REG 8,1 226512 1214049 /usr/
lib/libnspr4.so
qpidd 12365 qpidd mem REG 8,1 6270148 1266314 /usr/
lib/qpid/daemon/cluster.so
qpidd 12365 qpidd mem REG 8,1 9912 1214097 /usr/
lib/libplds4.so
qpidd 12365 qpidd mem REG 8,1 6782769 1215639 /usr/
lib/libqpidclient.so.0.1.0
qpidd 12365 qpidd mem REG 8,1 587779 1261584 /usr/
lib/qpid/client/sslconnector.so
qpidd 12365 qpidd mem REG 8,1 14008 1214096 /usr/
lib/libplc4.so
qpidd 12365 qpidd mem REG 8,1 515215 1266312 /usr/
lib/qpid/daemon/replication_exchange.so
qpidd 12365 qpidd mem REG 8,1 558508 1266316 /usr/
lib/qpid/daemon/ssl.so
qpidd 12365 qpidd mem REG 8,1 1187124 1214050 /usr/
lib/libnss3.so
qpidd 12365 qpidd mem REG 8,1 4155068 1215591 /usr/
lib/libxerces-c.so.27.0
qpidd 12365 qpidd mem REG 8,1 3957632 1215596 /usr/
lib/libxqilla.so.3.0.0
qpidd 12365 qpidd 0u CHR 136,0 2 /dev/
pts/0
qpidd 12365 qpidd 1u CHR 136,0 2 /dev/
pts/0
qpidd 12365 qpidd 2u CHR 136,0 2 /dev/
pts/0
qpidd 12365 qpidd 3r FIFO 0,6 240193 pipe
qpidd 12365 qpidd 4w FIFO 0,6 240193 pipe
qpidd 12365 qpidd 5w REG 8,1 3556296254 98581 /var/
log/qpid/sfbroker.log
qpidd 12365 qpidd 6r CHR 1,9 1065 /dev/
urandom
qpidd 12365 qpidd 7r FIFO 0,6 240195 pipe
qpidd 12366 qpidd cwd DIR 8,1 4096 557058 /var/
lib/qpidd/.qpidd
qpidd 12366 qpidd rtd DIR 8,1 4096 2 /
qpidd 12366 qpidd txt REG 8,1 76165 1299146 /usr/
sbin/qpidd
qpidd 12366 qpidd mem REG 8,1 18031759 1215641 /usr/
lib/libqpidcommon.so.0.1.0
qpidd 12366 qpidd mem REG 8,1 14644 786449 /lib/
libdl-2.5.so
qpidd 12366 qpidd mem REG 8,1 97512 1214114 /usr/
lib/libsasl2.so.2.0.22
qpidd 12366 qpidd mem REG 8,1 206396 786499 /lib/
i686/nosegneg/libm-2.5.so
qpidd 12366 qpidd mem REG 8,1 44992 786703 /lib/
libgcc_s-4.1.2-20080825.so.1
qpidd 12366 qpidd mem REG 8,1 1620400 789604 /lib/
i686/nosegneg/libc-2.5.so
qpidd 12366 qpidd mem REG 8,1 127720 786500 /lib/
i686/nosegneg/libpthread-2.5.so
qpidd 12366 qpidd mem REG 8,1 74616 786483 /lib/
libresolv-2.5.so
qpidd 12366 qpidd mem REG 8,1 43544 786447 /lib/
libcrypt-2.5.so
qpidd 12366 qpidd mem REG 8,1 2632 1213919 /usr/
lib/libaio.so.1.0.1
qpidd 12366 qpidd mem REG 8,1 15584 1247077 /usr/
lib/openais/libcpg.so.2.0.0
qpidd 12366 qpidd mem REG 8,1 14144 786738 /lib/
libuuid.so.1.2
qpidd 12366 qpidd mem REG 8,1 16068 1215646 /usr/
lib/libcman.so.2.0.115
qpidd 12366 qpidd mem REG 8,1 42068 786501 /lib/
i686/nosegneg/librt-2.5.so
qpidd 12366 qpidd mem REG 8,1 62328 1215614 /usr/
lib/libboost_filesystem.so.1.33.1
qpidd 12366 qpidd mem REG 8,1 124432 786503 /lib/
ld-2.5.so
qpidd 12366 qpidd mem REG 8,1 625546 1266311 /usr/
lib/qpid/daemon/replicating_listener.so
qpidd 12366 qpidd mem REG 8,1 213760 1215620 /usr/
lib/libboost_program_options.so.1.33.1
qpidd 12366 qpidd mem REG 8,1 1127841 1266315 /usr/
lib/qpid/daemon/msgstore.so
qpidd 12366 qpidd mem REG 8,1 946263 1215643 /usr/
lib/libsslcommon.so.0.1.0
qpidd 12366 qpidd mem REG 8,1 96924 1214052 /usr/
lib/libnssutil3.so
qpidd 12366 qpidd mem REG 8,1 925520 1214126 /usr/
lib/libstdc++.so.6.0.8
qpidd 12366 qpidd mem REG 8,1 1109436 1213951 /usr/
lib/libdb_cxx-4.3.so
qpidd 12366 qpidd mem REG 8,1 203340 1214124 /usr/
lib/libssl3.so
qpidd 12366 qpidd mem REG 8,1 839781 1266317 /usr/
lib/qpid/daemon/xml.so
qpidd 12366 qpidd mem REG 8,1 16636786 1215692 /usr/
lib/libqpidbroker.so.0.1.0
qpidd 12366 qpidd mem REG 8,1 226512 1214049 /usr/
lib/libnspr4.so
qpidd 12366 qpidd mem REG 8,1 6270148 1266314 /usr/
lib/qpid/daemon/cluster.so
qpidd 12366 qpidd mem REG 8,1 9912 1214097 /usr/
lib/libplds4.so
qpidd 12366 qpidd mem REG 8,1 6782769 1215639 /usr/
lib/libqpidclient.so.0.1.0
qpidd 12366 qpidd mem REG 8,1 587779 1261584 /usr/
lib/qpid/client/sslconnector.so
qpidd 12366 qpidd mem REG 8,1 14008 1214096 /usr/
lib/libplc4.so
qpidd 12366 qpidd mem REG 8,1 515215 1266312 /usr/
lib/qpid/daemon/replication_exchange.so
qpidd 12366 qpidd mem REG 8,1 558508 1266316 /usr/
lib/qpid/daemon/ssl.so
qpidd 12366 qpidd mem REG 8,1 1187124 1214050 /usr/
lib/libnss3.so
qpidd 12366 qpidd mem REG 8,1 4155068 1215591 /usr/
lib/libxerces-c.so.27.0
qpidd 12366 qpidd mem REG 8,1 3957632 1215596 /usr/
lib/libxqilla.so.3.0.0
qpidd 12366 qpidd mem REG 8,96 16384 402653317 /vol/
qpidd/rhm/dat/__db.005
qpidd 12366 qpidd mem REG 8,96 450560 402653316 /vol/
qpidd/rhm/dat/__db.004
qpidd 12366 qpidd mem REG 8,96 294912 402653315 /vol/
qpidd/rhm/dat/__db.003
qpidd 12366 qpidd mem REG 8,96 278528 402653314 /vol/
qpidd/rhm/dat/__db.002
qpidd 12366 qpidd mem REG 8,96 24576 402653313 /vol/
qpidd/rhm/dat/__db.001
qpidd 12366 qpidd 0u CHR 1,3 778 /dev/
null
qpidd 12366 qpidd 1u CHR 1,3 778 /dev/
null
qpidd 12366 qpidd 2u CHR 1,3 778 /dev/
null
qpidd 12366 qpidd 3r FIFO 0,6 240193 pipe
qpidd 12366 qpidd 4w FIFO 0,6 240193 pipe
qpidd 12366 qpidd 5w REG 8,1 3556296254 98581 /var/
log/qpid/sfbroker.log
qpidd 12366 qpidd 6r CHR 1,9 1065 /dev/
urandom
qpidd 12366 qpidd 7r 0000 0,10 0 240197
eventpoll
qpidd 12366 qpidd 8w FIFO 0,6 240195 pipe
qpidd 12366 qpidd 9u unix 0xed2f5040 240196 socket
qpidd 12366 qpidd 10wW REG 8,96 0 131 /vol/
qpidd/lock
qpidd 12366 qpidd 11r FIFO 0,6 240198 pipe
qpidd 12366 qpidd 12w FIFO 0,6 240198 pipe
qpidd 12366 qpidd 13u REG 8,96 8192 402653319 /vol/
qpidd/rhm/dat/queues.db
qpidd 12366 qpidd 14u REG 8,96 8192 402653320 /vol/
qpidd/rhm/dat/config.db
qpidd 12366 qpidd 15u REG 8,96 8192 402653321 /vol/
qpidd/rhm/dat/exchanges.db
qpidd 12366 qpidd 16u REG 8,96 8192 402653322 /vol/
qpidd/rhm/dat/messages.db
qpidd 12366 qpidd 17u REG 8,96 8192 402653323 /vol/
qpidd/rhm/dat/mappings.db
qpidd 12366 qpidd 18u REG 8,96 8192 402653324 /vol/
qpidd/rhm/dat/bindings.db
qpidd 12366 qpidd 19u REG 8,96 8192 402653325 /vol/
qpidd/rhm/dat/general.db
qpidd 12366 qpidd 20u REG 8,96 10485760 402653318 /vol/
qpidd/rhm/dat/log.0000000001
qpidd 12366 qpidd 21r REG 8,96 1342177792 503316610 /vol/
qpidd/rhm/jrnl/000c/hits/JournalData.0001.jdat
On Mar 11, 2010, at 12:15 PM, Kim van der Riet wrote:
A few questions:
1. What versions of the store and qpid are you using?
2. Are your messages persistent? Flow-to-disk will force transient
messages to disk, but will discard them on recovery; persistent
messages
are written to disk anyway, but have their message content discarded
when the policy limit is reached. I guess the latter is the case here.
This behavior should be checked out... it looks suspicious.
On Thu, 2010-03-11 at 11:01 -0800, Charles Woerner wrote:
A bit more information, at about the time Journal recovery phase 1 is
complete I see one of the two qpidd processes begin to acquire vast
amounts of memory (I assume this is the phase where it begins reading
up to max-queue-size of persistent queue data into memory). Once the
process acquires about 97% of available memory the system usually
dies
with the "critical Broker start-up failed: Cannot read from child
process.".
I'm not surprised that the system dies when it runs out of memory.
But I hope someone can help me understand why qpidd is using so much
memory and whether I have my max-queue-size and store geometry
configured in a reasonable manner. I would hope that there isn't
some
inherent limitation where one can't recover more data from a
persistent journal than one has available memory.
On Mar 10, 2010, at 10:55 PM, Charles Woerner wrote:
To follow up, it always fails at startup with this large queue,
but I
don't always see the "Broker startup failed: cannot read from child
process" error. Sometimes it just dies.
__
Charles Woerner | [email protected] | demandbase
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]
__
Charles Woerner | [email protected] | demandbase |
415.683.2669