Thanks Kim,

1) I'm using qpid from RedHat MRG 0.5.752581-34 for i386 and the store module is RHM 0.5.3206-27, also for i386.

2) Yes, my messages are persistent. I am aware that transient messages are discarded on recovery. Your comment that durable messages "have their message content discarded when the policy limit is reached" is surprising. I have an 80 GB store and a 1 GB queue with flow-to-disk. My expectation was that I would be able to continue to enqueue durable messages past the 1GB mark (up to 80% of 80 GB) and only the first (or latest, not sure) 1GB of queue data would be held in memory. I expected the dequeues would sort of "backfill" the 1GB memory buffer until the "flowed-to-disk" messages were consumed from the store. I recognize that that this is sort of dual-purposing the backing store to do both journaling and spillover, but that doesn't seem too far fetched. I'm beginning to think this is incorrect. Is there some kind of identity relating available memory to store geometry to queue size (ie. is the following a good rule of thumb "available memory" = "store size" = "sum(max-queue-size of all queues)")?

Also, in an attempt to (slowly) recover the queue I allocated 2 GB swap. This staved off the "Cannot read from child process" error with a c++ bad_alloc exception which I could see as a debug statement in the log. However, 2 hours later the system was still in the same unresponsive state (seemingly trying to complete recovery). It had not yet begun listening on 5672. I captured the lsof for qpidd user in case that might also be helpful. Yes, the sfbroker.log file is big and no, truncating it doesn't alleviate the problem :)

~]# lsof -u qpidd
COMMAND   PID  USER   FD   TYPE     DEVICE       SIZE      NODE NAME
qpidd   12365 qpidd  cwd    DIR        8,1       4096    573441 /root
qpidd   12365 qpidd  rtd    DIR        8,1       4096         2 /
qpidd 12365 qpidd txt REG 8,1 76165 1299146 /usr/ sbin/qpidd qpidd 12365 qpidd mem REG 8,1 18031759 1215641 /usr/ lib/libqpidcommon.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 14644 786449 /lib/ libdl-2.5.so qpidd 12365 qpidd mem REG 8,1 97512 1214114 /usr/ lib/libsasl2.so.2.0.22 qpidd 12365 qpidd mem REG 8,1 206396 786499 /lib/ i686/nosegneg/libm-2.5.so qpidd 12365 qpidd mem REG 8,1 44992 786703 /lib/ libgcc_s-4.1.2-20080825.so.1 qpidd 12365 qpidd mem REG 8,1 1620400 789604 /lib/ i686/nosegneg/libc-2.5.so qpidd 12365 qpidd mem REG 8,1 127720 786500 /lib/ i686/nosegneg/libpthread-2.5.so qpidd 12365 qpidd mem REG 8,1 74616 786483 /lib/ libresolv-2.5.so qpidd 12365 qpidd mem REG 8,1 43544 786447 /lib/ libcrypt-2.5.so qpidd 12365 qpidd mem REG 8,1 2632 1213919 /usr/ lib/libaio.so.1.0.1 qpidd 12365 qpidd mem REG 8,1 15584 1247077 /usr/ lib/openais/libcpg.so.2.0.0 qpidd 12365 qpidd mem REG 8,1 14144 786738 /lib/ libuuid.so.1.2 qpidd 12365 qpidd mem REG 8,1 16068 1215646 /usr/ lib/libcman.so.2.0.115 qpidd 12365 qpidd mem REG 8,1 42068 786501 /lib/ i686/nosegneg/librt-2.5.so qpidd 12365 qpidd mem REG 8,1 62328 1215614 /usr/ lib/libboost_filesystem.so.1.33.1 qpidd 12365 qpidd mem REG 8,1 124432 786503 /lib/ ld-2.5.so qpidd 12365 qpidd mem REG 8,1 625546 1266311 /usr/ lib/qpid/daemon/replicating_listener.so qpidd 12365 qpidd mem REG 8,1 213760 1215620 /usr/ lib/libboost_program_options.so.1.33.1 qpidd 12365 qpidd mem REG 8,1 1127841 1266315 /usr/ lib/qpid/daemon/msgstore.so qpidd 12365 qpidd mem REG 8,1 946263 1215643 /usr/ lib/libsslcommon.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 96924 1214052 /usr/ lib/libnssutil3.so qpidd 12365 qpidd mem REG 8,1 925520 1214126 /usr/ lib/libstdc++.so.6.0.8 qpidd 12365 qpidd mem REG 8,1 1109436 1213951 /usr/ lib/libdb_cxx-4.3.so qpidd 12365 qpidd mem REG 8,1 203340 1214124 /usr/ lib/libssl3.so qpidd 12365 qpidd mem REG 8,1 839781 1266317 /usr/ lib/qpid/daemon/xml.so qpidd 12365 qpidd mem REG 8,1 16636786 1215692 /usr/ lib/libqpidbroker.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 226512 1214049 /usr/ lib/libnspr4.so qpidd 12365 qpidd mem REG 8,1 6270148 1266314 /usr/ lib/qpid/daemon/cluster.so qpidd 12365 qpidd mem REG 8,1 9912 1214097 /usr/ lib/libplds4.so qpidd 12365 qpidd mem REG 8,1 6782769 1215639 /usr/ lib/libqpidclient.so.0.1.0 qpidd 12365 qpidd mem REG 8,1 587779 1261584 /usr/ lib/qpid/client/sslconnector.so qpidd 12365 qpidd mem REG 8,1 14008 1214096 /usr/ lib/libplc4.so qpidd 12365 qpidd mem REG 8,1 515215 1266312 /usr/ lib/qpid/daemon/replication_exchange.so qpidd 12365 qpidd mem REG 8,1 558508 1266316 /usr/ lib/qpid/daemon/ssl.so qpidd 12365 qpidd mem REG 8,1 1187124 1214050 /usr/ lib/libnss3.so qpidd 12365 qpidd mem REG 8,1 4155068 1215591 /usr/ lib/libxerces-c.so.27.0 qpidd 12365 qpidd mem REG 8,1 3957632 1215596 /usr/ lib/libxqilla.so.3.0.0 qpidd 12365 qpidd 0u CHR 136,0 2 /dev/ pts/0 qpidd 12365 qpidd 1u CHR 136,0 2 /dev/ pts/0 qpidd 12365 qpidd 2u CHR 136,0 2 /dev/ pts/0
qpidd   12365 qpidd    3r  FIFO        0,6               240193 pipe
qpidd   12365 qpidd    4w  FIFO        0,6               240193 pipe
qpidd 12365 qpidd 5w REG 8,1 3556296254 98581 /var/ log/qpid/sfbroker.log qpidd 12365 qpidd 6r CHR 1,9 1065 /dev/ urandom
qpidd   12365 qpidd    7r  FIFO        0,6               240195 pipe
qpidd 12366 qpidd cwd DIR 8,1 4096 557058 /var/ lib/qpidd/.qpidd
qpidd   12366 qpidd  rtd    DIR        8,1       4096         2 /
qpidd 12366 qpidd txt REG 8,1 76165 1299146 /usr/ sbin/qpidd qpidd 12366 qpidd mem REG 8,1 18031759 1215641 /usr/ lib/libqpidcommon.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 14644 786449 /lib/ libdl-2.5.so qpidd 12366 qpidd mem REG 8,1 97512 1214114 /usr/ lib/libsasl2.so.2.0.22 qpidd 12366 qpidd mem REG 8,1 206396 786499 /lib/ i686/nosegneg/libm-2.5.so qpidd 12366 qpidd mem REG 8,1 44992 786703 /lib/ libgcc_s-4.1.2-20080825.so.1 qpidd 12366 qpidd mem REG 8,1 1620400 789604 /lib/ i686/nosegneg/libc-2.5.so qpidd 12366 qpidd mem REG 8,1 127720 786500 /lib/ i686/nosegneg/libpthread-2.5.so qpidd 12366 qpidd mem REG 8,1 74616 786483 /lib/ libresolv-2.5.so qpidd 12366 qpidd mem REG 8,1 43544 786447 /lib/ libcrypt-2.5.so qpidd 12366 qpidd mem REG 8,1 2632 1213919 /usr/ lib/libaio.so.1.0.1 qpidd 12366 qpidd mem REG 8,1 15584 1247077 /usr/ lib/openais/libcpg.so.2.0.0 qpidd 12366 qpidd mem REG 8,1 14144 786738 /lib/ libuuid.so.1.2 qpidd 12366 qpidd mem REG 8,1 16068 1215646 /usr/ lib/libcman.so.2.0.115 qpidd 12366 qpidd mem REG 8,1 42068 786501 /lib/ i686/nosegneg/librt-2.5.so qpidd 12366 qpidd mem REG 8,1 62328 1215614 /usr/ lib/libboost_filesystem.so.1.33.1 qpidd 12366 qpidd mem REG 8,1 124432 786503 /lib/ ld-2.5.so qpidd 12366 qpidd mem REG 8,1 625546 1266311 /usr/ lib/qpid/daemon/replicating_listener.so qpidd 12366 qpidd mem REG 8,1 213760 1215620 /usr/ lib/libboost_program_options.so.1.33.1 qpidd 12366 qpidd mem REG 8,1 1127841 1266315 /usr/ lib/qpid/daemon/msgstore.so qpidd 12366 qpidd mem REG 8,1 946263 1215643 /usr/ lib/libsslcommon.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 96924 1214052 /usr/ lib/libnssutil3.so qpidd 12366 qpidd mem REG 8,1 925520 1214126 /usr/ lib/libstdc++.so.6.0.8 qpidd 12366 qpidd mem REG 8,1 1109436 1213951 /usr/ lib/libdb_cxx-4.3.so qpidd 12366 qpidd mem REG 8,1 203340 1214124 /usr/ lib/libssl3.so qpidd 12366 qpidd mem REG 8,1 839781 1266317 /usr/ lib/qpid/daemon/xml.so qpidd 12366 qpidd mem REG 8,1 16636786 1215692 /usr/ lib/libqpidbroker.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 226512 1214049 /usr/ lib/libnspr4.so qpidd 12366 qpidd mem REG 8,1 6270148 1266314 /usr/ lib/qpid/daemon/cluster.so qpidd 12366 qpidd mem REG 8,1 9912 1214097 /usr/ lib/libplds4.so qpidd 12366 qpidd mem REG 8,1 6782769 1215639 /usr/ lib/libqpidclient.so.0.1.0 qpidd 12366 qpidd mem REG 8,1 587779 1261584 /usr/ lib/qpid/client/sslconnector.so qpidd 12366 qpidd mem REG 8,1 14008 1214096 /usr/ lib/libplc4.so qpidd 12366 qpidd mem REG 8,1 515215 1266312 /usr/ lib/qpid/daemon/replication_exchange.so qpidd 12366 qpidd mem REG 8,1 558508 1266316 /usr/ lib/qpid/daemon/ssl.so qpidd 12366 qpidd mem REG 8,1 1187124 1214050 /usr/ lib/libnss3.so qpidd 12366 qpidd mem REG 8,1 4155068 1215591 /usr/ lib/libxerces-c.so.27.0 qpidd 12366 qpidd mem REG 8,1 3957632 1215596 /usr/ lib/libxqilla.so.3.0.0 qpidd 12366 qpidd mem REG 8,96 16384 402653317 /vol/ qpidd/rhm/dat/__db.005 qpidd 12366 qpidd mem REG 8,96 450560 402653316 /vol/ qpidd/rhm/dat/__db.004 qpidd 12366 qpidd mem REG 8,96 294912 402653315 /vol/ qpidd/rhm/dat/__db.003 qpidd 12366 qpidd mem REG 8,96 278528 402653314 /vol/ qpidd/rhm/dat/__db.002 qpidd 12366 qpidd mem REG 8,96 24576 402653313 /vol/ qpidd/rhm/dat/__db.001 qpidd 12366 qpidd 0u CHR 1,3 778 /dev/ null qpidd 12366 qpidd 1u CHR 1,3 778 /dev/ null qpidd 12366 qpidd 2u CHR 1,3 778 /dev/ null
qpidd   12366 qpidd    3r  FIFO        0,6               240193 pipe
qpidd   12366 qpidd    4w  FIFO        0,6               240193 pipe
qpidd 12366 qpidd 5w REG 8,1 3556296254 98581 /var/ log/qpid/sfbroker.log qpidd 12366 qpidd 6r CHR 1,9 1065 /dev/ urandom qpidd 12366 qpidd 7r 0000 0,10 0 240197 eventpoll
qpidd   12366 qpidd    8w  FIFO        0,6               240195 pipe
qpidd   12366 qpidd    9u  unix 0xed2f5040               240196 socket
qpidd 12366 qpidd 10wW REG 8,96 0 131 /vol/ qpidd/lock
qpidd   12366 qpidd   11r  FIFO        0,6               240198 pipe
qpidd   12366 qpidd   12w  FIFO        0,6               240198 pipe
qpidd 12366 qpidd 13u REG 8,96 8192 402653319 /vol/ qpidd/rhm/dat/queues.db qpidd 12366 qpidd 14u REG 8,96 8192 402653320 /vol/ qpidd/rhm/dat/config.db qpidd 12366 qpidd 15u REG 8,96 8192 402653321 /vol/ qpidd/rhm/dat/exchanges.db qpidd 12366 qpidd 16u REG 8,96 8192 402653322 /vol/ qpidd/rhm/dat/messages.db qpidd 12366 qpidd 17u REG 8,96 8192 402653323 /vol/ qpidd/rhm/dat/mappings.db qpidd 12366 qpidd 18u REG 8,96 8192 402653324 /vol/ qpidd/rhm/dat/bindings.db qpidd 12366 qpidd 19u REG 8,96 8192 402653325 /vol/ qpidd/rhm/dat/general.db qpidd 12366 qpidd 20u REG 8,96 10485760 402653318 /vol/ qpidd/rhm/dat/log.0000000001 qpidd 12366 qpidd 21r REG 8,96 1342177792 503316610 /vol/ qpidd/rhm/jrnl/000c/hits/JournalData.0001.jdat

On Mar 11, 2010, at 12:15 PM, Kim van der Riet wrote:

A few questions:

1. What versions of the store and qpid are you using?

2. Are your messages persistent? Flow-to-disk will force transient
messages to disk, but will discard them on recovery; persistent messages
are written to disk anyway, but have their message content discarded
when the policy limit is reached. I guess the latter is the case here.

This behavior should be checked out... it looks suspicious.

On Thu, 2010-03-11 at 11:01 -0800, Charles Woerner wrote:
A bit more information, at about the time Journal recovery phase 1 is
complete I see one of the two qpidd processes begin to acquire vast
amounts of memory (I assume this is the phase where it begins reading
up to max-queue-size of persistent queue data into memory).  Once the
process acquires about 97% of available memory the system usually dies
with the "critical Broker start-up failed: Cannot read from child
process.".

I'm not surprised that the system dies when it runs out of memory.
But I hope someone can help me understand why qpidd is using so much
memory and whether I have my max-queue-size and store geometry
configured in a reasonable manner. I would hope that there isn't some
inherent limitation where one can't recover more data from a
persistent journal than one has available memory.

On Mar 10, 2010, at 10:55 PM, Charles Woerner wrote:

To follow up, it always fails at startup with this large queue, but I
don't always see the "Broker startup failed: cannot read from child
process" error.  Sometimes it just dies.


__

Charles Woerner  | [email protected] |   demandbase



---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]


__

Charles Woerner | [email protected] | demandbase | 415.683.2669






Reply via email to