Hello,
I've run into a problem running some tests to probe various aspects of
QPID when all consumers have been down for extended periods of time (2
hours). I'm enqueuing data at a steady rate of about 450 enqueues per
second for 2 hours. My queue is a large durable queue (~ 80GB) using
the msgstore persistence mechanism. I have set max-session-rate=0 and
limit-policy=flow-to-disk. I should point out that QPID has been
running smoothly for 6 months, so I'm confident that the system is
stable under it's normal load. Also, my expectation based on my past
experience with QPID is that something is simply misconfigured, not
that there is a bug in QPID. We're using QPID 0.5.752581-34 provided
by RedHat (MRG).
The two problems I'm seeing are:
1) After enqueuing 2 hours worth of data at 450 enqueues/sec I am
unable to startup the broker after cleanly shutting it down. After
~10 minutes into the recovery sequence at startup (after Journal
recovery phase 1) it fails with the message "Broker startup failed:
cannot read from child process". Below is the relevant trace output
from the log.
2) qpid seems to be using more memory than I expected. I have
configured the max-queue-size as 108576000 (100x the default of 10MB,
or 1GB) with a policy of flow-to-disk. My understanding is that flow-
to-disk should allow me to continue to accept and serialize enqueues
beyond the max-queue-size limit and that everything up to the ${max-
queue-size} amount of data will be held in memory. Indeed, it appears
to work that way, best I can tell. Given those settings and my
experience with QPID I expected to see a little over 1 GB of memory
being used, but I see it's way more - over 1.6 GB of memory. Is there
a simple formula for approximating how much memory a given QPID
instance will use based on the number of queues and various
configuration options, or should I just experiment?
]$ top
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7699 qpidd 16 0 2548m 1.6g 1408 S 0.3 94.0 7:39.27 qpidd
...
]$ free
total used free shared buffers
cached
Mem: 1747764 1739964 7800 0 116
7580
-/+ buffers/cache: 1732268 15496
Swap: 917496 909268 8228
I would be very thankful if someone more knowledgeable than I about
QPID were able to take a look at my configuration and help me
determine what the problem is and how to work around it. Thanks in
advance.
Our qpidd.conf file is:
daemon=yes
log-to-file=/var/log/qpid/sfbroker.log
load-module=/usr/lib/qpid/daemon/msgstore.so
data-dir=/vol/qpidd
num-jfiles=64
jfile-size-pgs=20480
wait=1800
auth=no
trace=yes
max-session-rate=0
The broker queue is called "hits" and was created using the following
code. My understanding is that max-queue-size + flow-to-disk should
give me approximately 1GB of queue data in memory, with the older
items flowing to a disk store. The actual limit of the entire queue
should be limited to 80% of the journal size. If anyone is obliged
I'd love some feedback as to whether alternate disk geometries would
provide for a faster queue setup, recovery, and startup time.
qpid-config add queue hits --durable --file-count 64 --file-size
20480 --max-queue-size 1048576000 --limit-policy flow-to-disk
qpid-config bind amq.direct hits /hits
A trace of the log at startup follows:
Starting Qpid AMQP daemon: 2010-mar-10 22:02:12 info Loaded Module: /
usr/lib/qpid/daemon/msgstore.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/
replicating_listener.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ssl.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/
msgstore.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/
replicating_listener.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ssl.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/xml.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/
replication_exchange.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/
msgstore.so
2010-mar-10 22:02:13 info SSL connector not enabled, you must set
QPID_SSL_CERT_DB to enable it.
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/client/
sslconnector.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/cluster.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/xml.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/
replication_exchange.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/
msgstore.so
2010-mar-10 22:02:13 info SSL connector not enabled, you must set
QPID_SSL_CERT_DB to enable it.
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/client/
sslconnector.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/cluster.so
2010-mar-10 22:02:13 debug Forked daemon child process
2010-mar-10 22:02:13 info Management enabled
2010-mar-10 22:02:13 debug ManagementBroker restored broker ID:
d87970a3-39b8-41ae-bf7f-38c460ae9795
2010-mar-10 22:02:13 debug ManagementBroker boot sequence: 6
2010-mar-10 22:02:13 debug ManagementBroker added package
org.apache.qpid.broker
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:system
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:broker
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:agent
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:vhost
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:queue
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:exchange
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:binding
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:connection
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:link
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:bridge
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:session
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:clientConnect
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:clientConnectFail
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:clientDisconnect
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:brokerLinkUp
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:brokerLinkDown
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:queueDeclare
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:queueDelete
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:exchangeDeclare
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:exchangeDelete
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:bind
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:unbind
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:subscribe
2010-mar-10 22:02:13 debug ManagementBroker added class
org.apache.qpid.broker:unsubscribe
2010-mar-10 22:02:13 notice Journal "TplStore": Created
2010-mar-10 22:02:13 debug Journal "TplStore": Journal directory = "/
vol/qpidd/rhm/tpl/"; Base file name = "tpl"
2010-mar-10 22:02:13 notice Store module initialized; dir=/vol/qpidd
2010-mar-10 22:02:13 info > Default files per journal: 64
2010-mar-10 22:02:13 info > Default journal file size: 20480 (wpgs)
2010-mar-10 22:02:13 info > Default write cache page size: 32 (Kib)
2010-mar-10 22:02:13 info > Default number of write cache pages: 32
2010-mar-10 22:02:13 info > TPL files per journal: 8
2010-mar-10 22:02:13 info > TPL journal file size: 24 (wpgs)
2010-mar-10 22:02:13 info > TPL write cache page size: 4 (Kib)
2010-mar-10 22:02:13 info > TPL number of write cache pages: 64
2010-mar-10 22:02:13 debug Configured queue hits with no-local=0
2010-mar-10 22:02:13 debug Configured queue hits with qpid.trace.id=''
and qpid.trace.exclude='' i.e. 0 elements
2010-mar-10 22:02:13 notice Journal "hits": Created
2010-mar-10 22:02:13 debug Journal "hits": Journal directory = "/vol/
qpidd/rhm/jrnl/000c/hits/"; Base file name = "JournalData"
2010-mar-10 22:02:13 debug Journal "hits": Recover; num_jfiles=64
jfsize_sblks=2621440 queue_id = 0x2 wcache_pgsize_sblks=64
wcache_num_pages=32
2010-mar-10 22:09:59 debug Journal "hits": Recover file analysis
(jid="hits"): njf=64 ae=F aemjf=0 owi=F frot=T jempty=F ffid=0
fro=0x200 (4 dblks) lfid=2 eo=0x3a1fc800 (7618448 dblks)
h_rid=0x4322eb lffull=F jfull=F Enqueued records (txn & non-txn):
[ fid_00=0 fid_01=1208884 fid_02=903881 fid_03=0 fid_04=0 fid_05=0
fid_06=0 fid_07=0 fid_08=0 fid_09=0 fid_10=0 fid_11=0 fid_12=0
fid_13=0 fid_14=0 fid_15=0 fid_16=0 fid_17=0 fid_18=0 fid_19=0
fid_20=0 fid_21=0 fid_22=0 fid_23=0 fid_24=0 fid_25=0 fid_26=0
fid_27=0 fid_28=0 fid_29=0 fid_30=0 fid_31=0 fid_32=0 fid_33=0
fid_34=0 fid_35=0 fid_36=0 fid_37=0 fid_38=0 fid_39=0 fid_40=0
fid_41=0 fid_42=0 fid_43=0 fid_44=0 fid_45=0 fid_46=0 fid_47=0
fid_48=0 fid_49=0 fid_50=0 fid_51=0 fid_52=0 fid_53=0 fid_54=0
fid_55=0 fid_56=0 fid_57=0 fid_58=0 fid_59=0 fid_60=0 fid_61=0
fid_62=0 fid_63=0 ]
2010-mar-10 22:09:59 debug Journal "hits": Recover phase 1 complete;
highest rid found = 0x4322eb; emap.size=2112765; tmap.size=0; journal
now read-only.
2010-mar-10 22:11:44 debug Exception constructed: Cannot read from
child process.
2010-mar-10 22:11:44 critical Broker start-up failed: Cannot read from
child process.
__
Charles Woerner | [email protected] | demandbase |
415.683.2669