Hello,

I've run into a problem running some tests to probe various aspects of QPID when all consumers have been down for extended periods of time (2 hours). I'm enqueuing data at a steady rate of about 450 enqueues per second for 2 hours. My queue is a large durable queue (~ 80GB) using the msgstore persistence mechanism. I have set max-session-rate=0 and limit-policy=flow-to-disk. I should point out that QPID has been running smoothly for 6 months, so I'm confident that the system is stable under it's normal load. Also, my expectation based on my past experience with QPID is that something is simply misconfigured, not that there is a bug in QPID. We're using QPID 0.5.752581-34 provided by RedHat (MRG).

The two problems I'm seeing are:

1) After enqueuing 2 hours worth of data at 450 enqueues/sec I am unable to startup the broker after cleanly shutting it down. After ~10 minutes into the recovery sequence at startup (after Journal recovery phase 1) it fails with the message "Broker startup failed: cannot read from child process". Below is the relevant trace output from the log.

2) qpid seems to be using more memory than I expected. I have configured the max-queue-size as 108576000 (100x the default of 10MB, or 1GB) with a policy of flow-to-disk. My understanding is that flow- to-disk should allow me to continue to accept and serialize enqueues beyond the max-queue-size limit and that everything up to the ${max- queue-size} amount of data will be held in memory. Indeed, it appears to work that way, best I can tell. Given those settings and my experience with QPID I expected to see a little over 1 GB of memory being used, but I see it's way more - over 1.6 GB of memory. Is there a simple formula for approximating how much memory a given QPID instance will use based on the number of queues and various configuration options, or should I just experiment?

        ]$ top
        ...
        PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        7699 qpidd     16   0 2548m 1.6g 1408 S  0.3 94.0   7:39.27 qpidd
        ...

        ]$ free
total used free shared buffers cached Mem: 1747764 1739964 7800 0 116 7580
        -/+ buffers/cache:    1732268      15496
        Swap:       917496     909268       8228

I would be very thankful if someone more knowledgeable than I about QPID were able to take a look at my configuration and help me determine what the problem is and how to work around it. Thanks in advance.

Our qpidd.conf file is:

        daemon=yes
        log-to-file=/var/log/qpid/sfbroker.log
        load-module=/usr/lib/qpid/daemon/msgstore.so
        data-dir=/vol/qpidd
        num-jfiles=64
        jfile-size-pgs=20480
        wait=1800
        auth=no
        trace=yes
        max-session-rate=0

The broker queue is called "hits" and was created using the following code. My understanding is that max-queue-size + flow-to-disk should give me approximately 1GB of queue data in memory, with the older items flowing to a disk store. The actual limit of the entire queue should be limited to 80% of the journal size. If anyone is obliged I'd love some feedback as to whether alternate disk geometries would provide for a faster queue setup, recovery, and startup time.

qpid-config add queue hits --durable --file-count 64 --file-size 20480 --max-queue-size 1048576000 --limit-policy flow-to-disk
        qpid-config bind amq.direct hits /hits

A trace of the log at startup follows:

Starting Qpid AMQP daemon: 2010-mar-10 22:02:12 info Loaded Module: / usr/lib/qpid/daemon/msgstore.so 2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ replicating_listener.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ssl.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ msgstore.so 2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ replicating_listener.so
2010-mar-10 22:02:12 info Loaded Module: /usr/lib/qpid/daemon/ssl.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/xml.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/ replication_exchange.so 2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/ msgstore.so 2010-mar-10 22:02:13 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it. 2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/client/ sslconnector.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/cluster.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/xml.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/ replication_exchange.so 2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/ msgstore.so 2010-mar-10 22:02:13 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it. 2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/client/ sslconnector.so
2010-mar-10 22:02:13 info Loaded Module: /usr/lib/qpid/daemon/cluster.so
2010-mar-10 22:02:13 debug Forked daemon child process
2010-mar-10 22:02:13 info Management enabled
2010-mar-10 22:02:13 debug ManagementBroker restored broker ID: d87970a3-39b8-41ae-bf7f-38c460ae9795
2010-mar-10 22:02:13 debug ManagementBroker boot sequence: 6
2010-mar-10 22:02:13 debug ManagementBroker added package org.apache.qpid.broker 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:system 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:broker 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:agent 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:vhost 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:queue 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:exchange 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:binding 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:connection 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:link 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:bridge 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:session 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:clientConnect 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:clientConnectFail 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:clientDisconnect 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:brokerLinkUp 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:brokerLinkDown 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:queueDeclare 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:queueDelete 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:exchangeDeclare 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:exchangeDelete 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:bind 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:unbind 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:subscribe 2010-mar-10 22:02:13 debug ManagementBroker added class org.apache.qpid.broker:unsubscribe
2010-mar-10 22:02:13 notice Journal "TplStore": Created
2010-mar-10 22:02:13 debug Journal "TplStore": Journal directory = "/ vol/qpidd/rhm/tpl/"; Base file name = "tpl"
2010-mar-10 22:02:13 notice Store module initialized; dir=/vol/qpidd
2010-mar-10 22:02:13 info > Default files per journal: 64
2010-mar-10 22:02:13 info > Default journal file size: 20480 (wpgs)
2010-mar-10 22:02:13 info > Default write cache page size: 32 (Kib)
2010-mar-10 22:02:13 info > Default number of write cache pages: 32
2010-mar-10 22:02:13 info > TPL files per journal: 8
2010-mar-10 22:02:13 info > TPL journal file size: 24 (wpgs)
2010-mar-10 22:02:13 info > TPL write cache page size: 4 (Kib)
2010-mar-10 22:02:13 info > TPL number of write cache pages: 64
2010-mar-10 22:02:13 debug Configured queue hits with no-local=0
2010-mar-10 22:02:13 debug Configured queue hits with qpid.trace.id='' and qpid.trace.exclude='' i.e. 0 elements
2010-mar-10 22:02:13 notice Journal "hits": Created
2010-mar-10 22:02:13 debug Journal "hits": Journal directory = "/vol/ qpidd/rhm/jrnl/000c/hits/"; Base file name = "JournalData" 2010-mar-10 22:02:13 debug Journal "hits": Recover; num_jfiles=64 jfsize_sblks=2621440 queue_id = 0x2 wcache_pgsize_sblks=64 wcache_num_pages=32 2010-mar-10 22:09:59 debug Journal "hits": Recover file analysis (jid="hits"): njf=64 ae=F aemjf=0 owi=F frot=T jempty=F ffid=0 fro=0x200 (4 dblks) lfid=2 eo=0x3a1fc800 (7618448 dblks) h_rid=0x4322eb lffull=F jfull=F Enqueued records (txn & non-txn): [ fid_00=0 fid_01=1208884 fid_02=903881 fid_03=0 fid_04=0 fid_05=0 fid_06=0 fid_07=0 fid_08=0 fid_09=0 fid_10=0 fid_11=0 fid_12=0 fid_13=0 fid_14=0 fid_15=0 fid_16=0 fid_17=0 fid_18=0 fid_19=0 fid_20=0 fid_21=0 fid_22=0 fid_23=0 fid_24=0 fid_25=0 fid_26=0 fid_27=0 fid_28=0 fid_29=0 fid_30=0 fid_31=0 fid_32=0 fid_33=0 fid_34=0 fid_35=0 fid_36=0 fid_37=0 fid_38=0 fid_39=0 fid_40=0 fid_41=0 fid_42=0 fid_43=0 fid_44=0 fid_45=0 fid_46=0 fid_47=0 fid_48=0 fid_49=0 fid_50=0 fid_51=0 fid_52=0 fid_53=0 fid_54=0 fid_55=0 fid_56=0 fid_57=0 fid_58=0 fid_59=0 fid_60=0 fid_61=0 fid_62=0 fid_63=0 ] 2010-mar-10 22:09:59 debug Journal "hits": Recover phase 1 complete; highest rid found = 0x4322eb; emap.size=2112765; tmap.size=0; journal now read-only. 2010-mar-10 22:11:44 debug Exception constructed: Cannot read from child process. 2010-mar-10 22:11:44 critical Broker start-up failed: Cannot read from child process.


__

Charles Woerner | [email protected] | demandbase | 415.683.2669






Reply via email to