Thank you, yes we have multiple produces and multiple consumes accessing
the same queue at same time.  My message size is around ~80k. Small
confusion regarding AIO, if its AIO it shouldn't have io wait right? why is
it waiting?

Was reading about aio and found this, i am hitting this scenario?

"AIO read and write on files opened without O_DIRECT (i.e. normal buffered
filesystem AIO). On ext2, ext3, jfs, xfs and nfs, these do not return an
explicit error, but quietly default to sync" as in  <goog_793797107>
http://lse.sourceforge.net/io/aio.html

We have enabled trace loging and here is the full exception stack, see if
it helps you to identify the issue.

2018-11-06 18:04:33 [Broker] debug clean(): 10 messages remain; head is now
0
2018-11-06 18:04:38 [Store] critical Linear Store: Journal
"ax-q-axgroup-001-consumer-group-001": get_events() returned
JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=31 pc=2 po=0 aer=1 edac=TFFF
ps=[-------------------------------A]

2018-11-06 18:04:38 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:38 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:38 [System] debug Exception constructed: Queue
<queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)

2018-11-06 18:04:39 [System] debug Exception constructed: Queue
<queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)

2018-11-06 18:04:39 [Broker] warning Exchange <exchange-name> cannot
deliver to  Queue <queue-name>: Queue <queue-name>:
MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)

2018-11-06 18:04:39 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [Broker] error Connection exception: framing-error:
Queue <queue-name>: async_dequeue() failed: jexception 0x0000
(RHM_IORES_BUSY: dequeue while part way through another op: _enq_busy=T
_abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [Broker] error Connection exception: framing-error:
Queue <queue-name>: async_dequeue() failed: jexception 0x0000
(RHM_IORES_BUSY: dequeue while part way through another op: _enq_busy=T
_abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [Protocol] error Connection
qpid.<server>:5672-<server>:33484 closed by error: Queue <queue-name>:
async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY: dequeue while
part way through another op: _enq_busy=T _abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)(501)

2018-11-06 18:04:39 [Protocol] error Connection
qpid.<server>:5672-<client>:40333 closed by error: Queue <queue-name>:
async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY: dequeue while
part way through another op: _enq_busy=T _abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)(501)

2018-11-06 18:04:39 [System] debug Exception constructed: Value for
replyText is too large

2018-11-06 18:04:39 [Protocol] error Connection
qpid.<server>:5672-<client>:40333 closed by error: illegal-argument: Value
for replyText is too large(320)


Thanks a lot for your help,
Ram


On Thu, Nov 8, 2018 at 11:26 AM Kim van der Riet <[email protected]>
wrote:

> Resending, did not show up on the list the first time I sent it...
>
>
>
> -------- Forwarded Message --------
> Subject:        Re: qpid-cpp-0.35 errors
> Date:   Thu, 8 Nov 2018 09:30:24 -0500
> From:   Kim van der Riet <[email protected]>
> To:     [email protected]
>
>
>
> On 11/7/18 3:18 PM, rammohan ganapavarapu wrote:
> > Kim,
> >
> > Ok, i am still trying to see what part of my java application is causing
> > that issue, yes that issue is happening intermittently. Regarding
> > "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
> > previous error JERR_JCNTL_AIOCMPLWAIT?
> In my mind, it is more likely the other way around. But the logs should
> tell you that. It would be best to start with a clean store before each
> test so you don't inherit issues from a previous test or run.
> >
> > Does message size contribute to this issue?
>
> Yes, but only in the sense that the size alters the packing of the write
> buffers, and the timing of when they are written. Also, the number of
> simultaneous producers and consumers will affect this. In particular,
> when two consumers are simultaneously sending messages to the same
> queue, also if a consumer is consuming from a queue while a producer is
> also sending are going to be the main factors in any race condition such
> as I suspect this is. Playing with those will give clues as to what is
> happening. You could try the following, each time starting with a clean
> store:
>
> 1. Only allowing a single producer, followed by a single consumer (ie
> not at the same time);
>
> 2. Allowing a single producer and a single consumer to operate
> simultaneously;
>
> 3. Allowing multiple producers (I don't know if your use-case has this)
> only
>
> 4. Allowing multiple consumers.
>
> Once you have isolated which scenarios cause the problem, then try
> varying the message size. The answers to these will help isolating where
> the issue is happening.
>
> >
> > Thanks,
> > Ram
> >
> > On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <[email protected]>
> > wrote:
> >
> >> No, they are not.
> >>
> >> These two defines govern the number of sleeps and the sleep time while
> >> waiting for before throwing an exception during recovery only. They do
> >> not play a role during normal operation.
> >>
> >> If you are able to compile the broker code, you can try playing with
> >> these values. But I don't think they will make much difference to the
> >> overall problem. I think some of the other errors you have been seeing
> >> prior to this one are closer to where the real problem lies - such as
> >> the JRNL_WMGR_ENQDISCONT error.
> >>
> >> Do you have a reproducer of any kind? Does this error occur predictably
> >> under some or other conditions?
> >>
> >> Thanks,
> >>
> >> Kim van der Riet
> >>
> >> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
> >>> Kim,
> >>>
> >>> I see these two settings from code, can these be configurable?
> >>>
> >>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>
> >>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms
> >>>
> >>>
> >>> Ram
> >>>
> >>> On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
> >>> [email protected]> wrote:
> >>>
> >>>> Thank you Kim, i will try your suggestions.
> >>>>
> >>>> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <[email protected]
> >> wrote:
> >>>>> This error is a linearstore issue. It looks as though there is a
> >>>>> single
> >>>>> write operation to disk that has become stuck, and is holding up all
> >>>>> further write operations. This happens because there is a fixed
> >> circular
> >>>>> pool of memory pages used for the AIO operations to disk, and when
> one
> >>>>> of these is "busy" (indicated by the A letter in the page state map),
> >>>>> write operations cannot continue until it is cleared. It it does not
> >>>>> clear within a certain time, then an exception is thrown, which
> >>>>> usually
> >>>>> results in the broker closing the connection.
> >>>>>
> >>>>> The events leading up to a "stuck" write operation are complex and
> >>>>> sometimes difficult to reproduce. If you have a reproducer, then I
> >> would
> >>>>> be interested to see it! Even so, the ability to reproduce on another
> >>>>> machine is hard as it depends on such things as disk write speed, the
> >>>>> disk controller characteristics, the number of threads in the thread
> >>>>> pool (ie CPU type), memory and other hardware-related things.
> >>>>>
> >>>>> There are two linearstore parameters that you can try playing with to
> >>>>> see if you can change the behavior of the store:
> >>>>>
> >>>>> wcache-page-size: This sets the size of each page in the write
> buffer.
> >>>>> Larger page size is good for large messages, a smaller size will help
> >> if
> >>>>> you have small messages.
> >>>>>
> >>>>> wchache-num-pages: The total number of pages in the write buffer.
> >>>>>
> >>>>> Use the --help on the broker with the linearstore loaded to see more
> >>>>> details on this. I hope that helps a little.
> >>>>>
> >>>>> Kim van der Riet
> >>>>>
> >>>>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
> >>>>>> Any help in understand why/when broker throws those errors and stop
> >>>>>> receiving message would be appreciated.
> >>>>>>
> >>>>>> Not sure if any kernel tuning or broker tuning needs to be done to
> >>>>>> solve this issue.
> >>>>>>
> >>>>>> Thanks in advance,
> >>>>>> Ram
> >>>>>>
> >>>>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> Also from this log message (store level) it seems like waiting for
> >> AIO
> >>>>> to
> >>>>>>> complete.
> >>>>>>>
> >>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
> "<journal
> >>>>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
> >>>>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
> >>>>>>> ps=[-------------------------A------]
> >>>>>>>
> >>>>>>> page_state ps=[-------------------------A------] where A is
> >>>>> AIO_PENDING
> >>>>>>> aer=1 _aio_evt_rem; ///< Remaining AIO events
> >>>>>>>
> >>>>>>> When there is or there are pending AIO, does broker close the
> >>>>> connection?
> >>>>>>> is there any tuning that can be done to resolve this?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Ram
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
> >>>>>>> [email protected]> wrote:
> >>>>>>>
> >>>>>>>> I was check the code and i see these lines for that AIO timeout.
> >>>>>>>>
> >>>>>>>> case
> >>>>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
> >>>>>>>> if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
> >>>>>>>> THROW_STORE_EXCEPTION("Timeout waiting for
> >> AIO in
> >>>>>>>> MessageStoreImpl::recoverMessages()");
> >>>>>>>> ::usleep(AIO_SLEEP_TIME_US);
> >>>>>>>> break;
> >>>>>>>>
> >>>>>>>> And these are the defaults
> >>>>>>>>
> >>>>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>>>>>>
> >>>>>>>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page
> >> is
> >>>>>>>> waiting for AIO.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> So does page got blocked and its waiting for page availability?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ram
> >>>>>>>>
> >>>>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after
> >>>>>>>>> that
> >>>>> we
> >>>>>>>>> see this message
> >>>>>>>>>
> >>>>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
> >>>>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
> >>>>> offs=0x107680
> >>>>>>>>> (likely journal overwrite boundary); 19 filler record(s)
> required.
> >>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
> >>>>> fid=0x4605b
> >>>>>>>>> offs=0x107680
> >>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover
> >>>>>>>>> phase
> >>>>> logs
> >>>>>>>>> It worked fine for a day and started throwing this message:
> >>>>>>>>>
> >>>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
> >> "<name>":
> >>>>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
> >>>>> pi=25 pc=8
> >>>>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
> >>>>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot
> >>>>>>>>> deliver
> >>>>> to
> >>>>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
> >>>>> failed:
> >>>>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
> >>>>> JERR_JCNTL_AIOCMPLWAIT:
> >>>>>>>>> Timeout waiting for AIOs to complete.
> >>>>>>>>>
> >>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
> >>>>> framing-error:
> >>>>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
> >>>>> 0x0202
> >>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> >>>>> waiting for
> >>>>>>>>> AIOs to complete.
> >>>>>>>>>
> >>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
> >>>>> <queue-name>:
> >>>>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
> >>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> >>>>> waiting for
> >>>>>>>>> AIOs to complete.
> >>>>>>>>>
> >>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> >>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
> >>>>> illegal-argument:
> >>>>>>>>> Value for replyText is too large(320)
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Ram
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
> >>>>>>>>> [email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>> No, local disk.
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <[email protected]>
> >> wrote:
> >>>>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
> >>>>>>>>>>>> Gordon,
> >>>>>>>>>>>>
> >>>>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35
> >>>>>>>>>>>> version
> >>>>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
> >>>>> scenario
> >>>>>>>>>>> its
> >>>>>>>>>>>> happening but after i restart broker and if we wait for few
> >>>>>>>>>>>> days
> >>>>> its
> >>>>>>>>>>>> happening again. From the above logs do you have any
> >>>>>>>>>>>> pointers to
> >>>>>>>>>>> check?
> >>>>>>>>>>>
> >>>>>>>>>>> Are you using NFS?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [email protected]
> >>>>> For additional commands, e-mail: [email protected]
> >>>>>
> >>>>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>

Reply via email to