Thank you Kim, i will try your suggestions.

On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <[email protected] wrote:

> This error is a linearstore issue. It looks as though there is a single
> write operation to disk that has become stuck, and is holding up all
> further write operations. This happens because there is a fixed circular
> pool of memory pages used for the AIO operations to disk, and when one
> of these is "busy" (indicated by the A letter in the  page state map),
> write operations cannot continue until it is cleared. It it does not
> clear within a certain time, then an exception is thrown, which usually
> results in the broker closing the connection.
>
> The events leading up to a "stuck" write operation are complex and
> sometimes difficult to reproduce. If you have a reproducer, then I would
> be interested to see it! Even so, the ability to reproduce on another
> machine is hard as it depends on such things as disk write speed, the
> disk controller characteristics, the number of threads in the thread
> pool (ie CPU type), memory and other hardware-related things.
>
> There are two linearstore parameters that you can try playing with to
> see if you can change the behavior of the store:
>
> wcache-page-size: This sets the size of each page in the write buffer.
> Larger page size is good for large messages, a smaller size will help if
> you have small messages.
>
> wchache-num-pages: The total number of pages in the write buffer.
>
> Use the --help on the broker with the linearstore loaded to see more
> details on this. I hope that helps a little.
>
> Kim van der Riet
>
> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
> > Any help in understand why/when broker throws those errors and stop
> > receiving message would be appreciated.
> >
> > Not sure if any kernel tuning or broker tuning needs to be done to
> > solve this issue.
> >
> > Thanks in advance,
> > Ram
> >
> > On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
> > [email protected]> wrote:
> >
> >> Also from this log message (store level) it seems like waiting for AIO
> to
> >> complete.
> >>
> >> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
> >> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
> >> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
> >> ps=[-------------------------A------]
> >>
> >> page_state ps=[-------------------------A------]  where A is AIO_PENDING
> >> aer=1 _aio_evt_rem;          ///< Remaining AIO events
> >>
> >> When there is or there are pending AIO, does broker close the
> connection?
> >> is there any tuning that can be done to resolve this?
> >>
> >> Thanks,
> >> Ram
> >>
> >>
> >>
> >>
> >> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
> >> [email protected]> wrote:
> >>
> >>> I was check the code and i see these lines for that AIO timeout.
> >>>
> >>>                case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
> >>>                  if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
> >>>                      THROW_STORE_EXCEPTION("Timeout waiting for AIO in
> >>> MessageStoreImpl::recoverMessages()");
> >>>                  ::usleep(AIO_SLEEP_TIME_US);
> >>>                  break;
> >>>
> >>> And these are the defaults
> >>>
> >>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>
> >>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
> >>>
> >>>
> >>>    RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
> >>> waiting for AIO.
> >>>
> >>>
> >>>
> >>> So does page got blocked and its waiting for page availability?
> >>>
> >>>
> >>> Ram
> >>>
> >>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
> >>> [email protected]> wrote:
> >>>
> >>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we
> >>>> see this message
> >>>>
> >>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
> >>>> "<journal-name>": Bad record alignment found at fid=0x4605b
> offs=0x107680
> >>>> (likely journal overwrite boundary); 19 filler record(s) required.
> >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>> "<journal-name>": Recover phase write: Wrote filler record:
> fid=0x4605b
> >>>> offs=0x107680
> >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>> "<journal-name>": Recover phase write: Wr... few more Recover phase
> logs
> >>>>
> >>>> It worked fine for a day and started throwing this message:
> >>>>
> >>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
> >>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
> pi=25 pc=8
> >>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
> >>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
> >>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
> failed:
> >>>> jexception 0x0202 jcntl::handle_aio_wait() threw
> JERR_JCNTL_AIOCMPLWAIT:
> >>>> Timeout waiting for AIOs to complete.
> >>>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
> framing-error:
> >>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
> 0x0202
> >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> waiting for
> >>>> AIOs to complete.
> >>>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
> <queue-name>:
> >>>> MessageStoreImpl::store() failed: jexception 0x0202
> >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> waiting for
> >>>> AIOs to complete.
> >>>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> >>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
> >>>> Value for replyText is too large(320)
> >>>>
> >>>> Thanks,
> >>>> Ram
> >>>>
> >>>>
> >>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> No, local disk.
> >>>>>
> >>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <[email protected]> wrote:
> >>>>>
> >>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
> >>>>>>> Gordon,
> >>>>>>>
> >>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
> >>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario
> >>>>>> its
> >>>>>>> happening but after i restart broker and if we wait for few days
> its
> >>>>>>> happening again. From the above logs do you have any pointers to
> >>>>>> check?
> >>>>>>
> >>>>>> Are you using NFS?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>> For additional commands, e-mail: [email protected]
> >>>>>>
> >>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to