Thank you Kim, i will try your suggestions. On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <[email protected] wrote:
> This error is a linearstore issue. It looks as though there is a single > write operation to disk that has become stuck, and is holding up all > further write operations. This happens because there is a fixed circular > pool of memory pages used for the AIO operations to disk, and when one > of these is "busy" (indicated by the A letter in the page state map), > write operations cannot continue until it is cleared. It it does not > clear within a certain time, then an exception is thrown, which usually > results in the broker closing the connection. > > The events leading up to a "stuck" write operation are complex and > sometimes difficult to reproduce. If you have a reproducer, then I would > be interested to see it! Even so, the ability to reproduce on another > machine is hard as it depends on such things as disk write speed, the > disk controller characteristics, the number of threads in the thread > pool (ie CPU type), memory and other hardware-related things. > > There are two linearstore parameters that you can try playing with to > see if you can change the behavior of the store: > > wcache-page-size: This sets the size of each page in the write buffer. > Larger page size is good for large messages, a smaller size will help if > you have small messages. > > wchache-num-pages: The total number of pages in the write buffer. > > Use the --help on the broker with the linearstore loaded to see more > details on this. I hope that helps a little. > > Kim van der Riet > > On 11/6/18 2:12 PM, rammohan ganapavarapu wrote: > > Any help in understand why/when broker throws those errors and stop > > receiving message would be appreciated. > > > > Not sure if any kernel tuning or broker tuning needs to be done to > > solve this issue. > > > > Thanks in advance, > > Ram > > > > On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu < > > [email protected]> wrote: > > > >> Also from this log message (store level) it seems like waiting for AIO > to > >> complete. > >> > >> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal > >> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT; > >> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF > >> ps=[-------------------------A------] > >> > >> page_state ps=[-------------------------A------] where A is AIO_PENDING > >> aer=1 _aio_evt_rem; ///< Remaining AIO events > >> > >> When there is or there are pending AIO, does broker close the > connection? > >> is there any tuning that can be done to resolve this? > >> > >> Thanks, > >> Ram > >> > >> > >> > >> > >> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu < > >> [email protected]> wrote: > >> > >>> I was check the code and i see these lines for that AIO timeout. > >>> > >>> case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT: > >>> if (++aio_sleep_cnt > MAX_AIO_SLEEPS) > >>> THROW_STORE_EXCEPTION("Timeout waiting for AIO in > >>> MessageStoreImpl::recoverMessages()"); > >>> ::usleep(AIO_SLEEP_TIME_US); > >>> break; > >>> > >>> And these are the defaults > >>> > >>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec > >>> > >>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms > >>> > >>> > >>> RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is > >>> waiting for AIO. > >>> > >>> > >>> > >>> So does page got blocked and its waiting for page availability? > >>> > >>> > >>> Ram > >>> > >>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu < > >>> [email protected]> wrote: > >>> > >>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we > >>>> see this message > >>>> > >>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal > >>>> "<journal-name>": Bad record alignment found at fid=0x4605b > offs=0x107680 > >>>> (likely journal overwrite boundary); 19 filler record(s) required. > >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal > >>>> "<journal-name>": Recover phase write: Wrote filler record: > fid=0x4605b > >>>> offs=0x107680 > >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal > >>>> "<journal-name>": Recover phase write: Wr... few more Recover phase > logs > >>>> > >>>> It worked fine for a day and started throwing this message: > >>>> > >>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>": > >>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: > pi=25 pc=8 > >>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------] > >>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to > >>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store() > failed: > >>>> jexception 0x0202 jcntl::handle_aio_wait() threw > JERR_JCNTL_AIOCMPLWAIT: > >>>> Timeout waiting for AIOs to complete. > >>>> > (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211) > >>>> 2018-10-28 12:27:01 [Broker] error Connection exception: > framing-error: > >>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception > 0x0202 > >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout > waiting for > >>>> AIOs to complete. > >>>> > (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211) > >>>> 2018-10-28 12:27:01 [Protocol] error Connection > >>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue > <queue-name>: > >>>> MessageStoreImpl::store() failed: jexception 0x0202 > >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout > waiting for > >>>> AIOs to complete. > >>>> > (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501) > >>>> 2018-10-28 12:27:01 [Protocol] error Connection > >>>> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument: > >>>> Value for replyText is too large(320) > >>>> > >>>> Thanks, > >>>> Ram > >>>> > >>>> > >>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu < > >>>> [email protected]> wrote: > >>>> > >>>>> No, local disk. > >>>>> > >>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <[email protected]> wrote: > >>>>> > >>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote: > >>>>>>> Gordon, > >>>>>>> > >>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version > >>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario > >>>>>> its > >>>>>>> happening but after i restart broker and if we wait for few days > its > >>>>>>> happening again. From the above logs do you have any pointers to > >>>>>> check? > >>>>>> > >>>>>> Are you using NFS? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: [email protected] > >>>>>> For additional commands, e-mail: [email protected] > >>>>>> > >>>>>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
