Theres a fair bit of detail there which wasnt present originally. So it sounds like you are indeed only using the single thread, by scheduling a regular (how often is regular?) periodic task on it, since that wasnt mentioned it was unclear that was the case (usually means it isnt the case). That periodic task then effectively schedules return tasks for (individual message?) responses in the batch of queued requests by passing tasks using the connection work queue.
That actually seems unnecessary if you are already running everything on the connection/container thread anyway, the connection work queue is typically used to pass work for a connections [container] thread to do from a different thread, rather than passing to itself. Its not clear you really gain in this case with the work queue if its the same lone thread doing all the work, mainly its just doing the same work at different times (i.e later) than it might have otherwise done it, plus there is then additional scheduling overhead and work needed to service the additional tasks than there might have been without. Plus if there is a batched backlog of incoming requests to then periodically respond to, that could increase the amount of time the thread needs to spend on doing that in a single sequence and then processing the response tasks, perhaps making it less available to accept new things while it does so for that now-grouped work, at which point the acceptor backlog might come more into play. I would have a play with either changing/removing use of the work queue for responses if you are really using a single thread, or changing the way the handling is done so it is another thread actually handling things and only the send is being passed back (maybe even a batch of sends in one task). Or perhaps using multiple container threads so connections arent all on the same one, but again dropping the work queue usage and just processing the response inline. All that said, if your 'processing X is quick' is true (how quick is quick?) then I still wouldnt really expect to see any delays of anything like the length you are talking about regardless, unless something else odd was going on, such as the prior talk of reconnection. Although if that was previously occurring and you have significantly raised the backlog, it should then not really be in play anymore, which should be very simple to identify from the timings of what is happening. I'd suggest instrumenting your code to get a more precise handle on where the time is going..you should be able to tell exactly how long it takes for every connection to open from the clients perspective, for a message to arrive at the server, for it to be acked, for the request to be processed, for the request to arrive at consumer etc. Related to that, also note you can run client and/or server ends with protocol tracing enabled (PN_TRACE_FRM=1) to visualise what traffic is/isnt happening. If you have clients seeing 1 minute delays, that might be something fairly visible just looking as it runs. E.g run a bunch of clients without it as normal, and also manually run some with tracing on and observe. Perhaps you can narrow down an issue in your codes handling, or perhaps you can establish a case you think there is actually an issue in Proton, providing a minimal reproduce that can show it. On Mon, 6 Jun 2022 at 20:07, Fredrik Hallenberg <[email protected]> wrote: > > Maybe my wording was not correct, responses to clients are handled fine > when connection is achieved. The issue is only about the time before the > connection is made and the initial client message shows up in the server > handler. When this happens I will push the message to a queue and return > immediately. The queue is handled by a fiber running at regular intervals, > this is done by using the qpid scheduler. Each message will get its own > fiber which will use the qpid work queue to send a reply when processing is > done. This processing should happen quickly. I am pretty sure this system > is safe, I have done a lot of testing on it. If you think it will cause > delays in qpid I will try to improve it using threads etc. I have tried > running the message queue consumer on a separate thread but as I mentioned > I did not see any obvious improvements so I opted to go for a single thread > solution. > > On Mon, Jun 6, 2022 at 11:29 AM Robbie Gemmell <[email protected]> > wrote: > > > Personally from the original mail I think its as likely issue lies in > > just how the messages are being handled and responses generated. If > > adding threads is not helping any, it would only reinforce that view > > for me. > > > > Note is made that a single thread is being used, and that messages are > > only queued by the thread and "handled elsewhere" "quickly", but > > "responses" take a long time. What type of queuing is being done? Can > > the queue block (which would stop the container doing _anything_ )? > > How are the messages actually then being handled and responses > > generated exactly? Unless that process is using the appropriate > > mechanisms for passing work back to the connection (/container if > > single) thread, it would both be both unsafe and may very well result > > in delays, because no IO would actually happen until something else > > entirely caused that connection to process again later (e.g heartbeat > > checking). > > > > On Fri, 3 Jun 2022 at 18:04, Cliff Jansen <[email protected]> wrote: > > > > > > Adding threads should allow connection setup (socket creation, accept, > > and > > > initial malloc of data structures) to run in parallel with connection > > > processing (socket read/write, TLS overhead, AMQP encode/decode, your > > > application on_message callback). > > > > > > The epoll proactor scales better with additional threads than the libuv > > > implementation. If you are seeing no benefit with extra threads, trying > > > the libuv proactor is a worthwhile idea. > > > > > > On Fri, Jun 3, 2022 at 2:38 AM Fredrik Hallenberg <[email protected]> > > > wrote: > > > > > > > Yes, the fd limit is already raised a lot. Increasing backlog has > > improved > > > > performance and more file descriptors are in use but still I feel > > > > connection times are too long. Is there anything else to tune in the > > > > proactor? Should I try with the libuv proactor instead of epoll? > > > > I have tried with multiple threads in the past but did not notice any > > > > difference, but perhaps it is worth trying again with the current > > backlog > > > > setting? > > > > > > > > > > > > On Thu, Jun 2, 2022 at 5:11 PM Cliff Jansen <[email protected]> > > wrote: > > > > > > > > > Please try raising your fd limit too. Perhaps doubling it or more. > > > > > > > > > > I would also try running your proton::container with more threads, > > say 4 > > > > > and then 16, and see if that makes a difference. It shouldn’t if > > your > > > > > processing within Proton is as minimal as you describe. However, if > > > > there > > > > > is lengthy lock contention as you pass work out and then back in to > > > > Proton, > > > > > that may introduce delays. > > > > > > > > > > On Thu, Jun 2, 2022 at 7:43 AM Fredrik Hallenberg < > > [email protected]> > > > > > wrote: > > > > > > > > > > > I have done some experiments raising the backlog value, and it is > > > > > > possibly a bit better, I have to test it more. Even if it works I > > would > > > > > of > > > > > > course like to avoid having to rely on a patched qpid. Also, maybe > > some > > > > > > internal queues or similar should be modified to handle this? > > > > > > > > > > > > I have not seen transport errors in the clients, but this may be > > > > because > > > > > > reconnection is enabled. I am unsure on what the reconnection > > feature > > > > > > actually does, I never seen an on_connection_open where > > > > > > connection.reconnection() returns true. > > > > > > Perhaps it is only useful when a connection is established and then > > > > lost? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 2, 2022 at 1:44 PM Ted Ross <[email protected]> wrote: > > > > > > > > > > > > > On Thu, Jun 2, 2022 at 9:06 AM Fredrik Hallenberg < > > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, my application tends to get a lot of short lived incoming > > > > > > > connections. > > > > > > > > Messages are very short sync messages that usually can be > > responded > > > > > > with > > > > > > > > very little processing on the server side. It works fine but I > > feel > > > > > > > > that the performance is a bit lacking when many connections > > happen > > > > at > > > > > > the > > > > > > > > same time and would like advice on how to improve it. I am > > using > > > > qpid > > > > > > > > proton c++ 0.37 with epoll proactor. > > > > > > > > My current design uses a single thread for the listener but it > > will > > > > > > > > immediately push incoming messages in on_message to a queue > > that is > > > > > > > handled > > > > > > > > elsewhere. I can see that clients have to wait for a long time > > (up > > > > > to a > > > > > > > > minute) until they get a response, but I don't believe there > > is an > > > > > > issue > > > > > > > on > > > > > > > > my end as I as will quickly deal with any client messages as > > soon > > > > as > > > > > > they > > > > > > > > show up. Rather the issues seems to be that messages are not > > pushed > > > > > > into > > > > > > > > the queue quickly enough. > > > > > > > > I have noticed that the pn_proactor_listen is hardcoded to use > > a > > > > > > backlog > > > > > > > of > > > > > > > > 16 in the default container implementation, this seems low, > > but I > > > > am > > > > > > not > > > > > > > > sure if it is correct to change it. > > > > > > > > Any advice apppreciated. My goal is that a client should never > > need > > > > > to > > > > > > > wait > > > > > > > > more than a few seconds for a response even under reasonably > > high > > > > > load, > > > > > > > > maybe a few hundred connections per seconds. > > > > > > > > > > > > > > > > > > > > > > I would try increasing the backlog. 16 seems low to me as > > well. Do > > > > > you > > > > > > > know if any of your clients are re-trying the connection setup > > > > because > > > > > > they > > > > > > > overran the server's backlog? > > > > > > > > > > > > > > -Ted > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
