On 07/04/2012 05:34 PM, Amos Jeffries wrote: >>> 3124 - Cache manager stops responding when multiple workers used >>> ** requires implementing non-blocking IPC packets between workers and >>> coordinator. >> >> Has this been discussed somewhere? IPC communication is already >> non-blocking so I suspect some other issue is at play here. The specific >> examples of mgr commands in the bug report (userhash, sourcehash, >> client_list, and netdb) seem like non-essential in most environments >> and, hence, not justifying the "major" designation, but perhaps they >> indicate some major implementation problem that must be fixed.
> UNIX sockets apparently guarantee the write() is blocked until recipient > process has read() the packet. That is not true in general. I just wrote a basic UDS client and server to test this (attached), and I can send packets much faster than the server reads them. Linux keeps a queue of messages. The I/O may become blocking if the queue is full, but I suspect select(2) or equivalent will not let us send a new message under that condition (or the send will fail rather than block). There is a sysctrl option (net.unix.max_dgram_qlen in recent kernels) that controls the number of messages that can be queued between the client and server. It is possible that UDS sockets behave differently in some environments that I have not tested, but I doubt. Why do you think that UNIX sockets block write() until recipient has read() the packet? > Last I > looked the coordinator handling function also called component handler > functions synchronously for them to create the response IPC packet. Ipc::Coordinator::handleCacheMgrRequest() starts an async job to satisfy the received cache manager request. There are some Ipc::Coordinator::handle*() methods that create the final response synchronously, but they should all be very fast and not worth creating an async job. Are you talking about some other coordinator handling functions that block for a long time? > AFAIK this is waiting on the Subscription and generic (immediate-ACK) > IPC packets, which will free up the coordinator and workers for other > async operations even if a large process is underway. IIRC, subscription was needed to resolve IPC linking problems. It is possible that it is needed for this bug as well, but since I cannot tell what this bug is, I do not know whether subscription is the solution. I thought you knew because of your "requires implementing non-blocking IPC packets" solution summary. That is why I started asking questions... Alex. P.S. Output of the attached UDS server that sleeps to be slower than the client. All sent messages are received, some after the client is gone: > $ ./uds-server.pl /tmp/uds > 1341466849 waiting for messages > 1341466853 got msg #01 after 4.00 seconds ... sleeping for 3.00 seconds > 1341466856 got msg #02 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466859 got msg #03 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466862 got msg #04 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466865 got msg #05 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466868 got msg #06 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466871 got msg #07 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466874 got msg #08 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466877 got msg #09 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466880 got msg #10 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466883 got msg #11 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466886 got msg #12 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466889 got msg #13 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466892 got msg #14 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466895 got msg #15 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466898 got msg #16 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466901 got msg #17 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466904 got msg #18 after 3.00 seconds ... sleeping for 3.00 seconds > 1341466907 got msg #19 after 3.00 seconds ... sleeping for 3.00 seconds UDS client that sends as fast as it can. Note the blocking after the queue gets full around msg #12 (we are using blocking I/O here): > $ ./uds-client.pl /tmp/uds > 1341466853 sending with max queue length of 10 messages > 1341466853 sent msg #01 after 0.00 seconds > 1341466853 sent msg #02 after 0.00 seconds > 1341466853 sent msg #03 after 0.00 seconds > 1341466853 sent msg #04 after 0.00 seconds > 1341466853 sent msg #05 after 0.00 seconds > 1341466853 sent msg #06 after 0.00 seconds > 1341466853 sent msg #07 after 0.00 seconds > 1341466853 sent msg #08 after 0.00 seconds > 1341466853 sent msg #09 after 0.00 seconds > 1341466853 sent msg #10 after 0.00 seconds > 1341466853 sent msg #11 after 0.00 seconds > 1341466853 sent msg #12 after 0.00 seconds > 1341466856 sent msg #13 after 3.00 seconds > 1341466859 sent msg #14 after 3.00 seconds > 1341466862 sent msg #15 after 3.00 seconds > 1341466865 sent msg #16 after 3.00 seconds > 1341466868 sent msg #17 after 3.00 seconds > 1341466871 sent msg #18 after 3.00 seconds > 1341466874 sent msg #19 after 3.00 seconds > ^C
uds-server.pl
Description: Perl program
uds-client.pl
Description: Perl program
