RE: Problem with cygserver and sysv message queues: msgsnd() blocks forever.
Yes, I can patch and build the sources, and will test the patch. I can see that this will work, and is probably the least disruptive way to fix it. I'm bothered a little bit by the fixed timeout value, although this is an exceptional case, which shouldn't occur in a properly tuned and managed system. My thoughts for a fix were centered around replacing the msqptr ident parameter with a resource specific identifier that would allow freeing a resource by one queue to wake another. However, such a fix would require much regression testing, and STILL might need a timeout like this as an ultimate safety net. Besides, we likely want to continue tracking the BSD source. I'm currently building and testing using the cygwin-1.5.25-12 release tarball. Would it be more helpful for me to pull the CVS head down to test this? Thanks for the quick reply. I'm glad to be of some help. Dave Williams -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Corinna Vinschen Sent: Wednesday, April 30, 2008 3:59 AM To: cygwin@cygwin.com Subject: Re: Problem with cygserver and sysv message queues: msgsnd() blocks forever. On Apr 29 17:57, Williams, David wrote: I've been debugging a problem with msgsnd() hanging. If there are no free msghdrs available, msgsnd() blocks with msleep(). Unfortunately, the only way it can unblock is if that specific queue frees a msghdr. If the queue in question is empty, this never occurs. [...] It's possible to work around this by using the flag IPC_NOWAIT in msgsnd, and polling until the message is sent, but my feeling is that the library call should not hang like this. [...] The call to msleep() above passes msqptr (the queue handle) as the Ident pointer. Each of the calls to wakeup() in sysv_msg.cc also passes msgptr as the ident. This means that if the msghdr resource is free'd by a queue other than the one blocked, it won't wake up msgsnd(). Since doqueue's queue is empty, there is no way to wake up msgsnd(). [...] I haven't been able to spot a way to fix this behavior without significantly changing the block/release mechanism. Has anyone seen this before? Have I missed something? Is this simply a known limitation, with IPC_NOWAIT the only way to deal with it? Right now, yes. As you have probably seen when examining the sources, the code is pretty much the FreeBSD version, just with a thin and almost tasteless Cygwin topping. The code is basically the version 1.52 of the original FreeBSD code with a few patches applied up to version 1.60. FreeBSD is at 1.71. I inspected the FreeBSD ChangeLogs and found this change in version 1.65: Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by timing out msgsnd and rechecking resources. This problem was found while I was running Linux Test Project test suite (test cases: msgctl08, msgctl09). [...] This appears to be their solution to the above problem. The basic change is the call to msleep. The last parameter is changed from 0 (no timeout) to a value called hz. See http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sysv_msg.c.diff?r1=1.64;r2=1.65 hz is an external variable in the code which is the system's clock frequency. Are you set up to build the Cygwin sources? Would you mind to rebuild cygserver with this patch applied and test without IPC_NOWAIT again? Index: sysv_msg.cc === RCS file: /cvs/src/src/winsup/cygserver/sysv_msg.cc,v retrieving revision 1.3 diff -u -p -r1.3 sysv_msg.cc --- sysv_msg.cc 9 Jan 2006 15:10:14 - 1.3 +++ sysv_msg.cc 30 Apr 2008 10:57:58 - @@ -722,10 +722,14 @@ msgsnd(struct thread *td, struct msgsnd_ } DPRINTF((goodnight\n)); error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH, - msgwait, 0); + msgsnd, 50); DPRINTF((good morning, error=%d\n, error)); if (we_own_it) msqptr-msg_perm.mode = ~MSG_LOCKED; + if (error == EWOULDBLOCK) { + DPRINTF((timed out\n)); + continue; + } if (error != 0) { DPRINTF((msgsnd: interrupted system call\n)); #ifdef __CYGWIN__ @@ -1079,11 +1083,11 @@ msgrcv(struct thread *td, struct msgrcv_ DPRINTF((msgrcv: goodnight\n)); error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH, - msgwait, 0); + msgrcv, 0); DPRINTF((msgrcv: good morning (error=%d)\n, error)); if (error != 0) { - DPRINTF((msgsnd: interrupted system call\n)); + DPRINTF((msgrcv: interrupted system call\n)); #ifdef __CYGWIN__ if (error != EIDRM
RE: Problem with cygserver and sysv message queues: msgsnd() blocks forever.
Corinna, I can report that the patch works perfectly. Both with the examples and with our original application program that brought the bug to our attention. Dave Williams -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Corinna Vinschen Sent: Wednesday, April 30, 2008 11:58 AM To: cygwin@cygwin.com Subject: Re: Problem with cygserver and sysv message queues: msgsnd() blocks forever. On Apr 30 10:16, Williams, David wrote: Yes, I can patch and build the sources, and will test the patch. I can see that this will work, and is probably the least disruptive way to fix it. I'm bothered a little bit by the fixed timeout value, although this is an exceptional case, which shouldn't occur in a properly tuned and managed system. I'm not that concerned. A fixed value of 50 will interrupt a maximum of 20 times per second. The hz value in BSD is usually higher. I think 50 is a good compromise. My thoughts for a fix were centered around replacing the msqptr ident parameter with a resource specific identifier that would allow freeing a resource by one queue to wake another. However, such a fix would require much regression testing, and STILL might need a timeout like this as an ultimate safety net. Besides, we likely want to continue tracking the BSD source. There's surely some better way to solve this problem but if there's an upstream fix, I'd like to use it. My goal is to keep the code as much upstream centered as possible. I'm currently building and testing using the cygwin-1.5.25-12 release tarball. Would it be more helpful for me to pull the CVS head down to test this? Shouldn't matter, actually. There's no difference in the message queue code between 1.5.25 and CVS HEAD. However, the bugfix will only go into CVS HEAD. If you need this bugfix desperately, please maintain your local version for now. Thanks for the quick reply. I'm glad to be of some help. You're welcome. Thanks for the debugging effort and the testcase. You almost did all the work yourself already, I just had to look what upstream is doing about it :) I'll check this in in a couple of minutes. Thanks again, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Problem with cygserver and sysv message queues: msgsnd() blocks forever.
I've been debugging a problem with msgsnd() hanging. If there are no free msghdrs available, msgsnd() blocks with msleep(). Unfortunately, the only way it can unblock is if that specific queue frees a msghdr. If the queue in question is empty, this never occurs. I was able to isolate this behavior in set of examples, which I've attached. The program doqueue.c establishes a queue and upon each ENTER, calls msgsnd() followed by msgrcv(). Start this one first and verify that messages cycle normally. In another window, run overflow.c. This establishes a different queue, and calls msgsnd() until the queue is full, then exits. In the default configuration, that will happen at 40 messages. Go back to the first window and press ENTER. the call to msgsnd() will block. Then, either run drainq to remove messages from overflow's queue, or use ipcrm -q 4660 to delete the queue entirely. Doqueue will remain blocked. It's possible to work around this by using the flag IPC_NOWAIT in msgsnd, and polling until the message is sent, but my feeling is that the library call should not hang like this. Here is the code in question: From sysv_msg.cc, function msgsnd(): = if (free_msghdrs == NULL) { DPRINTF((no more msghdrs\n)); need_more_resources = 1; } if (need_more_resources) { int we_own_it; if ((msgflg IPC_NOWAIT) != 0) { DPRINTF((need more resources but caller doesn't want to wait\n)); error = EAGAIN; goto done2; } if ((msqptr-msg_perm.mode MSG_LOCKED) != 0) { DPRINTF((we don't own the msqid_ds\n)); we_own_it = 0; } else { /* Force later arrivals to wait for our request */ DPRINTF((we own the msqid_ds\n)); msqptr-msg_perm.mode |= MSG_LOCKED; we_own_it = 1; } DPRINTF((goodnight\n)); error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH, msgwait, 0); DPRINTF((good morning, error=%d\n, error)); == The call to msleep() above passes msqptr (the queue handle) as the Ident pointer. Each of the calls to wakeup() in sysv_msg.cc also passes msgptr as the ident. This means that if the msghdr resource is free'd by a queue other than the one blocked, it won't wake up msgsnd(). Since doqueue's queue is empty, there is no way to wake up msgsnd(). Here is a snippet from /var/log/messages: Apr 29 13:38:16 motonao cygserver: call to msgsnd(131072, 0x22CCD8, 1, 0) Apr 29 13:38:16 motonao cygserver: msgsz=1, msgssz=8, segs_needed=1 Apr 29 13:38:16 motonao cygserver: no more msghdrs Apr 29 13:38:16 motonao cygserver: we own the msqid_ds Apr 29 13:38:16 motonao cygserver: goodnight Apr 29 13:38:24 motonao cygserver: msgget(0x1234, 00) Apr 29 13:38:24 motonao cygserver: found public key Apr 29 13:38:24 motonao cygserver: call to msgrcv(196609, 0x22CCD8, 1, 0, 0) Apr 29 13:38:24 motonao cygserver: found a message, msgsz=1, msg_ts=1 The first line is doqueue's last call to msgsnd(). It finds there are no free msghdrs and logs the message no more msghdrs, then logs goodnight and calls msleep(). The call to msgrcv is drainq removing a message from the overflow queue. This is the point where we would like to see the good morning message from msgsnd(), but we don't. I haven't been able to spot a way to fix this behavior without significantly changing the block/release mechanism. Has anyone seen this before? Have I missed something? Is this simply a known limitation, with IPC_NOWAIT the only way to deal with it? ~ David Williams Solekai Systems #include stdio.h #include stdlib.h #include unistd.h #include ctype.h #include sys/types.h #include sys/ipc.h #include sys/msg.h #include errno.h #define MSGQ_KEY 0x1234 #define PERM 0660 struct mymsg { long mtype; char mtext[1]; }; int main() { int msg_id; /* message Queue id. */ struct mymsg msg; /* message to send. */ int nmsg; /* number of msg send. */ int status; /* status returned by msgsnd(). */ /* * Create Message Queue */ if ( (msg_id = msgget (MSGQ_KEY, PERM | IPC_CREAT)) == -1 ) { perror (msgget: ); exit (EXIT_FAILURE); } /* * now write message one by one until Queue is full */ msg.mtype = 1; msg.mtext[0] = 'A'; nmsg = 0; do { printf (Receiving #%d\n, nmsg); status = msgrcv (msg_id, msg, sizeof(msg.mtext), 0, 0); printf (Received msg #%d\n, nmsg); nmsg++; } while ( tolower(getchar()) != 'q' ); exit (EXIT_SUCCESS); } #include stdio.h #include stdlib.h #include unistd.h #include sys/types.h #include sys/ipc.h #include sys/msg.h #include errno.h #define MSGQ_KEY 0x1234 #define PERM 0660 struct mymsg { long mtype; char mtext[1]; }; int main() { int msg_id; /* message Queue id. */ struct mymsg msg; /* message to send. */ int nmsg; /* number of msg send. */ int status; /*
Problem with cygserver and sysv message queues: msgsnd() blocks forever.
Sorry, the first try was badly formatted. I didn't realize until I checked the archive site. I've been debugging a problem with msgsnd() hanging. If there are no free msghdrs available, msgsnd() blocks with msleep(). Unfortunately, the only way it can unblock is if that specific queue frees a msghdr. If the queue in question is empty, this never occurs. I was able to isolate this behavior in set of examples, which I've attached. The program doqueue.c establishes a queue and upon each ENTER, calls msgsnd() followed by msgrcv(). Start this one first and verify that messages cycle normally. In another window, run overflow.c. This establishes a different queue, and calls msgsnd() until the queue is full, then exits. In the default configuration, that will happen at 40 messages. Go back to the first window and press ENTER. the call to msgsnd() will block. Then, either run drainq to remove messages from overflow's queue, or use ipcrm -q 4660 to delete the queue entirely. Doqueue will remain blocked. It's possible to work around this by using the flag IPC_NOWAIT in msgsnd, and polling until the message is sent, but my feeling is that the library call should not hang like this. Here is the code in question: From sysv_msg.cc, function msgsnd(): = if (free_msghdrs == NULL) { DPRINTF((no more msghdrs\n)); need_more_resources = 1; } if (need_more_resources) { int we_own_it; if ((msgflg IPC_NOWAIT) != 0) { DPRINTF((need more resources but caller doesn't want to wait\n)); error = EAGAIN; goto done2; } if ((msqptr-msg_perm.mode MSG_LOCKED) != 0) { DPRINTF((we don't own the msqid_ds\n)); we_own_it = 0; } else { /* Force later arrivals to wait for our request */ DPRINTF((we own the msqid_ds\n)); msqptr-msg_perm.mode |= MSG_LOCKED; we_own_it = 1; } DPRINTF((goodnight\n)); error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH, msgwait, 0); DPRINTF((good morning, error=%d\n, error)); == The call to msleep() above passes msqptr (the queue handle) as the Ident pointer. Each of the calls to wakeup() in sysv_msg.cc also passes msgptr as the ident. This means that if the msghdr resource is free'd by a queue other than the one blocked, it won't wake up msgsnd(). Since doqueue's queue is empty, there is no way to wake up msgsnd(). Here is a snippet from /var/log/messages: Apr 29 13:38:16 motonao cygserver: call to msgsnd(131072, 0x22CCD8, 1, 0) Apr 29 13:38:16 motonao cygserver: msgsz=1, msgssz=8, segs_needed=1 Apr 29 13:38:16 motonao cygserver: no more msghdrs Apr 29 13:38:16 motonao cygserver: we own the msqid_ds Apr 29 13:38:16 motonao cygserver: goodnight Apr 29 13:38:24 motonao cygserver: msgget(0x1234, 00) Apr 29 13:38:24 motonao cygserver: found public key Apr 29 13:38:24 motonao cygserver: call to msgrcv(196609, 0x22CCD8, 1, 0, 0) Apr 29 13:38:24 motonao cygserver: found a message, msgsz=1, msg_ts=1 The first line is doqueue's last call to msgsnd(). It finds there are no free msghdrs and logs the message no more msghdrs, then logs goodnight and calls msleep(). The call to msgrcv is drainq removing a message from the overflow queue. This is the point where we would like to see the good morning message from msgsnd(), but we don't. I haven't been able to spot a way to fix this behavior without significantly changing the block/release mechanism. Has anyone seen this before? Have I missed something? Is this simply a known limitation, with IPC_NOWAIT the only way to deal with it? ~ David Williams Solekai Systems #include stdio.h #include stdlib.h #include unistd.h #include ctype.h #include sys/types.h #include sys/ipc.h #include sys/msg.h #include errno.h #define MSGQ_KEY 0x1234 #define PERM 0660 struct mymsg { long mtype; char mtext[1]; }; int main() { int msg_id; /* message Queue id. */ struct mymsg msg; /* message to send. */ int nmsg; /* number of msg send. */ int status; /* status returned by msgsnd(). */ /* * Create Message Queue */ if ( (msg_id = msgget (MSGQ_KEY, PERM | IPC_CREAT)) == -1 ) { perror (msgget: ); exit (EXIT_FAILURE); } /* * now write message one by one until Queue is full */ msg.mtype = 1; msg.mtext[0] = 'A'; nmsg = 0; do { printf (Receiving #%d\n, nmsg); status = msgrcv (msg_id, msg, sizeof(msg.mtext), 0, 0); printf (Received msg #%d\n, nmsg); nmsg++; } while ( tolower(getchar()) != 'q' ); exit (EXIT_SUCCESS); } #include stdio.h #include stdlib.h #include unistd.h #include sys/types.h #include sys/ipc.h #include sys/msg.h #include errno.h #define MSGQ_KEY 0x1234 #define PERM 0660 struct mymsg { long mtype; char mtext[1]; }; int main() { int msg_id; /* message Queue id. */ struct mymsg msg; /* message to