Re: Problem with cygserver and sysv message queues: msgsnd() blocks forever.

2008-05-01 Thread Corinna Vinschen
On Apr 30 16:10, Williams, David wrote:
 Corinna,
 
 I can report that the patch works perfectly. Both with the examples
 and with our original application program that brought the bug to
 our attention.

Thanks for the feedback!


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Problem with cygserver and sysv message queues: msgsnd() blocks forever.

2008-04-30 Thread Corinna Vinschen
On Apr 29 17:57, Williams, David wrote:
 I've been debugging a problem with msgsnd() hanging. If
 there are no free msghdrs available, msgsnd() blocks with
 msleep(). Unfortunately, the only way it can unblock is
 if that specific queue frees a msghdr. If the queue in
 question is empty, this never occurs.
 [...]
 It's possible to work around this by using the flag IPC_NOWAIT
 in msgsnd, and polling until the message is sent, but my feeling
 is that the library call should not hang like this.
 [...]
 The call to msleep() above passes msqptr (the queue handle)
 as the Ident pointer. Each of the calls to wakeup() in
 sysv_msg.cc also passes msgptr as the ident. This means that
 if the msghdr resource is free'd by a queue other than the one
 blocked, it won't wake up msgsnd(). Since doqueue's queue is
 empty, there is no way to wake up msgsnd().
 [...]
 I haven't been able to spot a way to fix this behavior without
 significantly changing the block/release mechanism. Has anyone
 seen this before? Have I missed something? Is this simply a known
 limitation, with IPC_NOWAIT the only way to deal with it?

Right now, yes.  As you have probably seen when examining the sources,
the code is pretty much the FreeBSD version, just with a thin and almost
tasteless Cygwin topping.  The code is basically the version 1.52 of 
the original FreeBSD code with a few patches applied up to version 1.60.
FreeBSD is at 1.71.  I inspected the FreeBSD ChangeLogs and found this
change in version 1.65:

  Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by
  timing out msgsnd and rechecking resources.  This problem was found
  while I was running Linux Test Project test suite (test cases:
  msgctl08, msgctl09).  [...]

This appears to be their solution to the above problem.  The basic
change is the call to msleep.  The last parameter is changed from 0 (no
timeout) to a value called hz.  See
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sysv_msg.c.diff?r1=1.64;r2=1.65
hz is an external variable in the code which is the system's clock
frequency.

Are you set up to build the Cygwin sources?  Would you mind to rebuild
cygserver with this patch applied and test without IPC_NOWAIT again?

Index: sysv_msg.cc
===
RCS file: /cvs/src/src/winsup/cygserver/sysv_msg.cc,v
retrieving revision 1.3
diff -u -p -r1.3 sysv_msg.cc
--- sysv_msg.cc 9 Jan 2006 15:10:14 -   1.3
+++ sysv_msg.cc 30 Apr 2008 10:57:58 -
@@ -722,10 +722,14 @@ msgsnd(struct thread *td, struct msgsnd_
}
DPRINTF((goodnight\n));
error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH,
-   msgwait, 0);
+   msgsnd, 50);
DPRINTF((good morning, error=%d\n, error));
if (we_own_it)
msqptr-msg_perm.mode = ~MSG_LOCKED;
+   if (error == EWOULDBLOCK) {
+   DPRINTF((timed out\n));
+   continue;
+   }
if (error != 0) {
DPRINTF((msgsnd:  interrupted system call\n));
 #ifdef __CYGWIN__
@@ -1079,11 +1083,11 @@ msgrcv(struct thread *td, struct msgrcv_
 
DPRINTF((msgrcv:  goodnight\n));
error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH,
-   msgwait, 0);
+   msgrcv, 0);
DPRINTF((msgrcv:  good morning (error=%d)\n, error));
 
if (error != 0) {
-   DPRINTF((msgsnd:  interrupted system call\n));
+   DPRINTF((msgrcv:  interrupted system call\n));
 #ifdef __CYGWIN__
if (error != EIDRM)
 #endif /* __CYGWIN__ */


Thanks for the report,
Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Problem with cygserver and sysv message queues: msgsnd() blocks forever.

2008-04-30 Thread Williams, David
Yes, I can patch and build the sources, and will test the patch. I
can see that this will work, and is probably the least disruptive
way to fix it. I'm bothered a little bit by the fixed timeout value,
although this is an exceptional case, which shouldn't occur in a
properly tuned and managed system.

My thoughts for a fix were centered around replacing the msqptr
ident parameter with a resource specific identifier that would
allow freeing a resource by one queue to wake another. However,
such a fix would require much regression testing, and STILL might
need a timeout like this as an ultimate safety net. Besides, we
likely want to continue tracking the BSD source.

I'm currently building and testing using the cygwin-1.5.25-12 release
tarball. Would it be more helpful for me to pull the CVS head down
to test this?

Thanks for the quick reply. I'm glad to be of some help.

Dave Williams

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Corinna Vinschen
Sent: Wednesday, April 30, 2008 3:59 AM
To: cygwin@cygwin.com
Subject: Re: Problem with cygserver and sysv message queues: msgsnd() blocks 
forever.

On Apr 29 17:57, Williams, David wrote:
 I've been debugging a problem with msgsnd() hanging. If
 there are no free msghdrs available, msgsnd() blocks with
 msleep(). Unfortunately, the only way it can unblock is
 if that specific queue frees a msghdr. If the queue in
 question is empty, this never occurs.
 [...]
 It's possible to work around this by using the flag IPC_NOWAIT
 in msgsnd, and polling until the message is sent, but my feeling
 is that the library call should not hang like this.
 [...]
 The call to msleep() above passes msqptr (the queue handle)
 as the Ident pointer. Each of the calls to wakeup() in
 sysv_msg.cc also passes msgptr as the ident. This means that
 if the msghdr resource is free'd by a queue other than the one
 blocked, it won't wake up msgsnd(). Since doqueue's queue is
 empty, there is no way to wake up msgsnd().
 [...]
 I haven't been able to spot a way to fix this behavior without
 significantly changing the block/release mechanism. Has anyone
 seen this before? Have I missed something? Is this simply a known
 limitation, with IPC_NOWAIT the only way to deal with it?

Right now, yes.  As you have probably seen when examining the sources,
the code is pretty much the FreeBSD version, just with a thin and almost
tasteless Cygwin topping.  The code is basically the version 1.52 of
the original FreeBSD code with a few patches applied up to version 1.60.
FreeBSD is at 1.71.  I inspected the FreeBSD ChangeLogs and found this
change in version 1.65:

  Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by
  timing out msgsnd and rechecking resources.  This problem was found
  while I was running Linux Test Project test suite (test cases:
  msgctl08, msgctl09).  [...]

This appears to be their solution to the above problem.  The basic
change is the call to msleep.  The last parameter is changed from 0 (no
timeout) to a value called hz.  See
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sysv_msg.c.diff?r1=1.64;r2=1.65
hz is an external variable in the code which is the system's clock
frequency.

Are you set up to build the Cygwin sources?  Would you mind to rebuild
cygserver with this patch applied and test without IPC_NOWAIT again?

Index: sysv_msg.cc
===
RCS file: /cvs/src/src/winsup/cygserver/sysv_msg.cc,v
retrieving revision 1.3
diff -u -p -r1.3 sysv_msg.cc
--- sysv_msg.cc 9 Jan 2006 15:10:14 -   1.3
+++ sysv_msg.cc 30 Apr 2008 10:57:58 -
@@ -722,10 +722,14 @@ msgsnd(struct thread *td, struct msgsnd_
}
DPRINTF((goodnight\n));
error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH,
-   msgwait, 0);
+   msgsnd, 50);
DPRINTF((good morning, error=%d\n, error));
if (we_own_it)
msqptr-msg_perm.mode = ~MSG_LOCKED;
+   if (error == EWOULDBLOCK) {
+   DPRINTF((timed out\n));
+   continue;
+   }
if (error != 0) {
DPRINTF((msgsnd:  interrupted system call\n));
 #ifdef __CYGWIN__
@@ -1079,11 +1083,11 @@ msgrcv(struct thread *td, struct msgrcv_

DPRINTF((msgrcv:  goodnight\n));
error = msleep(msqptr, msq_mtx, (PZERO - 4) | PCATCH,
-   msgwait, 0);
+   msgrcv, 0);
DPRINTF((msgrcv:  good morning (error=%d)\n, error));

if (error != 0) {
-   DPRINTF((msgsnd:  interrupted system call\n));
+   DPRINTF((msgrcv:  interrupted system call\n));
 #ifdef __CYGWIN__
if (error != EIDRM

Re: Problem with cygserver and sysv message queues: msgsnd() blocks forever.

2008-04-30 Thread Corinna Vinschen
On Apr 30 10:16, Williams, David wrote:
 Yes, I can patch and build the sources, and will test the patch. I
 can see that this will work, and is probably the least disruptive
 way to fix it. I'm bothered a little bit by the fixed timeout value,
 although this is an exceptional case, which shouldn't occur in a
 properly tuned and managed system.

I'm not that concerned.  A fixed value of 50 will interrupt a maximum of
20 times per second.  The hz value in BSD is usually higher.  I think 50
is a good compromise.

 My thoughts for a fix were centered around replacing the msqptr
 ident parameter with a resource specific identifier that would
 allow freeing a resource by one queue to wake another. However,
 such a fix would require much regression testing, and STILL might
 need a timeout like this as an ultimate safety net. Besides, we
 likely want to continue tracking the BSD source.

There's surely some better way to solve this problem but if there's
an upstream fix, I'd like to use it.  My goal is to keep the code as
much upstream centered as possible.

 I'm currently building and testing using the cygwin-1.5.25-12 release
 tarball. Would it be more helpful for me to pull the CVS head down
 to test this?

Shouldn't matter, actually.  There's no difference in the message queue
code between 1.5.25 and CVS HEAD.  However, the bugfix will only go into
CVS HEAD.  If you need this bugfix desperately, please maintain your
local version for now.

 Thanks for the quick reply. I'm glad to be of some help.

You're welcome.  Thanks for the debugging effort and the testcase.
You almost did all the work yourself already, I just had to look
what upstream is doing about it :)

I'll check this in in a couple of minutes.


Thanks again,
Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Problem with cygserver and sysv message queues: msgsnd() blocks forever.

2008-04-30 Thread Williams, David
Corinna,

I can report that the patch works perfectly. Both with the examples
and with our original application program that brought the bug to
our attention.

Dave Williams

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Corinna Vinschen
Sent: Wednesday, April 30, 2008 11:58 AM
To: cygwin@cygwin.com
Subject: Re: Problem with cygserver and sysv message queues: msgsnd() blocks 
forever.

On Apr 30 10:16, Williams, David wrote:
 Yes, I can patch and build the sources, and will test the patch. I
 can see that this will work, and is probably the least disruptive
 way to fix it. I'm bothered a little bit by the fixed timeout value,
 although this is an exceptional case, which shouldn't occur in a
 properly tuned and managed system.

I'm not that concerned.  A fixed value of 50 will interrupt a maximum of
20 times per second.  The hz value in BSD is usually higher.  I think 50
is a good compromise.

 My thoughts for a fix were centered around replacing the msqptr
 ident parameter with a resource specific identifier that would
 allow freeing a resource by one queue to wake another. However,
 such a fix would require much regression testing, and STILL might
 need a timeout like this as an ultimate safety net. Besides, we
 likely want to continue tracking the BSD source.

There's surely some better way to solve this problem but if there's
an upstream fix, I'd like to use it.  My goal is to keep the code as
much upstream centered as possible.

 I'm currently building and testing using the cygwin-1.5.25-12 release
 tarball. Would it be more helpful for me to pull the CVS head down
 to test this?

Shouldn't matter, actually.  There's no difference in the message queue
code between 1.5.25 and CVS HEAD.  However, the bugfix will only go into
CVS HEAD.  If you need this bugfix desperately, please maintain your
local version for now.

 Thanks for the quick reply. I'm glad to be of some help.

You're welcome.  Thanks for the debugging effort and the testcase.
You almost did all the work yourself already, I just had to look
what upstream is doing about it :)

I'll check this in in a couple of minutes.


Thanks again,
Corinna

--
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/