Public bug reported:

Binary package hint: gdm

Since upgrade to dapper final, we experience frequent breakage of gdm on
an amd64 (64-bit dapper) XDMCP server serving regularly around 40
clients.

Reproducibility: happens when e.g. many people log out at the same time
(once in a few days), gdm must be killed and started manually afterwards
(killing all the existing sessions as well).

Symptoms: slave gdm processe continue to work, but the main gdm process
does not spawn new slaves, it does not ping existing ones every 15s as
it does normally (from the debug syslog), does not repond to TERM (must
be KILLed) - as if it were waiting for something (race?).

Logs reveal the only difference between normal situation and the bug in
the timeout on the gdm socket (I will attach the full log):

Sep 22 14:16:40 [gdm] Sending LOGGED_IN == 0 for slave 317
Sep 22 14:16:40 [gdm] Timeout occurred for sending message LOGGED_IN 317 0

What might be the reason? In slave.c:gdm_slave_send, up to 10 attempts are made 
to deliver the message (select on 
 &rfds), but select apparently return error, since the timeout never expires 
(otherwise, it would have to take 10s between the message sending and the 
timeout).

PS. I compiled gdm with an added line for tracing the message sending
and will post results if they are relevant. (daemon/slave.c):

@@ -2767,6 +2766,7 @@
                if (in_usr2_signal > 0) {
                        fd_set rfds;
                        struct timeval tv;
+                       int select_retval;

                        FD_ZERO (&rfds);
                        FD_SET (d->slave_notify_fd, &rfds);
@@ -2775,9 +2775,10 @@
                        tv.tv_sec = 1;
                        tv.tv_usec = 0;

-                       if (select (d->slave_notify_fd+1, &rfds, NULL, NULL, 
&tv) > 0) {
+                       if ((select_retval = select (d->slave_notify_fd+1, 
&rfds, NULL, NULL, &tv)) > 0) {
                                gdm_slave_handle_usr2_message ();
                        }
+                       if (select_retval < 0) gdm_debug("TRACE (%s,%d): select 
returned errno %d 
(%s)",__FILE__,__LINE__,select_retval,strerror(select_retval));
                } else {
                        struct timeval tv;
                        /* Wait 1 second. */
@@ -2787,6 +2788,7 @@
                        /* don't want to use sleep since we're using alarm
                           for pinging */
                }
+               gdm_debug ("TRACE (%s,%d): Passed gdm_slave_send cycle, i=%d, 
in_usr2_signal=%d, wait_for_ack=%d, 
gdm_got_ack=%d.",__FILE__,__LINE__,i,in_usr2_signal,wait_for_ack,gdm_got_ack);
        }

        if G_UNLIKELY (wait_for_ack  &&

** Affects: gdm (Ubuntu)
     Importance: Untriaged
         Status: Unconfirmed

-- 
gdm hangs altogether after timeout on the gdm socket
https://launchpad.net/bugs/62139

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to