Robin:
Just to report what I did last night. I did a relogin with truss on
gnome-session-real (since the /usr/bin/gnome-session was renamed to
gnome-session-real). The system froze and was stuck as I described in my
previous post. After waiting for about 2 hours I interrupted my attempt.
The truss file reached 20MB when I interrupted the operation. I will
retry this once I get confirmation that the change I did in
"/usr/bin/gnome-session" is correct and how long it should take.
Yes the truss command should have been:
#/bin/bash
truss -faldo /tmp/truss.out.$$ gnome-session-real --debug
Sorry for providing the incorrect command in my last email.
Using truss on gnome-session like this should slow things down a bit,
but shouldn't hang the system for hours. It should make login take
a few minutes longer at most. However, if the bug you are experiencing
is caused by a process spinning and making a lot of system calls, truss
could get bogged down and hang.
The truss command is just a tool to see what system calls a process
makes, and is one way to analyze what a program is doing. If using
truss on gnome-session isn't working well, then we may need to diagnose
the problem using other methods.
Really what we are trying to pin down is specifically what process is
causing the problem. This problem would be easier to debug if we can
diagnose which process started by gnome-session is causing the hang.
If you can telnet into the machine from a second machine and run the
"top" command, and then log in and watch the top output when the hang
happens. I would keep an eye out for which processes are consuming the
most CPU and for programs whose memory size is growing at a fast rate.
If we could track down the problem to a specific program, then it would
likely be easier to diagnose what is going wrong.
For example, in bug #14857 it seems that GConf may be responsible for
a similar problem. However, this is just speculation at this point.
But if the problem is with /usr/lib/gconfd-2, then perhaps running
truss on just the gconfd-2 process might give better output and work
better than trying to run truss on gnome-session.
I also tried without my correction. I rebooted the system and all I saw
was a black screen with a busy pointer. When logging in I saw that the
truss files in the /tmp/ dir were literaly exploding in size. Initially
they were 600MB in size and reached over a gig within a few minutes so I
stopped this before things would go haywire.
Yes, telling the gnome-session script to call itself would result in
the system just hanging. Again, apologies for being confusing with
my previous email.
I would also like to know how long time this trussing is supposed to
take. I understand that patience is a virtue but it's too much for me to
babysit the computer for 12 hours waiting for the login screen to appear.
It shouldn't take more than a few minutes, really. If it is taking more
than 10 minutes, then a different approach to diagnosing the problem is
necessary.
The dtrace program is another program that might be useful in debugging
this sort of issue. I'm not much of a dtrace expert, but perhaps
someone else on this list could recommend ways to debug this sort of
issue with dtrace.
Brian
On 2010-05-11 05:11, Brian Cameron wrote:
Robin:
On 05/ 8/10 02:14 PM, Robin Axelsson wrote:
The delay is 3 minutes long (~180 seconds) and not 2 or 30 seconds. This
is delay easy to measure since there's a clock in the bottom right
corner of the login splash that is frozen which "jumps" 3 minutes
forward when the input text field for the user login pops up. I supply a
picture of what the login soplash screen looks like when it's frozen and
another /var/adm/messages logfile that is linked to this image and
contains more "before and after" information.
Looking at the messages20100508-1959.log.txt file that you provided, I
notice the following:
- The GDM service started at 19:59:27
- At 19:59:33 it looks like the slave daemon informs the greeter to
prompt for the username
- At 20:06:14 it looks like the prompt was answered with a username.
This looks like a 6 minute delay, much longer than 3 minutes, but
perhaps it took you some additional time to actually enter a username?
Unfortunately, the syslog doesn't seem to contain any information
between 19:59:33 and 20:06:14 to indicate what might be causing this
hang.
Later in the log, I also notice that GDM service started at 20:08:21
and that the greeter was informed to prompt for a username at 20:08:31,
but that is where the log ends, so I don't see any delay here.
Also the 0-greeter*log files (which shows stdout/stderr when the
greeter is running) shows the same warning/error messages that I see on
my machine but I do not have a slowdown. So this log does not seem to
highlight anything unusual that is causing the hang for you.
Since this problem seems to only be affecting a few people, I wonder if
you might be able to identify what about your setup might be different
that could be causing the slowdown. Are you using a particular locale,
IM (Input Method) setup, or somesuch that might be causing GDM to behave
differently for you than for others? If you are using a locale, then
does GDM not hang if you switch to using the default C locale?
I notice that bug #14857 in defect.opensolaris.org seems like it might
be related. I notice the same unusual GConf error in the "out" file
that you provided. Your problem sounds different than bug #14857 since
it seems to be affecting you on the login screen, while the slowdown in
bug #14857 seems to affect the user session starting. However, if this
is a locale issue, then this might be explained if you are setting your
default locale systemwide (so GDM picks it up) while the person who
reported bug #14857 might be setting their locale via the GDM login
screen.
http://defect.opensolaris.org/bz/show_bug.cgi?id=14857
It might be helpful to get some truss output to debug this problem.
If you rename /usr/bin/gnome-session to /usr/bin/gnome-session-real
and then create a script (with execute permissions) named
/usr/bin/gnome-session which contains these two lines
#/bin/bash
truss -faldo /tmp/truss.out.$$ gnome-session --debug
Then restart the "gdm" service. This will cause GDM to run the
script to start gnome-session, which will launch the gnome-session-real
with truss turned on.
This will do two things. It will make things even slower, and it will
create a /tmp/truss.out.(pid) file which will show what gnome-session
is doing, and might highlight the problem. The truss output timestamps
each line so it might better highlight where the problem is happening.
Truss output is large, so it might make sense to attach the output to
doo bug #14857 rather than as an email attachment.
> I forgot to say: I don't know what "face browser" means. If it is the
> feature that enables the possibility of "choosing" users at the login
> and/or poweroff/restart the the answer to this question is no.
Yes, that is what I meant. If you don't have the Face Browser enabled,
then there is no need to load the users. However, according to the
syslog, this slowdown is only a couple of seconds, and probably not the
issue which is causing the long slowdown that you are seeing.
---
Brian
.
_______________________________________________
opensolaris-discuss mailing list
[email protected]