Re: Screen 4.0.2 frozen, can't remote detach either

2009-07-17 Thread Juergen Weigert
On Jul 14, 09 22:17:19 -0700, Joe Zbiciak wrote:
> 
> Hello again,
> 
> I realize I'm top-posting my reply to my own email.  Sorry for the bad form.  
> Yahoo doesn't make it easy.
> 
> Anyway, I still have all these processes running and frozen.  Would anyone 
> like me to dig around or otherwise perform some forensics on this?  Or should 
> I just kill all the processes?
> 

pid 21652 is the frontend. It is just stuck because the backend is not
listening on its socket, as it should. You can kill 21652 without harm.

Backend 13738 appears to be stuck in write to the user display:
> write(4, "tr_new = 0 num_cmd = 28  metric "..., 348 
> lrwx--  1 jzbiciak vivid 64 Jul 13 07:25 4 -> /dev/pts/0

We have no good cure for this situation.
you can 
try a)
close the other end of /dev/pts/0 (your xterm, ssh daemon, or whatever it was)
$ kill -CHLD 13738
and check if it is still stuck in the write. If so,

try b) 
$ rm /tmp/uscreens/S-jzbiciak/13738.pts-2.durable03
$ kill -CHLD 13738
and see if it recreates the socket. If it does, the signal handler was able
to pull it out of the write.  If not, there is not much more to do
except $ kill -9 13738

All those write() calls in screen should have an alarm() on them,
so that screen has a chance to regain control, whenever the kernel keeps us 
stuck.
It apparently happens quite frequently to some people, while it only
happened once or twice to myself. The exact cause is still a mystery to me.

sorry,
JW-

-- 
 o \  Juergen Weigert  unix-software __/ _===.===_ 
 | j...@cs.fau.de creator__/_---|\/
 \  |0179/2069677  __/  (//\
(/) | /  _/ \_ vim:set sw=2 wm=8


___
screen-users mailing list
screen-users@gnu.org
http://lists.gnu.org/mailman/listinfo/screen-users


Re: Screen 4.0.2 frozen, can't remote detach either

2009-07-14 Thread Joe Zbiciak

Hello again,

I realize I'm top-posting my reply to my own email.  Sorry for the bad form.  
Yahoo doesn't make it easy.

Anyway, I still have all these processes running and frozen.  Would anyone like 
me to dig around or otherwise perform some forensics on this?  Or should I just 
kill all the processes?

I'd like to help catch a screen bug in the act.

--Joe



 --
We sell Spatulas, and that's all!
http://spatula-city.org/~im14u2c/
http://sdk-1600.spatula-city.org/
http://spacepatrol.info/



- Original Message 
From: Joe Zbiciak 
To: screen-users@gnu.org
Sent: Monday, July 13, 2009 3:51:07 PM
Subject: Screen 4.0.2 frozen, can't remote detach either


Howdy!

I seemed to have caused Screen 4.0.2 to freeze while sending commands to it in 
a too-rapid-fire fashion.  I was flipping rapidly between screens, going into 
Copy Mode, and the screen I was going into Copy Mode on was also busily 
scrolling by messages.  (My purpose in going into Copy Mode was to scroll back 
and catch something that had scrolled by before I could read it.)

I've captured some diagnostic information, and I haven't killed any processes 
yet, in the hope I can either help diagnose the bug, or at determine that this 
is an already known (and hopefully fixed) bug.  Attaching the foreground 
process (screen in lowercase, pid 21652) with gdb -p gives the following 
backtrace:

(gdb) bt
#0  0x002ed7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x003973c6 in __pause_nocancel () from /lib/tls/libc.so.6
#2  0x0806c016 in Attacher () at attacher.c:567
#3  0x0804f242 in main (ac=0, av=0xbfffe2ec) at screen.c:1097
(gdb) fr 2
#2  0x0806c016 in Attacher () at attacher.c:567
567   pause();

Because it was stuck in pause(), I decided to try to be a hero and send it 
SIGCONT, resulting in this stack backtrace:

(gdb) bt
#0  0x002ed7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x003c6273 in __write_nocancel () from /lib/tls/libc.so.6
#2  0x0806b55e in WriteMessage (s=4, m=0xbfff677c) at attacher.c:101
#3  0x0806be45 in AttacherFinit (sigsig=1) at attacher.c:435
#4  
#5  0x002ed7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#6  0x003c6273 in __write_nocancel () from /lib/tls/libc.so.6
#7  0x0806b55e in WriteMessage (s=3, m=0xbfff9bb0) at attacher.c:101
#8  0x0806bc11 in Attach (how=6) at attacher.c:206
#9  0x0806c198 in Attacher () at attacher.c:624
#10 0x0804f242 in main (ac=0, av=0xbfffe2ec) at screen.c:1097


Attaching it at this point with strace -p shows:

$ strace -p 21652
Process 21652 attached - interrupt to quit
write(4, "\0gsm\7\0\0\0/dev/pts/0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 12336 

Process 21652 detached


If I attach to the background process (SCREEN in ALL CAPS, pid 13738), I see 
the following backtrace:

(gdb) bt
#0  0x002ed7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x003c6273 in __write_nocancel () from /lib/tls/libc.so.6
#2  0x0807c517 in Flush () at display.c:3159
#3  0x080803b3 in MakeStatus (
msg=0xbfff88c0 "Copy mode - Column 3 Line 95(+1) (80,95)")
at display.c:2306
#4  0x080860df in LMsg (err=0, 
fmt=0xfe00 ) at layer.c:
#5  0x080586e7 in MarkRoutine () at mark.c:462
#6  0x0807761c in DoAction (act=0x809cca4, key=27) at process.c:2088
#7  0x0807a970 in ProcessInput2 (ibuf=0x8f6e2bb "\033OA", ilen=1)
at process.c:832
#8  0x0808645f in sched () at sched.c:208
#9  0x0804e79a in main (ac=0, av=0xbfffddbc) at screen.c:1362

If I attach to that with strace -p, I see that it's stuck writing some text 
that was part of the display that was scrolling by:

$ strace -p 13738
Process 13738 attached - interrupt to quit
write(4, "tr_new = 0 num_cmd = 28  metric "..., 348 
Process 13738 detached


Other information:

$ ls -l /proc/13738/fd
total 9
lr-x--  1 jzbiciak vivid 64 Jul 13 15:32 0 -> /dev/null
l-wx--  1 jzbiciak vivid 64 Jul 13 15:32 1 -> /dev/null
lr-x--  1 jzbiciak vivid 64 Jul 13 15:32 11 -> /var/run/utmp
l-wx--  1 jzbiciak vivid 64 Jul 13 15:32 2 -> /dev/null
lr-x--  1 jzbiciak vivid 64 Jul 13 15:32 3 -> 
/tmp/uscreens/S-jzbiciak/13738.pts-2.durable03|
lrwx--  1 jzbiciak vivid 64 Jul 13 07:25 4 -> /dev/pts/0
lrwx--  1 jzbiciak vivid 64 Jul 13 15:32 5 -> /dev/ptmx
lrwx--  1 jzbiciak vivid 64 Jul 13 15:32 7 -> /dev/ptmx
lrwx--  1 jzbiciak vivid 64 Jul 13 15:32 9 -> /dev/ptmx

$ ls -l /proc/21652/fd
total 5
lrwx--  1 jzbiciak vivid 64 Jul 13 15:04 0 -> /dev/pts/0
lrwx--  1 jzbiciak vivid 64 Jul 13 15:34 1 -> /dev/pts/0
lrwx--  1 jzbiciak vivid 64 Jul 13 15:06 2 -> /dev/pts/0
l-wx--  1 jzbiciak vivid 64 Jul 13 15:34 3 -> 
/tmp/uscreens/S-jzbiciak/13738.pts-2.durable03|
l-wx--  1 jzbiciak vivid 64 Jul 13 15:34 4 -> 
/tmp/uscreens/S-jzbiciak/13738.pts-2.durable03|

This was a self-compiled binary under RHEL WS 4.

$ screen -v
Screen version 4.00.02 (FAU) 5-Dec-03

Is there more information that I can provide to help diagnose this?   Things to 
probe around with with GDB?  A way to get this screen u