Re: Problem with pty allocation code, race condition?
On Wed, May 19, 2004 at 06:08:01PM -0400, Igor Pechtchanski wrote: On Mon, 17 May 2004, Christopher Faylor wrote: On Mon, May 17, 2004 at 09:29:44AM -0400, John P. Rouillard wrote: Does the last cygwin snapshot contain any code changes in the pty allocation area? Yes, the very latest snapshot attempts to fix this problem. Please give it a try and report the results here. I'm sorry to report that the problem I described in http://cygwin.com/ml/cygwin/2004-05/msg00596.html (multiple xterms started in quick succession get the same PTY) still exists in the 20040519 snapshot. The output of ps after reproducing it with 4 xterms is below, if it's of any help (the last bash, PID 1948, is a console). And, now I finally receive this email... Amazing. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Problem with pty allocation code, race condition?
On Mon, 17 May 2004, Christopher Faylor wrote: On Mon, May 17, 2004 at 09:29:44AM -0400, John P. Rouillard wrote: Does the last cygwin snapshot contain any code changes in the pty allocation area? Yes, the very latest snapshot attempts to fix this problem. Please give it a try and report the results here. I'm sorry to report that the problem I described in http://cygwin.com/ml/cygwin/2004-05/msg00596.html (multiple xterms started in quick succession get the same PTY) still exists in the 20040519 snapshot. The output of ps after reproducing it with 4 xterms is below, if it's of any help (the last bash, PID 1948, is a console). $ ps PIDPPIDPGID WINPID TTY UIDSTIME COMMAND 1732 11732 1748 con 1001 17:52:43 /usr/bin/xterm 2812 12812 1268 con 1001 17:52:43 /usr/bin/xterm 1160 11160 1152 con 1001 17:52:43 /usr/bin/xterm 1772 11772 1844 con 1001 17:52:43 /usr/bin/xterm S199617321996 20360 1001 17:52:53 /usr/bin/bash S124428121244 21200 1001 17:52:53 /usr/bin/bash I197211601972 21320 1001 17:52:53 /usr/bin/bash S198817721988 21280 1001 17:52:53 /usr/bin/bash 1948 11948 1948 con 1001 17:53:16 /usr/bin/bash 208819482088968 con 1001 17:53:19 /usr/bin/ps Note that now it's the third xterm that got a working bash, so I think my prior conclusion that the last one always wins was wrong. Note also that, for some reason, there was a 10-second delay between starting the xterms and having the bash shells started. As before, any ideas on how to debug this are appreciated. FWIW, cygcheck output is attached. Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_[EMAIL PROTECTED] ZZZzz /,`.-'`'-. ;-;;,_[EMAIL PROTECTED] |,4- ) )-,_. ,\ ( `'-' Igor Pechtchanski, Ph.D. '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! I have since come to realize that being between your mentor and his route to the bathroom is a major career booster. -- Patrick Naughton Cygwin Configuration Diagnostics Current System Time: Wed May 19 18:02:24 2004 Windows 2000 Professional Ver 5.0 Build 2195 Service Pack 3 Path: C:\cygwin\home\igor\bin C:\cygwin\usr\local\bin C:\cygwin\bin C:\cygwin\bin C:\cygwin\usr\local\games C:\cygwin\usr\X11R6\bin c:\Program Files\IBM\Java14\bin c:\Program Files\IBM\Infoprint Select c:\ActivePerl\bin c:\MikTeX\miktex\bin c:\WINNT\system32 c:\WINNT c:\WINNT\system32\wbem c:\Program Files\IBM\Trace Facility c:\Program Files\Personal Communications c:\Notes c:\Utilities c:\Program Files\ThinkPad\Utilities c:\cygwin\bin .\ Output from C:\cygwin\bin\id.exe (nontsec) UID: 1001(igor) GID: 544(Administrators) 544(Administrators) 10953([EMAIL PROTECTED]) Output from C:\cygwin\bin\id.exe (ntsec) UID: 1001(igor) GID: 544(Administrators) 513(None)544(Administrators) 545(Users) SysDir: C:\WINNT\system32 WinDir: C:\WINNT CYGWIN = `check_case:strict ntsec notitle binmode nosmbntsec notty' HOME = `C:\cygwin\home\igor' MAKE_MODE = `unix' PWD = `/home/igor' USER = `igor' ALLUSERSPROFILE = `C:\Documents and Settings\All Users' APPDATA = `C:\Documents and Settings\igor\Application Data' COMMONPROGRAMFILES = `C:\Program Files\Common Files' COMPUTERNAME = `PECHTCHA' COMSPEC = `C:\WINNT\system32\cmd.exe' CVSREAD = `true' CVS_RSH = `/bin/ssh' HOMEDRIVE = `C:' HOMEPATH = `\Documents and Settings\igor' HOSTNAME = `pechtcha' JAVA_HOME = `/usr/contrib/java' JIKESPATH = `.:/usr/contrib/java/jre/lib/core.jar:/usr/contrib/java/jre/lib/charsets.jar' LOGONSERVER = `\\PECHTCHA' MANPATH = `/usr/local/man:/usr/man:/usr/autotool/devel/man:/usr/contrib/jikes/man::/usr/ssl/man:/usr/X11R6/man' NUMBER_OF_PROCESSORS = `1' OLDPWD = `/tmp' OS2LIBPATH = `C:\WINNT\system32\os2\dll;' OS = `Windows_NT' PATHEXT = `.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH' PCOMM_ROOT = `C:\Program Files\Personal Communications' PDBASE = `C:\Program Files\IBM\Infoprint Select' PDHOST = `ushawsrv01' PD_SOCKET = `6874' PKG_CONFIG_PATH = `/usr/X11R6/lib/pkgconfig' PROCESSOR_ARCHITECTURE = `x86' PROCESSOR_IDENTIFIER = `x86 Family 6 Model 8 Stepping 6, GenuineIntel' PROCESSOR_LEVEL = `6' PROCESSOR_REVISION = `0806' PROGRAMFILES = `C:\Program Files' PROMPT = `$P$G' PS1 = `[\[\033[32m\]\h\[\033[0m\]:\[\033[33m\]\w\[\033[0m\]] \[\]' SHLVL = `1' SYSTEMDRIVE = `C:' SYSTEMROOT = `C:\WINNT' TEMP = `C:\cygwin\tmp' TERM = `cygwin' TMP = `C:\cygwin\tmp' USERDOMAIN = `PECHTCHA' USERNAME = `igor' USERPROFILE = `C:\Documents and Settings\igor' WINDIR = `C:\WINNT' _ =
Problem with pty allocation code, race condition?
Hello: I have noticed a problem when I start X windows. As part of my startup, I fire up three xterms, but only one of them actually completes and displays a prompt. I believe there may be a race condition in the pty allocation code as the three bash processes all share the same tty. ps -ef shows: UID PIDPPID TTY STIME COMMAND jrouilla23322252 1 09:04:19 /usr/bin/bash jrouilla23402248 1 09:04:19 /usr/bin/bash jrouilla23482168 1 09:04:19 /usr/bin/bash jrouilla26322332 1 09:05:01 /usr/bin/ps The one with pid 2332 I believe was the first to start based on the PID, but I also remember that the PID's are not monotonically increasing under cygwin so YMMV. However pid 2332 is the one (verified using echo $$) that I can interact with. The other two are frozen with no output or input (I entered a ^D which should have exited the shell). This failure usually occurs when I first log in in windows and run all my startup scripts. It is less likely to occur if I start up X after all the rest of the login processes have run, but I can provoke it here as well but with a lower frequency. A proper startup with three running bash/xterms looks like: UID PIDPPID TTY STIME COMMAND jrouilla24002216 1 09:11:05 /usr/bin/bash jrouilla26802204 3 09:11:05 /usr/bin/bash jrouilla27322188 4 09:11:06 /usr/bin/bash jrouilla27122400 1 09:11:09 /usr/bin/ps Does the last cygwin snapshot contain any code changes in the pty allocation area? If so I can try it and see if it helps. I am already running a snapshot from 20040412-23:00:24, but both 1.5.9 and this snapshot have the same issue AFAICT. It's an intermittent problem for me, but I will be happy to provide any info I can. I have attached the cygcheck output lightly edited to hide IP addresses and internal groups. If you need that info to debug the problem, I will send unedited output on request. -- rouilj John Rouillard === My employers don't acknowledge my existence much less my opinions. Cygwin Win95/NT Configuration Diagnostics Current System Time: Mon May 17 09:16:25 2004 Windows 2000 Professional Ver 5.0 Build 2195 Service Pack 4 Path: h:\local\bin C:\progra~1\cygwin\usr\X11R6\bin C:\progra~1\cygwin\tools\local\bin C:\progra~1\cygwin\usr\local\bin C:\progra~1\cygwin\bin C:\progra~1\cygwin\bin C:\progra~1\cygwin\bin C:\progra~1\cygwin\usr\X11R6\bin c:\WINNT\system32 c:\WINNT c:\WINNT\System32\Wbem c:\Program Files\Hummingbird\Connectivity\9.00\Security\Kerberos\ Output from C:\progra~1\cygwin\bin\id.exe (nontsec) UID: 165(jrouilla) GID: 507(hnm) 507(hnm) Output from C:\progra~1\cygwin\bin\id.exe (ntsec) UID: 165(jrouilla) GID: 507(hnm) 0(root) 544(Administrators) 545(Users) 10513(Domain Users) 33507(hnm) SysDir: C:\WINNT\system32 WinDir: C:\WINNT CYGWIN = `server' HOME = `h:\' MAKE_MODE = `unix' PWD = `/h' USER = `jrouilla' ALLUSERSPROFILE = `C:\Documents and Settings\All Users' APPDATA = `C:\Documents and Settings\jrouilla\Application Data' COLORFGBG = `default;default;0' COLORTERM = `rxvt-xpm' COMMONPROGRAMFILES = `C:\Program Files\Common Files' COMPUTERNAME = `SC028764' COMSPEC = `C:\WINNT\system32\cmd.exe' CVSROOT = `:ext:127.0.0.1:/twiki/src/cvsroot' CVS_RSH = `ssh' DISPLAY = `:1' HOMEDRIVE = `C:' HOMEPATH = `\Documents and Settings\jrouilla' HOMESHARE = `\\mlbapp2\jrouilla$' LESS = `-eiMqX -h5 -j3' MANPATH = `:/usr/ssl/man:/usr/X11R6/man' NUMBER_OF_PROCESSORS = `1' OLDPWD = `/c/Program Files/cygwin/bin' OS2LIBPATH = `C:\WINNT\system32\os2\dll;' OS = `Windows_NT' PAGER = `less' PATHEXT = `.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH' PKG_CONFIG_PATH = `/usr/X11R6/lib/pkgconfig' PROCESSOR_ARCHITECTURE = `x86' PROCESSOR_IDENTIFIER = `x86 Family 6 Model 11 Stepping 1, GenuineIntel' PROCESSOR_LEVEL = `6' PROCESSOR_REVISION = `0b01' PROGRAMFILES = `C:\Program Files' PS1 = `$PWD \! ' SCVER = `3.0b' SHLVL = `1' SMS_LOCAL_DIR = `C:\WINNT' SSH_AGENT_PID = `2076' SSH_AUTH_SOCK = `/tmp/ssh-fPZbxu1728/agent.1728' SYSTEMDRIVE = `C:' SYSTEMROOT = `C:\WINNT' TEMP = `c:\DOCUME~1\jrouilla\LOCALS~1\Temp' TERM = `rxvt' TMP = `c:\DOCUME~1\jrouilla\LOCALS~1\Temp' USERDNSDOMAIN = `cs.myharris.net' USERDOMAIN = `HARRIS' USERNAME = `jrouilla' USERPROFILE = `C:\Documents and Settings\jrouilla' VISUAL = `/h/local/bin/ec' WINDIR = `C:\WINNT' WINDOWID = `168329344' _ = `/usr/bin/cygcheck' POSIXLY_CORRECT = `1' HKEY_CURRENT_USER\Software\Cygnus Solutions HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin\mounts v2 HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin\Program Options HKEY_LOCAL_MACHINE\SOFTWARE\Cygnus Solutions HKEY_LOCAL_MACHINE\SOFTWARE\Cygnus Solutions\Cygwin
Re: Problem with pty allocation code, race condition?
On Mon, 17 May 2004, John P. Rouillard wrote: Hello: I have noticed a problem when I start X windows. As part of my startup, I fire up three xterms, but only one of them actually completes and displays a prompt. I believe there may be a race condition in the pty allocation code as the three bash processes all share the same tty. ps -ef shows: UID PIDPPID TTY STIME COMMAND jrouilla23322252 1 09:04:19 /usr/bin/bash jrouilla23402248 1 09:04:19 /usr/bin/bash jrouilla23482168 1 09:04:19 /usr/bin/bash jrouilla26322332 1 09:05:01 /usr/bin/ps The one with pid 2332 I believe was the first to start based on the PID, but I also remember that the PID's are not monotonically increasing under cygwin so YMMV. However pid 2332 is the one (verified using echo $$) that I can interact with. The other two are frozen with no output or input (I entered a ^D which should have exited the shell). This failure usually occurs when I first log in in windows and run all my startup scripts. It is less likely to occur if I start up X after all the rest of the login processes have run, but I can provoke it here as well but with a lower frequency. A proper startup with three running bash/xterms looks like: UID PIDPPID TTY STIME COMMAND jrouilla24002216 1 09:11:05 /usr/bin/bash jrouilla26802204 3 09:11:05 /usr/bin/bash jrouilla27322188 4 09:11:06 /usr/bin/bash jrouilla27122400 1 09:11:09 /usr/bin/ps Does the last cygwin snapshot contain any code changes in the pty allocation area? If so I can try it and see if it helps. I am already running a snapshot from 20040412-23:00:24, but both 1.5.9 and this snapshot have the same issue AFAICT. It's an intermittent problem for me, but I will be happy to provide any info I can. I have attached the cygcheck output lightly edited to hide IP addresses and internal groups. If you need that info to debug the problem, I will send unedited output on request. -- rouilj FWIW, I can confirm that this problem has existed for a while (as long as I can remember) -- if you fire up two xterms in quick succession, especially under heavy load, there are good chances that they will share a pty. The output of regular ps will show that the bash processes are in the suspended state (S), and sending SIGCONT doesn't work. The first xterm (judging by the window position) is always the one getting the suspended bash. Also, it seems to happen more often when the shortcut to start the xterm hasn't been used in a while (evicted from disk cache?), so this makes it hard to reproduce the problem twice in a row. I believe I've reported this before, but couldn't come up with a small reproducible testcase (although I just managed to reproduce it on my machine -- Win2kPro SP3, Cygwin 1.5.9 -- again, using the above recipe). It's annoying enough that I'd like to try debugging it. Of course, as with most races, running the xterms under strace fixes it... Attaching to a hung bash is, IMO, useless, as all of the pty assignments have already happened by that point. Any pointers on how to catch this in the act are appreciated. Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_[EMAIL PROTECTED] ZZZzz /,`.-'`'-. ;-;;,_[EMAIL PROTECTED] |,4- ) )-,_. ,\ ( `'-' Igor Pechtchanski, Ph.D. '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! I have since come to realize that being between your mentor and his route to the bathroom is a major career booster. -- Patrick Naughton -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Problem with pty allocation code, race condition?
On Mon, May 17, 2004 at 09:29:44AM -0400, John P. Rouillard wrote: Does the last cygwin snapshot contain any code changes in the pty allocation area? Yes, the very latest snapshot attempts to fix this problem. Please give it a try and report the results here. -- Christopher Faylor spammer? - [EMAIL PROTECTED] Cygwin Co-Project Leader[EMAIL PROTECTED] TimeSys, Inc. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Problem with pty allocation code, race condition?
Christopher Faylor wrote: Yes, the very latest snapshot attempts to fix this problem. Please give it a try and report the results here. If this is of any help as far as testing I normally start up multiple sessions for login using the following for my cygwin.bat file: @echo off SET MAKE_MODE=unix SET CYGWIN=binmode ntsec nostrip_title title tty SET PATH=d:\cygwin\usr\local\bin;d:\cygwin\usr\bin;d:\cygwin\bin;%PATH%;. D: chdir \cygwin start CMD /c bin\rxvt -geometry 90x30 -fg grey -bg midnightblue -cr red -sr -sl 2000 -fn Lucida Console-12 -tn rxvt -e /usr/bin/login start CMD /c bin\login.exe start CMD /c bin\login.exe start CMD /c bin\login.exe exit Prior to the current snapshot depending on system load I might or might not get 4 sessions started as they should be with a login prompt. With the current snapshot 20040517 this has worked successfully repeated times with some heavy loads on my laptop. bk -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/