UV command failing mystery - RESOLUTION
I had the following kernel parameters changed over the weekend: SDELSIM 256 to 1024 SEMOPM 64 to 100 SEMUME 64 to 1024 SHMMNI 2048 to 4096 SEMMNI 2048 to 4096 SEMMSL 256 to 1024 And the problem has not occurred since! I have reviewed the documentation on each of the above parameters in hopes of determining which parameter was set incorrectly. SDELSIM - The default number of file descriptors per process. (Referred to as NOFILES in the UV documentation.) I had set SDELSIM to exceed MFILES plus the 8 internal UV files. HDESLIM (the maximum number of file descriptors a user is allowed to have open) was left at the default value of 1024. SEMOPM - The maximum number of semaphore operations per semop (2) call. Although not a parameter mentioned in the UV documentation I had concluded from some Unix documentation that the value of 100 was unnecessarily large and had reduced the value to 64. The default value is 10. SEMUME - The maximum number of Undo entries per Undo structure. Also not a parameter mentioned in the UV documentation, a reading (mis-reading) of the Unix documentation led me to conclude that the value was unnecessarily large. Since I have now learned that the default value is 1024 this change is suspect. SHMMNI - The maximum number of shared memory segments that may exist in the system at one time. The UV documentation recommends setting this to a number greater than the maximum number of concurrent Universe users plus 2. Our UV user count has not exceeded 700 users in the last few months. SEMMNI - The maximum number of semaphore sets. The UV documentation states that Universe requires 2 semaphore sets. The ipcs -s command reports that our system uses 7. SEMMSL - The maximum number of semaphores per set. The UV documentation states that this should be at least equal to FSEMNUM + GSEMNUM + 5. On our system that total is 152. I still am not sure which parameter(s) were the source of the problem. And I intend to leave well enough alone so I may never know for sure. In any case I hope that if someone else encounters the same problem they can benefit from my experience. I want to thank everyone who offered help and took the time to make suggestions. Vance Dailey -Original Message- From: Vance Dailey [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 03, 2004 5:23 PM To: '[EMAIL PROTECTED]' Subject: UV command failing mystery We are having a very strange intermittent problem with the UV command not working from Unix. Occasionally, after a user logs into Unix (without noticing anything unusual) typing "UV" simply returns the user to the UNIX shell almost instantly. When the problem occurs it seems to affect everyone who logins in and attempts to go into Universe for a period of time and then the problem seems to resolve itself. Any users who logged into Unix during that period of time still can not go into Universe but new logins work fine. The problem seems to be with Universe. Unix commands work fine and when we have tried executing other Universe commands which normally can be run from Unix they fail also. The Unix login script seems to run fine. When the problem occurs users already in Universe notice no problems. No unusual locks or performance problems have been noticed. The problem does not seem to be load related since it happens at apparently random times including times when very few users are logged in. We have been running 9.6.2.2 on DG/UX for several years and have never had the problem until the last couple of months. The only thing that may be suspicious is some changes we made to some kernel and UV config settings a few weeks prior to the first reported problem. The following changes were made: (KERNEL) SDELSIM 2048 TO 256 SEMOPM 100 TO 64 SEMUME 1024 TO 64 SHMMNI 4096 TO 2048 SEMMNI 4096 TO 2048 (UV CONFIG) MFILES 56 TO 200 T30FILE 8000 TO 200 (we have no dynamic files) FSEMNUM 101 TO 50 GSEMNUM 211 TO 97 GLTABSZ 150 TO 75 RLTABSZ 150 TO 75 MAXRLOCK 100 TO 74 Any help solving this puzzle would be greatly appreciated. Thanks, Vance Dailey -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
We upgraded our system from a 16 processor AV2 to an 8 processor AV25000. Because each block of 4 cpus can only support 4gb of memory our upgrade was going to cut our memory in half. So, the main purpose of the changes was to cut the memory requirements of the universe lock tables. And in fact we successfully cut the per user memory usage in half. The other changes were made as part of the overall review of both uv.config and the kernel parameters prior to the upgrade. Because several people suggested that the semaphore and/or shared memory changes could be the cause of the problem I had them changed back to their original values yesterday. So far we have not had a failure, but its a little too soon to celebrate. Thanks for the advice on using the trace. I was not able to pick out the system calls and before I make another attempt I think I will wait and see if the weekends changes fixed the problem. If the problem is solved I will post in case someone else runs in the same problem in the future. Thanks, Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Ken Wallis Sent: Sunday, February 08, 2004 7:52 PM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery >From: Vance Dailey >It was suggested that I try to run dg_strace. I ran it on one >of the failing >uv processes. It generated a 1mb file. I can see where it >executes "uvsh" >and It fails just after the 7th occurance of "RUN APP.PROGS >PACKAGE.INS". >(The 7th run is just after the string >"SPECIAL.EDITOR.SELECT.DATA\OLONG".) I >have no idea how to read this file but I thought it might help identify >where the error occurred. I have included the very end of the output of >dg_strace below: > ... >close(3)= 0 >sigaction_svr3(SIGQUIT, {...}, {...}) = 1253 >sigaction_svr3(SIGNULL, {...}, {0xc0a0d, [XCPU XFSZ], >SA_RESTART|SA_SIGINFO}) = 2130681856 ... This tool seems to be showing you the system calls that uvsh is making and the values returned from them (the bit after the "="). The section you have shown is simply the program trying to tidy up and exit after detecting something it didn't like. You'll need to look higher up in the output for a system call which seems to return an error code. Unfortunately, you need to know what sort of system calls should return 0 all the time and which ones regularly return other values. I think I'd be looking at calls to sem...() or shm...() functions that return non-zero and then using errmsg (if DG/UX has that, or vi-ing /usr/include/sys/errno.h if it doesn't) to see what the error numbers returned mean and man to interpret from that where the problem lies. I can't remember the exact numbers you quoted earlier, but certainly with your user counts I'd be very suspicious of the reductions you made to the semaphore kernel parameters. Just as a matter of interest, why were these reductions made? HTH, Ken -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
>From: Vance Dailey >It was suggested that I try to run dg_strace. I ran it on one >of the failing >uv processes. It generated a 1mb file. I can see where it >executes "uvsh" >and It fails just after the 7th occurance of "RUN APP.PROGS >PACKAGE.INS". >(The 7th run is just after the string >"SPECIAL.EDITOR.SELECT.DATA\OLONG".) I >have no idea how to read this file but I thought it might help identify >where the error occurred. I have included the very end of the output of >dg_strace below: > ... >close(3)= 0 >sigaction_svr3(SIGQUIT, {...}, {...}) = 1253 >sigaction_svr3(SIGNULL, {...}, {0xc0a0d, [XCPU XFSZ], >SA_RESTART|SA_SIGINFO}) = 2130681856 ... This tool seems to be showing you the system calls that uvsh is making and the values returned from them (the bit after the "="). The section you have shown is simply the program trying to tidy up and exit after detecting something it didn't like. You'll need to look higher up in the output for a system call which seems to return an error code. Unfortunately, you need to know what sort of system calls should return 0 all the time and which ones regularly return other values. I think I'd be looking at calls to sem...() or shm...() functions that return non-zero and then using errmsg (if DG/UX has that, or vi-ing /usr/include/sys/errno.h if it doesn't) to see what the error numbers returned mean and man to interpret from that where the problem lies. I can't remember the exact numbers you quoted earlier, but certainly with your user counts I'd be very suspicious of the reductions you made to the semaphore kernel parameters. Just as a matter of interest, why were these reductions made? HTH, Ken -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
It was suggested that I try to run dg_strace. I ran it on one of the failing uv processes. It generated a 1mb file. I can see where it executes "uvsh" and It fails just after the 7th occurance of "RUN APP.PROGS PACKAGE.INS". (The 7th run is just after the string "SPECIAL.EDITOR.SELECT.DATA\OLONG".) I have no idea how to read this file but I thought it might help identify where the error occurred. I have included the very end of the output of dg_strace below: , {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, { NULL, 0}, {NULL, 0}, {NULL, 134688924}], 8192) = 8192 close(3)= 0 sigaction_svr3(SIGQUIT, {...}, {...}) = 1253 sigaction_svr3(SIGNULL, {...}, {0xc0a0d, [XCPU XFSZ], SA_RESTART|SA_SIGINFO}) = 2130681856 sigaction_svr3(SIGINT, {0xc0a0d, [XCPU XFSZ], SA_RESTART|SA_SIGINFO}, {0x74706f2 f, [HUP INT QUIT ILL ABRT KILL SEGV PIPE ALRM TERM CLD PWR URG POLL STOP CONT TT IN TTOU VTALRM XCPU ??? XCPU XFSZ PROF ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ? ?? ??? DGTIMER3 DGTIMER4], SA_RESTART|SA_SIGINFO|SA_RESETHAND|SA_ONSTACK|SA_NOCL DSTOP|SA_NOCLDWAIT|0x72707520}) = 0 wait([WIFSIGNALED(s) && WTERMSIG(s) == SIG???]) = 20325 wait4(134508240, --- SIGCLD --- <... wait4 resumed> [WIFSIGNALED(s) && WTERMSIG(s) == SIG???], WEXITED|WTRAPPED, NULL) = 20325 _exit(1)= ? +++ Exited with status 1 +++ Process 20284 detached root imm # -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I don't think its a license issue. analye.shm -c report the correct number of licenses (1065) while we only have 700 users at peak times. Also, when the uv command fails it returns to the shell almost instantly with no message. echo $? returns a 1 (one). Still, that is a difference in the code executed by a terminal users and a phantom. Perhaps the command fails in the process of checking the licenses? Thanks, vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Timothy Snyder Sent: Friday, February 06, 2004 11:55 AM To: U2 Users Discussion List Subject: Re: UV command failing mystery > It appears that Phantoms can always get into Universe but Terminal > sessions sometimes can not. Is there more information besides the fact that they "can't get in"? It could mean that you're exhausting all of the available licenses. Terminal sessions will consume a license while phantoms will not. Tim Snyder IBM Data Management Solutions Consulting I/T Specialist , U2 Professional Services [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
It looks like the command is "dg_strace" for DG/UX. My problem is that I don't understand the output yet. It looks like it may be a good tool to know how to use. Thanks, Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Gerry Maddock Sent: Friday, February 06, 2004 11:42 AM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery If your running Redhat,Mandrake, or Fedora, the command is strace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vance Dailey Sent: Friday, February 06, 2004 11:36 AM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery I checked out the link you provided. It appears that the command dg_strace may be what you are suggesting. I have not had time to try it but it looks very interesting. Not having used a command like this before I may have a bit of a learning curve. Once I do try it on a failure I will let you know what I find. Thanks for the tip. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave Kimmel Sent: Wednesday, February 04, 2004 11:08 PM To: U2 Users Discussion List Subject: Re: UV command failing mystery On Feb 4, 2004, at 4:38 PM, Vance Dailey wrote: > Well, it did not take long to get a chance to test trying 'uvsh'. The > problem reared its ugly head again today and we now know that uvsh > also does not work. The command returns to the shell so quickly that > whatever the problem is it must occur near the start of the program. > If you happen to know what the uvsh does first it may help me figure > out where the problem may lie. I would have thought that an error of > this sort would have been recorded in a log file but I have not > stumbled across one yet. Does your system have a command to trace system calls? You can use this to see what UniVerse is doing (at a very low level) - it may help you find the cause of this problem. As for finding the command, the various unix flavors all seem to call it something slightly different, but the Rosetta Stone may be able to help you: http://bhami.com/rosetta.html Look under the "tracing utility" item. -- Dave Kimmel [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I don't think its a license issue. analye.shm -c report the correct number of licenses (1065) while we only have 700 users at peak times. Also, when the uv command fails it returns to the shell almost instantly with no message. echo $? returns a 1 (one). Still, that is a difference in the code executed by a terminal users and a phantom. Perhaps the command fails in the process of checking the licenses? Thanks, vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Hennessey, Mark F. Sent: Friday, February 06, 2004 11:51 AM To: U2 Users Discussion List Subject: RE: UV command failing mystery It appears that Phantoms can always get into Universe but Terminal sessions sometimes can not. For what it's worth, phantoms do not consume licenses, while terminals do... -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
Re: UV command failing mystery
> It appears that Phantoms can always get into Universe but Terminal > sessions sometimes can not. Is there more information besides the fact that they "can't get in"? It could mean that you're exhausting all of the available licenses. Terminal sessions will consume a license while phantoms will not. Tim Snyder IBM Data Management Solutions Consulting I/T Specialist , U2 Professional Services [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
It appears that Phantoms can always get into Universe but Terminal sessions sometimes can not. For what it's worth, phantoms do not consume licenses, while terminals do... -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
UV command failing mystery
(If this is a duplicate post I apologize - my email generated a error when I sent it the first time) We have a new observation. We wrote two processes. One is a universe process which starts up a new phantom each 60 seconds. The other is a ProComm script which logs in a terminal session each 60 seconds. Based on the results last night It appears that Phantoms can always get into Universe but Terminal sessions sometimes can not. Also the ProComm script disproved that everyone is affected by the problem when it occurs. There are times when many users (if not most) users are affected and other times when few (if any) users are affected. The script also made it clear that the problem is happening more often than we realized. Does this new observation suggest sometime something we should be looking at? Thanks once again, Vance Dailey -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
If your running Redhat,Mandrake, or Fedora, the command is strace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vance Dailey Sent: Friday, February 06, 2004 11:36 AM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery I checked out the link you provided. It appears that the command dg_strace may be what you are suggesting. I have not had time to try it but it looks very interesting. Not having used a command like this before I may have a bit of a learning curve. Once I do try it on a failure I will let you know what I find. Thanks for the tip. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave Kimmel Sent: Wednesday, February 04, 2004 11:08 PM To: U2 Users Discussion List Subject: Re: UV command failing mystery On Feb 4, 2004, at 4:38 PM, Vance Dailey wrote: > Well, it did not take long to get a chance to test trying 'uvsh'. The > problem reared its ugly head again today and we now know that uvsh > also does not work. The command returns to the shell so quickly that > whatever the problem is it must occur near the start of the program. > If you happen to know what the uvsh does first it may help me figure > out where the problem may lie. I would have thought that an error of > this sort would have been recorded in a log file but I have not > stumbled across one yet. Does your system have a command to trace system calls? You can use this to see what UniVerse is doing (at a very low level) - it may help you find the cause of this problem. As for finding the command, the various unix flavors all seem to call it something slightly different, but the Rosetta Stone may be able to help you: http://bhami.com/rosetta.html Look under the "tracing utility" item. -- Dave Kimmel [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I checked out the link you provided. It appears that the command dg_strace may be what you are suggesting. I have not had time to try it but it looks very interesting. Not having used a command like this before I may have a bit of a learning curve. Once I do try it on a failure I will let you know what I find. Thanks for the tip. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave Kimmel Sent: Wednesday, February 04, 2004 11:08 PM To: U2 Users Discussion List Subject: Re: UV command failing mystery On Feb 4, 2004, at 4:38 PM, Vance Dailey wrote: > Well, it did not take long to get a chance to test trying 'uvsh'. The > problem reared its ugly head again today and we now know that uvsh > also does > not work. The command returns to the shell so quickly that whatever the > problem is it must occur near the start of the program. If you happen > to > know what the uvsh does first it may help me figure out where the > problem > may lie. I would have thought that an error of this sort would have > been > recorded in a log file but I have not stumbled across one yet. Does your system have a command to trace system calls? You can use this to see what UniVerse is doing (at a very low level) - it may help you find the cause of this problem. As for finding the command, the various unix flavors all seem to call it something slightly different, but the Rosetta Stone may be able to help you: http://bhami.com/rosetta.html Look under the "tracing utility" item. -- Dave Kimmel [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
That's a good point. To the best of my knowledge we don't have another (non Universe) application running on the system which uses semaphores or shared memory. We are going to make the changes over the weekend. Given how often the problem is happening now I should know if the problem is fixed by Monday evening. Thanks again, Vance Dailey -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of Timothy SnyderSent: Thursday, February 05, 2004 6:43 PMTo: U2 Users Discussion ListSubject: RE: UV command failing mystery It's quite possible that the other processes you're launching aren't using semaphores or shared memory, and therefore wouldn't be impacted if those resources were the cause of the problem.Tim SnyderIBM Data Management SolutionsConsulting I/T Specialist , U2 Professional Services[EMAIL PROTECTED][EMAIL PROTECTED] (Vance Dailey) [EMAIL PROTECTED] (Vance Dailey)Sent by: [EMAIL PROTECTED] 02/05/2004 06:36 PMPlease respond to U2 Users Discussion List To: "'U2 Users Discussion List'" <[EMAIL PROTECTED]>cc: Subject: RE: UV command failing mysteryI am going to change back a few parameters I changed in December. Still, theproblem seems to be very selective as it only affects users going intoUniverse. I am going to change SEMMNI from 1024 to 2048 and SEMMSL from 256to 1024 and SHMMNI from 1024 to 2048.Thanks for the suggestion. I'll post with the results.Vance-Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]OnBehalf Of Glenn HerbertSent: Thursday, February 05, 2004 10:21 AMTo: U2 Users Discussion ListSubject: RE: UV command failing mysteryOk. So its not an exec/fork problem. My next suspicion would be having todo with either SHMxxx or SEM kernel parameters not beingsufficient. In the cases where these resources are inadequate it ispossible the OS may boot you out.At 06:38 PM 02/04/2004, you wrote:>Well, it did not take long to get a chance to test trying 'uvsh'. The>problem reared its ugly head again today and we now know that uvsh alsodoes>not work. The command returns to the shell so quickly that whatever the>problem is it must occur near the start of the program. If you happen to>know what the uvsh does first it may help me figure out where the problem>may lie. I would have thought that an error of this sort would have been>recorded in a log file but I have not stumbled across one yet.>Thanks again,>Vance Dailey>>-Original Message->From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On>Behalf Of Glenn Herbert>Sent: Wednesday, February 04, 2004 10:13 AM>To: U2 Users Discussion List>Subject: Re: UV command failing mystery>>>The 'uv' command is basically a small front ender to the 'uvsh' executable,>which is really the guts of universe. The uv command does little more (for>universe) then to check ulimit settings, increasing them when necessary,>then issuing an execve() call to uvsh. If you bypass 'uv' and just use>'uvsh', does the problem still occur?>>Your universe configurables seem reasonable, as well as your kernel params,>but I do recall an issue whereby the OS would disallow exec's (and /or>forks) due to system resource exhaustion, although it has been a while and>I cannot recall the details.>>At 05:23 PM 02/03/2004, you wrote:> >We are having a very strange intermittent problem with the UV command not> >working from Unix.>>-->u2-users mailing list>[EMAIL PROTECTED]>http://www.oliver.com/mailman/listinfo/u2-users>>-->u2-users mailing list>[EMAIL PROTECTED]>http://www.oliver.com/mailman/listinfo/u2-users--u2-users mailing list[EMAIL PROTECTED]http://www.oliver.com/mailman/listinfo/u2-users-- u2-users mailing list[EMAIL PROTECTED]http://www.oliver.com/mailman/listinfo/u2-users <><>-- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I was mistaken. echo $? is returning a 1 (one). running echo $? from a script seems to report the status of the script not the command before the script. Sorry for any confusion. Vance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Lee Leitner Sent: Wednesday, February 04, 2004 9:16 PM To: [EMAIL PROTECTED] Subject: RE: UV command failing mystery Vance: Is echo $? returning anything meaningful after uvsh fails? Lee -- Lee J. Leitner, Ph.D. [EMAIL PROTECTED] http://www.leitner.org/~leitnerl The world can only be grasped by action, not by contemplation. The hand is the cutting edge of the mind. -- Jacob Bronowski V.13.0 --- -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
It's quite possible that the other processes you're launching aren't using semaphores or shared memory, and therefore wouldn't be impacted if those resources were the cause of the problem. Tim Snyder IBM Data Management Solutions Consulting I/T Specialist , U2 Professional Services [EMAIL PROTECTED] [EMAIL PROTECTED] (Vance Dailey) [EMAIL PROTECTED] (Vance Dailey) Sent by: [EMAIL PROTECTED] 02/05/2004 06:36 PM Please respond to U2 Users Discussion List To: "'U2 Users Discussion List'" <[EMAIL PROTECTED]> cc: Subject: RE: UV command failing mystery I am going to change back a few parameters I changed in December. Still, the problem seems to be very selective as it only affects users going into Universe. I am going to change SEMMNI from 1024 to 2048 and SEMMSL from 256 to 1024 and SHMMNI from 1024 to 2048. Thanks for the suggestion. I'll post with the results. Vance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Glenn Herbert Sent: Thursday, February 05, 2004 10:21 AM To: U2 Users Discussion List Subject: RE: UV command failing mystery Ok. So its not an exec/fork problem. My next suspicion would be having to do with either SHMxxx or SEM kernel parameters not being sufficient. In the cases where these resources are inadequate it is possible the OS may boot you out. At 06:38 PM 02/04/2004, you wrote: >Well, it did not take long to get a chance to test trying 'uvsh'. The >problem reared its ugly head again today and we now know that uvsh also does >not work. The command returns to the shell so quickly that whatever the >problem is it must occur near the start of the program. If you happen to >know what the uvsh does first it may help me figure out where the problem >may lie. I would have thought that an error of this sort would have been >recorded in a log file but I have not stumbled across one yet. >Thanks again, >Vance Dailey > >-Original Message- >From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On >Behalf Of Glenn Herbert >Sent: Wednesday, February 04, 2004 10:13 AM >To: U2 Users Discussion List >Subject: Re: UV command failing mystery > > >The 'uv' command is basically a small front ender to the 'uvsh' executable, >which is really the guts of universe. The uv command does little more (for >universe) then to check ulimit settings, increasing them when necessary, >then issuing an execve() call to uvsh. If you bypass 'uv' and just use >'uvsh', does the problem still occur? > >Your universe configurables seem reasonable, as well as your kernel params, >but I do recall an issue whereby the OS would disallow exec's (and /or >forks) due to system resource exhaustion, although it has been a while and >I cannot recall the details. > >At 05:23 PM 02/03/2004, you wrote: > >We are having a very strange intermittent problem with the UV command not > >working from Unix. > >-- >u2-users mailing list >[EMAIL PROTECTED] >http://www.oliver.com/mailman/listinfo/u2-users > >-- >u2-users mailing list >[EMAIL PROTECTED] >http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users <><><>-- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I am going to change back a few parameters I changed in December. Still, the problem seems to be very selective as it only affects users going into Universe. I am going to change SEMMNI from 1024 to 2048 and SEMMSL from 256 to 1024 and SHMMNI from 1024 to 2048. Thanks for the suggestion. I'll post with the results. Vance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Glenn Herbert Sent: Thursday, February 05, 2004 10:21 AM To: U2 Users Discussion List Subject: RE: UV command failing mystery Ok. So its not an exec/fork problem. My next suspicion would be having to do with either SHMxxx or SEM kernel parameters not being sufficient. In the cases where these resources are inadequate it is possible the OS may boot you out. At 06:38 PM 02/04/2004, you wrote: >Well, it did not take long to get a chance to test trying 'uvsh'. The >problem reared its ugly head again today and we now know that uvsh also does >not work. The command returns to the shell so quickly that whatever the >problem is it must occur near the start of the program. If you happen to >know what the uvsh does first it may help me figure out where the problem >may lie. I would have thought that an error of this sort would have been >recorded in a log file but I have not stumbled across one yet. >Thanks again, >Vance Dailey > >-Original Message- >From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >Behalf Of Glenn Herbert >Sent: Wednesday, February 04, 2004 10:13 AM >To: U2 Users Discussion List >Subject: Re: UV command failing mystery > > >The 'uv' command is basically a small front ender to the 'uvsh' executable, >which is really the guts of universe. The uv command does little more (for >universe) then to check ulimit settings, increasing them when necessary, >then issuing an execve() call to uvsh. If you bypass 'uv' and just use >'uvsh', does the problem still occur? > >Your universe configurables seem reasonable, as well as your kernel params, >but I do recall an issue whereby the OS would disallow exec's (and /or >forks) due to system resource exhaustion, although it has been a while and >I cannot recall the details. > >At 05:23 PM 02/03/2004, you wrote: > >We are having a very strange intermittent problem with the UV command not > >working from Unix. > >-- >u2-users mailing list >[EMAIL PROTECTED] >http://www.oliver.com/mailman/listinfo/u2-users > >-- >u2-users mailing list >[EMAIL PROTECTED] >http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Hello Lee, echo $? has been returning 0 (zero). We have been having the operators run the following script: #!/bin/ksh echo $? >> /usr/opt/uv/uvproblem/uv.problem;date >> /usr/opt/uv/uvproblem/uv.pro blem;who am i >> /usr/opt/uv/uvproblem/uv.problem;ps >> /usr/opt/uv/uvproblem/uv .problem;env >> /usr/opt/uv/uvproblem/uv.problem echo "DONE" a sample of the output generated is: 0 Wed Feb 4 22:41:26 EST 2004 sysops tty1050 Feb 4 22:35 PID TTY TIME CMD 8337 1050 0:00 -sh 8605 1050 0:00 ksh 8608 1050 0:00 ksh 8609 1050 0:00 nps _=/usr/bin/env TMPDIR=/IMM29/tmp LANG=C NLSPATH=/usr/lib/nls/msg/%L/%N:/etc/nls/msg/%L/%N HZ=100 PATH=/usr/bin:/usr/opt/uv/bin:/usr/local/bin LOGNAME=sysops MAIL=/var/mail/sysops SHELL=/sbin/sh HOME=/home/sysops SEMSTATS=TRUE TERM=wy60 PWD=/home/sysops TZ=:US/Eastern The /usr/pt/uv/errlog does not show anything interesting either, although the uv command returns to the shell so quickly that I don't think the command gets very far. In addition to uv and uvsh failing other universe commands that can be run from Unix (such as fixtool)also fail. If you have any suggestions as to what data we should be capturing I would be appreciative. Thanks, Vance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Lee Leitner Sent: Wednesday, February 04, 2004 9:16 PM To: [EMAIL PROTECTED] Subject: RE: UV command failing mystery Vance: Is echo $? returning anything meaningful after uvsh fails? Lee -- Lee J. Leitner, Ph.D. [EMAIL PROTECTED] http://www.leitner.org/~leitnerl The world can only be grasped by action, not by contemplation. The hand is the cutting edge of the mind. -- Jacob Bronowski V.13.0 --- -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Ok. So its not an exec/fork problem. My next suspicion would be having to do with either SHMxxx or SEM kernel parameters not being sufficient. In the cases where these resources are inadequate it is possible the OS may boot you out. At 06:38 PM 02/04/2004, you wrote: Well, it did not take long to get a chance to test trying 'uvsh'. The problem reared its ugly head again today and we now know that uvsh also does not work. The command returns to the shell so quickly that whatever the problem is it must occur near the start of the program. If you happen to know what the uvsh does first it may help me figure out where the problem may lie. I would have thought that an error of this sort would have been recorded in a log file but I have not stumbled across one yet. Thanks again, Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Glenn Herbert Sent: Wednesday, February 04, 2004 10:13 AM To: U2 Users Discussion List Subject: Re: UV command failing mystery The 'uv' command is basically a small front ender to the 'uvsh' executable, which is really the guts of universe. The uv command does little more (for universe) then to check ulimit settings, increasing them when necessary, then issuing an execve() call to uvsh. If you bypass 'uv' and just use 'uvsh', does the problem still occur? Your universe configurables seem reasonable, as well as your kernel params, but I do recall an issue whereby the OS would disallow exec's (and /or forks) due to system resource exhaustion, although it has been a while and I cannot recall the details. At 05:23 PM 02/03/2004, you wrote: >We are having a very strange intermittent problem with the UV command not >working from Unix. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
Re: UV command failing mystery
On Feb 4, 2004, at 4:38 PM, Vance Dailey wrote: Well, it did not take long to get a chance to test trying 'uvsh'. The problem reared its ugly head again today and we now know that uvsh also does not work. The command returns to the shell so quickly that whatever the problem is it must occur near the start of the program. If you happen to know what the uvsh does first it may help me figure out where the problem may lie. I would have thought that an error of this sort would have been recorded in a log file but I have not stumbled across one yet. Does your system have a command to trace system calls? You can use this to see what UniVerse is doing (at a very low level) - it may help you find the cause of this problem. As for finding the command, the various unix flavors all seem to call it something slightly different, but the Rosetta Stone may be able to help you: http://bhami.com/rosetta.html Look under the "tracing utility" item. -- Dave Kimmel [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Vance: Is echo $? returning anything meaningful after uvsh fails? Lee -- Lee J. Leitner, Ph.D. [EMAIL PROTECTED] http://www.leitner.org/~leitnerl The world can only be grasped by action, not by contemplation. The hand is the cutting edge of the mind. -- Jacob Bronowski V.13.0 --- -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Well, it did not take long to get a chance to test trying 'uvsh'. The problem reared its ugly head again today and we now know that uvsh also does not work. The command returns to the shell so quickly that whatever the problem is it must occur near the start of the program. If you happen to know what the uvsh does first it may help me figure out where the problem may lie. I would have thought that an error of this sort would have been recorded in a log file but I have not stumbled across one yet. Thanks again, Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Glenn Herbert Sent: Wednesday, February 04, 2004 10:13 AM To: U2 Users Discussion List Subject: Re: UV command failing mystery The 'uv' command is basically a small front ender to the 'uvsh' executable, which is really the guts of universe. The uv command does little more (for universe) then to check ulimit settings, increasing them when necessary, then issuing an execve() call to uvsh. If you bypass 'uv' and just use 'uvsh', does the problem still occur? Your universe configurables seem reasonable, as well as your kernel params, but I do recall an issue whereby the OS would disallow exec's (and /or forks) due to system resource exhaustion, although it has been a while and I cannot recall the details. At 05:23 PM 02/03/2004, you wrote: >We are having a very strange intermittent problem with the UV command not >working from Unix. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I wasn't saying that the parameter changes were necessarily the cause of the problem - it was just a casual observation. However, we now see that you've upgraded your system and in the process decreased the number of processors and the amount of memory by half. Again, this may not be the cause of the problem, since it was apparently occurring before the modifications. But the changes may be exacerbating a previously existing problem. It may have something to do with the fact that there are now more users on the system, but if the problem also occurs with only 10 users, that doesn't seem likely. Until there are more footprints left by the problem as it's occurring, it's difficult to say what's causing it. Making the parameter changes seems to be the best choice at this point. If it doesn't resolve the problem, at least you know one more thing that isn't the cause. Find enough things that aren't the problem, and hopefully it will be easier to figure out which of the remaining possibilities is the gremlin. (Calculatus Eliminatus, for students of Seuss.) Good luck and thanks for keeping us informed as this progresses. Tim Snyder IBM Data Management Solutions Consulting I/T Specialist , U2 Professional Services [EMAIL PROTECTED] [EMAIL PROTECTED] (Vance Dailey) [EMAIL PROTECTED] (Vance Dailey) Sent by: [EMAIL PROTECTED] 02/04/2004 12:36 PM Please respond to U2 Users Discussion List To: "'U2 Users Discussion List'" <[EMAIL PROTECTED]> cc: Subject: RE: UV command failing mystery We upgraded our system from a 16 processor AV2 to an 8 processor AV25000. Because each block of 4 cpus can only support 4gb of memory our upgrade was going to cut our memory in half. So, the main purpose of the changes was to cut the memory requirements of the universe lock tables. And in fact we successfully cut the per user memory usage in half. The other changes were made as part of the overall review prior to the upgrade. I intend to set a few of the Unix setting back to their original values this weekend, but if I understand the parameters its hard to see how the current values could be causing the problem. I should point out that the uv problem occurred a few time prior to the system upgrade so I have discounted the possibility that the upgrade could be the problem. We are running more users now and the frequency of the problem has increased over the last couple of weeks. However the problem happens at random times, sometimes with 10 users logged in, and at other times with 700 users logged in. I'll let everyone know if changing the parameters back helps the problem, but I am not real optimistic since we only experienced the problem a few times in the month following the changes but we are now getting the problem an average of once a day. Thanks for responding. All suggestions are welcome. Vance Dailey <><><>-- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
We upgraded our system from a 16 processor AV2 to an 8 processor AV25000. Because each block of 4 cpus can only support 4gb of memory our upgrade was going to cut our memory in half. So, the main purpose of the changes was to cut the memory requirements of the universe lock tables. And in fact we successfully cut the per user memory usage in half. The other changes were made as part of the overall review prior to the upgrade. I intend to set a few of the Unix setting back to their original values this weekend, but if I understand the parameters its hard to see how the current values could be causing the problem. I should point out that the uv problem occurred a few time prior to the system upgrade so I have discounted the possibility that the upgrade could be the problem. We are running more users now and the frequency of the problem has increased over the last couple of weeks. However the problem happens at random times, sometimes with 10 users logged in, and at other times with 700 users logged in. I'll let everyone know if changing the parameters back helps the problem, but I am not real optimistic since we only experienced the problem a few times in the month following the changes but we are now getting the problem an average of once a day. Thanks for responding. All suggestions are welcome. Vance Dailey -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of Timothy SnyderSent: Wednesday, February 04, 2004 11:44 AMTo: U2 Users Discussion ListSubject: Re: UV command failing mystery The only thing that may be suspicious is some changes we made tosome kernel and UV config settings a few weeks prior to the first reportedproblem. The following changes were made:(KERNEL)SDELSIM 2048 TO 256SEMOPM 100 TO 64SEMUME 1024 TO 64SHMMNI 4096 TO 2048SEMMNI 4096 TO 2048(UV CONFIG)MFILES 56 TO 200T30FILE 8000 TO 200 (we have no dynamic files)FSEMNUM 101 TO 50GSEMNUM 211 TO 97GLTABSZ 150 TO 75RLTABSZ 150 TO 75MAXRLOCK 100 TO 74Just curious as to why so many parameters were reduced on a running system. Were you exhausting some sort of resource? I wouldn't imagine that most of these parameters would have caused problems at their original values, unless you're working with an incredibly small system. I generally don't like to change ANYTHING downward - especially on a functional production system - unless there's a compelling reason.Tim SnyderIBM Data Management SolutionsConsulting I/T Specialist , U2 Professional Services[EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
Re: UV command failing mystery
The only thing that may be suspicious is some changes we made to some kernel and UV config settings a few weeks prior to the first reported problem. The following changes were made: (KERNEL) SDELSIM 2048 TO 256 SEMOPM 100 TO 64 SEMUME 1024 TO 64 SHMMNI 4096 TO 2048 SEMMNI 4096 TO 2048 (UV CONFIG) MFILES 56 TO 200 T30FILE 8000 TO 200 (we have no dynamic files) FSEMNUM 101 TO 50 GSEMNUM 211 TO 97 GLTABSZ 150 TO 75 RLTABSZ 150 TO 75 MAXRLOCK 100 TO 74 Just curious as to why so many parameters were reduced on a running system. Were you exhausting some sort of resource? I wouldn't imagine that most of these parameters would have caused problems at their original values, unless you're working with an incredibly small system. I generally don't like to change ANYTHING downward - especially on a functional production system - unless there's a compelling reason. Tim Snyder IBM Data Management Solutions Consulting I/T Specialist , U2 Professional Services [EMAIL PROTECTED]-- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Thanks for the suggestion. We will try uvsh the next time we have the failure. Last night we had the problem occur for 50 mins. Because it continued for so long we were able to have users in our various locations make multiple attempts to login from multiple PCs. Surprisingly, some locations claimed they could not login at all, others claimed that some PCs could be logged in while other could not. Either our assumption that the problem affects all users is false or the problem occurred many times but each time for only a short period of time. Once again users already in Universe noticed no problems and no unusual locks we noted when we used the command "list.readu every". "analyze.shm -s" does not show any unusually high numbers of Collisions. Are there any log files we should be checking? We are more puzzled than ever. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Glenn Herbert Sent: Wednesday, February 04, 2004 10:13 AM To: U2 Users Discussion List Subject: Re: UV command failing mystery The 'uv' command is basically a small front ender to the 'uvsh' executable, which is really the guts of universe. The uv command does little more (for universe) then to check ulimit settings, increasing them when necessary, then issuing an execve() call to uvsh. If you bypass 'uv' and just use 'uvsh', does the problem still occur? Your universe configurables seem reasonable, as well as your kernel params, but I do recall an issue whereby the OS would disallow exec's (and /or forks) due to system resource exhaustion, although it has been a while and I cannot recall the details. At 05:23 PM 02/03/2004, you wrote: >We are having a very strange intermittent problem with the UV command not >working from Unix. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
Re: UV command failing mystery
The 'uv' command is basically a small front ender to the 'uvsh' executable, which is really the guts of universe. The uv command does little more (for universe) then to check ulimit settings, increasing them when necessary, then issuing an execve() call to uvsh. If you bypass 'uv' and just use 'uvsh', does the problem still occur? Your universe configurables seem reasonable, as well as your kernel params, but I do recall an issue whereby the OS would disallow exec's (and /or forks) due to system resource exhaustion, although it has been a while and I cannot recall the details. At 05:23 PM 02/03/2004, you wrote: We are having a very strange intermittent problem with the UV command not working from Unix. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
UV command failing mystery
We are having a very strange intermittent problem with the UV command not working from Unix. Occasionally, after a user logs into Unix (without noticing anything unusual) typing "UV" simply returns the user to the UNIX shell almost instantly. When the problem occurs it seems to affect everyone who logins in and attempts to go into Universe for a period of time and then the problem seems to resolve itself. Any users who logged into Unix during that period of time still can not go into Universe but new logins work fine. The problem seems to be with Universe. Unix commands work fine and when we have tried executing other Universe commands which normally can be run from Unix they fail also. The Unix login script seems to run fine. When the problem occurs users already in Universe notice no problems. No unusual locks or performance problems have been noticed. The problem does not seem to be load related since it happens at apparently random times including times when very few users are logged in. We have been running 9.6.2.2 on DG/UX for several years and have never had the problem until the last couple of months. The only thing that may be suspicious is some changes we made to some kernel and UV config settings a few weeks prior to the first reported problem. The following changes were made: (KERNEL) SDELSIM 2048 TO 256 SEMOPM 100 TO 64 SEMUME 1024 TO 64 SHMMNI 4096 TO 2048 SEMMNI 4096 TO 2048 (UV CONFIG) MFILES 56 TO 200 T30FILE 8000 TO 200 (we have no dynamic files) FSEMNUM 101 TO 50 GSEMNUM 211 TO 97 GLTABSZ 150 TO 75 RLTABSZ 150 TO 75 MAXRLOCK 100 TO 74 Any help solving this puzzle would be greatly appreciated. Thanks, Vance Dailey -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users