UV command failing mystery - RESOLUTION
I had the following kernel parameters changed over the weekend: SDELSIM 256 to 1024 SEMOPM 64 to 100 SEMUME 64 to 1024 SHMMNI 2048 to 4096 SEMMNI 2048 to 4096 SEMMSL 256 to 1024 And the problem has not occurred since! I have reviewed the documentation on each of the above parameters in hopes of determining which parameter was set incorrectly. SDELSIM - The default number of file descriptors per process. (Referred to as NOFILES in the UV documentation.) I had set SDELSIM to exceed MFILES plus the 8 internal UV files. HDESLIM (the maximum number of file descriptors a user is allowed to have open) was left at the default value of 1024. SEMOPM - The maximum number of semaphore operations per semop (2) call. Although not a parameter mentioned in the UV documentation I had concluded from some Unix documentation that the value of 100 was unnecessarily large and had reduced the value to 64. The default value is 10. SEMUME - The maximum number of Undo entries per Undo structure. Also not a parameter mentioned in the UV documentation, a reading (mis-reading) of the Unix documentation led me to conclude that the value was unnecessarily large. Since I have now learned that the default value is 1024 this change is suspect. SHMMNI - The maximum number of shared memory segments that may exist in the system at one time. The UV documentation recommends setting this to a number greater than the maximum number of concurrent Universe users plus 2. Our UV user count has not exceeded 700 users in the last few months. SEMMNI - The maximum number of semaphore sets. The UV documentation states that Universe requires 2 semaphore sets. The ipcs -s command reports that our system uses 7. SEMMSL - The maximum number of semaphores per set. The UV documentation states that this should be at least equal to FSEMNUM + GSEMNUM + 5. On our system that total is 152. I still am not sure which parameter(s) were the source of the problem. And I intend to leave well enough alone so I may never know for sure. In any case I hope that if someone else encounters the same problem they can benefit from my experience. I want to thank everyone who offered help and took the time to make suggestions. Vance Dailey -Original Message- From: Vance Dailey [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 03, 2004 5:23 PM To: '[EMAIL PROTECTED]' Subject:UV command failing mystery We are having a very strange intermittent problem with the UV command not working from Unix. Occasionally, after a user logs into Unix (without noticing anything unusual) typing UV simply returns the user to the UNIX shell almost instantly. When the problem occurs it seems to affect everyone who logins in and attempts to go into Universe for a period of time and then the problem seems to resolve itself. Any users who logged into Unix during that period of time still can not go into Universe but new logins work fine. The problem seems to be with Universe. Unix commands work fine and when we have tried executing other Universe commands which normally can be run from Unix they fail also. The Unix login script seems to run fine. When the problem occurs users already in Universe notice no problems. No unusual locks or performance problems have been noticed. The problem does not seem to be load related since it happens at apparently random times including times when very few users are logged in. We have been running 9.6.2.2 on DG/UX for several years and have never had the problem until the last couple of months. The only thing that may be suspicious is some changes we made to some kernel and UV config settings a few weeks prior to the first reported problem. The following changes were made: (KERNEL) SDELSIM 2048 TO 256 SEMOPM 100 TO 64 SEMUME 1024 TO 64 SHMMNI 4096 TO 2048 SEMMNI 4096 TO 2048 (UV CONFIG) MFILES 56 TO 200 T30FILE 8000 TO 200 (we have no dynamic files) FSEMNUM 101 TO 50 GSEMNUM 211 TO 97 GLTABSZ 150 TO 75 RLTABSZ 150 TO 75 MAXRLOCK 100 TO 74 Any help solving this puzzle would be greatly appreciated. Thanks, Vance Dailey -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
We upgraded our system from a 16 processor AV2 to an 8 processor AV25000. Because each block of 4 cpus can only support 4gb of memory our upgrade was going to cut our memory in half. So, the main purpose of the changes was to cut the memory requirements of the universe lock tables. And in fact we successfully cut the per user memory usage in half. The other changes were made as part of the overall review of both uv.config and the kernel parameters prior to the upgrade. Because several people suggested that the semaphore and/or shared memory changes could be the cause of the problem I had them changed back to their original values yesterday. So far we have not had a failure, but its a little too soon to celebrate. Thanks for the advice on using the trace. I was not able to pick out the system calls and before I make another attempt I think I will wait and see if the weekends changes fixed the problem. If the problem is solved I will post in case someone else runs in the same problem in the future. Thanks, Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Ken Wallis Sent: Sunday, February 08, 2004 7:52 PM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery From: Vance Dailey It was suggested that I try to run dg_strace. I ran it on one of the failing uv processes. It generated a 1mb file. I can see where it executes uvsh and It fails just after the 7th occurance of RUN APP.PROGS PACKAGE.INS. (The 7th run is just after the string SPECIAL.EDITOR.SELECT.DATA\OLONG.) I have no idea how to read this file but I thought it might help identify where the error occurred. I have included the very end of the output of dg_strace below: ... close(3)= 0 sigaction_svr3(SIGQUIT, {...}, {...}) = 1253 sigaction_svr3(SIGNULL, {...}, {0xc0a0d, [XCPU XFSZ], SA_RESTART|SA_SIGINFO}) = 2130681856 ... This tool seems to be showing you the system calls that uvsh is making and the values returned from them (the bit after the =). The section you have shown is simply the program trying to tidy up and exit after detecting something it didn't like. You'll need to look higher up in the output for a system call which seems to return an error code. Unfortunately, you need to know what sort of system calls should return 0 all the time and which ones regularly return other values. I think I'd be looking at calls to sem...() or shm...() functions that return non-zero and then using errmsg (if DG/UX has that, or vi-ing /usr/include/sys/errno.h if it doesn't) to see what the error numbers returned mean and man to interpret from that where the problem lies. I can't remember the exact numbers you quoted earlier, but certainly with your user counts I'd be very suspicious of the reductions you made to the semaphore kernel parameters. Just as a matter of interest, why were these reductions made? HTH, Ken -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I checked out the link you provided. It appears that the command dg_strace may be what you are suggesting. I have not had time to try it but it looks very interesting. Not having used a command like this before I may have a bit of a learning curve. Once I do try it on a failure I will let you know what I find. Thanks for the tip. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave Kimmel Sent: Wednesday, February 04, 2004 11:08 PM To: U2 Users Discussion List Subject: Re: UV command failing mystery On Feb 4, 2004, at 4:38 PM, Vance Dailey wrote: Well, it did not take long to get a chance to test trying 'uvsh'. The problem reared its ugly head again today and we now know that uvsh also does not work. The command returns to the shell so quickly that whatever the problem is it must occur near the start of the program. If you happen to know what the uvsh does first it may help me figure out where the problem may lie. I would have thought that an error of this sort would have been recorded in a log file but I have not stumbled across one yet. Does your system have a command to trace system calls? You can use this to see what UniVerse is doing (at a very low level) - it may help you find the cause of this problem. As for finding the command, the various unix flavors all seem to call it something slightly different, but the Rosetta Stone may be able to help you: http://bhami.com/rosetta.html Look under the tracing utility item. -- Dave Kimmel [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I don't think its a license issue. analye.shm -c report the correct number of licenses (1065) while we only have 700 users at peak times. Also, when the uv command fails it returns to the shell almost instantly with no message. echo $? returns a 1 (one). Still, that is a difference in the code executed by a terminal users and a phantom. Perhaps the command fails in the process of checking the licenses? Thanks, vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Hennessey, Mark F. Sent: Friday, February 06, 2004 11:51 AM To: U2 Users Discussion List Subject: RE: UV command failing mystery snip It appears that Phantoms can always get into Universe but Terminal sessions sometimes can not. /snip For what it's worth, phantoms do not consume licenses, while terminals do... -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
It looks like the command is dg_strace for DG/UX. My problem is that I don't understand the output yet. It looks like it may be a good tool to know how to use. Thanks, Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Gerry Maddock Sent: Friday, February 06, 2004 11:42 AM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery If your running Redhat,Mandrake, or Fedora, the command is strace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vance Dailey Sent: Friday, February 06, 2004 11:36 AM To: 'U2 Users Discussion List' Subject: RE: UV command failing mystery I checked out the link you provided. It appears that the command dg_strace may be what you are suggesting. I have not had time to try it but it looks very interesting. Not having used a command like this before I may have a bit of a learning curve. Once I do try it on a failure I will let you know what I find. Thanks for the tip. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave Kimmel Sent: Wednesday, February 04, 2004 11:08 PM To: U2 Users Discussion List Subject: Re: UV command failing mystery On Feb 4, 2004, at 4:38 PM, Vance Dailey wrote: Well, it did not take long to get a chance to test trying 'uvsh'. The problem reared its ugly head again today and we now know that uvsh also does not work. The command returns to the shell so quickly that whatever the problem is it must occur near the start of the program. If you happen to know what the uvsh does first it may help me figure out where the problem may lie. I would have thought that an error of this sort would have been recorded in a log file but I have not stumbled across one yet. Does your system have a command to trace system calls? You can use this to see what UniVerse is doing (at a very low level) - it may help you find the cause of this problem. As for finding the command, the various unix flavors all seem to call it something slightly different, but the Rosetta Stone may be able to help you: http://bhami.com/rosetta.html Look under the tracing utility item. -- Dave Kimmel [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
I don't think its a license issue. analye.shm -c report the correct number of licenses (1065) while we only have 700 users at peak times. Also, when the uv command fails it returns to the shell almost instantly with no message. echo $? returns a 1 (one). Still, that is a difference in the code executed by a terminal users and a phantom. Perhaps the command fails in the process of checking the licenses? Thanks, vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Timothy Snyder Sent: Friday, February 06, 2004 11:55 AM To: U2 Users Discussion List Subject: Re: UV command failing mystery It appears that Phantoms can always get into Universe but Terminal sessions sometimes can not. Is there more information besides the fact that they can't get in? It could mean that you're exhausting all of the available licenses. Terminal sessions will consume a license while phantoms will not. Tim Snyder IBM Data Management Solutions Consulting I/T Specialist , U2 Professional Services [EMAIL PROTECTED] -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
It was suggested that I try to run dg_strace. I ran it on one of the failing uv processes. It generated a 1mb file. I can see where it executes uvsh and It fails just after the 7th occurance of RUN APP.PROGS PACKAGE.INS. (The 7th run is just after the string SPECIAL.EDITOR.SELECT.DATA\OLONG.) I have no idea how to read this file but I thought it might help identify where the error occurred. I have included the very end of the output of dg_strace below: , {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, {NULL, 0}, { NULL, 0}, {NULL, 0}, {NULL, 134688924}], 8192) = 8192 close(3)= 0 sigaction_svr3(SIGQUIT, {...}, {...}) = 1253 sigaction_svr3(SIGNULL, {...}, {0xc0a0d, [XCPU XFSZ], SA_RESTART|SA_SIGINFO}) = 2130681856 sigaction_svr3(SIGINT, {0xc0a0d, [XCPU XFSZ], SA_RESTART|SA_SIGINFO}, {0x74706f2 f, [HUP INT QUIT ILL ABRT KILL SEGV PIPE ALRM TERM CLD PWR URG POLL STOP CONT TT IN TTOU VTALRM XCPU ??? XCPU XFSZ PROF ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ? ?? ??? DGTIMER3 DGTIMER4], SA_RESTART|SA_SIGINFO|SA_RESETHAND|SA_ONSTACK|SA_NOCL DSTOP|SA_NOCLDWAIT|0x72707520}) = 0 wait([WIFSIGNALED(s) WTERMSIG(s) == SIG???]) = 20325 wait4(134508240, unfinished ... --- SIGCLD --- ... wait4 resumed [WIFSIGNALED(s) WTERMSIG(s) == SIG???], WEXITED|WTRAPPED, NULL) = 20325 _exit(1)= ? +++ Exited with status 1 +++ Process 20284 detached root imm # -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
Re: UV command failing mystery
The 'uv' command is basically a small front ender to the 'uvsh' executable, which is really the guts of universe. The uv command does little more (for universe) then to check ulimit settings, increasing them when necessary, then issuing an execve() call to uvsh. If you bypass 'uv' and just use 'uvsh', does the problem still occur? Your universe configurables seem reasonable, as well as your kernel params, but I do recall an issue whereby the OS would disallow exec's (and /or forks) due to system resource exhaustion, although it has been a while and I cannot recall the details. At 05:23 PM 02/03/2004, you wrote: We are having a very strange intermittent problem with the UV command not working from Unix. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Thanks for the suggestion. We will try uvsh the next time we have the failure. Last night we had the problem occur for 50 mins. Because it continued for so long we were able to have users in our various locations make multiple attempts to login from multiple PCs. Surprisingly, some locations claimed they could not login at all, others claimed that some PCs could be logged in while other could not. Either our assumption that the problem affects all users is false or the problem occurred many times but each time for only a short period of time. Once again users already in Universe noticed no problems and no unusual locks we noted when we used the command list.readu every. analyze.shm -s does not show any unusually high numbers of Collisions. Are there any log files we should be checking? We are more puzzled than ever. Vance Dailey -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Glenn Herbert Sent: Wednesday, February 04, 2004 10:13 AM To: U2 Users Discussion List Subject: Re: UV command failing mystery The 'uv' command is basically a small front ender to the 'uvsh' executable, which is really the guts of universe. The uv command does little more (for universe) then to check ulimit settings, increasing them when necessary, then issuing an execve() call to uvsh. If you bypass 'uv' and just use 'uvsh', does the problem still occur? Your universe configurables seem reasonable, as well as your kernel params, but I do recall an issue whereby the OS would disallow exec's (and /or forks) due to system resource exhaustion, although it has been a while and I cannot recall the details. At 05:23 PM 02/03/2004, you wrote: We are having a very strange intermittent problem with the UV command not working from Unix. -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
RE: UV command failing mystery
Vance: Is echo $? returning anything meaningful after uvsh fails? Lee -- Lee J. Leitner, Ph.D. [EMAIL PROTECTED] http://www.leitner.org/~leitnerl The world can only be grasped by action, not by contemplation. The hand is the cutting edge of the mind. -- Jacob Bronowski V.13.0 --- -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users
UV command failing mystery
We are having a very strange intermittent problem with the UV command not working from Unix. Occasionally, after a user logs into Unix (without noticing anything unusual) typing UV simply returns the user to the UNIX shell almost instantly. When the problem occurs it seems to affect everyone who logins in and attempts to go into Universe for a period of time and then the problem seems to resolve itself. Any users who logged into Unix during that period of time still can not go into Universe but new logins work fine. The problem seems to be with Universe. Unix commands work fine and when we have tried executing other Universe commands which normally can be run from Unix they fail also. The Unix login script seems to run fine. When the problem occurs users already in Universe notice no problems. No unusual locks or performance problems have been noticed. The problem does not seem to be load related since it happens at apparently random times including times when very few users are logged in. We have been running 9.6.2.2 on DG/UX for several years and have never had the problem until the last couple of months. The only thing that may be suspicious is some changes we made to some kernel and UV config settings a few weeks prior to the first reported problem. The following changes were made: (KERNEL) SDELSIM 2048 TO 256 SEMOPM 100 TO 64 SEMUME 1024 TO 64 SHMMNI 4096 TO 2048 SEMMNI 4096 TO 2048 (UV CONFIG) MFILES 56 TO 200 T30FILE 8000 TO 200 (we have no dynamic files) FSEMNUM 101 TO 50 GSEMNUM 211 TO 97 GLTABSZ 150 TO 75 RLTABSZ 150 TO 75 MAXRLOCK 100 TO 74 Any help solving this puzzle would be greatly appreciated. Thanks, Vance Dailey -- u2-users mailing list [EMAIL PROTECTED] http://www.oliver.com/mailman/listinfo/u2-users