On Sat, Nov 1, 2008 at 2:08 PM, Quayle, Bill <[EMAIL PROTECTED]> wrote: > I found a segmentation violation in /var/dt/Xerrors that proceeds each > termination by > what looks to be 1.5 - 2 hours, (but it may also just not be time-stamped). > I'm not > sure what process is dying.
The Xerrors messages come from Sun Ray X servers that have crashed because of a segmentation violation. That is, they've tried to access a memory address that is outside the process's address space, and that's earned them a SIGSEGV (signal 11). If you have a support contract then you should open a call. The support folks should be able to tell you whether those dumps match a known bug and if so what the status of the fix is. > We'll look at two users on two different machines, user 'foo' is on 'sr03c', > user > 'bar' is on 'sr01a'. > > >From /var/opt/SUNWut/messages- > > ==user 'foo'== > Oct 31 16:33:13 sr03c utauthd: [ID 794400 user.info] SessionManager0 NOTICE: > EMPTY: ACTIVE session > Oct 31 16:33:13 sr03c gconfd (foo-589): [ID 702911 user.info] Received signal > 15, shutting down cleanly > Oct 31 16:33:13 sr03c gconfd (foo-589): [ID 702911 user.info] Exiting > > ==user 'bar'== > ... These look like normal logout messages to me. > Fri Oct 31 14:59:37 2008 > info (pid 801): Rescanning both config and servers files This is normal. It's just 'dtlogin' noting that it's been asked to rescan its configuration files, probably because SRSS has added (or removed) a session and wants 'dtlogin' to start (or stop) managing that session's X server. > Signal 11 received! (pid 29924) > pc = 0x3425C > npc = 0x34260 > mem_catch at 0xFE18E200 > Machine context: > FE901003 0003425C 00034260 00000000 000DE000 00033E5C 00033C00 C0000000 > 0000034E 00000000 FF212A00 2097FFFE 00401400 00000004 00000004 00818EA0 > 00000000 FFBFE670 00034260 00000000 00E1EE80 0086F498 00E1EE80 0086F498 > 0143C320 088A4BB0 04B2DEC0 0179E450 0086F498 00B48718 00E1E4D0 00E90F08 > 088A4BB0 044E2890 0179E450 0143C320 0251C7B8 04B2DEC0 41978000 00000000 > 40979797 97979798 3FB548B6 AB580104 00000000 00000000 40240000 00000000 > 40340000 00000000 40700000 00000000 00000000 00000820 00080100 00000000 > 78727300 FFBFE4F8 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 This isn't normal, it's the X server crashing. The data here is enough to give someone a chance of figuring out what happened. You're correct that there's no time stamp, so there's no knowing how much time passed since the previous message. I think (I'm not certain) the handler that writes this data then goes on to try to leave a core file, but because the X server is a getgid process by default the system won't collect the core. You can override that by using the 'coreadm' command, and that'll let you collect a snapshot of the failed process which should make it easier to figure out what happened than just having the hex dump above. If the dump doesn't match a known bug then I expect the service guys will ask you to try 'coreadm'. OttoM. __ ottomeister Disclaimer: These are my opinions. I do not speak for my employer. _______________________________________________ SunRay-Users mailing list [email protected] http://www.filibeto.org/mailman/listinfo/sunray-users
