On Sat, Nov 1, 2008 at 2:08 PM, Quayle, Bill <[EMAIL PROTECTED]> wrote:
> I found a segmentation violation in /var/dt/Xerrors that proceeds each 
> termination by
> what looks to be 1.5 - 2 hours, (but it may also just not be time-stamped).  
> I'm not
> sure what process is dying.

The Xerrors messages come from Sun Ray X servers that have crashed
because of a segmentation violation.  That is, they've tried to access a
memory address that is outside the process's address space, and that's
earned them a SIGSEGV (signal 11).  If you have a support contract then
you should open a call.  The support folks should be able to tell you
whether those dumps match a known bug and if so what the status of
the fix is.

> We'll look at two users on two different machines, user 'foo' is on 'sr03c', 
> user
> 'bar' is on 'sr01a'.
>
> >From /var/opt/SUNWut/messages-
>
> ==user 'foo'==
> Oct 31 16:33:13 sr03c utauthd: [ID 794400 user.info] SessionManager0 NOTICE: 
> EMPTY: ACTIVE session
> Oct 31 16:33:13 sr03c gconfd (foo-589): [ID 702911 user.info] Received signal 
> 15, shutting down cleanly
> Oct 31 16:33:13 sr03c gconfd (foo-589): [ID 702911 user.info] Exiting
>
> ==user 'bar'==
> ...

These look like normal logout messages to me.

> Fri Oct 31 14:59:37 2008
> info (pid 801): Rescanning both config and servers files

This is normal.  It's just 'dtlogin' noting that it's been asked to rescan its
configuration files, probably because SRSS has added (or removed) a
session and wants 'dtlogin' to start (or stop) managing that session's
X server.

> Signal 11 received! (pid 29924)
> pc = 0x3425C
> npc = 0x34260
> mem_catch at 0xFE18E200
> Machine context:
> FE901003 0003425C 00034260 00000000 000DE000 00033E5C 00033C00 C0000000
> 0000034E 00000000 FF212A00 2097FFFE 00401400 00000004 00000004 00818EA0
> 00000000 FFBFE670 00034260 00000000 00E1EE80 0086F498 00E1EE80 0086F498
> 0143C320 088A4BB0 04B2DEC0 0179E450 0086F498 00B48718 00E1E4D0 00E90F08
> 088A4BB0 044E2890 0179E450 0143C320 0251C7B8 04B2DEC0 41978000 00000000
> 40979797 97979798 3FB548B6 AB580104 00000000 00000000 40240000 00000000
> 40340000 00000000 40700000 00000000 00000000 00000820 00080100 00000000
> 78727300 FFBFE4F8 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000

This isn't normal, it's the X server crashing.  The data here is enough
to give someone a chance of figuring out what happened.  You're
correct that there's no time stamp, so there's no knowing how much
time passed since the previous message.

I think (I'm not certain) the handler that writes this data then goes on to
try to leave a core file, but because the X server is a getgid process
by default the system won't collect the core.  You can override that by
using the 'coreadm' command, and that'll let you collect a snapshot of
the failed process which should make it easier to figure out what
happened than just having the hex dump above.  If the dump doesn't
match a known bug then I expect the service guys will ask you to try
'coreadm'.

OttoM.
__
ottomeister

Disclaimer: These are my opinions.  I do not speak for my employer.
_______________________________________________
SunRay-Users mailing list
[email protected]
http://www.filibeto.org/mailman/listinfo/sunray-users

Reply via email to