My advice would be to set up coreadm to capture whichever process is
doing this rather than guessing. [ This works on the assumption that the
process is being sent a SEGV, and hence will dump core if so enabled. ]
I would also advise against changing a global value for the sake of one
(misbehaving?) application. I would much rather identify the process,
then either log a bug (if this is errant behaviour), understand what
configuration is necessary to reduce the stack usage, or provide a
per-process change to the limit (i.e. a shell script wrapper to start
the process) rather than blindly setting a global variable.
My suspicion is that there is a recursive function somewhere which is
consuming the stack segment, causing the limit to be reached and the
process is terminated. Simply raising the limit will give it more
headroom, but not actually move your problem further forward - the core
dumps (if any) will simply be larger :-)
Obviously this is speculation, but I would suggest coreadm (with global
cores and core logging enabled) at least temporarily to catch the
process. Searching the /var/svc/log files may also reveal a process
which died unexpectedly, perhaps with the service simply restarting. If
you suspect Xorg, then start with the cde-login or gdm SMF services to
see if it reported anything there.
Regards,
Brian
Mike DeMarco wrote:
Thanks for your post Brian:
It is hard to tell what process is throwing this error as it is only displayed as the PID and since the process dies and svc watcher attempts to restart it the PID number changes.
>From /var/adm/messages around the time that this error is generated xorg is also trying to start. So My best guess is that it is xorg that wants more stack size.
Is there a way to increase the global max-stack-size above its default of 10Meg?
/etc/project does not do this. I have found that /etc/project is not working
properly even under Solaris10u5 & u6, I am working this through Sun support now.
I'm guessing here, but: the limit probably comes from
the shell-imposed
stack limit (default 10Mb / 10240Kb).
This is normally put in place to catch bad apps or
recursive functions
that don't stop recursing. It helps stop one process
quickly consuming
vast amounts of RAM (OK, a simplistic approcach, I
know).
The question is - what is the app, and why does it
need more than 10Mb
for stack?
If it really does need that much, then it may need to
have the stack
limit increased (perhaps a shell script wrapper with
the appropriate
"ulimit -s" command, or maybe there's a more clever
way to do this now
:-) ).
One word of caution, especially for 32-bit processes:
Don't be too
liberal with the setting for the stack limit. Because
of the location of
stack and libraries within the process virtual
address space, the stack
limit effectively removes that amount of memory from
the process. In a
32-bit process, only 4Gb is available, and setting
the stack limit to
1Gb would actually only leave approx 3Gb for the
process to use (despite
the stack only being a few Kb in size).
I would suggest looking at what the process is, and
see whether there is
a software fault there first, before fiddling with
the stack limit.
coreadm should help you catch what the process is (as
out of stack
generates SIGSEGV I believe) if you don't already
know.
Regards,
Brian
--
Brian Ruthven Sun Microsystems UK
Solaris Revenue Product Engineering Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org