[osol-code] Re: Signal questions

Andrew Tue, 03 Jan 2006 11:14:41 -0800

Michael Shapiro wrote:
> You can use the /proc PCSTOP and PCRUN directives
> for this.
Thanks; that appears to be what I need.
But the man pages and source code are confusing me. While reading them I feel 
like a rat in a maze. Every time I read any part of anything to learn how it 
works, and have preconceived notions about how other things on which the thing 
which I'm reading about depends work, my notions turn out to be wrong, and I 
have to go read about those other things.
Repeat ad infinitum.
For example I tried to decipher how SIGSTOP, SIGRUN, PCSTOP, and PCRUN interact 
when they're are used together, including when and how they set various 
statuses, and I failed utterly. I get the impression that it's not possible to 
comprehend the answer without also learning how the entire kernel works.


It appears that SIGSTOP sets lwpstatus.pr_why to PR_SIGNALLED, whereas PCSTOP 
and PCDSTOP sets it to PR_REQUESTED, and these are mutually exclusive, but the 
man page says "If PCSTOP or PCDSTOP is applied to a thread that is stopped, but 
not because of an event of interest, the stop directive takes effect when the 
thread is restarted by the competing mechanism; at that time the thread enters 
a PR_REQUESTED stop before executing any user-level code." and I don't see 
where that pending stop directive is recorded. For example if the scheduler is 
the competing mechanism, and has set lwpstatus.pr_why to PR_SIGNALLED, and then 
I apply PCDSTOP, and then the scheduler tries to set the lwp runnable, then it 
has to know somehow that I previously applied PCDSTOP so that it can set 
lwpstatus.pr_why to PR_REQUESTED instead of setting the lwp runnable. And while 
trying to figure out how, I then notice that the flags available for 
lwpstatus.pr_flags contain various flavors of stopped, but no runnable. So 
there's another wrong preconceived notion, that the same structure which 
contains a flag to mark a lwp as stopped would also contain a flag to mark it 
as runnable.
Then I try to read prcontrol.c/pr_stop(), and after a few minutes I just give 
up, even though it's only 64 lines long. This isn't a comment on the quality of 
the Solaris source code, just a comment that it seems futile to try to 
understand any part of the code without understanding all of it.


> > And if there is such a mechanism, then is there a
> way to tell the kernel's
> > scheduler to use it instead of sigstop and sigcont
> for a particular process,
> > so that the process thinks that it runs without
> ever being preempted by the
> > scheduler?
> 
> Not sure what you're trying to do,
The sigcont signal sent by the scheduler is a side channel via which some 
information about system timing can leak to a process which is not authorized 
to have such information.
The system can deny the process access to system clocks and timers and the 
network so that the process can't time its own execution, but the process could 
still gain some (limited) timing information by using the arrival of sigcont 
signals as a crude timer. So in order to deny the process access to any timing 
information, the system must refrain from sending it such signals when it's 
scheduled to run.


> but I could
> imagine some inventive use of
> DTrace that would achieve this: DTrace's stop()
> action is equivalent to a
> PCSTOP, so you could write a DTrace script to hit a
> probe either on
> descheduling a process with particular attributes or
> every so often,
> hit it with a stop, and then have another control
> process later wake it up.
I don't understand how this would achieve my goal of allowing a process to run 
as usual and allowing it to be periodically preempted as usual so that other 
processes can run concurrently, but making the preemption be completely 
undetectable so that the process thinks that it runs continously in one 
contiguous timeslice, so that it can't correlate individual timeslices with its 
own execution progress.


> you need to make sure you've
> got a shell which
> supports it or use /bin/kill.  Many shells have kill
> built-ins which
> may not support the exact same syntax.
That's what my problem was; both /bin/sh and /bin/bash have built-in kill. Not 
only is their syntax different from /bin/kill, they're even different from each 
other.
Somehow it seems wrong to have redundant functionality with the same name in 
the same place (the shell's command namespace) with subtly different syntax. 
This is a feature which I would be proud to include in a user interface if the 
goal were intentional obfuscation. Naturally, the shells' built-in kill 
commands are documented in the man pages so technically I ought to have not 
been confused if I'd just bothered to RTFM.

> This is effectively what gcore() does
[snip]
Thanks again; that's what I need.
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
[email protected]
https://opensolaris.org:444/mailman/listinfo/opensolaris-code

[osol-code] Re: Signal questions

Reply via email to