Re: Someone help me understand this...?

2003-08-30 Thread Jilles Tjoelker
On Thu, Aug 28, 2003 at 11:34:09AM -0400, Robert Watson wrote:
   Clearly, unbreaking applications like Diablo by default is desirable.  At
   least OpenBSD has similar protections to these turned on by default, and
   possibly other systems as well.  As 5.x sees more broad use, we may well
   bump into other cases where applications have similar behavior: they rely
   on no special protections once they've given up privilege.  I wonder if
   Diablo can run unmodified on OpenBSD; it could be they don't include
   SIGALRM on the list of protect against signals, or it could be that they
   modify Diablo for their environment to use an alternative signaling
   mechanism.  Another alternative to this patch would simply be to add
   SIGARLM to the list of acceptable signals to deliver in the
   privilege-change case.

OpenBSD does not consider a process 'tainted' if it changes credentials
while running. From the issetugid(2) manpage:

The status of issetugid() is only affected by execve().

 In most cases, fail-stop is a reasonable behavior for unexpected security
 behavior from the system, but ignore is likely to shoot you later. :-)  I
 tend to wrap even kill() calls as uid 0 in an assertion check, just to be
 on the safe side.  If nothing else, it helps detect the case where the
 other process has died, and you're using a stale pid.  It's particular
 useful if the other process has died, the pid has been reused, and it's
 now owned by another user, which is a real-world case where kill() as a
 non-0 uid can fail even when you're sure it can't :-). 

This can be avoided by careful programming: do not use SA_NOCLDWAIT and
don't pass pids to kill() when they have been returned by wait() or
similar functions. If the process has terminated in between, it's a
zombie. In that case, FreeBSD probably returns ESRCH but SUSv3 mandates
returning success (but performing no action).

Jilles Tjoelker
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Someone help me understand this...?

2003-08-30 Thread Robert Watson

On Sat, 30 Aug 2003, Jilles Tjoelker wrote:

 On Thu, Aug 28, 2003 at 11:34:09AM -0400, Robert Watson wrote:
Clearly, unbreaking applications like Diablo by default is desirable.  At
least OpenBSD has similar protections to these turned on by default, and
possibly other systems as well.  As 5.x sees more broad use, we may well
bump into other cases where applications have similar behavior: they rely
on no special protections once they've given up privilege.  I wonder if
Diablo can run unmodified on OpenBSD; it could be they don't include
SIGALRM on the list of protect against signals, or it could be that they
modify Diablo for their environment to use an alternative signaling
mechanism.  Another alternative to this patch would simply be to add
SIGARLM to the list of acceptable signals to deliver in the
privilege-change case.
 
 OpenBSD does not consider a process 'tainted' if it changes credentials
 while running. From the issetugid(2) manpage: 
 
 The status of issetugid() is only affected by execve().

In OpenBSD, two flags are used to represent the credential change notion: 
P_SUGIDEXEC, and P_SUGID.  issetugid() checks the first of these, but
signal delivery checks P_SUGID.  P_SUGIDEXEC is set during execve().  In
FreeBSD, we have a combined notion used by both, since the same
protections generally apply.  You can find a comment comparing our use of
P_SUGID to the OpenBSD approach in our issetugid() implementation:

/*
 * Note: OpenBSD sets a P_SUGIDEXEC flag set at execve() time,
 * we use P_SUGID because we consider changing the owners as
 * tainting as well.   
 * This is significant for procs that start as root and become
 * a user without an exec - programs cannot know *everything*
 * that libc *might* have put in their data segment.
 */

Regarding specific signals: inspection of the OpenBSD implementation
reveals that the following signals are permitted in the P_SUGID case,
assuming a reasonable credential match:

case 0:
case SIGKILL:
case SIGINT:
case SIGTERM:
case SIGALRM:
case SIGSTOP:
case SIGTTIN:
case SIGTTOU:
case SIGTSTP:
case SIGHUP:
case SIGUSR1:
case SIGUSR2:

In FreeBSD, we permit:

case 0:
case SIGKILL:
case SIGINT:
case SIGTERM:
case SIGSTOP:
case SIGTTIN:
case SIGTTOU:
case SIGTSTP:
case SIGHUP:
case SIGUSR1:
case SIGUSR2:

So they permit SIGALRM in addition to the signals we support.  In light of
this thread, I think it would be reasonable to add SIGALRM to our list as
well.

  In most cases, fail-stop is a reasonable behavior for unexpected security
  behavior from the system, but ignore is likely to shoot you later. :-)  I
  tend to wrap even kill() calls as uid 0 in an assertion check, just to be
  on the safe side.  If nothing else, it helps detect the case where the
  other process has died, and you're using a stale pid.  It's particular
  useful if the other process has died, the pid has been reused, and it's
  now owned by another user, which is a real-world case where kill() as a
  non-0 uid can fail even when you're sure it can't :-). 
 
 This can be avoided by careful programming: do not use SA_NOCLDWAIT and
 don't pass pids to kill() when they have been returned by wait() or
 similar functions. If the process has terminated in between, it's a
 zombie. In that case, FreeBSD probably returns ESRCH but SUSv3 mandates
 returning success (but performing no action). 

There's still a race possible here, it just becomes more narrow with
conservative programming.  And in the classic use of pids for signalling
(/var/run/foo.pid, or kill -9 pid), these approaches won't help.  The only
way to close this sort of race is to have a notion of a unique process
identifier that lasts beyond the lifetime of the process itself -- i.e.,
the ability to return EMYSINCERESTREGRESTS if you try to signal a process
after it has died, and have a guarantee that the handle won't be reused. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Someone help me understand this...?

2003-08-30 Thread Garrett Wollman
On Sat, 30 Aug 2003 12:23:35 -0400 (EDT), Robert Watson [EMAIL PROTECTED] said:

 The only way to close this sort of race is to have a notion of a
 unique process identifier that lasts beyond the lifetime of the
 process itself -- i.e., the ability to return EMYSINCERESTREGRESTS
 if you try to signal a process after it has died, and have a
 guarantee that the handle won't be reused.

This is traditionally done by holding an advisory lock on the pid
file; if the file is no longer locked, then the process holding the
lock must have exited.

You could also do it with UUIDs and a more heavyweight signal API.

-GAWollman

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Someone help me understand this...?

2003-08-30 Thread Robert Watson

On Thu, 28 Aug 2003, Joe Greco wrote:

Could you confim this happens with 4.8?  The access control checks there
are substantially different, and I wouldn't expect the behavior you're
seeing on 4.8...
   
   Rather difficult.  I'll see if the client will let me trash a production
   system, but usually people don't like $40K servers handing out a few
   hundred megabits of traffic going out of service.  We were trying to fix
   it on the scratch box (which happens to have 5.1R on it) and then were
   going to see how it fared on the production systems. 
  
  I think it's safe to assume that if you're seeing a similar failure,
  there's a different source given my reading of the code, but I'm willing
  to be proven wrong.  It's probably not worth the investment if you're
  talking about large quantities of money, though.
 
 It's more like large quantities of annoyance and work.  Can you
 describe the case you're envisioning?  If I can easily poke at it, I can
 at least get some clues. 

I guess all I'm looking for is confirmation that your original statement
(happens in 4.8 and 5.1) is completely correct: the 5.1 behavior is
expected, but I'm surprised it happens with 4.8. 

 Correct.  The USR signals control debug levels.  If it was a signal that
 was only used internally, it could be changed, of course, but changing a
 signal used by humans (and one used in the same manner as other
 programs)  is probably a bad idea. 

Try the patch attached, which introduces both the conservative_signals
sysctl, and adds SIGALRM to the list of acceptable signals for P_SUGID
processes.

 Yeah, if anything, we probably don't want to do that, because the
 resources set up as root are usually more attractive.  I don't have a
 problem with coding in some FreeBSD-isms, but I don't see it as buying
 us anything, does it?

I'm not sure there are explicit benefits in this specific situation,
except that you can run Diablo with the resource limits of the user you
configure, and potentially those might be similar to (but perhaps not
identical to) those given to root.  I.e., instead of hard-coding use the
resource limits of root, you're saying use the resource limits of the
user Diablo is run as, and set those to what you want.  Given that
heavy-weight news servers are likely to be dedicated machines, it's a
subtle but perhaps useful semantic difference.

Updated patch below.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories

Index: kern_prot.c
===
RCS file: /home/ncvs/src/sys/kern/kern_prot.c,v
retrieving revision 1.175
diff -u -r1.175 kern_prot.c
--- kern_prot.c 13 Jul 2003 01:22:20 -  1.175
+++ kern_prot.c 30 Aug 2003 19:45:50 -
@@ -1367,6 +1367,20 @@
return (cr_cansee(td-td_ucred, p-p_ucred));
 }
 
+/*
+ * 'conservative_signals' prevents the delivery of a broad class of
+ * signals by unprivileged processes to processes that have changed their
+ * credentials since the last invocation of execve().  This can prevent
+ * the leakage of cached information or retained privileges as a result
+ * of a common class of signal-related vulnerabilities.  However, this
+ * may interfere with some applications that expect to be able to
+ * deliver these signals to peer processes after having given up
+ * privilege.
+ */
+static int conservative_signals = 1;
+SYSCTL_INT(_security_bsd, OID_AUTO, conservative_signals, CTLFLAG_RW,
+conservative_signals, 0, Unprivileged processes prevented from 
+sending certain signals to processes whose credentials have changed);
 /*-
  * Determine whether cred may deliver the specified signal to proc.
  * Returns: 0 for permitted, an errno value otherwise.
@@ -1399,12 +1413,13 @@
 * bit on the target process.  If the bit is set, then additional
 * restrictions are placed on the set of available signals.
 */
-   if (proc-p_flag  P_SUGID) {
+   if (conservative_signals  (proc-p_flag  P_SUGID)) {
switch (signum) {
case 0:
case SIGKILL:
case SIGINT:
case SIGTERM:
+   case SIGALRM:
case SIGSTOP:
case SIGTTIN:
case SIGTTOU:

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Someone help me understand this...?

2003-08-28 Thread Bruce Evans
On Wed, 27 Aug 2003, Joe Greco wrote:

 I've got a weirdness with kill(2).

 This code is out of Diablo, the news package, and has been working fine for
 some years.  It apparently works fine on other OS's.

 In the Diablo model, the parent process may choose to tell its children to
 update status via a signal.  The loop basically consists of going through
 and issuing a SIGALRM.

 This stopped working a while ago, don't know precisely when.  I was in the
 process of debugging it today and ran into this.

 The specific OS below is 5.1-RELEASE but apparently this happens on 4.8 as
 well.

Perhaps the children are setuid, the parent doesn't have appropriate
privelege and you are mistaken about this happening under 4.8 as well.
In 5.x since at least rev.1.80 of kern_prot.c, only certain signals
not including SIGALRM can be sent from unprivileged processes to setuid
processes.

This is very UN-unixlike although it is permitted as an-implementation-
defined restriction in at least POSIX.1-2001.  It breaks^Wexposes bugs
in some old POSIX test programs and I don't have many security concerns
so I just disable it locally:

%%%
Index: kern_prot.c
===
RCS file: /home/ncvs/src/sys/kern/kern_prot.c,v
retrieving revision 1.175
diff -u -2 -r1.175 kern_prot.c
--- kern_prot.c 13 Jul 2003 01:22:20 -  1.175
+++ kern_prot.c 17 Aug 2003 04:26:00 -
@@ -1395,4 +1387,5 @@
return (error);

+#if 0
/*
 * UNIX signal semantics depend on the status of the P_SUGID
@@ -1425,4 +1418,5 @@
}
}
+#endif

/*
%%%

 Wot?  Why can't I send it a signal?

 I've read kill(2) rather carefully and cannot find the reason.  It says,

  For a process to have permission to send a signal to a process designated
  by pid, the real or effective user ID of the receiving process must match
  that of the sending process or the user must have appropriate privileges
  (such as given by a set-user-ID program or the user is the super-user).

The implementation-defined restrictions are not documented, of course ;-).

 Well, the sending and receiving processes both clearly have equal uid/euid.

 We're not running in a jail, so I don't expect any issues there.

 The parent process did actually start as root and then shed privilege with

 struct passwd *pw = getpwnam(news);
 struct group *gr = getgrnam(news);
 gid_t gid;

 if (pw == NULL) {
 perror(getpwnam('news'));
 exit(1);
 }
 if (gr == NULL) {
 perror(getgrnam('news'));
 exit(1);
 }
 gid = gr-gr_gid;
 setgroups(1, gid);
 setgid(gr-gr_gid);
 setuid(pw-pw_uid);

 so that looks all well and fine...  so why can't it kill its own children,
 and why can't I kill one of its children from a shell with equivalent
 uid/euid?

Changing the ids is one way to make the process setuid (setuid-on-exec is
another but that doesn't seem to be the problem here).  The relevant setuid
bit (P_SUGID) is normally cleared on exec, but perhaps it isn't here,
either because the children don't exec or the effective ids don't match
the real ids at the time of the exec.

 I know there's been some paranoia about signal delivery and all that, but
 my searching hasn't turned up anything that would explain this.  Certainly
 the manual page ought to be updated if this is a new expected behaviour or
 something...  at least some clue as to why it might fail would be helpful.

Certainly.  It is incomplete even not counting complications for jails
or other implementation-defined restrictions related to appropriate
privilege.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Someone help me understand this...?

2003-08-28 Thread Joe Greco
 On Wed, 27 Aug 2003, Joe Greco wrote:
  I've got a weirdness with kill(2).
 
  This code is out of Diablo, the news package, and has been working fine for
  some years.  It apparently works fine on other OS's.
 
  In the Diablo model, the parent process may choose to tell its children to
  update status via a signal.  The loop basically consists of going through
  and issuing a SIGALRM.
 
  This stopped working a while ago, don't know precisely when.  I was in the
  process of debugging it today and ran into this.
 
  The specific OS below is 5.1-RELEASE but apparently this happens on 4.8 as
  well.
 
 Perhaps the children are setuid, the parent doesn't have appropriate
 privelege and you are mistaken about this happening under 4.8 as well.

Well, the parent process does the code I listed below early on in the
initialization process - it pretty much does a little initialization,
opens its socket (119), sheds privilege, and begins accepting conns.

It then forks off processes for each connection.

The program itself is not a suid executable, but rather relies on being
launched by a root user.

I'm not sure what appropriate privilege would be.  If both the uid/euid
of the parent match both the uid/euid of the child, I would expect to be
able to kill the process...

Client complains about the resulting problems also happening under 4.8 
servers.  Dunno.  Could possibly be a separate issue.

 In 5.x since at least rev.1.80 of kern_prot.c, only certain signals
 not including SIGALRM can be sent from unprivileged processes to setuid
 processes.
 
 This is very UN-unixlike although it is permitted as an-implementation-
 defined restriction in at least POSIX.1-2001.  It breaks^Wexposes bugs
 in some old POSIX test programs and I don't have many security concerns
 so I just disable it locally:
 
 %%%
 Index: kern_prot.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_prot.c,v
 retrieving revision 1.175
 diff -u -2 -r1.175 kern_prot.c
 --- kern_prot.c   13 Jul 2003 01:22:20 -  1.175
 +++ kern_prot.c   17 Aug 2003 04:26:00 -
 @@ -1395,4 +1387,5 @@
   return (error);
 
 +#if 0
   /*
* UNIX signal semantics depend on the status of the P_SUGID
 @@ -1425,4 +1418,5 @@
   }
   }
 +#endif
 
   /*
 %%%
 
  Wot?  Why can't I send it a signal?
 
  I've read kill(2) rather carefully and cannot find the reason.  It says,
 
   For a process to have permission to send a signal to a process designated
   by pid, the real or effective user ID of the receiving process must match
   that of the sending process or the user must have appropriate privileges
   (such as given by a set-user-ID program or the user is the super-user).
 
 The implementation-defined restrictions are not documented, of course ;-).
 
  Well, the sending and receiving processes both clearly have equal uid/euid.
 
  We're not running in a jail, so I don't expect any issues there.
 
  The parent process did actually start as root and then shed privilege with
 
  struct passwd *pw = getpwnam(news);
  struct group *gr = getgrnam(news);
  gid_t gid;
 
  if (pw == NULL) {
  perror(getpwnam('news'));
  exit(1);
  }
  if (gr == NULL) {
  perror(getgrnam('news'));
  exit(1);
  }
  gid = gr-gr_gid;
  setgroups(1, gid);
  setgid(gr-gr_gid);
  setuid(pw-pw_uid);
 
  so that looks all well and fine...  so why can't it kill its own children,
  and why can't I kill one of its children from a shell with equivalent
  uid/euid?
 
 Changing the ids is one way to make the process setuid (setuid-on-exec is
 another but that doesn't seem to be the problem here).  The relevant setuid
 bit (P_SUGID) is normally cleared on exec, but perhaps it isn't here,
 either because the children don't exec or the effective ids don't match
 the real ids at the time of the exec.

The children aren't spawned via exec, but clearly they have equal 
uid/euid's.

So what you're saying, I guess, is it's not supposed to work.

I guess I'm a bit confused by the logic of this.  I've seen numerous
forking daemons over the years that do this sort of thing (not to mention
I've written a number).  I've always viewed shedding root privs as being a
good thing...  Was it really intended to break things in this manner?

  I know there's been some paranoia about signal delivery and all that, but
  my searching hasn't turned up anything that would explain this.  Certainly
  the manual page ought to be updated if this is a new expected behaviour or
  something...  at least some clue as to why it might fail would be helpful.
 
 Certainly.  It is incomplete even not counting complications for jails
 or other implementation-defined restrictions related to appropriate
 privilege.

Sigh.

Thanks for the note,

... JG
-- 
Joe Greco - sol.net Network 

Re: Someone help me understand this...?

2003-08-28 Thread Robert Watson

On Wed, 27 Aug 2003, Joe Greco wrote:

 The specific OS below is 5.1-RELEASE but apparently this happens on 4.8
 as well. 

Could you confim this happens with 4.8?  The access control checks there
are substantially different, and I wouldn't expect the behavior you're
seeing on 4.8...

...
 Well, the sending and receiving processes both clearly have equal uid/euid.
 
 We're not running in a jail, so I don't expect any issues there.
...
 
 The parent process did actually start as root and then shed privilege with
 
 struct passwd *pw = getpwnam(news);
 struct group *gr = getgrnam(news);
 gid_t gid;
 
 if (pw == NULL) {
 perror(getpwnam('news'));
 exit(1);
 }
 if (gr == NULL) {
 perror(getgrnam('news'));
 exit(1);
 }
 gid = gr-gr_gid;
 setgroups(1, gid);
 setgid(gr-gr_gid);
 setuid(pw-pw_uid);
 
 so that looks all well and fine...  so why can't it kill its own children,
 and why can't I kill one of its children from a shell with equivalent 
 uid/euid?
 
 I know there's been some paranoia about signal delivery and all that, but
 my searching hasn't turned up anything that would explain this.  Certainly
 the manual page ought to be updated if this is a new expected behaviour or
 something...  at least some clue as to why it might fail would be helpful.

The man page definitely needs to be updated, but I think it's worth having
a conversation about whether the current behavior is too conservative
first...

These changes come in response to a class of application vulnerabilities
relating to the delivery of unexpected signals.  The reason the process
in question is being treated as special from an access control perspective
is that it has undergone a credential change, resulting in the setting of
the process P_SUGID bit.  This bit remains set even if the remaining
credentials of the process appear normal -- i.e., even if ruid==euid,
rgid==egid, and can only be reset by calling execve() on a normal 
binary, which is considered sufficient to flush the state of the process. 

These processes are given special protection properties because they
almost always have cached access to memory or resources acquired using the
original credential.  For example, the process accesses the password file
while holding root privilege, which means that the process may well have
password hashes in memory from its reading the shadow password file -- in
fact, it likely even have a file descriptor to the shadow password file
still open.  The same P_SUGID flag is used to prevent against unprivileged
debugging of applications that have changed credentials and now appear
normal.  P_SUGID is also used to determine the results of the
issetugid() system call, which is used by many libraries to see if they
are running with (or have run with)  privilege and need to behave in a
more conservative manner. 

I don't remember the details, but there have been at least a couple of
demonstrated exploits of vulnerable applications using signals in which
setuid applications rely on certain signals (such as SIGALRM, SIGIO,
SIGURG) only being delivered as a result of system calls that set up
timers, IO, etc. I seem to recall it might have involved a setuid
application such as sendmail on OpenBSD, but I'll have to do some googling
and get back to you.  These protections probably fall into the same class
of conservative behavior as our preventing setuid programs from being
started with closed stdin/stdout/stderr descriptors.

Giving up privilege without performing an exec() is very difficult in
UNIX, unfortunately, since the trappings of privilege may be maintained by
libraries, etc, without the knowledge of application writers.  Right now,
signal delivery in 5.x is pretty conservative if a process has changed
credentials, to protect against tampering with a class of applications
that has, historically, been vulnerable to a broad variety of exploits. 
I've attached an (untested) patch that makes this behavior run-time
configuration using a sysctl -- when the sysctl is disabled, special-case
handling for P_SUGID processes is disabled.  I believe that this will
cause the problem you're experiencing in 5.x to go away -- please let me
know.

Clearly, unbreaking applications like Diablo by default is desirable.  At
least OpenBSD has similar protections to these turned on by default, and
possibly other systems as well.  As 5.x sees more broad use, we may well
bump into other cases where applications have similar behavior: they rely
on no special protections once they've given up privilege.  I wonder if
Diablo can run unmodified on OpenBSD; it could be they don't include
SIGALRM on the list of protect against signals, or it could be that they
modify Diablo for their environment to use an alternative signaling
mechanism.  Another alternative to this patch would simply be to add
SIGARLM to the list of acceptable signals to deliver 

Re: Someone help me understand this...?

2003-08-28 Thread Joe Greco
 On Wed, 27 Aug 2003, Joe Greco wrote:
  The specific OS below is 5.1-RELEASE but apparently this happens on 4.8
  as well. 
 
 Could you confim this happens with 4.8?  The access control checks there
 are substantially different, and I wouldn't expect the behavior you're
 seeing on 4.8...

Rather difficult.  I'll see if the client will let me trash a production
system, but usually people don't like $40K servers handing out a few
hundred megabits of traffic going out of service.  We were trying to fix
it on the scratch box (which happens to have 5.1R on it) and then were
going to see how it fared on the production systems.

 The man page definitely needs to be updated, but I think it's worth having
 a conversation about whether the current behavior is too conservative
 first...
 
 These changes come in response to a class of application vulnerabilities
 relating to the delivery of unexpected signals.  The reason the process
 in question is being treated as special from an access control perspective
 is that it has undergone a credential change, resulting in the setting of
 the process P_SUGID bit.  This bit remains set even if the remaining
 credentials of the process appear normal -- i.e., even if ruid==euid,
 rgid==egid, and can only be reset by calling execve() on a normal 
 binary, which is considered sufficient to flush the state of the process. 
 
 These processes are given special protection properties because they
 almost always have cached access to memory or resources acquired using the
 original credential.  For example, the process accesses the password file
 while holding root privilege, which means that the process may well have
 password hashes in memory from its reading the shadow password file -- in
 fact, it likely even have a file descriptor to the shadow password file
 still open.  The same P_SUGID flag is used to prevent against unprivileged
 debugging of applications that have changed credentials and now appear
 normal.  P_SUGID is also used to determine the results of the
 issetugid() system call, which is used by many libraries to see if they
 are running with (or have run with)  privilege and need to behave in a
 more conservative manner. 

Okay, well, that makes good sense.

 I don't remember the details, but there have been at least a couple of
 demonstrated exploits of vulnerable applications using signals in which
 setuid applications rely on certain signals (such as SIGALRM, SIGIO,
 SIGURG) only being delivered as a result of system calls that set up
 timers, IO, etc. I seem to recall it might have involved a setuid
 application such as sendmail on OpenBSD, but I'll have to do some googling
 and get back to you.  These protections probably fall into the same class
 of conservative behavior as our preventing setuid programs from being
 started with closed stdin/stdout/stderr descriptors.
 
 Giving up privilege without performing an exec() is very difficult in
 UNIX, unfortunately, since the trappings of privilege may be maintained by
 libraries, etc, without the knowledge of application writers.  Right now,
 signal delivery in 5.x is pretty conservative if a process has changed
 credentials, to protect against tampering with a class of applications
 that has, historically, been vulnerable to a broad variety of exploits. 
 I've attached an (untested) patch that makes this behavior run-time
 configuration using a sysctl -- when the sysctl is disabled, special-case
 handling for P_SUGID processes is disabled.  I believe that this will
 cause the problem you're experiencing in 5.x to go away -- please let me
 know.

Well, I'm hoping more for a general fix for Diablo, rather than a special
patch for the OS.

 Clearly, unbreaking applications like Diablo by default is desirable.  At
 least OpenBSD has similar protections to these turned on by default, and
 possibly other systems as well.  As 5.x sees more broad use, we may well
 bump into other cases where applications have similar behavior: they rely
 on no special protections once they've given up privilege.  I wonder if
 Diablo can run unmodified on OpenBSD; it could be they don't include
 SIGALRM on the list of protect against signals, or it could be that they
 modify Diablo for their environment to use an alternative signaling
 mechanism.  Another alternative to this patch would simply be to add
 SIGARLM to the list of acceptable signals to deliver in the
 privilege-change case.

I wonder if it would be reasonable to have some sort of interface that
allowed a program to tell FreeBSD not to set this flag...  if not, at least
if there was a sysctl, code could be added so that the daemon checked the
flag when starting and errored out if it wasn't set.

 BTW, it's worth noting that the mechanism Diablo is using to give up
 privilege actually does retain some privileges -- it doesn't, for
 example, synchronize its resource limits with those of the user it is
 switching to, so it retains the starting resource limits (likely those of
 

Re: Someone help me understand this...?

2003-08-28 Thread Robert Watson

On Thu, 28 Aug 2003, Joe Greco wrote:

  On Wed, 27 Aug 2003, Joe Greco wrote:
   The specific OS below is 5.1-RELEASE but apparently this happens on 4.8
   as well. 
  
  Could you confim this happens with 4.8?  The access control checks there
  are substantially different, and I wouldn't expect the behavior you're
  seeing on 4.8...
 
 Rather difficult.  I'll see if the client will let me trash a production
 system, but usually people don't like $40K servers handing out a few
 hundred megabits of traffic going out of service.  We were trying to fix
 it on the scratch box (which happens to have 5.1R on it) and then were
 going to see how it fared on the production systems. 

I think it's safe to assume that if you're seeing a similar failure,
there's a different source given my reading of the code, but I'm willing
to be proven wrong.  It's probably not worth the investment if you're
talking about large quantities of money, though.

  Clearly, unbreaking applications like Diablo by default is desirable.  At
  least OpenBSD has similar protections to these turned on by default, and
  possibly other systems as well.  As 5.x sees more broad use, we may well
  bump into other cases where applications have similar behavior: they rely
  on no special protections once they've given up privilege.  I wonder if
  Diablo can run unmodified on OpenBSD; it could be they don't include
  SIGALRM on the list of protect against signals, or it could be that they
  modify Diablo for their environment to use an alternative signaling
  mechanism.  Another alternative to this patch would simply be to add
  SIGARLM to the list of acceptable signals to deliver in the
  privilege-change case.
 
 I wonder if it would be reasonable to have some sort of interface that
 allowed a program to tell FreeBSD not to set this flag...  if not, at
 least if there was a sysctl, code could be added so that the daemon
 checked the flag when starting and errored out if it wasn't set. 

We actually have such an interface, but it's only enabled for the purposes
of regression testing.  If you compile options REGRESSION into the
kernel configuration, a new system call __setsugid(), is exposed to
applications.  It's used by src/tools/regression/security/proc_to_proc to
make it easier to set up process pairs for regression testing of
inter-process access control.  When I added it, there was some interest in
just making it setsugid() and exposing it to all processes.  Maybe we
should just go this route for 5.2-RELEASE.  Invoking it with a (0)
argument would mean the application writer accepted the inherrent risks.

However, this would open the application to the risks of debugging
attachment, which are probably greater than the signal risks in most
cases.  It's not clear what the best way to express I want to accept
these risks but not those risks would be...  So far, it sounds like
we have three work-arounds in the pot, perhaps we can think of something
better:

(1) Remove SIGALRM from the list of prohibited signals in the P_SUGID
case.  Not clear what the risks are here based on common application
use, but this is an easy change to make.

(2) Add setsugid() to allow applications to give up implicit protections
associated with credential changes.  This comes with greater risks, I
suspect, since it opens up applications to more explicit
vulnerabilities:  signal attacks require more sophistication and luck,
but debugging attacks are easy.

(3) Allow administrators to selectively disable the more restrictive
signal checks at a system scope using a sysctl.  This is easy, and
comes with no risks as long as the setting is unchanged (the default
in the patch I sent out earlier). 

I'm tempted to commit (1) immediately to allow a workaround if we get
nothing else figured out, and to think some more about (2) and (3).
Another possibility would be to encourage application writers to avoid
overloading signals that already have meanings, and rely on the USR
signals.  I assume the reason Diablo uses ALRM is that the USR signals
already have assigned semantics?

  BTW, it's worth noting that the mechanism Diablo is using to give up
  privilege actually does retain some privileges -- it doesn't, for
  example, synchronize its resource limits with those of the user it is
  switching to, so it retains the starting resource limits (likely those of
  the root account). 
 
 That's actually preferred in most cases.  News servers almost always eat
 far more resources than whatever limits you might set by default, which
 just turns into telling people to remove the limits or use root's
 limits.  Generally if a news package bumps limits bad things happen. 

Right now, most applications in the base system make use of the
setusercontext() call to modify their protections as part of a switch of
users.  They often pass in the flag LOGIN_SETALL and then remove the bits
they don't need, such as LOGIN_SETRESOURCES.  This also has the side
effect 

Re: Someone help me understand this...?

2003-08-28 Thread Joe Greco
 On Thu, 28 Aug 2003, Joe Greco wrote:
   On Wed, 27 Aug 2003, Joe Greco wrote:
The specific OS below is 5.1-RELEASE but apparently this happens on 4.8
as well. 
   
   Could you confim this happens with 4.8?  The access control checks there
   are substantially different, and I wouldn't expect the behavior you're
   seeing on 4.8...
  
  Rather difficult.  I'll see if the client will let me trash a production
  system, but usually people don't like $40K servers handing out a few
  hundred megabits of traffic going out of service.  We were trying to fix
  it on the scratch box (which happens to have 5.1R on it) and then were
  going to see how it fared on the production systems. 
 
 I think it's safe to assume that if you're seeing a similar failure,
 there's a different source given my reading of the code, but I'm willing
 to be proven wrong.  It's probably not worth the investment if you're
 talking about large quantities of money, though.

It's more like large quantities of annoyance and work.  Can you describe
the case you're envisioning?  If I can easily poke at it, I can at least
get some clues.

   Clearly, unbreaking applications like Diablo by default is desirable.  At
   least OpenBSD has similar protections to these turned on by default, and
   possibly other systems as well.  As 5.x sees more broad use, we may well
   bump into other cases where applications have similar behavior: they rely
   on no special protections once they've given up privilege.  I wonder if
   Diablo can run unmodified on OpenBSD; it could be they don't include
   SIGALRM on the list of protect against signals, or it could be that they
   modify Diablo for their environment to use an alternative signaling
   mechanism.  Another alternative to this patch would simply be to add
   SIGARLM to the list of acceptable signals to deliver in the
   privilege-change case.
  
  I wonder if it would be reasonable to have some sort of interface that
  allowed a program to tell FreeBSD not to set this flag...  if not, at
  least if there was a sysctl, code could be added so that the daemon
  checked the flag when starting and errored out if it wasn't set. 
 
 We actually have such an interface, but it's only enabled for the purposes
 of regression testing.  If you compile options REGRESSION into the
 kernel configuration, a new system call __setsugid(), is exposed to
 applications.  It's used by src/tools/regression/security/proc_to_proc to
 make it easier to set up process pairs for regression testing of
 inter-process access control.  When I added it, there was some interest in
 just making it setsugid() and exposing it to all processes.  Maybe we
 should just go this route for 5.2-RELEASE.  Invoking it with a (0)
 argument would mean the application writer accepted the inherrent risks.
 
 However, this would open the application to the risks of debugging
 attachment, which are probably greater than the signal risks in most
 cases.  It's not clear what the best way to express I want to accept
 these risks but not those risks would be...  So far, it sounds like
 we have three work-arounds in the pot, perhaps we can think of something
 better:
 
 (1) Remove SIGALRM from the list of prohibited signals in the P_SUGID
 case.  Not clear what the risks are here based on common application
 use, but this is an easy change to make.
 
 (2) Add setsugid() to allow applications to give up implicit protections
 associated with credential changes.  This comes with greater risks, I
 suspect, since it opens up applications to more explicit
 vulnerabilities:  signal attacks require more sophistication and luck,
 but debugging attacks are easy.
 
 (3) Allow administrators to selectively disable the more restrictive
 signal checks at a system scope using a sysctl.  This is easy, and
 comes with no risks as long as the setting is unchanged (the default
 in the patch I sent out earlier). 
 
 I'm tempted to commit (1) immediately to allow a workaround if we get
 nothing else figured out, and to think some more about (2) and (3).
 Another possibility would be to encourage application writers to avoid
 overloading signals that already have meanings, and rely on the USR
 signals.  I assume the reason Diablo uses ALRM is that the USR signals
 already have assigned semantics?

Correct.  The USR signals control debug levels.  If it was a signal that
was only used internally, it could be changed, of course, but changing a
signal used by humans (and one used in the same manner as other programs)
is probably a bad idea.

   BTW, it's worth noting that the mechanism Diablo is using to give up
   privilege actually does retain some privileges -- it doesn't, for
   example, synchronize its resource limits with those of the user it is
   switching to, so it retains the starting resource limits (likely those of
   the root account). 
  
  That's actually preferred in most cases.  News servers almost always eat
  far more 

Someone help me understand this...?

2003-08-27 Thread Joe Greco
I've got a weirdness with kill(2).

This code is out of Diablo, the news package, and has been working fine for
some years.  It apparently works fine on other OS's.

In the Diablo model, the parent process may choose to tell its children to
update status via a signal.  The loop basically consists of going through
and issuing a SIGALRM.

This stopped working a while ago, don't know precisely when.  I was in the
process of debugging it today and ran into this.

The specific OS below is 5.1-RELEASE but apparently this happens on 4.8 as
well.

%echo $$
29047
%ps -O ruid,uid | egrep '28949|29045|29047'
28949 8 8  p0  I  0:00.00 diablo: ihav=0chk=0rec=0 ent=0
29045 8 8  p0  I  0:00.00 sleep 99
29047 8 8  p0  D  0:00.01 -su (csh)
%kill -ALRM 28949
28949: Operation not permitted
%kill -ALRM 29045
%ps -O ruid,uid | egrep '28949|29045'
28949 8 8  p0  I  0:00.00 diablo: ihav=0chk=0rec=0 ent=0
%

Wot?  Why can't I send it a signal?

I've read kill(2) rather carefully and cannot find the reason.  It says,

 For a process to have permission to send a signal to a process designated
 by pid, the real or effective user ID of the receiving process must match
 that of the sending process or the user must have appropriate privileges
 (such as given by a set-user-ID program or the user is the super-user).

Well, the sending and receiving processes both clearly have equal uid/euid.

We're not running in a jail, so I don't expect any issues there.

The parent process did actually start as root and then shed privilege with

struct passwd *pw = getpwnam(news);
struct group *gr = getgrnam(news);
gid_t gid;

if (pw == NULL) {
perror(getpwnam('news'));
exit(1);
}
if (gr == NULL) {
perror(getgrnam('news'));
exit(1);
}
gid = gr-gr_gid;
setgroups(1, gid);
setgid(gr-gr_gid);
setuid(pw-pw_uid);

so that looks all well and fine...  so why can't it kill its own children,
and why can't I kill one of its children from a shell with equivalent 
uid/euid?

I know there's been some paranoia about signal delivery and all that, but
my searching hasn't turned up anything that would explain this.  Certainly
the manual page ought to be updated if this is a new expected behaviour or
something...  at least some clue as to why it might fail would be helpful.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]