--On Wednesday, January 30, 2008 06:14:02 PM +1100 Mike Battersby <[EMAIL PROTECTED]> wrote:

1. SSYS process exiting considered harmful

  The first problem is that setting process flag SSYS on a process that
  exits, as the afs_osi_Invisible routine on Solaris 10 does, causes the
  system not to clean up the contract state of the process.  This leaves
  a dangling kernel-memory pointer in the contract table which used to
  point to the process struct.

  Any user can corrupt kernel memory and cause a panic with the 'ctstat'
  command and the system cannot shut down without either panicing or
  going into an infinite loop as svc.startd repeatedly tries to kill the
  non-existent process.

I really don't know why the code would set SSYS on a userland process
that's about to exit in the first place.  Can anyone shed any light?

Threads that call afs_osi_Invisible are not about to exit; they're about to become long-lived AFS kernel threads. Setting SSYS is correct; we just need to figure out how to clean it up when the process exits. The right thing to do here is probably to introduce a new osi-layer function to be called just before such a daemon exits, which on Solaris could reasonably turn SSYS back off.

There's another issue here, which is that AFS's kernel threads probably should not be considered part of the contract under which afsd is started. That is certain to cause all sorts of havoc as SMF tries to kill off the contract if afsd should die prematurely. I'll leave it somewhat up in the air whether the right place to fix this is in afsd or in the kernel code.

I'm not sure of the placing of the cleanup code for case #2, as no
spot seems particularly better than another in afs_shutdown().

On the contrary, the shutdown process is carefully orchestrated to insure that each subsystem is shut down only when nothing is depending on it still being up. The required order is similar to the reverse of startup order, but not exactly the same.

In this case, shutting down the interface poll task fairly late is probably the right thing. You probably should do it before setting afs_termState to AFSOP_STOP_COMPLETE, though. More importantly, you destroy the task queue and the lock it uses without making sure the task isn't currently running! Simply returning at the start of the task if afs_shuttingdown is true isn't good enough; in fact, that does almost nothing -- if the task is _not_ running when you shut down, then destroying the queue should prevent it from being started again. If it _is_ running, then it's almost certainly past that check, and is eventually going to end up touching the lock and/or the task queue you've already destroyed.



Since it is fairly small I've included it here.  I apologise if that's
against list etiquette.

Including it here is fine, but a better approach would have been to send it to openafs-bugs and then mention the ticket number here; that way it makes its way into the bug-tracking system.

-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]>
  Carnegie Mellon University - Pittsburgh, PA

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to