> > > > Do you know what the race is? > > Apparently it's a race between deleting a process and accessing its > /proc/pid entries. It came out in pidof while it was accessing > /proc/pid/stat (fs/proc/array.c:do_task_stat crashed on first > instruction - it was an inline function accessing task->state, > get_task_state IIRC). oops (with vserver history data - I'm using a > patch mentioned below) is attached. > > > > > How does one reproduce it? > > I managed to reproduce it (although not reliably) during high CPU load > and I/O (parallel kernel compiles) on SMP systems with the vserver > patch (http://linux-vserver.org, the exact patch is > http://vserver.13thfloor.at/Experimental/patch-2.6.14.2-vs2.1.0-rc8.diff), > but the vserver maintainer pointed out that it probably is a mainline > issue. We're not using 2.6 systems too much except for the vserver > test beds so I cannot tell if it happens on vanilla kernels. > > > > > > The following micro-patch seems to fix it. > > > > It might be right, or it might be a workaround.. > > > > I'm not a kernel guru so it's just my proposal. Can it break anything? > An alternative _might_ be somewhat coarser task_struct locking > (do_task_stat grabs a spinlock but then it's already too late). > However, if no "right" solution appears, I'll keep using my two-liner > because it seems to help, at least in my setup. >
Oh well, I got another oops in the very same place with the patch applied. So now I surrounded the check with read_[un]lock(&tasklist_lock) and added a check to do_task_stat (both now have a printk). If it builds, boots and doesn't crash, I'll post the patch. Best regards, Grzegorz Nosek _______________________________________________ Vserver mailing list [email protected] http://list.linux-vserver.org/mailman/listinfo/vserver
