On Fri, Jan 08, 2016 at 07:08:19AM +0000, David Holland wrote: > On Thu, Jan 07, 2016 at 07:34:33AM +0800, Paul Goyette wrote: > > Based on internal implementation of filemon(4), there is an ordering > > requirement imposed on the sequence of events that occur when a process > > using /dev/filemon exits. In particular, the file descriptor on which the > > device is open needs to be closed prior to closing the descriptor to which > > avent/activity records are being logged. (Otherwise, because of an extra > > reference on the log file, fd_close() will hang indefinitely.) > > Looking at the filemon code... it is completely wrong. It should not > be using file handles like that, especially not in a device driver. > Rather than adding new hacks to the system to work around it being > silly, it should be redone properly. > > For an example of the right way to do this kind of thing, look in > kern_acct.c. >
I would like to add my $0,03 seeing that there is time invested into what I consider a lost cause. I have to agree that filemon is misdesigned and of low quality in general. While I don't know all the details, it is clear that the purpose would be much better served by ktrace and I would argue efforts should be spent there. bmake takes filemon output and copies it into some file. >From usability perspective what seems to be needed is few hacks to ktrace to only log the stuff make is interested in and parsing code for bmake to translate to what looks like current filemon format for aformentioned copying. I only had a brief look at ktrace implementation. It uses global locks, so that would have to be worked on, but it should be trivial to at least make tracees using different output files not compete with each other. So, what's so fitting about ktrace design: tracing is tied to the target process, can react to things like changing credentials and reparenting (fork + parent exit), but most importantly it does not make assumptions about things which can change without its control (see below). Filemon monitors stuff at syscall level, stores the fd and remembers pids. Let's see how this leads to trouble. Some of these problems were also present in FreeBSD implementation (and some still are). A bonus interesting fact is that implementations have diverged, although are still fundamentally broken. http://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon.c#filemon_ioctl > case FILEMON_SET_PID: > /* Set the monitored process ID - if allowed. */ > mutex_enter(proc_lock); > tp = proc_find(*((pid_t *) data)); There are no steps taken to keep tp around nor in a certain state (see below). proc_find only finds the pointer. > mutex_exit(proc_lock); > if (tp == NULL || > tp->p_emul != &emul_netbsd) { tp could have exited and be freed by now. Now that the comparison was done it could have p_emul changed. > error = ESRCH; > break; > } > > error = kauth_authorize_process(curproc->p_cred, > KAUTH_PROCESS_CANSEE, tp, > KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL); Now that the check is pased the process could have executed something setuid. I'm somewhat surprised this code is here. Should not kauth_authorize_process assert that tp has stable credentials in some way, in effect just crashing filemon consumers? > if (!error) { > filemon->fm_pid = tp->p_pid; > } Pid is stored, but tp is not held. What if the 'traced' process exits? grep only reveals the following bit in filemon_wrapper_sys_exit: > if (filemon->fm_pid == curproc->p_pid) > filemon_printf(filemon, "# Bye bye\n"); So filemon will continue tracing anything which happened to reuse the pid. "Tracing points" are at syscall layer. With only pid remembered this adds up to serious breakage induced by the need to look the filemon structure up each time, and based on unreliable data. >static struct filemon * >filemon_pid_check(struct proc * p) >{ > struct filemon *filemon; > struct proc * lp; > > if (!TAILQ_EMPTY(&filemons_inuse)) { > while (p) { > /* > * make sure p cannot exit > * until we have moved on to p_pptr > */ > rw_enter(&p->p_reflock, RW_READER); > TAILQ_FOREACH(filemon, &filemons_inuse, fm_link) { > if (p->p_pid == filemon->fm_pid) { > rw_exit(&p->p_reflock); > return (filemon); > } > } But p_pid should be constant for 'struct proc' lifetime, so locking this early is wasteful. > lp = p; > p = p->p_pptr; > rw_exit(&lp->p_reflock); I guess the lock is needed to read the pointer to the parent. Nothing was done to keep the parent around, which means it could have exited and be freed by now. Which in turn makes the second loop iteration a use-after-free. The first iteration is safe if the argument is curproc (which it is), as it clearly cannot disappear. Turns out this traversal is even more wrong since p_pptr does not have to be the process you are looking for - ptrace is reparenting tracees. > } > } > return (NULL); >} -- Mateusz Guzik <mjguzik gmail.com>