Date: Sun, 11 Mar 2018 11:06:33 +0100 From: Martin Husemann <mar...@duskware.de> Message-ID: <20180311100633.ge23...@mail.duskware.de>
| I don't get this part - how would we end up with the new process using | the same pid? >From what I can see from glancing at the code, the issue is an attempt to monitor an unrelated process - one that is neither a child, nor being ptrace'd. That process can exit, its zombie be cleaned up, and then a new process created which happens to have the same pid as the previous one had (most of this is intended to happen quite quickly, but there's no guarantee of that - the process doing the monitoring could be suspended and wake up days after the process it was looking for vanished). All this is inherantly unreliable, and nothing that is done, beyond adding a whole new mechanism to hold a process that some other process has an interest it, will ever fix it. Kamil: What I don't understand is how you were ever getting the process returned twice? You're using the sysctl to look for a specific pid right.? When that pid is found, the sysctl code should simply copy out the relevant datea, and return. There's no point searching further in the lists, one pid can only exist once at a time - once found it is found.... If that is not the way the sysctl lookup code is working then we should probably fix it. There cannot be 2 processes with pid N at any one instant, so looking for a specific pid should only ever be able to return 1 (or "not found" of course). It is possible to not find it, depending on what kind of locking the finding code is doing (for this, just being an "observation" interface, I'd assume the minimum possible) even though it exists, if the lists are changing underneath the search - but given the nature of what happens to a process, a search of zombproc, allproc, zombproc (stopping when found) will either find the process or the process does not exist - and possibly just allproc followed by zombproc searches would work as well. kre