Hi Renzo! Sorry for the long delay in getting back to you about these issues.
Please take a look at the current GIT trees or patches for context. Your comments referred to details of the old-style utrace interface, and we have the new one in place now. I just did some further tweaks to syscall tracing today. Re: 1- TIF_SYSCALL_EMU is useless. In the new implementation, TIF_SYSCALL_EMU is left as it is for the old x86 ptrace. With CONFIG_UTRACE_PTRACE=y it's not used at all. So this point is moot. (All old ptrace code is unperturbed unless CONFIG_UTRACE_PTRACE=y.) Re: 2- "skip syscall" management. There are two layers to this: the arch<->generic interface, and utrace. Firstly, just about the arch layer. I came to your way of thinking for having it abort/skip the syscall without losing the information of the syscall number from its original location (orig_ax on x86). So, now tracehook_report_syscall_entry() returns nonzero to arch code to tell it that it must skip the syscall and go directly to syscall-exit tracing. The tracehook return value replaces the syscall_abort() call in asm/syscall.h, which is now gone. Instead, I've added a function syscall_rollback() to the asm/syscall.h spec. This can be used from syscall-exit tracing to restore the user registers to what they were at syscall entry after tracehook_report_syscall_entry() aborted the call. (This isn't used for PTRACE_SYSEMU, since its traditional behavior is to leave -ENOSYS in ax and the original ax in orig_ax.) But in general for writers of modules wanting to do syscall emulation sorts of things, using syscall_rollback() glosses over the internal arch-specific idiosyncracies of syscall tracing. The details of how to implement the treatment of a nonzero return from tracehook_report_syscall_entry() optimally is entirely up to the arch and just depends on the details of its assembly paths. I implemented the new behavior on x86 and powerpc. The way I did it there doesn't have deep meaning, it's just what makes things simple and streamlined in the assembly code on each machine. The important thing is that it skips the syscall without clobbering the pt_regs field that holds the syscall number, so that syscall_rollback() has the information to find later. For utrace, the report_syscall_entry callback now has more meaningful bits in its @action argument and its return value. These are shown in enum utrace_syscall_action. Similar to the report_signal callback, those bits in the @action argument are the choice made by the previous engine to get this callback, but the bits in your return value override the last engine's choice (and are overridden by the next engine). So if you ignore the @action argument in your callback, you will execute the system call that another engine wanted aborted/emulated. Under CONFIG_UTRACE_PTRACE=y, we now implement PTRACE_SYSEMU using UTRACE_SYSCALL_ABORT and do not use TIF_SYSCALL_EMU at all. A caveat is that when your callback's return value uses UTRACE_STOP, your UTRACE_SYSCALL_* choice in that same return value is fixed when that callback pass finishes. When utrace_control() later resumes the thread, it will wake up and either run or skip the syscall depending on the choice made in the report_syscall_entry callback's return value before it stopped. This means you can't implement something like PTRACE_SYSVM. You can have a tracing engine that does whatever fancy things it can do synchronously in the report_syscall_entry callback (without blocking) to decide whether to run or skip the syscall. But you can't stop, e.g. to be woken up by an asynchronous wakeup call from a user-level debugger/controller, before you make your choice. What I am contemplating is this. As now, when you return UTRACE_STOP the UTRACE_SYSCALL_* choice you return also holds sway by default. That is, when you are woken up with UTRACE_RESUME. But, waking from stop at syscall-entry with UTRACE_REPORT would have a special meaning. Then a second round of report_syscall_entry callbacks is made to interested engines, but the @action argument starts with a special value, UTRACE_SYSCALL_REPORT. This tells the engine that this is not a fresh syscall entry event, but just restarting after UTRACE_STOP was used at the same syscall entry last reported. It now has a chance to return its new choice of RUN or ABORT, taking into account whatever bookkeeping might have been tweaked while we were stopped. I think that's a bit dubiously hairy. I haven't done it, it's just a thought. But if this caveat is a real pain point for you, then we should iron something out. Re: 3- utrace module nesting (again) I take your point here. I think it probably does indeed make sense to reverse the order of engines for syscall-entry vs all other events. However, I still haven't done it in the current version. The trouble with this is some hairy implementation details. It's how we use the list with RCU to support asynchronous attach. That makes it impossible to safely do reverse walks. I may want to get rid of the RCU lists for implementation reasons anyway. That would mean plain list.h lists where either order of iteration is easy. But bear with me for the moment. As the utrace implementation now stands, we can't change the order for one event. At the utrace level, we can continue to work out kinks in the future. What I hope right now is that the tracehook_report_syscall_entry() return value protocol and syscall_rollback() are sufficient arch interfaces for whatever we might build. I think it's a reasonable balance of keeping the arch work fairly minimal (especially the assembly tweaking) and giving a higher level of functionality than most arch's had before, in a clean way. If this is arch interface is workable from the perspective of utrace and above, then we can stabilize the tracehooks spec and get arch folks on the job sooner. Thanks, Roland
