Re: [PATCHv4 1/1] Documentation: describe how to add a system call

2015-08-13 Thread Jonathan Corbet
On Mon, 10 Aug 2015 09:00:44 +0100
David Drysdale  wrote:

> Add a document describing the process of adding a new system call,
> including the need for a flags argument for future compatibility, and
> covering 32-bit/64-bit concerns (albeit in an x86-centric way).

This seems about ready, so I've gone ahead and applied it to the docs
tree.  Many thanks!

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv4 1/1] Documentation: describe how to add a system call

2015-08-10 Thread David Drysdale
Add a document describing the process of adding a new system call,
including the need for a flags argument for future compatibility, and
covering 32-bit/64-bit concerns (albeit in an x86-centric way).

Signed-off-by: David Drysdale 
Reviewed-by: Michael Kerrisk 
Reviewed-by: Eric B Munson 
Reviewed-by: Kees Cook 
Reviewed-by: Randy Dunlap 
Reviewed-by: Josh Triplett 
---
 Documentation/adding-syscalls.txt | 527 ++
 1 file changed, 527 insertions(+)
 create mode 100644 Documentation/adding-syscalls.txt

diff --git a/Documentation/adding-syscalls.txt 
b/Documentation/adding-syscalls.txt
new file mode 100644
index ..cc2d4ac4f404
--- /dev/null
+++ b/Documentation/adding-syscalls.txt
@@ -0,0 +1,527 @@
+Adding a New System Call
+
+
+This document describes what's involved in adding a new system call to the
+Linux kernel, over and above the normal submission advice in
+Documentation/SubmittingPatches.
+
+
+System Call Alternatives
+
+
+The first thing to consider when adding a new system call is whether one of
+the alternatives might be suitable instead.  Although system calls are the
+most traditional and most obvious interaction points between userspace and the
+kernel, there are other possibilities -- choose what fits best for your
+interface.
+
+ - If the operations involved can be made to look like a filesystem-like
+   object, it may make more sense to create a new filesystem or device.  This
+   also makes it easier to encapsulate the new functionality in a kernel module
+   rather than requiring it to be built into the main kernel.
+ - If the new functionality involves operations where the kernel notifies
+   userspace that something has happened, then returning a new file
+   descriptor for the relevant object allows userspace to use
+   poll/select/epoll to receive that notification.
+ - However, operations that don't map to read(2)/write(2)-like operations
+   have to be implemented as ioctl(2) requests, which can lead to a
+   somewhat opaque API.
+ - If you're just exposing runtime system information, a new node in sysfs
+   (see Documentation/filesystems/sysfs.txt) or the /proc filesystem may be
+   more appropriate.  However, access to these mechanisms requires that the
+   relevant filesystem is mounted, which might not always be the case (e.g.
+   in a namespaced/sandboxed/chrooted environment).  Avoid adding any API to
+   debugfs, as this is not considered a 'production' interface to userspace.
+ - If the operation is specific to a particular file or file descriptor, then
+   an additional fcntl(2) command option may be more appropriate.  However,
+   fcntl(2) is a multiplexing system call that hides a lot of complexity, so
+   this option is best for when the new function is closely analogous to
+   existing fcntl(2) functionality, or the new functionality is very simple
+   (for example, getting/setting a simple flag related to a file descriptor).
+ - If the operation is specific to a particular task or process, then an
+   additional prctl(2) command option may be more appropriate.  As with
+   fcntl(2), this system call is a complicated multiplexor so is best reserved
+   for near-analogs of existing prctl() commands or getting/setting a simple
+   flag related to a process.
+
+
+Designing the API: Planning for Extension
+-
+
+A new system call forms part of the API of the kernel, and has to be supported
+indefinitely.  As such, it's a very good idea to explicitly discuss the
+interface on the kernel mailing list, and it's important to plan for future
+extensions of the interface.
+
+(The syscall table is littered with historical examples where this wasn't done,
+together with the corresponding follow-up system calls -- eventfd/eventfd2,
+dup2/dup3, inotify_init/inotify_init1,  pipe/pipe2, renameat/renameat2 -- so
+learn from the history of the kernel and plan for extensions from the start.)
+
+For simpler system calls that only take a couple of arguments, the preferred
+way to allow for future extensibility is to include a flags argument to the
+system call.  To make sure that userspace programs can safely use flags
+between kernel versions, check whether the flags value holds any unknown
+flags, and reject the system call (with EINVAL) if it does:
+
+if (flags & ~(THING_FLAG1 | THING_FLAG2 | THING_FLAG3))
+return -EINVAL;
+
+(If no flags values are used yet, check that the flags argument is zero.)
+
+For more sophisticated system calls that involve a larger number of arguments,
+it's preferred to encapsulate the majority of the arguments into a structure
+that is passed in by pointer.  Such a structure can cope with future extension
+by including a size argument in the structure:
+
+struct xyzzy_params {
+u32 size; /* userspace sets p->size = sizeof(struct xyzzy_params) */
+u32 param_1;
+