Re: [PATCH v4] fs/proc: Expose RSEQ configuration

2021-03-10 Thread Piotr Figiel
On Tue, Feb 02, 2021 at 06:37:09PM +0100, Piotr Figiel wrote:
> For userspace checkpoint and restore (C/R) some way of getting process
> state containing RSEQ configuration is needed.

[...]

> To achieve above goals expose the RSEQ ABI address and the signature
> value with the new procfs file "/proc//rseq".

For the record: this idea was dropped in favor of ptrace approach, as
discussed over separate mail thread with Mathieu Desnoyers and Peter
Zijlstra.  The equivalent ptrace patch in its current version is here:

https://lore.kernel.org/lkml/20210226135156.1081606-1-fig...@google.com/

Best regards,
Piotr.


[PATCH v4] fs/proc: Expose RSEQ configuration

2021-02-02 Thread Piotr Figiel
For userspace checkpoint and restore (C/R) some way of getting process
state containing RSEQ configuration is needed.

There are two ways this information is going to be used:
 - to re-enable RSEQ for threads which had it enabled before C/R
 - to detect if a thread was in a critical section during C/R

Since C/R preserves TLS memory and addresses RSEQ ABI will be restored
using the address registered before C/R.

Detection whether the thread is in a critical section during C/R is
needed to enforce behavior of RSEQ abort during C/R. Attaching with
ptrace() before registers are dumped itself doesn't cause RSEQ abort.
Restoring the instruction pointer within the critical section is
problematic because rseq_cs may get cleared before the control is
passed to the migrated application code leading to RSEQ invariants not
being preserved.

To achieve above goals expose the RSEQ ABI address and the signature
value with the new procfs file "/proc//rseq".

Signed-off-by: Piotr Figiel 

---

v4:
 - added documentation and extended comment before task_lock()
v3:
 - added locking so that the proc file always shows consistent pair of
   RSEQ ABI address and the signature
 - changed string formatting to use %px for the RSEQ ABI address
v2:
 - fixed string formatting for 32-bit architectures
v1:
 - https://lkml.kernel.org/r/20210113174127.2500051-1-fig...@google.com

---
 Documentation/filesystems/proc.rst | 16 
 fs/exec.c  |  2 ++
 fs/proc/base.c | 22 ++
 include/linux/sched/task.h |  3 ++-
 kernel/rseq.c  |  4 
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index 2fa69f710e2a..d887666dc849 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -47,6 +47,7 @@ fixes/update part 1.1  Stefani Seibold   
  June 9 2009
   3.10  /proc//timerslack_ns - Task timerslack value
   3.11 /proc//patch_state - Livepatch patch operation state
   3.12 /proc//arch_status - Task architecture specific information
+  3.13 /proc//rseq - RSEQ configuration state
 
   4Configuring procfs
   4.1  Mount options
@@ -2131,6 +2132,21 @@ AVX512_elapsed_ms
   the task is unlikely an AVX512 user, but depends on the workload and the
   scheduling scenario, it also could be a false negative mentioned above.
 
+3.13   /proc//rseq - RSEQ configuration state
+---
+This file provides RSEQ configuration of a thread. Available fields correspond
+to the rseq() syscall parameters and are:
+
+ - RSEQ ABI structure address shared between the kernel and user-space
+ - signature value expected before the abort handler code
+
+Both values are in hexadecimal format, for example::
+
+   # cat /proc/12345/rseq
+   abcdef12340 aabb0011
+
+This file is only present if CONFIG_RSEQ is enabled.
+
 Chapter 4: Configuring procfs
 =
 
diff --git a/fs/exec.c b/fs/exec.c
index 5d4d52039105..5d84f98847f1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1830,7 +1830,9 @@ static int bprm_execve(struct linux_binprm *bprm,
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
+   task_lock(current);
rseq_execve(current);
+   task_unlock(current);
acct_update_integrals(current);
task_numa_free(current, false);
return retval;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b3422cda2a91..89232329d966 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -662,6 +662,22 @@ static int proc_pid_syscall(struct seq_file *m, struct 
pid_namespace *ns,
 
return 0;
 }
+
+#ifdef CONFIG_RSEQ
+static int proc_pid_rseq(struct seq_file *m, struct pid_namespace *ns,
+   struct pid *pid, struct task_struct *task)
+{
+   int res = lock_trace(task);
+
+   if (res)
+   return res;
+   task_lock(task);
+   seq_printf(m, "%px %08x\n", task->rseq, task->rseq_sig);
+   task_unlock(task);
+   unlock_trace(task);
+   return 0;
+}
+#endif /* CONFIG_RSEQ */
 #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */
 
 //
@@ -3182,6 +3198,9 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("comm",  S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
ONE("syscall",S_IRUSR, proc_pid_syscall),
+#ifdef CONFIG_RSEQ
+   ONE("rseq",   S_IRUSR, proc_pid_rseq),
+#endif
 #endif
REG("cmdline",S_IRUGO, proc_pid_cmdline_ops),
ONE("stat",   S_IRUGO, proc_tgid_stat),
@@ -3522,6 +3541,9 @@ static const struct pid_entry tid_base_stuff[] = {
 _pid_set_comm_operations, {}),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
ONE("syscall",   S_IRUSR, proc_pid_syscall),
+#ifdef CONFIG_RSEQ
+