[PATCH] uprobe: add support for overlayfs

2018-04-10 Thread Howard McLauchlan
uprobes cannot successfully attach to binaries located in a directory
mounted with overlayfs.

To verify, create directories for mounting overlayfs
(upper,lower,work,merge), move some binary into merge/ and use readelf
to obtain some known instruction of the binary. I used /bin/true and the
entry instruction(0x13b0):

$ mount -t overlay overlay -o 
lowerdir=lower,upperdir=upper,workdir=work merge
$ cd /sys/kernel/debug/tracing
$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
$ echo 1 > events/uprobes/true_entry/enable

This returns 'bash: echo: write error: Input/output error' and dmesg
tells us 'event trace: Could not enable event true_entry'

This change makes create_trace_uprobe() look for the real inode of a
dentry. In the case of normal filesystems, this simplifies to just
returning the inode. In the case of overlayfs(and similar fs) we will
obtain the underlying dentry and corresponding inode, upon which uprobes
can successfully register.

Running the example above with the patch applied, we can see that the
uprobe is enabled and will output to trace as expected.

Reviewed-by: Josef Bacik <jba...@fb.com>
Signed-off-by: Howard McLauchlan <hmclauch...@fb.com>
---
 kernel/trace/trace_uprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2014f4351ae0..17c65fa4136d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -446,7 +446,7 @@ static int create_trace_uprobe(int argc, char **argv)
if (ret)
goto fail_address_parse;
 
-   inode = igrab(d_inode(path.dentry));
+   inode = igrab(d_real_inode(path.dentry));
path_put();
 
if (!inode || !S_ISREG(inode->i_mode)) {
-- 
2.17.0



[PATCH] uprobe: add support for overlayfs

2018-04-10 Thread Howard McLauchlan
uprobes cannot successfully attach to binaries located in a directory
mounted with overlayfs.

To verify, create directories for mounting overlayfs
(upper,lower,work,merge), move some binary into merge/ and use readelf
to obtain some known instruction of the binary. I used /bin/true and the
entry instruction(0x13b0):

$ mount -t overlay overlay -o 
lowerdir=lower,upperdir=upper,workdir=work merge
$ cd /sys/kernel/debug/tracing
$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
$ echo 1 > events/uprobes/true_entry/enable

This returns 'bash: echo: write error: Input/output error' and dmesg
tells us 'event trace: Could not enable event true_entry'

This change makes create_trace_uprobe() look for the real inode of a
dentry. In the case of normal filesystems, this simplifies to just
returning the inode. In the case of overlayfs(and similar fs) we will
obtain the underlying dentry and corresponding inode, upon which uprobes
can successfully register.

Running the example above with the patch applied, we can see that the
uprobe is enabled and will output to trace as expected.

Reviewed-by: Josef Bacik 
Signed-off-by: Howard McLauchlan 
---
 kernel/trace/trace_uprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2014f4351ae0..17c65fa4136d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -446,7 +446,7 @@ static int create_trace_uprobe(int argc, char **argv)
if (ret)
goto fail_address_parse;
 
-   inode = igrab(d_inode(path.dentry));
+   inode = igrab(d_real_inode(path.dentry));
path_put();
 
if (!inode || !S_ISREG(inode->i_mode)) {
-- 
2.17.0



[PATCH v2] bpf: whitelist all syscalls for error injection

2018-03-21 Thread Howard McLauchlan
Error injection is a useful mechanism to fail arbitrary kernel
functions. However, it is often hard to guarantee an error propagates
appropriately to user space programs. By injecting into syscalls, we can
return arbitrary values to user space directly; this increases
flexibility and robustness in testing, allowing us to test user space
error paths effectively.

The following script, for example, fails calls to sys_open() from a
given pid:

from bcc import BPF
from sys import argv

pid = argv[1]

prog = r"""

int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
{
u32 pid = bpf_get_current_pid_tgid();
if (pid == %s)
bpf_override_return(ctx, -ENOMEM);
return 0;
}
""" % pid

b = BPF(text=prog)
while 1:
b.perf_buffer_poll()

This patch whitelists all syscalls defined with SYSCALL_DEFINE and
COMPAT_SYSCALL_DEFINE for error injection. These changes are not
intended to be considered stable, and would normally be configured off.

Signed-off-by: Howard McLauchlan <hmclauch...@fb.com>

---

Dominik,

I've updated the patch to support compat syscalls. Please let me know if there
are additional changes to be made.

Cheers,

V1->V2: added similar mechanism in compat.h, clarify commit
Based on 4.16-rc6

 include/linux/compat.h   | 3 +++
 include/linux/syscalls.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 16c3027074a2..857ddb1cfedf 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -33,6 +33,8 @@
 #endif
 
 #define COMPAT_SYSCALL_DEFINE0(name) \
+   asmlinkage long compat_sys_##name(void); \
+   ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \
asmlinkage long compat_sys_##name(void)
 
 #define COMPAT_SYSCALL_DEFINE1(name, ...) \
@@ -51,6 +53,7 @@
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)   \
asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
__attribute__((alias(__stringify(compat_SyS##name;  \
+   ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO); \
static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));\
asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))\
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..e8c6d63ace78 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 
 #define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0);  \
+   asmlinkage long sys_##sname(void);  \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
asmlinkage long sys_##sname(void)
 
 #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
@@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 #define __SYSCALL_DEFINEx(x, name, ...)
\
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))   \
__attribute__((alias(__stringify(SyS##name; \
+   ALLOW_ERROR_INJECTION(sys##name, ERRNO);\
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))   \
-- 
2.14.1



[PATCH v2] bpf: whitelist all syscalls for error injection

2018-03-21 Thread Howard McLauchlan
Error injection is a useful mechanism to fail arbitrary kernel
functions. However, it is often hard to guarantee an error propagates
appropriately to user space programs. By injecting into syscalls, we can
return arbitrary values to user space directly; this increases
flexibility and robustness in testing, allowing us to test user space
error paths effectively.

The following script, for example, fails calls to sys_open() from a
given pid:

from bcc import BPF
from sys import argv

pid = argv[1]

prog = r"""

int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
{
u32 pid = bpf_get_current_pid_tgid();
if (pid == %s)
bpf_override_return(ctx, -ENOMEM);
return 0;
}
""" % pid

b = BPF(text=prog)
while 1:
b.perf_buffer_poll()

This patch whitelists all syscalls defined with SYSCALL_DEFINE and
COMPAT_SYSCALL_DEFINE for error injection. These changes are not
intended to be considered stable, and would normally be configured off.

Signed-off-by: Howard McLauchlan 

---

Dominik,

I've updated the patch to support compat syscalls. Please let me know if there
are additional changes to be made.

Cheers,

V1->V2: added similar mechanism in compat.h, clarify commit
Based on 4.16-rc6

 include/linux/compat.h   | 3 +++
 include/linux/syscalls.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 16c3027074a2..857ddb1cfedf 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -33,6 +33,8 @@
 #endif
 
 #define COMPAT_SYSCALL_DEFINE0(name) \
+   asmlinkage long compat_sys_##name(void); \
+   ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \
asmlinkage long compat_sys_##name(void)
 
 #define COMPAT_SYSCALL_DEFINE1(name, ...) \
@@ -51,6 +53,7 @@
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)   \
asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
__attribute__((alias(__stringify(compat_SyS##name;  \
+   ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO); \
static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));\
asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))\
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..e8c6d63ace78 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 
 #define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0);  \
+   asmlinkage long sys_##sname(void);  \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
asmlinkage long sys_##sname(void)
 
 #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
@@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 #define __SYSCALL_DEFINEx(x, name, ...)
\
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))   \
__attribute__((alias(__stringify(SyS##name; \
+   ALLOW_ERROR_INJECTION(sys##name, ERRNO);\
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))   \
-- 
2.14.1



Re: [PATCH] bpf: whitelist syscalls for error injection

2018-03-19 Thread Howard McLauchlan

On 03/18/2018 07:13 PM, Andy Lutomirski wrote:

On Sun, Mar 18, 2018 at 6:47 AM, Dominik Brodowski
<li...@dominikbrodowski.net> wrote:

On Fri, Mar 16, 2018 at 03:55:04PM -0700, Howard McLauchlan wrote:

On 03/13/2018 04:56 PM, Andy Lutomirski wrote:

On Tue, Mar 13, 2018 at 11:16 PM, Howard McLauchlan <hmclauch...@fb.com> wrote:

Error injection is a useful mechanism to fail arbitrary kernel
functions. However, it is often hard to guarantee an error propagates
appropriately to user space programs. By injecting into syscalls, we can
return arbitrary values to user space directly; this increases
flexibility and robustness in testing, allowing us to test user space
error paths effectively.


Temporary NAK IMO.  Specifically:


diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..e8c6d63ace78 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)

 #define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0);  \
+   asmlinkage long sys_##sname(void);  \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \


sys_xyz() is not just the syscall itself; it's also a helper that's
used for entirely silly reasons by various bits of kernel code for
quite a few syscalls.  Fortunately, Dominik has patches to fix that,
and Linus is even considering pulling them for 4.16.  This patch will
most likely conflict with the final result of Dominik's series.

Can you and Dominik coordinate a bit to get this patch or its
equivalent landed on top of Dominik's work?  It might make sense for
Dominik to just add this patch to his series so it can land with the
rest of it.  Dominik, Ingo, what do you think?

--Andy



Dominik,

This patch applies cleanly on top of your patch series. Is there anything you'd 
need from me to get this in on top of your work?


Howard,

would this form part of the kernel<->userspace interface and therefore needs
to be kept stable? If so, this patch should wait until the arch-specific
syscall calling convention is agreed upon.

Moreover, the patches I sent out already do not cover all syscalls yet.
Until all in-kernel users of sys_*() are gone (or at least outside arch/),
I'd prefer to postpone this patch.



I was assuming that this ALLOW_ERROR_INJECTION thing is *not*
considered stable ABI.  We should be free to change the way that the
syscall entry code calls syscalls whenever we like.

If you want a stable syscall error injection mechanism, make it work
like seccomp instead, please.



This is not supposed to be considered stable. It's for debug purposes only and 
would normally be configured off.


Howard


Re: [PATCH] bpf: whitelist syscalls for error injection

2018-03-19 Thread Howard McLauchlan

On 03/18/2018 07:13 PM, Andy Lutomirski wrote:

On Sun, Mar 18, 2018 at 6:47 AM, Dominik Brodowski
 wrote:

On Fri, Mar 16, 2018 at 03:55:04PM -0700, Howard McLauchlan wrote:

On 03/13/2018 04:56 PM, Andy Lutomirski wrote:

On Tue, Mar 13, 2018 at 11:16 PM, Howard McLauchlan  wrote:

Error injection is a useful mechanism to fail arbitrary kernel
functions. However, it is often hard to guarantee an error propagates
appropriately to user space programs. By injecting into syscalls, we can
return arbitrary values to user space directly; this increases
flexibility and robustness in testing, allowing us to test user space
error paths effectively.


Temporary NAK IMO.  Specifically:


diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..e8c6d63ace78 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)

 #define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0);  \
+   asmlinkage long sys_##sname(void);  \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \


sys_xyz() is not just the syscall itself; it's also a helper that's
used for entirely silly reasons by various bits of kernel code for
quite a few syscalls.  Fortunately, Dominik has patches to fix that,
and Linus is even considering pulling them for 4.16.  This patch will
most likely conflict with the final result of Dominik's series.

Can you and Dominik coordinate a bit to get this patch or its
equivalent landed on top of Dominik's work?  It might make sense for
Dominik to just add this patch to his series so it can land with the
rest of it.  Dominik, Ingo, what do you think?

--Andy



Dominik,

This patch applies cleanly on top of your patch series. Is there anything you'd 
need from me to get this in on top of your work?


Howard,

would this form part of the kernel<->userspace interface and therefore needs
to be kept stable? If so, this patch should wait until the arch-specific
syscall calling convention is agreed upon.

Moreover, the patches I sent out already do not cover all syscalls yet.
Until all in-kernel users of sys_*() are gone (or at least outside arch/),
I'd prefer to postpone this patch.



I was assuming that this ALLOW_ERROR_INJECTION thing is *not*
considered stable ABI.  We should be free to change the way that the
syscall entry code calls syscalls whenever we like.

If you want a stable syscall error injection mechanism, make it work
like seccomp instead, please.



This is not supposed to be considered stable. It's for debug purposes only and 
would normally be configured off.


Howard


Re: [PATCH] bpf: whitelist syscalls for error injection

2018-03-16 Thread Howard McLauchlan
On 03/13/2018 04:56 PM, Andy Lutomirski wrote:
> On Tue, Mar 13, 2018 at 11:16 PM, Howard McLauchlan <hmclauch...@fb.com> 
> wrote:
>> Error injection is a useful mechanism to fail arbitrary kernel
>> functions. However, it is often hard to guarantee an error propagates
>> appropriately to user space programs. By injecting into syscalls, we can
>> return arbitrary values to user space directly; this increases
>> flexibility and robustness in testing, allowing us to test user space
>> error paths effectively.
> 
> Temporary NAK IMO.  Specifically:
> 
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index a78186d826d7..e8c6d63ace78 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
>> trace_event_call *tp_event)
>>
>>  #define SYSCALL_DEFINE0(sname) \
>> SYSCALL_METADATA(_##sname, 0);  \
>> +   asmlinkage long sys_##sname(void);  \
>> +   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
> 
> sys_xyz() is not just the syscall itself; it's also a helper that's
> used for entirely silly reasons by various bits of kernel code for
> quite a few syscalls.  Fortunately, Dominik has patches to fix that,
> and Linus is even considering pulling them for 4.16.  This patch will
> most likely conflict with the final result of Dominik's series.
> 
> Can you and Dominik coordinate a bit to get this patch or its
> equivalent landed on top of Dominik's work?  It might make sense for
> Dominik to just add this patch to his series so it can land with the
> rest of it.  Dominik, Ingo, what do you think?
> 
> --Andy
> 

Dominik, 

This patch applies cleanly on top of your patch series. Is there anything you'd 
need from me to get this in on top of your work? 

Howard 


Re: [PATCH] bpf: whitelist syscalls for error injection

2018-03-16 Thread Howard McLauchlan
On 03/13/2018 04:56 PM, Andy Lutomirski wrote:
> On Tue, Mar 13, 2018 at 11:16 PM, Howard McLauchlan  
> wrote:
>> Error injection is a useful mechanism to fail arbitrary kernel
>> functions. However, it is often hard to guarantee an error propagates
>> appropriately to user space programs. By injecting into syscalls, we can
>> return arbitrary values to user space directly; this increases
>> flexibility and robustness in testing, allowing us to test user space
>> error paths effectively.
> 
> Temporary NAK IMO.  Specifically:
> 
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index a78186d826d7..e8c6d63ace78 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
>> trace_event_call *tp_event)
>>
>>  #define SYSCALL_DEFINE0(sname) \
>> SYSCALL_METADATA(_##sname, 0);  \
>> +   asmlinkage long sys_##sname(void);  \
>> +   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
> 
> sys_xyz() is not just the syscall itself; it's also a helper that's
> used for entirely silly reasons by various bits of kernel code for
> quite a few syscalls.  Fortunately, Dominik has patches to fix that,
> and Linus is even considering pulling them for 4.16.  This patch will
> most likely conflict with the final result of Dominik's series.
> 
> Can you and Dominik coordinate a bit to get this patch or its
> equivalent landed on top of Dominik's work?  It might make sense for
> Dominik to just add this patch to his series so it can land with the
> rest of it.  Dominik, Ingo, what do you think?
> 
> --Andy
> 

Dominik, 

This patch applies cleanly on top of your patch series. Is there anything you'd 
need from me to get this in on top of your work? 

Howard 


Re: [PATCH] bpf: whitelist syscalls for error injection

2018-03-13 Thread Howard McLauchlan
On 03/13/2018 04:49 PM, Yonghong Song wrote:
> 
> 
> On 3/13/18 4:45 PM, Omar Sandoval wrote:
>> On Tue, Mar 13, 2018 at 04:16:27PM -0700, Howard McLauchlan wrote:
>>> Error injection is a useful mechanism to fail arbitrary kernel
>>> functions. However, it is often hard to guarantee an error propagates
>>> appropriately to user space programs. By injecting into syscalls, we can
>>> return arbitrary values to user space directly; this increases
>>> flexibility and robustness in testing, allowing us to test user space
>>> error paths effectively.
>>>
>>> The following script, for example, fails calls to sys_open() from a
>>> given pid:
>>>
>>> from bcc import BPF
>>> from sys import argv
>>>
>>> pid = argv[1]
>>>
>>> prog = r"""
>>>
>>> int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
>>> {
>>>  u32 pid = bpf_get_current_pid_tgid();
>>>  if (pid == %s)
>>>  bpf_override_return(ctx, -ENOENT);
>>>  return 0;
>>> }
>>> """ % pid
>>>
>>> b = BPF(text = prog)
>>> while 1:
>>>  b.perf_buffer_poll()
>>>
>>> This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
>>> injection.
>>>
>>> Signed-off-by: Howard McLauchlan <hmclauch...@fb.com>
>>> ---
>>> based on 4.16-rc5
>>>   include/linux/syscalls.h | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>> index a78186d826d7..e8c6d63ace78 100644
>>> --- a/include/linux/syscalls.h
>>> +++ b/include/linux/syscalls.h
>>> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
>>> trace_event_call *tp_event)
>>>     #define SYSCALL_DEFINE0(sname)    \
>>>   SYSCALL_METADATA(_##sname, 0);    \
>>> +    asmlinkage long sys_##sname(void);    \
>>> +    ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);    \
>>>   asmlinkage long sys_##sname(void)
> 
> duplication of asmlinkage in the above?
> 
The pre-declaration is necessary to ensure ALLOW_ERROR_INJECTION works 
appropriately. There can be syscalls
that are not pre-declared elsewhere which will fail compilation if not declared 
in this block.
>>>     #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, 
>>> __VA_ARGS__)
>>> @@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct 
>>> trace_event_call *tp_event)
>>>   #define __SYSCALL_DEFINEx(x, name, ...)    \
>>>   asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))    \
>>>   __attribute__((alias(__stringify(SyS##name;    \
>>> +    ALLOW_ERROR_INJECTION(sys##name, ERRNO);    \
>>>   static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));    \
>>>   asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));    \
>>>   asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))    \
>>> -- 
>>> 2.14.1
>>>
>>
>> Adding a few more people to Cc
>>


Re: [PATCH] bpf: whitelist syscalls for error injection

2018-03-13 Thread Howard McLauchlan
On 03/13/2018 04:49 PM, Yonghong Song wrote:
> 
> 
> On 3/13/18 4:45 PM, Omar Sandoval wrote:
>> On Tue, Mar 13, 2018 at 04:16:27PM -0700, Howard McLauchlan wrote:
>>> Error injection is a useful mechanism to fail arbitrary kernel
>>> functions. However, it is often hard to guarantee an error propagates
>>> appropriately to user space programs. By injecting into syscalls, we can
>>> return arbitrary values to user space directly; this increases
>>> flexibility and robustness in testing, allowing us to test user space
>>> error paths effectively.
>>>
>>> The following script, for example, fails calls to sys_open() from a
>>> given pid:
>>>
>>> from bcc import BPF
>>> from sys import argv
>>>
>>> pid = argv[1]
>>>
>>> prog = r"""
>>>
>>> int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
>>> {
>>>  u32 pid = bpf_get_current_pid_tgid();
>>>  if (pid == %s)
>>>  bpf_override_return(ctx, -ENOENT);
>>>  return 0;
>>> }
>>> """ % pid
>>>
>>> b = BPF(text = prog)
>>> while 1:
>>>  b.perf_buffer_poll()
>>>
>>> This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
>>> injection.
>>>
>>> Signed-off-by: Howard McLauchlan 
>>> ---
>>> based on 4.16-rc5
>>>   include/linux/syscalls.h | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>> index a78186d826d7..e8c6d63ace78 100644
>>> --- a/include/linux/syscalls.h
>>> +++ b/include/linux/syscalls.h
>>> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
>>> trace_event_call *tp_event)
>>>     #define SYSCALL_DEFINE0(sname)    \
>>>   SYSCALL_METADATA(_##sname, 0);    \
>>> +    asmlinkage long sys_##sname(void);    \
>>> +    ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);    \
>>>   asmlinkage long sys_##sname(void)
> 
> duplication of asmlinkage in the above?
> 
The pre-declaration is necessary to ensure ALLOW_ERROR_INJECTION works 
appropriately. There can be syscalls
that are not pre-declared elsewhere which will fail compilation if not declared 
in this block.
>>>     #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, 
>>> __VA_ARGS__)
>>> @@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct 
>>> trace_event_call *tp_event)
>>>   #define __SYSCALL_DEFINEx(x, name, ...)    \
>>>   asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))    \
>>>   __attribute__((alias(__stringify(SyS##name;    \
>>> +    ALLOW_ERROR_INJECTION(sys##name, ERRNO);    \
>>>   static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));    \
>>>   asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));    \
>>>   asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))    \
>>> -- 
>>> 2.14.1
>>>
>>
>> Adding a few more people to Cc
>>


[PATCH] bpf: whitelist syscalls for error injection

2018-03-13 Thread Howard McLauchlan
Error injection is a useful mechanism to fail arbitrary kernel
functions. However, it is often hard to guarantee an error propagates
appropriately to user space programs. By injecting into syscalls, we can
return arbitrary values to user space directly; this increases
flexibility and robustness in testing, allowing us to test user space
error paths effectively.

The following script, for example, fails calls to sys_open() from a
given pid:

from bcc import BPF
from sys import argv

pid = argv[1]

prog = r"""

int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
{
u32 pid = bpf_get_current_pid_tgid();
if (pid == %s)
bpf_override_return(ctx, -ENOENT);
return 0;
}
""" % pid

b = BPF(text = prog)
while 1:
b.perf_buffer_poll()

This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
injection.

Signed-off-by: Howard McLauchlan <hmclauch...@fb.com>
---
based on 4.16-rc5
 include/linux/syscalls.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..e8c6d63ace78 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 
 #define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0);  \
+   asmlinkage long sys_##sname(void);  \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
asmlinkage long sys_##sname(void)
 
 #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
@@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 #define __SYSCALL_DEFINEx(x, name, ...)
\
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))   \
__attribute__((alias(__stringify(SyS##name; \
+   ALLOW_ERROR_INJECTION(sys##name, ERRNO);\
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))   \
-- 
2.14.1



[PATCH] bpf: whitelist syscalls for error injection

2018-03-13 Thread Howard McLauchlan
Error injection is a useful mechanism to fail arbitrary kernel
functions. However, it is often hard to guarantee an error propagates
appropriately to user space programs. By injecting into syscalls, we can
return arbitrary values to user space directly; this increases
flexibility and robustness in testing, allowing us to test user space
error paths effectively.

The following script, for example, fails calls to sys_open() from a
given pid:

from bcc import BPF
from sys import argv

pid = argv[1]

prog = r"""

int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
{
u32 pid = bpf_get_current_pid_tgid();
if (pid == %s)
bpf_override_return(ctx, -ENOENT);
return 0;
}
""" % pid

b = BPF(text = prog)
while 1:
b.perf_buffer_poll()

This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
injection.

Signed-off-by: Howard McLauchlan 
---
based on 4.16-rc5
 include/linux/syscalls.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a78186d826d7..e8c6d63ace78 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 
 #define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0);  \
+   asmlinkage long sys_##sname(void);  \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
asmlinkage long sys_##sname(void)
 
 #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
@@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct 
trace_event_call *tp_event)
 #define __SYSCALL_DEFINEx(x, name, ...)
\
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))   \
__attribute__((alias(__stringify(SyS##name; \
+   ALLOW_ERROR_INJECTION(sys##name, ERRNO);\
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));  \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))   \
-- 
2.14.1



[PATCH] uprobe: add support for overlayfs

2018-02-27 Thread Howard McLauchlan
uprobes cannot successfully attach to binaries located in a directory
mounted with overlayfs.

To verify, create directories for mounting overlayfs
(upper,lower,work,merge), move some binary into merge/ and use readelf
to obtain some known instruction of the binary. I used /bin/true and the
entry instruction(0x13b0):

$ mount -t overlay overlay -o 
lowerdir=lower,upperdir=upper,workdir=work merge
$ cd /sys/kernel/debug/tracing
$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
$ echo 1 > events/uprobes/true_entry/enable

This returns 'bash: echo: write error: Input/output error' and dmesg
tells us 'event trace: Could not enable event true_entry'

This change makes create_trace_uprobe() look for the real inode of a
dentry. In the case of normal filesystems, this simplifies to just
returning the inode. In the case of overlayfs(and similar fs) we will
obtain the underlying dentry and corresponding inode, upon which uprobes
can successfully register.

Running the example above with the patch applied, we can see that the
uprobe is enabled and will output to trace as expected.

Signed-off-by: Howard McLauchlan <hmclauch...@fb.com>
---
 kernel/trace/trace_uprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 268029ae1be6..8b86d76c55ee 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -446,7 +446,7 @@ static int create_trace_uprobe(int argc, char **argv)
if (ret)
goto fail_address_parse;
 
-   inode = igrab(d_inode(path.dentry));
+   inode = igrab(d_real_inode(path.dentry));
path_put();
 
if (!inode || !S_ISREG(inode->i_mode)) {
-- 
2.14.1



[PATCH] uprobe: add support for overlayfs

2018-02-27 Thread Howard McLauchlan
uprobes cannot successfully attach to binaries located in a directory
mounted with overlayfs.

To verify, create directories for mounting overlayfs
(upper,lower,work,merge), move some binary into merge/ and use readelf
to obtain some known instruction of the binary. I used /bin/true and the
entry instruction(0x13b0):

$ mount -t overlay overlay -o 
lowerdir=lower,upperdir=upper,workdir=work merge
$ cd /sys/kernel/debug/tracing
$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
$ echo 1 > events/uprobes/true_entry/enable

This returns 'bash: echo: write error: Input/output error' and dmesg
tells us 'event trace: Could not enable event true_entry'

This change makes create_trace_uprobe() look for the real inode of a
dentry. In the case of normal filesystems, this simplifies to just
returning the inode. In the case of overlayfs(and similar fs) we will
obtain the underlying dentry and corresponding inode, upon which uprobes
can successfully register.

Running the example above with the patch applied, we can see that the
uprobe is enabled and will output to trace as expected.

Signed-off-by: Howard McLauchlan 
---
 kernel/trace/trace_uprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 268029ae1be6..8b86d76c55ee 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -446,7 +446,7 @@ static int create_trace_uprobe(int argc, char **argv)
if (ret)
goto fail_address_parse;
 
-   inode = igrab(d_inode(path.dentry));
+   inode = igrab(d_real_inode(path.dentry));
path_put();
 
if (!inode || !S_ISREG(inode->i_mode)) {
-- 
2.14.1