Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)
Hello Remy, On 12/20/07, Remy Bohmer [EMAIL PROTECTED] wrote: So, Is this a serious requirement? Should this be possible? I have noticed this problem: [EMAIL PROTECTED]:~# cat /proc/loadavgrt 1.00 1.00 1.00 0/52 1158 [EMAIL PROTECTED]:~# cat /proc/loadavg 0.00 0.00 0.02 1/52 1159 [EMAIL PROTECTED]:~# So I am curious, if possible, user can switch softirq-threads or IRQs RT tasks to non-RT tasks for slow hardware or least important hardware for NON-RT tasks. So this will improve RT behaviour. By default, all softirq-threads or IRQs will be RT task but if there is some option user can switch it to NON-RT task then it will be good. Thank you, Jaswinder Singh. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 7/8] debugfs: allow access to signed values
debugfs: allow access to signed values Add debugfs_create_s{8,16,32,64}. For these to work properly, we need to remove a cast in libfs, change the simple_attr_open prototype and thus fix the users as well. Cc: Johannes Berg [EMAIL PROTECTED] Cc: Mattias Nissler [EMAIL PROTECTED] To: Greg Kroah-Hartman [EMAIL PROTECTED] To: Arnd Bergmann [EMAIL PROTECTED] To: Akinobu Mita [EMAIL PROTECTED] Signed-off-by: Stefano Brivio [EMAIL PROTECTED] --- This version addresses last Johannes' concerns. Please just discard v4. --- arch/powerpc/platforms/cell/spufs/file.c | 48 fs/debugfs/file.c| 186 +-- fs/libfs.c | 21 +-- include/linux/debugfs.h | 38 ++ include/linux/fs.h | 25 ++-- lib/fault-inject.c | 11 + 6 files changed, 267 insertions(+), 62 deletions(-) Index: wireless-2.6/fs/debugfs/file.c === --- wireless-2.6.orig/fs/debugfs/file.c +++ wireless-2.6/fs/debugfs/file.c @@ -56,11 +56,11 @@ const struct inode_operations debugfs_li .follow_link= debugfs_follow_link, }; -static void debugfs_u8_set(void *data, u64 val) +static void debugfs_u8_set(void *data, unsigned long long val) { *(u8 *)data = val; } -static u64 debugfs_u8_get(void *data) +static unsigned long long debugfs_u8_get(void *data) { return *(u8 *)data; } @@ -97,11 +97,11 @@ struct dentry *debugfs_create_u8(const c } EXPORT_SYMBOL_GPL(debugfs_create_u8); -static void debugfs_u16_set(void *data, u64 val) +static void debugfs_u16_set(void *data, unsigned long long val) { *(u16 *)data = val; } -static u64 debugfs_u16_get(void *data) +static unsigned long long debugfs_u16_get(void *data) { return *(u16 *)data; } @@ -138,11 +138,11 @@ struct dentry *debugfs_create_u16(const } EXPORT_SYMBOL_GPL(debugfs_create_u16); -static void debugfs_u32_set(void *data, u64 val) +static void debugfs_u32_set(void *data, unsigned long long val) { *(u32 *)data = val; } -static u64 debugfs_u32_get(void *data) +static unsigned long long debugfs_u32_get(void *data) { return *(u32 *)data; } @@ -179,12 +179,12 @@ struct dentry *debugfs_create_u32(const } EXPORT_SYMBOL_GPL(debugfs_create_u32); -static void debugfs_u64_set(void *data, u64 val) +static void debugfs_u64_set(void *data, unsigned long long val) { *(u64 *)data = val; } -static u64 debugfs_u64_get(void *data) +static unsigned long long debugfs_u64_get(void *data) { return *(u64 *)data; } @@ -221,6 +221,172 @@ struct dentry *debugfs_create_u64(const } EXPORT_SYMBOL_GPL(debugfs_create_u64); + +static void debugfs_s8_set(void *data, unsigned long long val) +{ + *(s8 *)data = val; +} +static unsigned long long debugfs_s8_get(void *data) +{ + return *(s8 *)data; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_s8, debugfs_s8_get, debugfs_s8_set, %lld\n); + +/** + * debugfs_create_s8 - create a debugfs file that is used to read and write a signed 8-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + * + * This function creates a file in debugfs with the given name that + * contains the value of the variable @value. If the @mode variable is so + * set, it can be read from, and written to. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the debugfs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If debugfs is not enabled in the kernel, the value -%ENODEV will be + * returned. It is not wise to check for this value, but rather, check for + * %NULL or !%NULL instead as to eliminate the need for #ifdef in the calling + * code. + */ +struct dentry *debugfs_create_s8(const char *name, mode_t mode, +struct dentry *parent, s8 *value) +{ + return debugfs_create_file(name, mode, parent, value, fops_s8); +} +EXPORT_SYMBOL_GPL(debugfs_create_s8); + +static void debugfs_s16_set(void *data, unsigned long long val) +{ + *(s16 *)data = val; +} +static unsigned long long debugfs_s16_get(void *data) +{ + return *(s16 *)data; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_s16, debugfs_s16_get, debugfs_s16_set, %lld\n); + +/** + * debugfs_create_s16 - create a debugfs file that is used to read and write a signed 16-bit value + * @name: a pointer to a string
Re: [PATCH v4 7/8] debugfs: allow access to signed values
On Thursday 20 December 2007, Stefano Brivio wrote: debugfs: allow access to signed values Add debugfs_create_s{8,16,32,64}. For these to work properly, we need to remove a cast in libfs, change the simple_attr_open prototype and thus fix the users as well. Cc: Johannes Berg [EMAIL PROTECTED] Cc: Mattias Nissler [EMAIL PROTECTED] To: Greg Kroah-Hartman [EMAIL PROTECTED] To: Arnd Bergmann [EMAIL PROTECTED] To: Akinobu Mita [EMAIL PROTECTED] Signed-off-by: Stefano Brivio [EMAIL PROTECTED] Have you checked that spufs still builds? I would guess that you need to do the same interface changes there. Also, Christoph has recently posted a suggestion for how to improve the interface to allow the 'get' operation to return an error: http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=14962 I'd suggest consolidating the two changes in order to avoid merge conflicts. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)
By default, all softirq-threads or IRQs will be RT task but if there is some option user can switch it to NON-RT task then it will be good. Okay, understood. I will think about how to add that too. But, probably we can add it seperately from this patchset. Kind Regards, Remy 2007/12/20, Jaswinder Singh [EMAIL PROTECTED]: Hello Remy, On 12/20/07, Remy Bohmer [EMAIL PROTECTED] wrote: So, Is this a serious requirement? Should this be possible? I have noticed this problem: [EMAIL PROTECTED]:~# cat /proc/loadavgrt 1.00 1.00 1.00 0/52 1158 [EMAIL PROTECTED]:~# cat /proc/loadavg 0.00 0.00 0.02 1/52 1159 [EMAIL PROTECTED]:~# So I am curious, if possible, user can switch softirq-threads or IRQs RT tasks to non-RT tasks for slow hardware or least important hardware for NON-RT tasks. So this will improve RT behaviour. By default, all softirq-threads or IRQs will be RT task but if there is some option user can switch it to NON-RT task then it will be good. Thank you, Jaswinder Singh. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] udf: fix sparse warnings (shadowing mismatch between declaration and definition)
On Wed 19-12-07 19:35:14, Marcin Slusarz wrote: On Mon, Dec 17, 2007 at 05:50:17PM +0100, Jan Kara wrote: fix warnings: fs/udf/super.c:1320:24: warning: symbol 'bh' shadows an earlier one fs/udf/super.c:1240:21: originally declared here fs/udf/super.c:1583:4: warning: symbol 'i' shadows an earlier one fs/udf/super.c:1418:6: originally declared here fs/udf/super.c:1585:4: warning: symbol 'i' shadows an earlier one fs/udf/super.c:1418:6: originally declared here fs/udf/super.c:1658:4: warning: symbol 'i' shadows an earlier one fs/udf/super.c:1648:6: originally declared here fs/udf/super.c:1660:4: warning: symbol 'i' shadows an earlier one fs/udf/super.c:1648:6: originally declared here fs/udf/super.c:450:6: warning: symbol 'udf_write_super' was not declared. Should it be static? Signed-off-by: Marcin Slusarz [EMAIL PROTECTED] CC: Ben Fennema [EMAIL PROTECTED] Thanks for the patch. The 'bh' change is fine. The problems with 'i' should be solved differently I think. Those functions UDF_SB_FREE, UDF_SB_ALLOC_PARTMAPS should be functions and not macros. Please convert those to either inline functions if they are small or to regular functions if they are larger. It won't be completely trivial because of the hackery e.g. in UDF_SB_ALLOC_BITMAP. It gets an argument meaning on which struct member something should be performed. But for example in the UDF_SB_ALLOC_BITMAP case you can simply make the function return the pointer to allocated and initialized space and the caller would assign it to a proper element of the superblock. Ok, I'll try to do it. Thanks. This would help the overall code quality of UDF (which is sadly quite poor). If you have other suggestions how to clean up this code, let me know. I'll see what I can do with them ;) Hmm, it's hard to formulate precisely. It's nothing particular but all in all the code is simply hard to read - all those strange names of variables, unusual sideeffects of functions, ... But one specific problem I'm aware of is the lack of error handling so in case of IO error or filesystem corruption results are quite spectacular (kernel crash, etc.). Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall [EMAIL PROTECTED] Date: Mon, 17 Dec 2007 08:55:54 -0600 On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: Actually, you may only need these two: maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch Yes these two were enough, and exporting fs/proc/base.c's mem_lseek(). As hard as I try, I can't reproduce this at all. I tried both on my workstation and my niagara boxes. It must be other needle in the 30MB+ -mm haystack. :-( -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OOPS: 2.6.24-rc5-mm1 -- EIP is at r_show+0x2a/0x70 -- (triggered by cat /proc/iomem)
On Dec 20, 2007 6:37 AM, David Howells [EMAIL PROTECTED] wrote: Andrew Morton [EMAIL PROTECTED] wrote: I would be suspecting iget-stop-procfs-from-using-iget-and-read_inode.patch. I think your suspicions are very unlikely. The patch only affects proc_get_inode() - and looking at the patch backtrace, it looks like the system is successfully past that already (it's unlikely that proc_reg_read+0x60/0x74 would have been reached otherwise). If my patch to procfs is wrong, it would affect all proc files and ought be immediately detectable. I tested the patch Andrew sent. I ran cat /proc/iomem before trying a suspend-to-disk. It worked fine. Then I suspended and resumed. This time, cat /proc/iomem caused the stack trace to be generated. So, you are right, you patch is not the problem. Thanks, Miles -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 00/43] user_regset framework -- arch maintainers take note!
* Roland McGrath [EMAIL PROTECTED] wrote: This is a large series of patches, but there are only a couple that you need to read in detail to know how to get started on cleaning up your arch code (1, 4, 6). user_regset is a new kernel-internal interface into the arch code for accessing the user-space view of machine-specific state (registers et al--everything machine-specific that is visible via ptrace and the like, or should be). The idea is that arch code will have just one place it has to support fetching and changing the user-visible machine state of a user thread. This same interface can be used for writing core dumps, to underlie the implementation of PTRACE_GETREGS, PTRACE_SETREGS, and the like, and by any new set of debugging facilities that might come along. [...] Patches 26 through 43 affect only arch/x86 code. I have not CC'd these ones to linux-arch. They include a bunch of cleanup that is specific to the idiosyncracies of the x86 code and isn't interesting as an example for what another arch would do. thanks Roland - this is a really impressive set of cleanups generalizations! Testing feedback: i've put the x86 and core bits into x86.git and your regset series has so far successfully passed a couple of hundred iterations of random-qa on 32-bit and 64-bit x86 as well. (with a few ptrace tests added to the mix as well) So it's all green as far as arch/x86 and core goes :-) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rfc][patch] mm: madvise(WILLNEED) for anonymous memory
Hi, Lennart asked for madvise(WILLNEED) to work on anonymous pages, he plans to use this to pre-fault pages. He currently uses: mlock/munlock for this purpose. [ compile tested only ] Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- diff --git a/mm/madvise.c b/mm/madvise.c index 93ee375..eff60ce 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -100,6 +100,24 @@ out: return error; } +static long madvice_willneed_anon(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + int ret, len; + + *prev = vma; + if (end vma-vm_end) + end = vma-vm_end; + + len = end - start; + ret = get_user_pages(current, current-mm, start, len, + 0, 0, NULL, NULL); + if (ret 0) + return ret; + return ret == len ? 0 : -1; +} + /* * Schedule all required I/O operations. Do not wait for completion. */ @@ -110,7 +128,7 @@ static long madvise_willneed(struct vm_area_struct * vma, struct file *file = vma-vm_file; if (!file) - return -EBADF; + return madvice_willneed_anon(vma, prev, start, end); if (file-f_mapping-a_ops-get_xip_page) { /* no bad return value, but ignore advice */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4] add task handling notifier
With more and more sub-systems/sub-components leaving their footprint in task handling functions, it seems reasonable to add notifiers that these components can use instead of having them all patch themselves directly into core files. Patch 1 introduces the base definitions and hooks for task creation and deletion. Patch 2 switches delayacct to make use of the notifier. Patch 3 makes the procevents/connector use the infrastructure and adds additional notifiers needed there. Patch 4 makes the security keys handling use this, too. Signed-off-by: Jan Beulich [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] add task handling notifier: base definitions
This is the base patch, adding notification for task creation and deletion. Signed-off-by: Jan Beulich [EMAIL PROTECTED] --- include/linux/sched.h |8 +++- kernel/fork.c | 11 +++ 2 files changed, 18 insertions(+), 1 deletion(-) --- 2.6.24-rc5-notify-task.orig/include/linux/sched.h +++ 2.6.24-rc5-notify-task/include/linux/sched.h @@ -80,7 +80,7 @@ struct sched_param { #include linux/rcupdate.h #include linux/futex.h #include linux/rtmutex.h - +#include linux/notifier.h #include linux/time.h #include linux/param.h #include linux/resource.h @@ -1700,6 +1700,12 @@ extern int do_execve(char *, char __user extern long do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned long, int __user *, int __user *); struct task_struct *fork_idle(int); +#define TASK_NEW 1 +#define TASK_DELETE 2 + +extern struct blocking_notifier_head task_notifier_list; +extern struct atomic_notifier_head atomic_task_notifier_list; + extern void set_task_comm(struct task_struct *tsk, char *from); extern void get_task_comm(char *to, struct task_struct *tsk); --- 2.6.24-rc5-notify-task.orig/kernel/fork.c +++ 2.6.24-rc5-notify-task/kernel/fork.c @@ -46,6 +46,7 @@ #include linux/tsacct_kern.h #include linux/cn_proc.h #include linux/freezer.h +#include linux/notifier.h #include linux/delayacct.h #include linux/taskstats_kern.h #include linux/random.h @@ -71,6 +72,11 @@ DEFINE_PER_CPU(unsigned long, process_co __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */ +BLOCKING_NOTIFIER_HEAD(task_notifier_list); +EXPORT_SYMBOL_GPL(task_notifier_list); +ATOMIC_NOTIFIER_HEAD(atomic_task_notifier_list); +EXPORT_SYMBOL_GPL(atomic_task_notifier_list); + int nr_processes(void) { int cpu; @@ -121,6 +127,9 @@ void __put_task_struct(struct task_struc WARN_ON(atomic_read(tsk-usage)); WARN_ON(tsk == current); + atomic_notifier_call_chain(atomic_task_notifier_list, + TASK_DELETE, tsk); + security_task_free(tsk); free_uid(tsk-user); put_group_info(tsk-group_info); @@ -1450,6 +1459,8 @@ long do_fork(unsigned long clone_flags, set_tsk_thread_flag(p, TIF_SIGPENDING); } + blocking_notifier_call_chain(task_notifier_list, TASK_NEW, p); + if (!(clone_flags CLONE_STOPPED)) wake_up_new_task(p, clone_flags); else -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] add task handling notifier: connector based proc events
This has the additional benefit of allowing the code to now be built as a module (which made it necessary to add MODULE_xxx declarations). Signed-off-by: Jan Beulich [EMAIL PROTECTED] Cc: Matt Helsley [EMAIL PROTECTED] --- drivers/connector/Kconfig |5 +-- drivers/connector/cn_proc.c | 57 fs/exec.c |5 ++- include/linux/cn_proc.h | 21 include/linux/sched.h |4 +++ kernel/exit.c |5 ++- kernel/fork.c |2 - kernel/sys.c| 27 +--- 8 files changed, 83 insertions(+), 43 deletions(-) --- 2.6.24-rc5-notify-task.orig/drivers/connector/Kconfig +++ 2.6.24-rc5-notify-task/drivers/connector/Kconfig @@ -12,9 +12,8 @@ menuconfig CONNECTOR if CONNECTOR config PROC_EVENTS - boolean Report process events to userspace - depends on CONNECTOR=y - default y + tristate Report process events to userspace + default CONNECTOR ---help--- Provide a connector that reports process events to userspace. Send events such as fork, exec, id change (uid, gid, suid, etc), and exit. --- 2.6.24-rc5-notify-task.orig/drivers/connector/cn_proc.c +++ 2.6.24-rc5-notify-task/drivers/connector/cn_proc.c @@ -26,12 +26,17 @@ #include linux/kernel.h #include linux/ktime.h #include linux/init.h +#include linux/notifier.h #include linux/connector.h #include asm/atomic.h #include asm/unaligned.h #include linux/cn_proc.h +MODULE_LICENSE(GPL); +MODULE_AUTHOR(Matt Helsley [EMAIL PROTECTED]); +MODULE_DESCRIPTION(Process events - userspace connector); + #define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event)) static atomic_t proc_event_num_listeners = ATOMIC_INIT(0); @@ -47,7 +52,7 @@ static inline void get_seq(__u32 *ts, in put_cpu_var(proc_event_counts); } -void proc_fork_connector(struct task_struct *task) +static void proc_fork_connector(struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; @@ -75,7 +80,7 @@ void proc_fork_connector(struct task_str cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } -void proc_exec_connector(struct task_struct *task) +static void proc_exec_connector(struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; @@ -100,7 +105,7 @@ void proc_exec_connector(struct task_str cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } -void proc_id_connector(struct task_struct *task, int which_id) +static void proc_id_connector(struct task_struct *task, int which_id) { struct cn_msg *msg; struct proc_event *ev; @@ -133,7 +138,7 @@ void proc_id_connector(struct task_struc cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); } -void proc_exit_connector(struct task_struct *task) +static void proc_exit_connector(struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; @@ -220,6 +225,35 @@ static void cn_proc_mcast_ctl(void *data cn_proc_ack(err, msg-seq, msg-ack); } +static int connector_task_notifier(struct notifier_block *nb, + unsigned long action, + void *task) +{ + switch (action) { + case TASK_NEW: + proc_fork_connector(task); + break; + case TASK_EXEC: + proc_exec_connector(task); + break; + case TASK_EXIT: + proc_exit_connector(task); + break; + case TASK_UID_CHANGE: + proc_id_connector(task, PROC_EVENT_UID); + break; + case TASK_GID_CHANGE: + proc_id_connector(task, PROC_EVENT_GID); + break; + } + + return NOTIFY_DONE; +} + +static struct notifier_block connector_task_notifier_block = { + .notifier_call = connector_task_notifier +}; + /* * cn_proc_init - initialization entry point * @@ -234,7 +268,22 @@ static int __init cn_proc_init(void) printk(KERN_WARNING cn_proc failed to register\n); return err; } + blocking_notifier_chain_register(task_notifier_list, +connector_task_notifier_block); return 0; } +/* + * cn_proc_exit - unload entry point + * + * Removes the connector callback from the connector driver. + */ +static void __exit cn_proc_exit(void) +{ + blocking_notifier_chain_unregister(task_notifier_list, + connector_task_notifier_block); + cn_del_callback(cn_proc_event_id); +} + module_init(cn_proc_init); +module_exit(cn_proc_exit); --- 2.6.24-rc5-notify-task.orig/fs/exec.c +++ 2.6.24-rc5-notify-task/fs/exec.c @@ -49,7 +49,7 @@ #include linux/syscalls.h #include linux/rmap.h #include linux/tsacct_kern.h -#include linux/cn_proc.h +#include linux/notifier.h #include linux/audit.h #include
[PATCH 4/4] add task handling notifier: security keys
Signed-off-by: Jan Beulich [EMAIL PROTECTED] Cc: David Howells [EMAIL PROTECTED] --- arch/mips/kernel/kspd.c |7 +++-- include/linux/key.h |4 --- kernel/sys.c |8 -- security/keys/process_keys.c | 55 ++- 4 files changed, 39 insertions(+), 35 deletions(-) --- 2.6.24-rc5-notify-task.orig/arch/mips/kernel/kspd.c +++ 2.6.24-rc5-notify-task/arch/mips/kernel/kspd.c @@ -25,6 +25,7 @@ #include linux/workqueue.h #include linux/errno.h #include linux/list.h +#include linux/notifier.h #include asm/vpe.h #include asm/rtlx.h @@ -177,8 +178,10 @@ static void sp_setfsuidgid( uid_t uid, g current-fsuid = uid; current-fsgid = gid; - key_fsuid_changed(current); - key_fsgid_changed(current); + blocking_notifier_call_chain(task_notifier_list, +TASK_UID_CHANGE, current); + blocking_notifier_call_chain(task_notifier_list, +TASK_GID_CHANGE, current); } /* --- 2.6.24-rc5-notify-task.orig/include/linux/key.h +++ 2.6.24-rc5-notify-task/include/linux/key.h @@ -272,8 +272,6 @@ extern void exit_keys(struct task_struct extern void exit_thread_group_keys(struct signal_struct *tg); extern int suid_keys(struct task_struct *tsk); extern int exec_keys(struct task_struct *tsk); -extern void key_fsuid_changed(struct task_struct *tsk); -extern void key_fsgid_changed(struct task_struct *tsk); extern void key_init(void); #define __install_session_keyring(tsk, keyring)\ @@ -302,8 +300,6 @@ extern void key_init(void); #define exit_thread_group_keys(tg) do { } while(0) #define suid_keys(t) do { } while(0) #define exec_keys(t) do { } while(0) -#define key_fsuid_changed(t) do { } while(0) -#define key_fsgid_changed(t) do { } while(0) #define key_init() do { } while(0) /* Initial keyrings */ --- 2.6.24-rc5-notify-task.orig/kernel/sys.c +++ 2.6.24-rc5-notify-task/kernel/sys.c @@ -517,7 +517,6 @@ asmlinkage long sys_setregid(gid_t rgid, current-fsgid = new_egid; current-egid = new_egid; current-gid = new_rgid; - key_fsgid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_GID_CHANGE, current); @@ -554,7 +553,6 @@ asmlinkage long sys_setgid(gid_t gid) else return -EPERM; - key_fsgid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_GID_CHANGE, current); @@ -644,7 +642,6 @@ asmlinkage long sys_setreuid(uid_t ruid, current-suid = current-euid; current-fsuid = current-euid; - key_fsuid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_UID_CHANGE, current); @@ -692,7 +689,6 @@ asmlinkage long sys_setuid(uid_t uid) current-fsuid = current-euid = uid; current-suid = new_suid; - key_fsuid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_UID_CHANGE, current); @@ -741,7 +737,6 @@ asmlinkage long sys_setresuid(uid_t ruid if (suid != (uid_t) -1) current-suid = suid; - key_fsuid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_UID_CHANGE, current); @@ -794,7 +789,6 @@ asmlinkage long sys_setresgid(gid_t rgid if (sgid != (gid_t) -1) current-sgid = sgid; - key_fsgid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_GID_CHANGE, current); return 0; @@ -836,7 +830,6 @@ asmlinkage long sys_setfsuid(uid_t uid) current-fsuid = uid; } - key_fsuid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_UID_CHANGE, current); @@ -864,7 +857,6 @@ asmlinkage long sys_setfsgid(gid_t gid) smp_wmb(); } current-fsgid = gid; - key_fsgid_changed(current); blocking_notifier_call_chain(task_notifier_list, TASK_GID_CHANGE, current); } --- 2.6.24-rc5-notify-task.orig/security/keys/process_keys.c +++ 2.6.24-rc5-notify-task/security/keys/process_keys.c @@ -16,6 +16,7 @@ #include linux/keyctl.h #include linux/fs.h #include linux/err.h +#include linux/notifier.h #include linux/mutex.h #include asm/uaccess.h #include internal.h @@ -354,35 +355,47 @@ int suid_keys(struct task_struct *tsk) } /* end suid_keys() */ -/*/ -/* - * the filesystem user ID changed - */
[patch 1/5] x86, ptrace: rlimit BTS buffer allocation
Check the rlimit of the tracing task for total and locked memory when allocating the BTS buffer. Signed-off-by: Markus Metzger [EMAIL PROTECTED] --- Index: linux-2.6-x86/arch/x86/kernel/ptrace.c === --- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:51:21.%N +0100 +++ linux-2.6-x86/arch/x86/kernel/ptrace.c 2007-12-20 13:51:45.%N +0100 @@ -620,12 +620,80 @@ return i; } +static int ptrace_bts_realloc(struct task_struct *child, + int size, int reduce_size) +{ + unsigned long rlim, vm; + int ret, old_size; + + if (size 0) + return -EINVAL; + + old_size = ds_get_bts_size((void *)child-thread.ds_area_msr); + if (old_size 0) + return old_size; + + ret = ds_free((void **)child-thread.ds_area_msr); + if (ret 0) + goto out; + + size = PAGE_SHIFT; + old_size = PAGE_SHIFT; + + current-mm-total_vm -= old_size; + current-mm-locked_vm -= old_size; + + if (size == 0) + goto out; + + rlim = current-signal-rlim[RLIMIT_AS].rlim_cur PAGE_SHIFT; + vm = current-mm-total_vm + size; + if (rlim vm) { + ret = -ENOMEM; + + if (!reduce_size) + goto out; + + size = rlim - current-mm-total_vm; + if (size = 0) + goto out; + } + + rlim = current-signal-rlim[RLIMIT_MEMLOCK].rlim_cur PAGE_SHIFT; + vm = current-mm-locked_vm + size; + if (rlim vm) { + ret = -ENOMEM; + + if (!reduce_size) + goto out; + + size = rlim - current-mm-locked_vm; + if (size = 0) + goto out; + } + + ret = ds_allocate((void **)child-thread.ds_area_msr, + size PAGE_SHIFT); + if (ret 0) + goto out; + + current-mm-total_vm += size; + current-mm-locked_vm += size; + +out: + if (child-thread.ds_area_msr) + set_tsk_thread_flag(child, TIF_DS_AREA_MSR); + else + clear_tsk_thread_flag(child, TIF_DS_AREA_MSR); + + return ret; +} + static int ptrace_bts_config(struct task_struct *child, const struct ptrace_bts_config __user *ucfg) { struct ptrace_bts_config cfg; - unsigned long debugctl_mask; - int bts_size, ret; + int bts_size, ret = 0; void *ds; if (copy_from_user(cfg, ucfg, sizeof(cfg))) @@ -638,59 +706,46 @@ if (bts_size 0) return bts_size; } + cfg.size = PAGE_ALIGN(cfg.size); if (bts_size != cfg.size) { - ret = ds_free((void **)child-thread.ds_area_msr); + ret = ptrace_bts_realloc(child, cfg.size, +cfg.flags PTRACE_BTS_O_CUT_SIZE); if (ret 0) - return ret; + goto errout; - if (cfg.size 0) - ret = ds_allocate((void **)child-thread.ds_area_msr, - cfg.size); ds = (void *)child-thread.ds_area_msr; - if (ds) - set_tsk_thread_flag(child, TIF_DS_AREA_MSR); - else - clear_tsk_thread_flag(child, TIF_DS_AREA_MSR); - - if (ret 0) - return ret; - - bts_size = ds_get_bts_size(ds); - if (bts_size = 0) - return bts_size; } - if (ds) { - if (cfg.flags PTRACE_BTS_O_SIGNAL) { - ret = ds_set_overflow(ds, DS_O_SIGNAL); - } else { - ret = ds_set_overflow(ds, DS_O_WRAP); - } - if (ret 0) - return ret; - } - - debugctl_mask = ds_debugctl_mask(); - if (ds (cfg.flags PTRACE_BTS_O_TRACE)) { - child-thread.debugctlmsr |= debugctl_mask; - set_tsk_thread_flag(child, TIF_DEBUGCTLMSR); - } else { - /* there is no way for us to check whether we 'own' -* the respective bits in the DEBUGCTL MSR, we're -* about to clear */ - child-thread.debugctlmsr = ~debugctl_mask; + if (cfg.flags PTRACE_BTS_O_SIGNAL) + ret = ds_set_overflow(ds, DS_O_SIGNAL); + else + ret = ds_set_overflow(ds, DS_O_WRAP); + if (ret 0) + goto errout; - if (!child-thread.debugctlmsr) - clear_tsk_thread_flag(child, TIF_DEBUGCTLMSR); - } + if (cfg.flags PTRACE_BTS_O_TRACE) + child-thread.debugctlmsr |= ds_debugctl_mask(); + else +
[patch 2/5] x86, ptrace: support 32bit-cross-64bit BTS recording
Support BTS recording of 32bit and 64bit tasks from 32bit or 64bit tasks. Signed-off-by: Markus Metzger [EMAIL PROTECTED] --- Index: linux-2.6-x86/arch/x86/kernel/ds.c === --- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-20 13:51:20.%N +0100 +++ linux-2.6-x86/arch/x86/kernel/ds.c 2007-12-20 13:52:01.%N +0100 @@ -111,53 +111,53 @@ * Accessor functions for some DS and BTS fields using the above * global ptrace_bts_cfg. */ -static inline void *get_bts_buffer_base(char *base) +static inline unsigned long get_bts_buffer_base(char *base) { - return *(void **)(base + ds_cfg.bts_buffer_base.offset); + return *(unsigned long *)(base + ds_cfg.bts_buffer_base.offset); } -static inline void set_bts_buffer_base(char *base, void *value) +static inline void set_bts_buffer_base(char *base, unsigned long value) { - (*(void **)(base + ds_cfg.bts_buffer_base.offset)) = value; + (*(unsigned long *)(base + ds_cfg.bts_buffer_base.offset)) = value; } -static inline void *get_bts_index(char *base) +static inline unsigned long get_bts_index(char *base) { - return *(void **)(base + ds_cfg.bts_index.offset); + return *(unsigned long *)(base + ds_cfg.bts_index.offset); } -static inline void set_bts_index(char *base, void *value) +static inline void set_bts_index(char *base, unsigned long value) { - (*(void **)(base + ds_cfg.bts_index.offset)) = value; + (*(unsigned long *)(base + ds_cfg.bts_index.offset)) = value; } -static inline void *get_bts_absolute_maximum(char *base) +static inline unsigned long get_bts_absolute_maximum(char *base) { - return *(void **)(base + ds_cfg.bts_absolute_maximum.offset); + return *(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset); } -static inline void set_bts_absolute_maximum(char *base, void *value) +static inline void set_bts_absolute_maximum(char *base, unsigned long value) { - (*(void **)(base + ds_cfg.bts_absolute_maximum.offset)) = value; + (*(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset)) = value; } -static inline void *get_bts_interrupt_threshold(char *base) +static inline unsigned long get_bts_interrupt_threshold(char *base) { - return *(void **)(base + ds_cfg.bts_interrupt_threshold.offset); + return *(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset); } -static inline void set_bts_interrupt_threshold(char *base, void *value) +static inline void set_bts_interrupt_threshold(char *base, unsigned long value) { - (*(void **)(base + ds_cfg.bts_interrupt_threshold.offset)) = value; + (*(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset)) = value; } -static inline long get_from_ip(char *base) +static inline unsigned long get_from_ip(char *base) { - return *(long *)(base + ds_cfg.from_ip.offset); + return *(unsigned long *)(base + ds_cfg.from_ip.offset); } -static inline void set_from_ip(char *base, long value) +static inline void set_from_ip(char *base, unsigned long value) { - (*(long *)(base + ds_cfg.from_ip.offset)) = value; + (*(unsigned long *)(base + ds_cfg.from_ip.offset)) = value; } -static inline long get_to_ip(char *base) +static inline unsigned long get_to_ip(char *base) { - return *(long *)(base + ds_cfg.to_ip.offset); + return *(unsigned long *)(base + ds_cfg.to_ip.offset); } -static inline void set_to_ip(char *base, long value) +static inline void set_to_ip(char *base, unsigned long value) { - (*(long *)(base + ds_cfg.to_ip.offset)) = value; + (*(unsigned long *)(base + ds_cfg.to_ip.offset)) = value; } static inline unsigned char get_info_type(char *base) { @@ -180,7 +180,7 @@ int ds_allocate(void **dsp, size_t bts_size_in_bytes) { size_t bts_size_in_records; - void *bts; + unsigned long bts; void *ds; if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts) @@ -197,7 +197,7 @@ if (bts_size_in_bytes = 0) return -EINVAL; - bts = kzalloc(bts_size_in_bytes, GFP_KERNEL); + bts = (unsigned long)kzalloc(bts_size_in_bytes, GFP_KERNEL); if (!bts) return -ENOMEM; @@ -205,7 +205,7 @@ ds = kzalloc(ds_cfg.sizeof_ds, GFP_KERNEL); if (!ds) { - kfree(bts); + kfree((void *)bts); return -ENOMEM; } @@ -221,7 +221,7 @@ int ds_free(void **dsp) { if (*dsp) - kfree(get_bts_buffer_base(*dsp)); + kfree((void *)get_bts_buffer_base(*dsp)); kfree(*dsp); *dsp = 0; @@ -230,7 +230,7 @@ int ds_get_bts_size(void *ds) { - size_t size_in_bytes; + int size_in_bytes; if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts) return -EOPNOTSUPP; @@ -246,7 +246,7 @@ int ds_get_bts_end(void *ds) { - size_t size_in_bytes = ds_get_bts_size(ds); + int size_in_bytes =
Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)
On Thursday 20 December 2007 13:45, Jaswinder Singh wrote: On 12/20/07, Remy Bohmer [EMAIL PROTECTED] wrote: So, Is this a serious requirement? Should this be possible? I have noticed this problem: [EMAIL PROTECTED]:~# cat /proc/loadavgrt 1.00 1.00 1.00 0/52 1158 [EMAIL PROTECTED]:~# cat /proc/loadavg 0.00 0.00 0.02 1/52 1159 [EMAIL PROTECTED]:~# So I am curious, if possible, user can switch softirq-threads or IRQs RT tasks to non-RT tasks for slow hardware or least important hardware for NON-RT tasks. So this will improve RT behaviour. ^^ Why? IMHO: Simply decrease the RT priority of less important IRQs or increase all other more important IRQs. IRQs are always more important than other processes in a system, also in non RT systems. Juergen -- Dipl.-Ing. Juergen Beisert | http://www.pengutronix.de Pengutronix - Linux Solutions for Science and Industry Handelsregister: Amtsgericht Hildesheim, HRA 2686 Vertretung Sued/Muenchen, Germany Phone: +49-8766-939 228 | Fax: +49-5121-206917-9 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/5] x86, ptrace: add buffer size checks
Pass the buffer size for (most) ptrace commands that pass user-allocated buffers and check that size before accessing the buffer. Unfortunately, PTRACE_BTS_GET already uses all 4 parameters. Commands that access user buffers return the number of bytes or records read or written. Signed-off-by: Markus Metzger [EMAIL PROTECTED] --- Index: linux-2.6-x86/arch/x86/kernel/ptrace.c === --- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:52:01.%N +0100 +++ linux-2.6-x86/arch/x86/kernel/ptrace.c 2007-12-20 13:52:09.%N +0100 @@ -591,6 +591,7 @@ } static int ptrace_bts_drain(struct task_struct *child, + long size, struct bts_struct __user *out) { int end, i; @@ -603,6 +604,9 @@ if (end = 0) return end; + if (size (end * sizeof(struct bts_struct))) + return -EIO; + for (i = 0; i end; i++, out++) { struct bts_struct ret; int retval; @@ -617,7 +621,7 @@ ds_clear(ds); - return i; + return end; } static int ptrace_bts_realloc(struct task_struct *child, @@ -690,15 +694,22 @@ } static int ptrace_bts_config(struct task_struct *child, +long cfg_size, const struct ptrace_bts_config __user *ucfg) { struct ptrace_bts_config cfg; int bts_size, ret = 0; void *ds; + if (cfg_size sizeof(cfg)) + return -EIO; + if (copy_from_user(cfg, ucfg, sizeof(cfg))) return -EFAULT; + if ((int)cfg.size 0) + return -EINVAL; + bts_size = 0; ds = (void *)child-thread.ds_area_msr; if (ds) { @@ -734,6 +745,8 @@ else clear_tsk_thread_flag(child, TIF_BTS_TRACE_TS); + ret = sizeof(cfg); + out: if (child-thread.debugctlmsr) set_tsk_thread_flag(child, TIF_DEBUGCTLMSR); @@ -749,11 +762,15 @@ } static int ptrace_bts_status(struct task_struct *child, +long cfg_size, struct ptrace_bts_config __user *ucfg) { void *ds = (void *)child-thread.ds_area_msr; struct ptrace_bts_config cfg; + if (cfg_size sizeof(cfg)) + return -EIO; + memset(cfg, 0, sizeof(cfg)); if (ds) { @@ -935,12 +952,12 @@ case PTRACE_BTS_CONFIG: ret = ptrace_bts_config - (child, (struct ptrace_bts_config __user *)addr); + (child, data, (struct ptrace_bts_config __user *)addr); break; case PTRACE_BTS_STATUS: ret = ptrace_bts_status - (child, (struct ptrace_bts_config __user *)addr); + (child, data, (struct ptrace_bts_config __user *)addr); break; case PTRACE_BTS_SIZE: @@ -958,7 +975,7 @@ case PTRACE_BTS_DRAIN: ret = ptrace_bts_drain - (child, (struct bts_struct __user *) addr); + (child, data, (struct bts_struct __user *) addr); break; default: Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h === --- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:01.%N +0100 +++ linux-2.6-x86/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:09.%N +0100 @@ -99,13 +99,15 @@ #define PTRACE_BTS_CONFIG 40 /* Configure branch trace recording. - DATA is ignored, ADDR points to a struct ptrace_bts_config. + ADDR points to a struct ptrace_bts_config. + DATA gives the size of that buffer. A new buffer is allocated, iff the size changes. + Returns the number of bytes read. */ #define PTRACE_BTS_STATUS 41 -/* Return the current configuration. - DATA is ignored, ADDR points to a struct ptrace_bts_config - that will contain the result. +/* Return the current configuration in a struct ptrace_bts_config + pointed to by ADDR; DATA gives the size of that buffer. + Returns the number of bytes written. */ #define PTRACE_BTS_SIZE42 /* Return the number of available BTS records. @@ -123,8 +125,8 @@ */ #define PTRACE_BTS_DRAIN 45 /* Read all available BTS records and clear the buffer. - DATA is ignored. ADDR points to an array of struct bts_struct of - suitable size. + ADDR points to an array of struct bts_struct. + DATA gives the size of that buffer. BTS records are read from oldest to newest. Returns number of BTS records drained. */ - Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Re: [PATCH] Move page_assign_page_cgroup to VM_BUG_ON in free_hot_cold_page
On Wed, 19 Dec 2007, Dave Hansen wrote: --- linux-2.6.24-rc5/mm/page_alloc.c~memory-controller-move-to-bug-on-in-free_hot_cold_page 2007-12-19 11:31:46.0 +0530 +++ linux-2.6.24-rc5-balbir/mm/page_alloc.c 2007-12-19 11:33:45.0 +0530 @@ -995,7 +995,7 @@ static void fastcall free_hot_cold_page( if (!PageHighMem(page)) debug_check_no_locks_freed(page_address(page), PAGE_SIZE); - page_assign_page_cgroup(page, NULL); + VM_BUG_ON(page_get_page_cgroup(page)); arch_free_page(page, 0); kernel_map_pages(page, 1, 0); Hi Balbir, You generally want to do these like: foo = page_assign_page_cgroup(page, NULL); VM_BUG_ON(foo); Some embedded people have been known to optimize kernel size like this: #define VM_BUG_ON(x) do{}while(0) Balbir's patch looks fine to me: I don't get your point there, Dave. Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix crash with FLAT_MEMORY and ARCH_PFN_OFFSET != 0
On (20/12/07 13:43), Thomas Bogendoerfer didst pronounce: On Thu, Dec 20, 2007 at 11:44:06AM +, Mel Gorman wrote: --- a/include/asm-mips/page.h +++ b/include/asm-mips/page.h @@ -37,13 +37,6 @@ #include linux/pfn.h #include asm/io.h -/* - * It's normally defined only for FLATMEM config but it's - * used in our early mem init code for all memory models. - * So always define it. - */ -#define ARCH_PFN_OFFSETPFN_UP(PHYS_OFFSET) - hmm, doesn't this break what I've fixed ? Without this #define ARCH_PFN_OFFSET gets defined to 0 and the bug is back. Or did I miss anything ? ARCH_PFN_OFFSET goes to 0, so page_to_pfn() is no longer adjusting by PFN_UP(PHYS_OFFSET) like it was when your problem occured. I am guessing that the nature of the crash was that page_to_pfn() was returning bogus values early in boot and trying to initialise memmap that didn't exist. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] udf: fix signedness issue
On Wed 19-12-07 20:27:20, Marcin Slusarz wrote: On Mon, Dec 17, 2007 at 05:32:17PM +0100, Jan Kara wrote: sparse generated: fs/udf/namei.c:896:15: originally declared here fs/udf/namei.c:1147:41: warning: incorrect type in argument 3 (different signedness) fs/udf/namei.c:1147:41:expected int *offset fs/udf/namei.c:1147:41:got unsigned int *noident fs/udf/namei.c:1152:78: warning: incorrect type in argument 3 (different signedness) fs/udf/namei.c:1152:78:expected int *offset fs/udf/namei.c:1152:78:got unsigned int *noident Signed-off-by: Marcin Slusarz [EMAIL PROTECTED] I don't think this is right. udf_get_fileident() should take unsigned int * as an offset, not just int. This means changing struct udf_fileident_bh to use unsigned int too but that is better anyway. And BTW the type shouldn't be uint32_t but really unsigned int in udf_rename (int needn't have 32 bits on all archs (although I think it has currently)). That would be hard. Look what is happening with soffset and eoffset eg in udf_fileident_read() - these fields are used as signed ints. Doh, you're right. soffset can go below 0. OK, then your patch is fine. You can add there Acked-by: Jan Kara [EMAIL PROTECTED] BTW: If you haven't got an email from Andrew about accepting the patches we agreed upon, then please resend the patches to him (with additional Acked-by: Jan Kara [EMAIL PROTECTED] and CC me) so that they don't get lost. Thanks. Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/16] lguest: introduce vcpu structure
this patch makes room for the vcpu structure in lguest, already used in this very same way at lguest64. It's the first part of our plan to have lguest and lguest64 unified too. When two dogs hang out, you don't have new puppies right in the other day. Some time has to be elapsed. They have to grow first. In this same spirit, having these patches _do not_ mean smp guests can be launched (yet) Much more work is to come, but this is the basic infrastructure. Enjoy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/16] introduce vcpu struct
this patch introduces a vcpu struct for lguest. In upcoming patches, more and more fields will be moved from the lguest struct to the vcpu Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/lg.h | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index 8692489..9723732 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -38,6 +38,13 @@ struct lguest_pages #define CHANGED_GDT_TLS4 /* Actually a subset of CHANGED_GDT */ #define CHANGED_ALL3 +struct lguest; + +struct lguest_vcpu { + int vcpu_id; + struct lguest *lg; +}; + /* The private info the thread maintains about the guest. */ struct lguest { @@ -47,6 +54,9 @@ struct lguest struct lguest_data __user *lguest_data; struct task_struct *tsk; struct mm_struct *mm; /* == tsk-mm, but that becomes NULL on exit */ + struct lguest_vcpu vcpus[NR_CPUS]; + unsigned int nr_vcpus; + u32 pfn_limit; /* This provides the offset to the base of guest-physical * memory in the Launcher. */ @@ -92,6 +102,11 @@ struct lguest DECLARE_BITMAP(irqs_pending, LGUEST_IRQS); }; +static inline struct lguest *lg_of_vcpu(struct lguest_vcpu *vcpu) +{ + return container_of((vcpu - vcpu-vcpu_id), struct lguest, vcpus[0]); +} + extern struct mutex lguest_lock; /* core.c: */ -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/16] initialize vcpu
this patch initializes the first vcpu in the initialize() routing, which is responsible for starting the process of putting the guest up. right now, as much of the fields are still not per-vcpu, it does not do much. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/lguest_user.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index 3b92a61..d1b1c26 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -88,6 +88,17 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o) return run_guest(lg, (unsigned long __user *)user); } +static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id, + unsigned long start_ip) +{ + vcpu-vcpu_id = vcpu_id; + + vcpu-lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]); + vcpu-lg-nr_vcpus++; + + return 0; +} + /*L:020 The initialization write supplies 4 pointer sized (32 or 64 bit) * values (in addition to the LHREQ_INITIALIZE value). These are: * @@ -134,6 +145,12 @@ static int initialize(struct file *file, const unsigned long __user *input) lg-mem_base = (void __user *)(long)args[0]; lg-pfn_limit = args[1]; + /* This is the first cpu */ + lg-nr_vcpus = 0; + err = vcpu_start(lg-vcpus[0], 0, args[3]); + if (err) + goto release_guest; + /* We need a complete page for the Guest registers: they are accessible * to the Guest and we can only grant it access to whole pages. */ lg-regs_page = get_zeroed_page(GFP_KERNEL); -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/16] adapt lguest launcher to per-cpuness
This patch makes uses of pread() and pwrite() in lguest launcher to communicate the vcpu id to the lguest driver. The id is kept in a thread variable, which means we'll span in the future, vcpus as threads. But right now, only the infrastructure is out there. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- Documentation/lguest/lguest.c | 24 +--- 1 files changed, 17 insertions(+), 7 deletions(-) diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 9b0e322..c406ba9 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -79,6 +79,9 @@ static void *guest_base; /* The maximum guest physical address allowed, and maximum possible. */ static unsigned long guest_limit, guest_max; +/* a per-cpu variable indicating whose vcpu is currently running */ +static unsigned int __thread vcpu_id; + /* This is our list of devices. */ struct device_list { @@ -554,7 +557,7 @@ static void wake_parent(int pipefd, int lguest_fd) else FD_CLR(-fd - 1, devices.infds); } else /* Send LHREQ_BREAK command. */ - write(lguest_fd, args, sizeof(args)); + pwrite(lguest_fd, args, sizeof(args), 0); } } @@ -1511,7 +1514,8 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd) int readval; /* We read from the /dev/lguest device to run the Guest. */ - readval = read(lguest_fd, notify_addr, sizeof(notify_addr)); + readval = pread(lguest_fd, notify_addr, + sizeof(notify_addr), vcpu_id); /* One unsigned long means the Guest did HCALL_NOTIFY */ if (readval == sizeof(notify_addr)) { @@ -1521,17 +1525,22 @@ static void __attribute__((noreturn)) run_guest(int lguest_fd) /* ENOENT means the Guest died. Reading tells us why. */ } else if (errno == ENOENT) { char reason[1024] = { 0 }; - read(lguest_fd, reason, sizeof(reason)-1); + pread(lguest_fd, reason, sizeof(reason)-1, vcpu_id); errx(1, %s, reason); /* EAGAIN means the Waker wanted us to look at some input. * Anything else means a bug or incompatible change. */ } else if (errno != EAGAIN) err(1, Running guest failed); - /* Service input, then unset the BREAK to release the Waker. */ - handle_input(lguest_fd); - if (write(lguest_fd, args, sizeof(args)) 0) - err(1, Resetting break); + if (!vcpu_id) { + /* +* Service input, then unset the BREAK to +* release the Waker. +*/ + handle_input(lguest_fd); + if (pwrite(lguest_fd, args, sizeof(args), 0) 0) + err(1, Resetting break); + } } } /* @@ -1582,6 +1591,7 @@ int main(int argc, char *argv[]) devices.lastdev = devices.dev; devices.next_irq = 1; + vcpu_id = 0; /* We need to know how much memory so we can set up the device * descriptor and memory pages for the devices as we parse the command * line. So we quickly look through the arguments to find the amount -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/16] per-cpu run guest
This patch makes the run_guest() routine use the vcpu struct. This is required since in a smp guest environment, there's no more the notion of running the guest, but rather, it is running the vcpu Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/core.c|6 -- drivers/lguest/lg.h |4 ++-- drivers/lguest/lguest_user.c |6 +- drivers/lguest/x86/core.c| 16 +++- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c index cb4c670..70fc65e 100644 --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -174,8 +174,10 @@ void __lgwrite(struct lguest *lg, unsigned long addr, const void *b, /*H:030 Let's jump straight to the the main loop which runs the Guest. * Remember, this is called by the Launcher reading /dev/lguest, and we keep * going around and around until something interesting happens. */ -int run_guest(struct lguest *lg, unsigned long __user *user) +int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) { + struct lguest *lg = vcpu-lg; + /* We stop running once the Guest is dead. */ while (!lg-dead) { /* First we run any hypercalls the Guest wants done. */ @@ -226,7 +228,7 @@ int run_guest(struct lguest *lg, unsigned long __user *user) local_irq_disable(); /* Actually run the Guest until something happens. */ - lguest_arch_run_guest(lg); + lguest_arch_run_guest(vcpu); /* Now we're ready to be interrupted or moved to other CPUs */ local_irq_enable(); diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index 9723732..c4a0a97 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -131,7 +131,7 @@ void __lgwrite(struct lguest *, unsigned long, const void *, unsigned); } while(0) /* (end of memory access helper routines) :*/ -int run_guest(struct lguest *lg, unsigned long __user *user); +int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user); /* Helper macros to obtain the first 12 or the last 20 bits, this is only the * first step in the migration to the kernel types. pte_pfn is already defined @@ -182,7 +182,7 @@ void page_table_guest_data_init(struct lguest *lg); /* arch/core.c: */ void lguest_arch_host_init(void); void lguest_arch_host_fini(void); -void lguest_arch_run_guest(struct lguest *lg); +void lguest_arch_run_guest(struct lguest_vcpu *vcpu); void lguest_arch_handle_trap(struct lguest *lg); int lguest_arch_init_hypercalls(struct lguest *lg); int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args); diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index d1b1c26..894d530 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -55,11 +55,15 @@ static int user_send_irq(struct lguest *lg, const unsigned long __user *input) static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o) { struct lguest *lg = file-private_data; + struct lguest_vcpu *vcpu = NULL; + unsigned int vcpu_id = *o; /* You must write LHREQ_INITIALIZE first! */ if (!lg) return -EINVAL; + vcpu = lg-vcpus[vcpu_id]; + /* If you're not the task which owns the Guest, go away. */ if (current != lg-tsk) return -EPERM; @@ -85,7 +89,7 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o) lg-pending_notify = 0; /* Run the Guest until something interesting happens. */ - return run_guest(lg, (unsigned long __user *)user); + return run_guest(vcpu, (unsigned long __user *)user); } static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id, diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c index 482aec2..0530ef3 100644 --- a/drivers/lguest/x86/core.c +++ b/drivers/lguest/x86/core.c @@ -73,8 +73,10 @@ static DEFINE_PER_CPU(struct lguest *, last_guest); * since it last ran. We saw this set in interrupts_and_traps.c and * segments.c. */ -static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages) +static void copy_in_guest_info(struct lguest_vcpu *vcpu, + struct lguest_pages *pages) { + struct lguest *lg = vcpu-lg; /* Copying all this data can be quite expensive. We usually run the * same Guest we ran last time (and that Guest hasn't run anywhere else * meanwhile). If that's not the case, we pretend everything in the @@ -113,14 +115,16 @@ static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages) } /* Finally: the code to actually call into the Switcher to run the Guest. */ -static void run_guest_once(struct lguest *lg, struct lguest_pages *pages) +static void run_guest_once(struct lguest_vcpu *vcpu, + struct
[PATCH 05/16] make write() operation smp aware
This patch makes the write() file operation smp aware. Which means, receiving the vcpu_id value through the offset parameter, and being well aware to which vcpu we're talking to. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/lguest_user.c | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index 894d530..ae5bf4c 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -223,14 +223,21 @@ static ssize_t write(struct file *file, const char __user *in, struct lguest *lg = file-private_data; const unsigned long __user *input = (const unsigned long __user *)in; unsigned long req; + struct lguest_vcpu *vcpu = NULL; + int vcpu_id = *off; if (get_user(req, input) != 0) return -EFAULT; input++; /* If you haven't initialized, you must do that first. */ - if (req != LHREQ_INITIALIZE !lg) - return -EINVAL; + if (req != LHREQ_INITIALIZE) { + if (!lg) + return -EINVAL; + vcpu = lg-vcpus[vcpu_id]; + if (!vcpu) + return -EINVAL; + } /* Once the Guest is dead, all you can do is read() why it died. */ if (lg lg-dead) -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/16] make hypercalls use the vcpu struct
this patch changes do_hcall() and do_async_hcall() interfaces (and obviously their callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the whole guest Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/core.c |6 +++--- drivers/lguest/hypercalls.c | 42 +++--- drivers/lguest/lg.h | 16 drivers/lguest/x86/core.c | 16 ++-- 4 files changed, 44 insertions(+), 36 deletions(-) diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c index 70fc65e..ef35e02 100644 --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -181,8 +181,8 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) /* We stop running once the Guest is dead. */ while (!lg-dead) { /* First we run any hypercalls the Guest wants done. */ - if (lg-hcall) - do_hypercalls(lg); + if (vcpu-hcall) + do_hypercalls(vcpu); /* It's possible the Guest did a NOTIFY hypercall to the * Launcher, in which case we return from the read() now. */ @@ -234,7 +234,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) local_irq_enable(); /* Now we deal with whatever happened to the Guest. */ - lguest_arch_handle_trap(lg); + lguest_arch_handle_trap(vcpu); } /* The Guest is dead = No such file or directory */ diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index b478aff..62da355 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -29,8 +29,10 @@ /*H:120 This is the core hypercall routine: where the Guest gets what it wants. * Or gets killed. Or, in the case of LHCALL_CRASH, both. */ -static void do_hcall(struct lguest *lg, struct hcall_args *args) +static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) { + struct lguest *lg = vcpu-lg; + switch (args-arg0) { case LHCALL_FLUSH_ASYNC: /* This call does nothing, except by breaking out of the Guest @@ -91,7 +93,7 @@ static void do_hcall(struct lguest *lg, struct hcall_args *args) break; default: /* It should be an architecture-specific hypercall. */ - if (lguest_arch_do_hcall(lg, args)) + if (lguest_arch_do_hcall(vcpu, args)) kill_guest(lg, Bad hypercall %li\n, args-arg0); } } @@ -104,10 +106,11 @@ static void do_hcall(struct lguest *lg, struct hcall_args *args) * Guest put them in the ring, but we also promise the Guest that they will * happen before any normal hypercall (which is why we check this before * checking for a normal hcall). */ -static void do_async_hcalls(struct lguest *lg) +static void do_async_hcalls(struct lguest_vcpu *vcpu) { unsigned int i; u8 st[LHCALL_RING_SIZE]; + struct lguest *lg = vcpu-lg; /* For simplicity, we copy the entire call status array in at once. */ if (copy_from_user(st, lg-lguest_data-hcall_status, sizeof(st))) @@ -119,7 +122,7 @@ static void do_async_hcalls(struct lguest *lg) /* We remember where we were up to from last time. This makes * sure that the hypercalls are done in the order the Guest * places them in the ring. */ - unsigned int n = lg-next_hcall; + unsigned int n = vcpu-next_hcall; /* 0xFF means there's no call here (yet). */ if (st[n] == 0xFF) @@ -127,8 +130,8 @@ static void do_async_hcalls(struct lguest *lg) /* OK, we have hypercall. Increment the next_hcall cursor, * and wrap back to 0 if we reach the end. */ - if (++lg-next_hcall == LHCALL_RING_SIZE) - lg-next_hcall = 0; + if (++vcpu-next_hcall == LHCALL_RING_SIZE) + vcpu-next_hcall = 0; /* Copy the hypercall arguments into a local copy of * the hcall_args struct. */ @@ -139,7 +142,7 @@ static void do_async_hcalls(struct lguest *lg) } /* Do the hypercall, same as a normal one. */ - do_hcall(lg, args); + do_hcall(vcpu, args); /* Mark the hypercall done. */ if (put_user(0xFF, lg-lguest_data-hcall_status[n])) { @@ -156,16 +159,17 @@ static void do_async_hcalls(struct lguest *lg) /* Last of all, we look at what happens first of all. The very first time the * Guest makes a hypercall, we end up here to set things up: */ -static void initialize(struct lguest *lg) +static void initialize(struct lguest_vcpu *vcpu) { + struct lguest *lg = vcpu-lg; /* You can't do anything until you're initialized. The Guest
[PATCH 07/16] per-vcpu lguest timers
Here, I introduce per-vcpu timers. With this, we can have local expiries, needed for accounting time in smp guests Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/hypercalls.c |2 +- drivers/lguest/interrupts_and_traps.c | 20 ++-- drivers/lguest/lg.h | 10 +- drivers/lguest/lguest_user.c | 12 +++- 4 files changed, 23 insertions(+), 21 deletions(-) diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index 62da355..4364bc2 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -78,7 +78,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) guest_set_pmd(lg, args-arg1, args-arg2); break; case LHCALL_SET_CLOCKEVENT: - guest_set_clockevent(lg, args-arg1); + guest_set_clockevent(vcpu, args-arg1); break; case LHCALL_TS: /* This sets the TS flag, as we saw used in run_guest(). */ diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index 2b66f79..189d66e 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -470,13 +470,13 @@ void copy_traps(const struct lguest *lg, struct desc_struct *idt, * infrastructure to set a callback at that time. * * 0 means turn off the clock. */ -void guest_set_clockevent(struct lguest *lg, unsigned long delta) +void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta) { ktime_t expires; if (unlikely(delta == 0)) { /* Clock event device is shutting down. */ - hrtimer_cancel(lg-hrt); + hrtimer_cancel(vcpu-hrt); return; } @@ -484,25 +484,25 @@ void guest_set_clockevent(struct lguest *lg, unsigned long delta) * all the time between now and the timer interrupt it asked for. This * is almost always the right thing to do. */ expires = ktime_add_ns(ktime_get_real(), delta); - hrtimer_start(lg-hrt, expires, HRTIMER_MODE_ABS); + hrtimer_start(vcpu-hrt, expires, HRTIMER_MODE_ABS); } /* This is the function called when the Guest's timer expires. */ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer) { - struct lguest *lg = container_of(timer, struct lguest, hrt); + struct lguest_vcpu *vcpu = container_of(timer, struct lguest_vcpu, hrt); /* Remember the first interrupt is the timer interrupt. */ - set_bit(0, lg-irqs_pending); + set_bit(0, vcpu-lg-irqs_pending); /* If the Guest is actually stopped, we need to wake it up. */ - if (lg-halted) - wake_up_process(lg-tsk); + if (vcpu-lg-halted) + wake_up_process(vcpu-lg-tsk); return HRTIMER_NORESTART; } /* This sets up the timer for this Guest. */ -void init_clockdev(struct lguest *lg) +void init_clockdev(struct lguest_vcpu *vcpu) { - hrtimer_init(lg-hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS); - lg-hrt.function = clockdev_fn; + hrtimer_init(vcpu-hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS); + vcpu-hrt.function = clockdev_fn; } diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index 696cdf1..0205409 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -47,6 +47,9 @@ struct lguest_vcpu { /* If a hypercall was asked for, this points to the arguments. */ struct hcall_args *hcall; u32 next_hcall; + + /* Virtual clock device */ + struct hrtimer hrt; }; /* The private info the thread maintains about the guest. */ @@ -95,9 +98,6 @@ struct lguest struct lguest_arch arch; - /* Virtual clock device */ - struct hrtimer hrt; - /* Pending virtual interrupts */ DECLARE_BITMAP(irqs_pending, LGUEST_IRQS); }; @@ -150,8 +150,8 @@ void setup_default_idt_entries(struct lguest_ro_state *state, const unsigned long *def); void copy_traps(const struct lguest *lg, struct desc_struct *idt, const unsigned long *def); -void guest_set_clockevent(struct lguest *lg, unsigned long delta); -void init_clockdev(struct lguest *lg); +void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta); +void init_clockdev(struct lguest_vcpu *vcpu); bool check_syscall_vector(struct lguest *lg); int init_interrupts(void); void free_interrupts(void); diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index ae5bf4c..7481e82 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -97,6 +97,9 @@ static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id, { vcpu-vcpu_id = vcpu_id; + /* The timer for lguest's clock needs initialization. */ + init_clockdev(vcpu); + vcpu-lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
[PATCH 08/16] per-vcpu interrupt processing.
This patch adapts interrupt processing for using the vcpu struct. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/core.c |2 +- drivers/lguest/interrupts_and_traps.c | 25 ++--- drivers/lguest/lg.h | 10 +- drivers/lguest/lguest_user.c |7 --- drivers/lguest/x86/core.c |2 +- 5 files changed, 25 insertions(+), 21 deletions(-) diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c index ef35e02..4d0102d 100644 --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -203,7 +203,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) /* Check if there are any interrupts which can be delivered * now: if so, this sets up the hander to be executed when we * next run the Guest. */ - maybe_do_interrupt(lg); + maybe_do_interrupt(vcpu); /* All long-lived kernel loops need to check with this horrible * thing called the freezer. If the Host is trying to suspend, diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index 189d66e..db440cb 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -60,11 +60,13 @@ static void push_guest_stack(struct lguest *lg, unsigned long *gstack, u32 val) * We set up the stack just like the CPU does for a real interrupt, so it's * identical for the Guest (and the standard iret instruction will undo * it). */ -static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err) +static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, + int has_err) { unsigned long gstack, origstack; u32 eflags, ss, irq_enable; unsigned long virtstack; + struct lguest *lg = vcpu-lg; /* There are two cases for interrupts: one where the Guest is already * in the kernel, and a more complex one where the Guest is in @@ -129,9 +131,10 @@ static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err) * * maybe_do_interrupt() gets called before every entry to the Guest, to see if * we should divert the Guest to running an interrupt handler. */ -void maybe_do_interrupt(struct lguest *lg) +void maybe_do_interrupt(struct lguest_vcpu *vcpu) { unsigned int irq; + struct lguest *lg = vcpu-lg; DECLARE_BITMAP(blk, LGUEST_IRQS); struct desc_struct *idt; @@ -145,7 +148,7 @@ void maybe_do_interrupt(struct lguest *lg) sizeof(blk))) return; - bitmap_andnot(blk, lg-irqs_pending, blk, LGUEST_IRQS); + bitmap_andnot(blk, vcpu-irqs_pending, blk, LGUEST_IRQS); /* Find the first interrupt. */ irq = find_first_bit(blk, LGUEST_IRQS); @@ -180,11 +183,11 @@ void maybe_do_interrupt(struct lguest *lg) /* If they don't have a handler (yet?), we just ignore it */ if (idt_present(idt-a, idt-b)) { /* OK, mark it no longer pending and deliver it. */ - clear_bit(irq, lg-irqs_pending); + clear_bit(irq, vcpu-irqs_pending); /* set_guest_interrupt() takes the interrupt descriptor and a * flag to say whether this interrupt pushes an error code onto * the stack as well: virtual interrupts never do. */ - set_guest_interrupt(lg, idt-a, idt-b, 0); + set_guest_interrupt(vcpu, idt-a, idt-b, 0); } /* Every time we deliver an interrupt, we update the timestamp in the @@ -245,19 +248,19 @@ static int has_err(unsigned int trap) } /* deliver_trap() returns true if it could deliver the trap. */ -int deliver_trap(struct lguest *lg, unsigned int num) +int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num) { /* Trap numbers are always 8 bit, but we set an impossible trap number * for traps inside the Switcher, so check that here. */ - if (num = ARRAY_SIZE(lg-arch.idt)) + if (num = ARRAY_SIZE(vcpu-lg-arch.idt)) return 0; /* Early on the Guest hasn't set the IDT entries (or maybe it put a * bogus one in): if we fail here, the Guest will be killed. */ - if (!idt_present(lg-arch.idt[num].a, lg-arch.idt[num].b)) + if (!idt_present(vcpu-lg-arch.idt[num].a, vcpu-lg-arch.idt[num].b)) return 0; - set_guest_interrupt(lg, lg-arch.idt[num].a, lg-arch.idt[num].b, - has_err(num)); + set_guest_interrupt(vcpu, vcpu-lg-arch.idt[num].a, + vcpu-lg-arch.idt[num].b, has_err(num)); return 1; } @@ -493,7 +496,7 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer) struct lguest_vcpu *vcpu = container_of(timer, struct lguest_vcpu, hrt);
[PATCH 09/16] map_switcher_in_guest() per-vcpu
The switcher needs to be mapped per-vcpu, because different vcpus will potentially have different page tables (they don't have to, because threads will share the same). So our first step is the make the function receive a vcpu struct Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/lg.h |3 ++- drivers/lguest/page_tables.c |4 +++- drivers/lguest/x86/core.c|2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index db2edd6..f6e9020 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -173,7 +173,8 @@ void guest_pagetable_clear_all(struct lguest *lg); void guest_pagetable_flush_user(struct lguest *lg); void guest_set_pte(struct lguest *lg, unsigned long gpgdir, unsigned long vaddr, pte_t val); -void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages); +void map_switcher_in_guest(struct lguest_vcpu *vcpu, + struct lguest_pages *pages); int demand_page(struct lguest *info, unsigned long cr2, int errcode); void pin_page(struct lguest *lg, unsigned long vaddr); unsigned long guest_pa(struct lguest *lg, unsigned long vaddr); diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c index fffabb3..7fb8627 100644 --- a/drivers/lguest/page_tables.c +++ b/drivers/lguest/page_tables.c @@ -634,8 +634,10 @@ void free_guest_pagetable(struct lguest *lg) * Guest (and not the pages for other CPUs). We have the appropriate PTE pages * for each CPU already set up, we just need to hook them in now we know which * Guest is about to run on this CPU. */ -void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages) +void map_switcher_in_guest(struct lguest_vcpu *vcpu, + struct lguest_pages *pages) { + struct lguest *lg = vcpu-lg; pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages); pgd_t switcher_pgd; pte_t regs_pte; diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c index 3d21c6d..9bf2213 100644 --- a/drivers/lguest/x86/core.c +++ b/drivers/lguest/x86/core.c @@ -92,7 +92,7 @@ static void copy_in_guest_info(struct lguest_vcpu *vcpu, pages-state.host_cr3 = __pa(current-mm-pgd); /* Set up the Guest's page tables to see this CPU's pages (and no * other CPU's pages). */ - map_switcher_in_guest(lg, pages); + map_switcher_in_guest(vcpu, pages); /* Set up the two TSS members which tell the CPU what stack to use * for traps which do directly into the Guest (ie. traps at privilege * level 1). */ -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/16] make emulate_insn receive a vcpu struct.
emulate_insn() needs to know about current eip, which will be, in the future, a per-vcpu thing. So in this patch, the function prototype is modified to receive a vcpu struct Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/x86/core.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c index 9bf2213..2fb9cd3 100644 --- a/drivers/lguest/x86/core.c +++ b/drivers/lguest/x86/core.c @@ -220,8 +220,9 @@ void lguest_arch_run_guest(struct lguest_vcpu *vcpu) * When the Guest uses one of these instructions, we get a trap (General * Protection Fault) and come here. We see if it's one of those troublesome * instructions and skip over it. We return true if we did. */ -static int emulate_insn(struct lguest *lg) +static int emulate_insn(struct lguest_vcpu *vcpu) { + struct lguest *lg = vcpu-lg; u8 insn; unsigned int insnlen = 0, in = 0, shift = 0; /* The eip contains the *virtual* address of the Guest's instruction: @@ -294,7 +295,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu) * instructions which we need to emulate. If so, we just go * back into the Guest after we've done it. */ if (lg-regs-errcode == 0) { - if (emulate_insn(lg)) + if (emulate_insn(vcpu)) return; } break; -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 12/16] replace lguest_arch with lguest_vcpu_arch.
The fields found in lguest_arch are not really per-guest, but per-cpu (gdt, idt, etc). So this patch turns lguest_arch into lguest_vcpu_arch. It makes sense to have a per-guest per-arch struct, but this can be addressed later, when the need arrives. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/interrupts_and_traps.c | 29 +++-- drivers/lguest/lg.h | 19 +++--- drivers/lguest/segments.c | 43 +--- drivers/lguest/x86/core.c | 24 -- include/asm-x86/lguest.h |2 +- 5 files changed, 60 insertions(+), 57 deletions(-) diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index 1ceff5f..b3d444a 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -180,7 +180,7 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu) /* Look at the IDT entry the Guest gave us for this interrupt. The * first 32 (FIRST_EXTERNAL_VECTOR) entries are for traps, so we skip * over them. */ - idt = lg-arch.idt[FIRST_EXTERNAL_VECTOR+irq]; + idt = vcpu-arch.idt[FIRST_EXTERNAL_VECTOR+irq]; /* If they don't have a handler (yet?), we just ignore it */ if (idt_present(idt-a, idt-b)) { /* OK, mark it no longer pending and deliver it. */ @@ -253,15 +253,15 @@ int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num) { /* Trap numbers are always 8 bit, but we set an impossible trap number * for traps inside the Switcher, so check that here. */ - if (num = ARRAY_SIZE(vcpu-lg-arch.idt)) + if (num = ARRAY_SIZE(vcpu-arch.idt)) return 0; /* Early on the Guest hasn't set the IDT entries (or maybe it put a * bogus one in): if we fail here, the Guest will be killed. */ - if (!idt_present(vcpu-lg-arch.idt[num].a, vcpu-lg-arch.idt[num].b)) + if (!idt_present(vcpu-arch.idt[num].a, vcpu-arch.idt[num].b)) return 0; - set_guest_interrupt(vcpu, vcpu-lg-arch.idt[num].a, - vcpu-lg-arch.idt[num].b, has_err(num)); + set_guest_interrupt(vcpu, vcpu-arch.idt[num].a, + vcpu-arch.idt[num].b, has_err(num)); return 1; } @@ -387,7 +387,8 @@ static void set_trap(struct lguest *lg, struct desc_struct *trap, * * We saw the Guest setting Interrupt Descriptor Table (IDT) entries with the * LHCALL_LOAD_IDT_ENTRY hypercall before: that comes here. */ -void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi) +void load_guest_idt_entry(struct lguest_vcpu *vcpu, + unsigned int num, u32 lo, u32 hi) { /* Guest never handles: NMI, doublefault, spurious interrupt or * hypercall. We ignore when it tries to set them. */ @@ -396,13 +397,13 @@ void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi) /* Mark the IDT as changed: next time the Guest runs we'll know we have * to copy this again. */ - lg-changed |= CHANGED_IDT; + vcpu-lg-changed |= CHANGED_IDT; /* Check that the Guest doesn't try to step outside the bounds. */ - if (num = ARRAY_SIZE(lg-arch.idt)) - kill_guest(lg, Setting idt entry %u, num); + if (num = ARRAY_SIZE(vcpu-arch.idt)) + kill_guest(vcpu-lg, Setting idt entry %u, num); else - set_trap(lg, lg-arch.idt[num], num, lo, hi); + set_trap(vcpu-lg, vcpu-arch.idt[num], num, lo, hi); } /* The default entry for each interrupt points into the Switcher routines which @@ -438,14 +439,14 @@ void setup_default_idt_entries(struct lguest_ro_state *state, /*H:240 We don't use the IDT entries in the struct lguest directly, instead * we copy them into the IDT which we've set up for Guests on this CPU, just * before we run the Guest. This routine does that copy. */ -void copy_traps(const struct lguest *lg, struct desc_struct *idt, +void copy_traps(const struct lguest_vcpu *vcpu, struct desc_struct *idt, const unsigned long *def) { unsigned int i; /* We can simply copy the direct traps, otherwise we use the default * ones in the Switcher: they will return to the Host. */ - for (i = 0; i ARRAY_SIZE(lg-arch.idt); i++) { + for (i = 0; i ARRAY_SIZE(vcpu-arch.idt); i++) { /* If no Guest can ever override this trap, leave it alone. */ if (!direct_trap(i)) continue; @@ -454,8 +455,8 @@ void copy_traps(const struct lguest *lg, struct desc_struct *idt, * Interrupt gates (type 14) disable interrupts as they are * entered, which we never let the Guest do. Not present * entries (type 0x0) also can't go direct, of course. */ -
neigh: timer !nud_in_timer
Hello, I noticed the following message in my kernel log. kernel: neigh: timer !nud_in_timer (Might be due to a race condition.) I'm running a UP Linux version 2.6.22.1-rt9 ( http://rt.wiki.kernel.org/index.php ) The following /proc entries might be relevant. /proc/sys/net/ipv4/conf/all/arp_accept 0 /proc/sys/net/ipv4/conf/all/arp_announce 2 /proc/sys/net/ipv4/conf/all/arp_filter 0 /proc/sys/net/ipv4/conf/all/arp_ignore 1 I also lowered the priority of softirq-timer/0 to 10 which means it can be interrupted by other IRQ handlers. Regards. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/16] make registers per-vcpu
This is the most obvious per-vcpu field: registers. So this patch moves it from struct lguest to struct vcpu, and patch the places in which they are used, accordingly Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/interrupts_and_traps.c | 29 --- drivers/lguest/lg.h |9 --- drivers/lguest/lguest_user.c | 36 +++--- drivers/lguest/page_tables.c |4 ++- drivers/lguest/x86/core.c | 39 + 5 files changed, 61 insertions(+), 56 deletions(-) diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index db440cb..1ceff5f 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -71,7 +71,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, /* There are two cases for interrupts: one where the Guest is already * in the kernel, and a more complex one where the Guest is in * userspace. We check the privilege level to find out. */ - if ((lg-regs-ss0x3) != GUEST_PL) { + if ((vcpu-regs-ss0x3) != GUEST_PL) { /* The Guest told us their kernel stack with the SET_STACK * hypercall: both the virtual address and the segment */ virtstack = lg-esp1; @@ -82,12 +82,12 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, * stack: when the Guest does an iret back from the interrupt * handler the CPU will notice they're dropping privilege * levels and expect these here. */ - push_guest_stack(lg, gstack, lg-regs-ss); - push_guest_stack(lg, gstack, lg-regs-esp); + push_guest_stack(lg, gstack, vcpu-regs-ss); + push_guest_stack(lg, gstack, vcpu-regs-esp); } else { /* We're staying on the same Guest (kernel) stack. */ - virtstack = lg-regs-esp; - ss = lg-regs-ss; + virtstack = vcpu-regs-esp; + ss = vcpu-regs-ss; origstack = gstack = guest_pa(lg, virtstack); } @@ -96,7 +96,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, * the Interrupt Flag bit is always set. We copy that bit from the * Guest's irq_enabled field into the eflags word: we saw the Guest * copy it back in lguest_iret. */ - eflags = lg-regs-eflags; + eflags = vcpu-regs-eflags; if (get_user(irq_enable, lg-lguest_data-irq_enabled) == 0 !(irq_enable X86_EFLAGS_IF)) eflags = ~X86_EFLAGS_IF; @@ -105,19 +105,19 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, * eflags word, the old code segment, and the old instruction * pointer. */ push_guest_stack(lg, gstack, eflags); - push_guest_stack(lg, gstack, lg-regs-cs); - push_guest_stack(lg, gstack, lg-regs-eip); + push_guest_stack(lg, gstack, vcpu-regs-cs); + push_guest_stack(lg, gstack, vcpu-regs-eip); /* For the six traps which supply an error code, we push that, too. */ if (has_err) - push_guest_stack(lg, gstack, lg-regs-errcode); + push_guest_stack(lg, gstack, vcpu-regs-errcode); /* Now we've pushed all the old state, we change the stack, the code * segment and the address to execute. */ - lg-regs-ss = ss; - lg-regs-esp = virtstack + (gstack - origstack); - lg-regs-cs = (__KERNEL_CS|GUEST_PL); - lg-regs-eip = idt_address(lo, hi); + vcpu-regs-ss = ss; + vcpu-regs-esp = virtstack + (gstack - origstack); + vcpu-regs-cs = (__KERNEL_CS|GUEST_PL); + vcpu-regs-eip = idt_address(lo, hi); /* There are two kinds of interrupt handlers: 0xE is an interrupt * gate which expects interrupts to be disabled on entry. */ @@ -158,7 +158,8 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu) /* They may be in the middle of an iret, where they asked us never to * deliver interrupts. */ - if (lg-regs-eip = lg-noirq_start lg-regs-eip lg-noirq_end) + if ((vcpu-regs-eip = lg-noirq_start) + (vcpu-regs-eip lg-noirq_end)) return; /* If they're halted, interrupts restart them. */ diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index f6e9020..d05fe38 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -44,6 +44,10 @@ struct lguest_vcpu { int vcpu_id; struct lguest *lg; + /* At end of a page shared mapped over lguest_pages in guest. */ + unsigned long regs_page; + struct lguest_regs *regs; + /* If a hypercall was asked for, this points to the arguments. */ struct hcall_args *hcall; u32 next_hcall; @@ -58,9
[PATCH 14/16] makes special fields be per-vcpu
lguest struct have room for some fields, namely, cr2, ts, esp1 and ss1, that are not really guest-wide, but rather, vcpu-wide. This patch puts it in the vcpu struct Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/hypercalls.c | 10 +- drivers/lguest/interrupts_and_traps.c | 24 +--- drivers/lguest/lg.h | 18 ++ drivers/lguest/page_tables.c | 11 ++- drivers/lguest/x86/core.c | 10 -- 5 files changed, 38 insertions(+), 35 deletions(-) diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index 41ea2e2..c6b87ef 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -58,7 +58,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) /* FLUSH_TLB comes in two flavors, depending on the * argument: */ if (args-arg1) - guest_pagetable_clear_all(lg); + guest_pagetable_clear_all(vcpu); else guest_pagetable_flush_user(lg); break; @@ -66,10 +66,10 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) /* All these calls simply pass the arguments through to the right * routines. */ case LHCALL_NEW_PGTABLE: - guest_new_pagetable(lg, args-arg1); + guest_new_pagetable(vcpu, args-arg1); break; case LHCALL_SET_STACK: - guest_set_stack(lg, args-arg1, args-arg2, args-arg3); + guest_set_stack(vcpu, args-arg1, args-arg2, args-arg3); break; case LHCALL_SET_PTE: guest_set_pte(lg, args-arg1, args-arg2, __pte(args-arg3)); @@ -82,7 +82,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) break; case LHCALL_TS: /* This sets the TS flag, as we saw used in run_guest(). */ - lg-ts = args-arg1; + vcpu-ts = args-arg1; break; case LHCALL_HALT: /* Similarly, this sets the halted flag for run_guest(). */ @@ -189,7 +189,7 @@ static void initialize(struct lguest_vcpu *vcpu) * first write to a Guest page. This may have caused a copy-on-write * fault, but the old page might be (read-only) in the Guest * pagetable. */ - guest_pagetable_clear_all(lg); + guest_pagetable_clear_all(vcpu); } /*H:100 diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index 10c9aea..78f6210 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -74,8 +74,8 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, if ((vcpu-regs-ss0x3) != GUEST_PL) { /* The Guest told us their kernel stack with the SET_STACK * hypercall: both the virtual address and the segment */ - virtstack = lg-esp1; - ss = lg-ss1; + virtstack = vcpu-esp1; + ss = vcpu-ss1; origstack = gstack = guest_pa(lg, virtstack); /* We push the old stack segment and pointer onto the new @@ -313,10 +313,11 @@ static int direct_trap(unsigned int num) * the Guest. * * Which is deeply unfair, because (literally!) it wasn't the Guests' fault. */ -void pin_stack_pages(struct lguest *lg) +void pin_stack_pages(struct lguest_vcpu *vcpu) { unsigned int i; + struct lguest *lg = vcpu-lg; /* Depending on the CONFIG_4KSTACKS option, the Guest can have one or * two pages of stack space. */ for (i = 0; i lg-stack_pages; i++) @@ -324,7 +325,7 @@ void pin_stack_pages(struct lguest *lg) * start of the page after the kernel stack. Subtract one to * get back onto the first stack page, and keep subtracting to * get to the rest of the stack pages. */ - pin_page(lg, lg-esp1 - 1 - i * PAGE_SIZE); + pin_page(lg, vcpu-esp1 - 1 - i * PAGE_SIZE); } /* Direct traps also mean that we need to know whenever the Guest wants to use @@ -335,21 +336,22 @@ void pin_stack_pages(struct lguest *lg) * * In Linux each process has its own kernel stack, so this happens a lot: we * change stacks on each context switch. */ -void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages) +void guest_set_stack(struct lguest_vcpu *vcpu, u32 seg, u32 esp, +unsigned int pages) { /* You are not allowed have a stack segment with privilege level 0: bad * Guest! */ if ((seg 0x3) != GUEST_PL) - kill_guest(lg, bad stack segment %i, seg); + kill_guest(vcpu-lg, bad stack segment %i, seg); /* We only expect one or two
[PATCH 13/16] per-vcpu lguest task management
lguest uses tasks to control its running behaviour (like sending breaks, controlling halted state, etc). In a per-vcpu environment, each vcpu will have its own underlying task. So this patch makes the infrastructure for that possible Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/core.c |4 +- drivers/lguest/hypercalls.c |2 +- drivers/lguest/interrupts_and_traps.c |8 ++-- drivers/lguest/lg.h | 14 drivers/lguest/lguest_user.c | 56 ++-- 5 files changed, 45 insertions(+), 39 deletions(-) diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c index 4d0102d..285a465 100644 --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -197,7 +197,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) return -ERESTARTSYS; /* If Waker set break_out, return to Launcher. */ - if (lg-break_out) + if (vcpu-break_out) return -EAGAIN; /* Check if there are any interrupts which can be delivered @@ -217,7 +217,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) /* If the Guest asked to be stopped, we sleep. The Guest's * clock timer or LHCALL_BREAK from the Waker will wake us. */ - if (lg-halted) { + if (vcpu-halted) { set_current_state(TASK_INTERRUPTIBLE); schedule(); continue; diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index 4364bc2..41ea2e2 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -86,7 +86,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) break; case LHCALL_HALT: /* Similarly, this sets the halted flag for run_guest(). */ - lg-halted = 1; + vcpu-halted = 1; break; case LHCALL_NOTIFY: lg-pending_notify = args-arg1; diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index b3d444a..10c9aea 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -163,11 +163,11 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu) return; /* If they're halted, interrupts restart them. */ - if (lg-halted) { + if (vcpu-halted) { /* Re-enable interrupts. */ if (put_user(X86_EFLAGS_IF, lg-lguest_data-irq_enabled)) kill_guest(lg, Re-enabling interrupts); - lg-halted = 0; + vcpu-halted = 0; } else { /* Otherwise we check if they have interrupts disabled. */ u32 irq_enabled; @@ -500,8 +500,8 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer *timer) /* Remember the first interrupt is the timer interrupt. */ set_bit(0, vcpu-irqs_pending); /* If the Guest is actually stopped, we need to wake it up. */ - if (vcpu-lg-halted) - wake_up_process(vcpu-lg-tsk); + if (vcpu-halted) + wake_up_process(vcpu-tsk); return HRTIMER_NORESTART; } diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index f9429ff..b23694e 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -43,6 +43,8 @@ struct lguest; struct lguest_vcpu { int vcpu_id; struct lguest *lg; + struct task_struct *tsk; + struct mm_struct *mm; /* == tsk-mm, but that becomes NULL on exit */ /* At end of a page shared mapped over lguest_pages in guest. */ unsigned long regs_page; @@ -55,6 +57,11 @@ struct lguest_vcpu { /* Virtual clock device */ struct hrtimer hrt; + /* Do we need to stop what we're doing and return to userspace? */ + int break_out; + wait_queue_head_t break_wq; + int halted; + /* Pending virtual interrupts */ DECLARE_BITMAP(irqs_pending, LGUEST_IRQS); @@ -65,8 +72,6 @@ struct lguest_vcpu { struct lguest { struct lguest_data __user *lguest_data; - struct task_struct *tsk; - struct mm_struct *mm; /* == tsk-mm, but that becomes NULL on exit */ struct lguest_vcpu vcpus[NR_CPUS]; unsigned int nr_vcpus; @@ -76,15 +81,10 @@ struct lguest void __user *mem_base; unsigned long kernel_address; u32 cr2; - int halted; int ts; u32 esp1; u8 ss1; - /* Do we need to stop what we're doing and return to userspace? */ - int break_out; - wait_queue_head_t break_wq; - /* Bitmap of what has changed: see CHANGED_* above. */ int changed; struct lguest_pages *last_pages; diff --git a/drivers/lguest/lguest_user.c
[PATCH 15/16] make pending notifications per-vcpu
this patch makes the pending_notify field, used to control pending notifications, per-vcpu, instead of per-guest Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/core.c|6 +++--- drivers/lguest/hypercalls.c |6 +++--- drivers/lguest/lg.h |3 ++- drivers/lguest/lguest_user.c |4 ++-- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c index 285a465..d628515 100644 --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -186,10 +186,10 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user) /* It's possible the Guest did a NOTIFY hypercall to the * Launcher, in which case we return from the read() now. */ - if (lg-pending_notify) { - if (put_user(lg-pending_notify, user)) + if (vcpu-pending_notify) { + if (put_user(vcpu-pending_notify, user)) return -EFAULT; - return sizeof(lg-pending_notify); + return sizeof(vcpu-pending_notify); } /* Check for signals */ diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index c6b87ef..95e1062 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -89,7 +89,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) vcpu-halted = 1; break; case LHCALL_NOTIFY: - lg-pending_notify = args-arg1; + vcpu-pending_notify = args-arg1; break; default: /* It should be an architecture-specific hypercall. */ @@ -152,7 +152,7 @@ static void do_async_hcalls(struct lguest_vcpu *vcpu) /* Stop doing hypercalls if they want to notify the Launcher: * it needs to service this first. */ - if (lg-pending_notify) + if (vcpu-pending_notify) break; } } @@ -217,7 +217,7 @@ void do_hypercalls(struct lguest_vcpu *vcpu) /* If we stopped reading the hypercall ring because the Guest did a * NOTIFY to the Launcher, we want to return now. Otherwise we do * the hypercall. */ - if (!vcpu-lg-pending_notify) { + if (!vcpu-pending_notify) { do_hcall(vcpu, vcpu-hcall); /* Tricky point: we reset the hcall pointer to mark the * hypercall as done. We use the hcall pointer rather than diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index dbf70c6..6faf90d 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -51,6 +51,8 @@ struct lguest_vcpu { u32 esp1; u8 ss1; + unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */ + /* At end of a page shared mapped over lguest_pages in guest. */ unsigned long regs_page; struct lguest_regs *regs; @@ -95,7 +97,6 @@ struct lguest struct pgdir pgdirs[4]; unsigned long noirq_start, noirq_end; - unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */ unsigned int stack_pages; u32 tsc_khz; diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c index d081db4..349d69d 100644 --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -88,8 +88,8 @@ static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o) /* If we returned from read() last time because the Guest notified, * clear the flag. */ - if (lg-pending_notify) - lg-pending_notify = 0; + if (vcpu-pending_notify) + vcpu-pending_notify = 0; /* Run the Guest until something interesting happens. */ return run_guest(vcpu, (unsigned long __user *)user); -- 1.5.0.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 16/16] per-vcpu lguest pgdir management
this patch makes the pgdir management per-vcpu. The pgdirs pool is still guest-wide (although it'll probably need to grow when we are really executing more vcpus), but the pgdidx index is gone, since it makes no sense anymore. Instead, we use a per-vcpu index. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] --- drivers/lguest/hypercalls.c |2 +- drivers/lguest/interrupts_and_traps.c |6 ++-- drivers/lguest/lg.h | 12 +++--- drivers/lguest/page_tables.c | 60 + drivers/lguest/x86/core.c |6 ++-- 5 files changed, 44 insertions(+), 42 deletions(-) diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c index 95e1062..f379475 100644 --- a/drivers/lguest/hypercalls.c +++ b/drivers/lguest/hypercalls.c @@ -60,7 +60,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args) if (args-arg1) guest_pagetable_clear_all(vcpu); else - guest_pagetable_flush_user(lg); + guest_pagetable_flush_user(vcpu); break; /* All these calls simply pass the arguments through to the right diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c index 78f6210..a0ac77e 100644 --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -77,7 +77,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, virtstack = vcpu-esp1; ss = vcpu-ss1; - origstack = gstack = guest_pa(lg, virtstack); + origstack = gstack = guest_pa(vcpu, virtstack); /* We push the old stack segment and pointer onto the new * stack: when the Guest does an iret back from the interrupt * handler the CPU will notice they're dropping privilege @@ -89,7 +89,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi, virtstack = vcpu-regs-esp; ss = vcpu-regs-ss; - origstack = gstack = guest_pa(lg, virtstack); + origstack = gstack = guest_pa(vcpu, virtstack); } /* Remember that we never let the Guest actually disable interrupts, so @@ -325,7 +325,7 @@ void pin_stack_pages(struct lguest_vcpu *vcpu) * start of the page after the kernel stack. Subtract one to * get back onto the first stack page, and keep subtracting to * get to the rest of the stack pages. */ - pin_page(lg, vcpu-esp1 - 1 - i * PAGE_SIZE); + pin_page(vcpu, vcpu-esp1 - 1 - i * PAGE_SIZE); } /* Direct traps also mean that we need to know whenever the Guest wants to use diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h index 6faf90d..e700408 100644 --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -57,6 +57,8 @@ struct lguest_vcpu { unsigned long regs_page; struct lguest_regs *regs; + int vcpu_pgd; /* which pgd this vcpu is currently using */ + /* If a hypercall was asked for, this points to the arguments. */ struct hcall_args *hcall; u32 next_hcall; @@ -92,8 +94,6 @@ struct lguest int changed; struct lguest_pages *last_pages; - /* We keep a small number of these. */ - u32 pgdidx; struct pgdir pgdirs[4]; unsigned long noirq_start, noirq_end; @@ -175,14 +175,14 @@ void free_guest_pagetable(struct lguest *lg); void guest_new_pagetable(struct lguest_vcpu *vcpu, unsigned long pgtable); void guest_set_pmd(struct lguest *lg, unsigned long gpgdir, u32 i); void guest_pagetable_clear_all(struct lguest_vcpu *vcpu); -void guest_pagetable_flush_user(struct lguest *lg); +void guest_pagetable_flush_user(struct lguest_vcpu *vcpu); void guest_set_pte(struct lguest *lg, unsigned long gpgdir, unsigned long vaddr, pte_t val); void map_switcher_in_guest(struct lguest_vcpu *vcpu, struct lguest_pages *pages); -int demand_page(struct lguest *info, unsigned long cr2, int errcode); -void pin_page(struct lguest *lg, unsigned long vaddr); -unsigned long guest_pa(struct lguest *lg, unsigned long vaddr); +int demand_page(struct lguest_vcpu *vcpu, unsigned long cr2, int errcode); +void pin_page(struct lguest_vcpu *vcpu, unsigned long vaddr); +unsigned long guest_pa(struct lguest_vcpu *vcpu, unsigned long vaddr); void page_table_guest_data_init(struct lguest *lg); /* arch/core.c: */ diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c index f0f271d..84c22d7 100644 --- a/drivers/lguest/page_tables.c +++ b/drivers/lguest/page_tables.c @@ -94,10 +94,10 @@ static pte_t *spte_addr(struct lguest *lg, pgd_t spgd, unsigned long vaddr) /* These two functions just like the above two, except they access the Guest * page tables.
Re: OOPS: 2.6.24-rc5-mm1 -- EIP is at r_show+0x2a/0x70 -- (triggered by cat /proc/iomem AFTER suspend-to-disk/resume)
On further investigation, cat /proc/iomem does not trigger the stack trace until after a suspend-to-disk/resume cycle has occurred. I am removing Ingo and Russell from the TO list (as they are apparently the wrong people) and adding the suspend folks, as suspend is implicated. My .config file can be found here: http://marc.info/?l=linux-kernelm=119812903001296w=2 Miles Lane wrote: .config attached in order to not trip spam filters. Miles Lane wrote: [ 252.868386] BUG: unable to handle kernel NULL pointer dereference at virtual address 0018 [ 252.868393] printing ip: c012d527 *pde = [ 252.868399] Oops: [#1] SMP [ 252.868403] last sysfs file: /sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda3/stat [ 252.868407] Modules linked in: aes_i586 aes_generic i915 drm rfcomm l2cap bluetooth acpi_cpufreq cpufreq_stats cpufreq_conservative sbs sbshc dm_crypt sbp2 parport_pc lp parport arc4 ecb crypto_blkcipher cryptomgr crypto_algapi snd_hda_intel snd_pcm_oss snd_mixer_oss pcmcia snd_pcm iTCO_wdt iTCO_vendor_support snd_seq_dummy watchdog_core watchdog_dev snd_seq_oss snd_seq_midi tifm_7xx1 snd_rawmidi iwl3945 snd_seq_midi_event rng_core tifm_core mac80211 snd_seq snd_timer snd_seq_device cfg80211 sky2 battery yenta_socket rsrc_nonstatic pcmcia_core ac snd soundcore snd_page_alloc button shpchp pci_hotplug sr_mod cdrom pata_acpi piix ide_core firewire_ohci firewire_core crc_itu_t thermal processor fan [ 252.868469] [ 252.868472] Pid: 7088, comm: head Not tainted (2.6.24-rc5-mm1 #9) [ 252.868476] EIP: 0060:[c012d527] EFLAGS: 00010297 CPU: 0 [ 252.868481] EIP is at r_show+0x2a/0x70 [ 252.868483] EAX: EBX: 0001 ECX: c07e3224 EDX: c04bb034 [ 252.868486] ESI: 0008 EDI: ed1f52c0 EBP: f5320f10 ESP: f5320f04 [ 252.868489] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 252.868493] Process head (pid: 7088, ti=f532 task=f532e000 task.ti=f532) [ 252.868495] Stack: c03a6cac ed1f52c0 c07e3224 f5320f50 c0199a7e 2000 bf930807 e1007800 [ 252.868504]ed1f52e0 01d3 000e 000d [ 252.868512]fffb f7d39370 c01998e4 f5320f74 c01af4f5 f5320f9c 2000 bf930807 [ 252.868521] Call Trace: [ 252.868523] [c0107d55] show_trace_log_lvl+0x12/0x25 [ 252.868529] [c0107df2] show_stack_log_lvl+0x8a/0x95 [ 252.868534] [c0107e89] show_registers+0x8c/0x154 [ 252.868538] [c010805f] die+0x10e/0x1d2 [ 252.868542] [c039c8c9] do_page_fault+0x52b/0x600 [ 252.868547] [c039af9a] error_code+0x72/0x78 [ 252.868552] [c0199a7e] seq_read+0x19a/0x26c [ 252.868557] [c01af4f5] proc_reg_read+0x60/0x74 [ 252.868562] [c018390d] vfs_read+0xa2/0x11e [ 252.868567] [c0183d02] sys_read+0x3b/0x60 [ 252.868571] [c0106bae] sysenter_past_esp+0x6b/0xc1 [ 252.868575] === [ 252.868577] Code: c3 55 89 d1 89 e5 57 89 c7 56 53 8b 50 64 83 7a 0c 00 77 0e 81 7a 08 ff ff 00 00 be 04 00 00 00 76 05 be 08 00 00 00 89 c8 31 db 8b 40 18 39 d0 74 06 43 83 fb 05 75 f3 8b 41 10 ba 2f 1b 45 c0 [ 252.868623] EIP: [c012d527] r_show+0x2a/0x70 SS:ESP 0068:f5320f04 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)
hello Juergen, On 12/20/07, Juergen Beisert [EMAIL PROTECTED] wrote: On Thursday 20 December 2007 13:45, Jaswinder Singh wrote: So I am curious, if possible, user can switch softirq-threads or IRQs RT tasks to non-RT tasks for slow hardware or least important hardware for NON-RT tasks. So this will improve RT behaviour. ^^ Why? IMHO: Simply decrease the RT priority of less important IRQs or increase all other more important IRQs. IRQs are always more important than other processes in a system, also in non RT systems. In RT Kernel we have :- RT Tasks and Non-RT tasks. So we can also support :- RT softirq-threads/IRQs and non-RT softirq-threads/IRQs So RT softirq-threads/IRQs for RT Tasks and non-RT softirq-threads/IRQs for non-RT Tasks. OR also remove Non-RT tasks from RT Kernel and simply decrease the RT priority of less important task as per your suggestions. Thank you, Jaswinder Singh. Juergen -- Dipl.-Ing. Juergen Beisert | http://www.pengutronix.de Pengutronix - Linux Solutions for Science and Industry Handelsregister: Amtsgericht Hildesheim, HRA 2686 Vertretung Sued/Muenchen, Germany Phone: +49-8766-939 228 | Fax: +49-5121-206917-9 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/5] x86, ptrace: overflow signal API
Establish the user API for sending a user-defined signal to the traced task on a BTS buffer overflow. This should complete the user API for the BTS ptrace extension. The patches so far implement wrap-around overflow handling as is needed for debugging. The remaining open is another overflow handling mechanism that sends a signal to the traced task on a buffer overflow. This will take some more time from my side. Since, from a user perspective, this occurs behind the scenes, the patch set should already be useful. More features may/will be added on top of it (overflow signal, pageable back-up buffers, kernel tracing, core file support, profiling, ...). Signed-off-by: Markus Metzger [EMAIL PROTECTED] --- Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h === --- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:09.%N +0100 +++ linux-2.6-x86/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:14.%N +0100 @@ -88,11 +88,13 @@ unsigned int size; /* bitmask of below flags */ unsigned int flags; + /* buffer overflow signal */ + unsigned int signal; }; #define PTRACE_BTS_O_TRACE 0x1 /* branch trace */ #define PTRACE_BTS_O_SCHED 0x2 /* scheduling events w/ jiffies */ -#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIG? on buffer overflow +#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIGsignal on buffer overflow instead of wrapping around */ #define PTRACE_BTS_O_CUT_SIZE 0x8 /* cut requested size to max available instead of failing */ - Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] msi: set 'En' bit of MSI Mapping Capability
The quirk is for our Intel platform, we don't want HT MSI mapping enabled in any of our devices. BRs Peer Chen -Original Message- From: Eric W. Biederman [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 19, 2007 5:59 AM To: peerchen Cc: linux-kernel; akpm; Andy Currid; Peer Chen Subject: Re: [PATCH] msi: set 'En' bit of MSI Mapping Capability peerchen [EMAIL PROTECTED] writes: According to the HyperTransport spec, 'En' indicate if the MSI Mapping is active. Set the 'En' bit when setup pci and add the quirk for some nvidia devices. The patch base on kernel 2.6.24-rc5 Ok. This is starting to look good. Signed-off-by: Andy Currid [EMAIL PROTECTED] Signed-off-by: Peer Chen [EMAIL PROTECTED] --- diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff linux-2.6.24-rc5-vanilla/drivers/pci/probe.c linux-2.6.24-rc5/drivers/pci/probe.c --- linux-2.6.24-rc5-vanilla/drivers/pci/probe.c 2007-12-18 14:35:46.0 -0500 +++ linux-2.6.24-rc5/drivers/pci/probe.c 2007-12-18 16:28:29.0 -0500 @@ -721,6 +721,9 @@ static int pci_setup_device(struct pci_d /* Unknown power state */ dev-current_state = PCI_UNKNOWN; + + /* Enable HT MSI mapping */ + ht_enable_msi_mapping(dev); /* Early fixups, before probing the BARs */ pci_fixup_device(pci_fixup_early, dev); diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff linux-2.6.24-rc5-vanilla/drivers/pci/quirks.c linux-2.6.24-rc5/drivers/pci/quirks.c --- linux-2.6.24-rc5-vanilla/drivers/pci/quirks.c 2007-12-18 14:35:46.0 -0500 +++ linux-2.6.24-rc5/drivers/pci/quirks.c 2007-12-18 16:28:41.0 -0500 @@ -1705,6 +1705,45 @@ static void __devinit quirk_nvidia_ck804 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_DEVICE_ID_NVIDIA_CK804_PCIE, quirk_nvidia_ck804_msi_ht_cap); +static void __devinit quirk_msi_ht_cap_disable(struct pci_dev *dev) { + struct pci_dev *host_bridge; + int pos, ttl = 48; + + /* HT MSI mapping should be disabled on devices that are below + * a non-Hypertransport host bridge. Locate the host bridge... + */ + + if ((host_bridge = pci_get_bus_and_slot(0, PCI_DEVFN(0,0))) == NULL) { + printk(KERN_WARNING + PCI: quirk_msi_ht_cap_disable didn't locate host bridge\n); + return; + } + + if ((pos = pci_find_ht_capability(host_bridge, HT_CAPTYPE_SLAVE)) != 0) { + /* Host bridge is to HT */ + return; + } + + /* Host bridge is not to HT, disable HT MSI mapping on this device */ + + pos = pci_find_ht_capability(dev, HT_CAPTYPE_MSI_MAPPING); + while (pos ttl--) { + u8 flags; + + if (pci_read_config_byte(dev, pos + HT_MSI_FLAGS, flags) == 0) { + printk(KERN_INFO PCI: Quirk disabling HT MSI mapping on %s\n, +pci_name(dev)); + + pci_write_config_byte(dev, pos + HT_MSI_FLAGS, + flags ~HT_MSI_FLAGS_ENABLE); + } + pos = pci_find_next_ht_capability(dev, pos, + HT_CAPTYPE_MSI_MAPPING); + } +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, + quirk_msi_ht_cap_disable); Could you explain the need for this quirk? My expectation would be that if we turned an MSI interrupt into a hypertransport interrupt then if we hit a non-hyptransport bridge upstream it would just turn the hypertransport interrupts into whatever makes sense for the upstream bridge. I can see it being excess work and adding some latency but I don't see it being a correctness problem. Or are my expectations off and the NVIDIA chipsets do not handle hypertransport interrupts the way I would expect. Dropping them instead of converting when going to a non-hypertransport host bridge. static void __devinit quirk_msi_intx_disable_bug(struct pci_dev *dev) { dev-dev_flags |= PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG; diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff linux-2.6.24-rc5-vanilla/include/asm-generic/pci.h linux-2.6.24-rc5/include/asm-generic/pci.h --- linux-2.6.24-rc5-vanilla/include/asm-generic/pci.h 2007-12-18 14:35:52.0 -0500 +++ linux-2.6.24-rc5/include/asm-generic/pci.h 2007-12-18 16:29:12.0 -0500 @@ -45,6 +45,10 @@ pcibios_select_root(struct pci_dev *pdev #define pcibios_scan_all_fns(a, b) 0 +#ifndef HAVE_ARCH_HT_ENABLE_MSI_MAPPING +#define ht_enable_msi_mapping(a) 0 +#endif /* HAVE_ARCH_HT_ENABLE_MSI_MAPPING */ + #ifndef HAVE_ARCH_PCI_GET_LEGACY_IDE_IRQ static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel) { diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff linux-2.6.24-rc5-vanilla/include/asm-x86/pci.h linux-2.6.24-rc5/include/asm-x86/pci.h --- linux-2.6.24-rc5-vanilla/include/asm-x86/pci.h 2007-12-18 14:35:51.0 -0500 +++ linux-2.6.24-rc5/include/asm-x86/pci.h 2007-12-18
[patch 5/5] x86, ptrace, man: man pages for ptrace BTS extensions
Document changes for this patch set. Signed-off-by: Markus Metzger [EMAIL PROTECTED] --- Index: man/man2/ptrace.2 === --- man.orig/man2/ptrace.2 2007-12-14 17:45:33.%N +0100 +++ man/man2/ptrace.2 2007-12-20 13:20:07.%N +0100 @@ -40,6 +40,9 @@ .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.) .\ +.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED] +.\ Added PTRACE_BTS_* commands +.\ .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual .SH NAME ptrace \- process trace @@ -378,6 +381,131 @@ detached in this way regardless of which method was used to initiate tracing. (\fIaddr\fP is ignored.) +.LP +The following ptrace commands provide access to the hardware's last +branch recording. They may not be available on all architectures. +.LP +Last branch recording stores an execution trace of the traced +process. For every (conditional) control flow change, the source and +destination address are stored. On some architectures, control flow +changes inside the kernel are recorded, as well. On later +architectures, these are automatically filtered out. +.LP +The buffer (called Branch Trace Store) can be configured to be either +circular, or to send a signal to the traced task when it is about to +overflow. Not all methods may be available on all architectures. +.LP +The buffer can be accessed in two ways matching the above +configurations: either as an array of BTS records from newest +record to older records, one record at a time; or all records at once, +from oldest to newest. +.LP +The former is mostly used for circular buffers to capture a tail of +the execution trace (e.g. for debugging); the latter is mostly used to +collect a continuous trace (e.g. for profiling) where the user drains +the hardware buffer into a larger private buffer or into a file. +.LP +In addition to branches, timestamps (in jiffies) may optionally be +recorded when the traced process arrives and departs, +respectively. This information can be used to obtain a qualitative +execution order, if more than one process is traced. +.LP +A BTS record is defined as: +.LP +.nf +enum ptrace_bts_qualifier { + PTRACE_BTS_INVALID = 0, + PTRACE_BTS_BRANCH, + PTRACE_BTS_TASK_ARRIVES, + PTRACE_BTS_TASK_DEPARTS +}; +.sp +struct ptrace_bts_record { + u64 qualifier; + union { + /* PTRACE_BTS_BRANCH */ + struct { + u64 from_ip; + u64 to_ip; + } lbr; + /* PTRACE_BTS_TASK_ARRIVES or + PTRACE_BTS_TASK_DEPARTS */ + u64 timestamp; + } variant; +}; +.fi +.LP +For configuring last branch recording and for querying its status, the +following struct is used: +.LP +.nf +struct ptrace_bts_config { + unsigned int size; + unsigned int flags; + unsigned int signal; +}; +.fi +.LP +\fISize\fP is either the requested or the actual size of the kernel +BTS buffer in bytes. +\fIFlags\fP is a bitmask of options, which are specified by the +following flags: +.RS +.TP +.BR PTRACE_BTS_O_TRACE +Collect branch trace records +.TP +.BR PTRACE_BTS_O_SCHED +Collect scheduling timing information +.TP +.BR PTRACE_BTS_O_SIGNAL +Send \fIsignal\fP to the traced task in case of a buffer overflow +.TP +.BR PTRACE_BTS_O_CUT_SIZE +Reduce the requested buffer size if it is bigger than the available +buffer size. +.RE +\fISignal\fP is the signal to send to the traced task in case of a +buffer overflow. +.TP +.BR PTRACE_BTS_CONFIG +Configure last branch recording. \fIaddr\fP points to a +\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies +the size of that structure. +Returns the number of bytes read. +.TP +.BR PTRACE_BTS_STATUS +Writes the actual configuration into a \fIptrace_bts_config\fP +structure pointed to by \fIaddr\fP. The caller is responsible for +allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP +structure. \fIData\fP specifies the size of that structure. +Returns the number of bytes written. +.TP +.BR PTRACE_BTS_SIZE +Returns the number of BTS records available for draining. For a +circular buffer, this number is meaningless. +(\fIaddr\fP and \fIdata\fP are ignored.) +.TP +.BR PTRACE_BTS_GET +Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The +caller is responsible for allocating memory at \fIaddr\fP to hold one +\fIptrace_bts_record\fP structure. +The bigger the index, the older the record; the latest record can +always be found at index 0. +Returns the number of bytes written. +.TP +.BR PTRACE_BTS_CLEAR +Clears the BTS buffer. This command can be used after a manual +draining using PTRACE_BTS_GET commands. +(\fIaddr\fP and \fIdata\fP are ignored.) +.TP +.BR PTRACE_BTS_DRAIN +Reads all available BTS records into the buffer pointed to by +\fIaddr\fP and clears the buffer.
Re: 2.6.22.14 oops msg with commvault galaxy ?
Le vendredi 14 décembre 2007 à 09:28 -0800, Greg KH a écrit : On Fri, Dec 14, 2007 at 10:37:39PM +0530, Dhaval Giani wrote: On Fri, Dec 14, 2007 at 08:26:42AM -0800, Greg KH wrote: On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote: * Kay Sievers [EMAIL PROTECTED] wrote: This one also fails to apply properly at the exact same place has Ingo's previously posted patch. Would need to backport his one. It depends on a completely reworked sysfs logic, I don't think it makes any sense to backport that. well, if it fixes a live bug in a still supported stable kernel release... Vincent, could you try to just get rid of all actual uses of se-attr.owner, within fs/sysfs/*.c? Something like the patch below. (totally untested - might be fatally broken as well) How can you think that this is not needed? You can not remove it with sysfs you are patching. Hope this explains it: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced yeah - as i said it might be fatally broken (in fact it is). Do we understand why Vincent got the crashes with vanilla 2.6.22.14 ? No, and I can't seem to duplicate them here at all. Does anyone have a test case for this that I can work on trying to duplicate? If you apply CFS without my fix, and try to constantly check cpu_shares for a user who is logging and logging out, you should hit it. (That's what I was doing). Hm, how about a vanilla 2.6.22.14 kernel _without_ any patches. That's what I am most worried about :) Since I was getting the problem with both vanilla CFS patched kernels and that, sadly, I don't have the time to do git bisect at the moment I decided to go ahead and prepare a full migration to 2.6.23 (I was hoping to skip directly to 2.6.24 but...). I can confirm at the moment that 2.6.23 works properly with Galaxy (just has 2.6.20 2.6.21 used to...). Thnx very much everyone for the help but sadly this bug will have to remain unresolved. thanks, - vin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Move page_assign_page_cgroup to VM_BUG_ON in free_hot_cold_page
On Thu, 2007-12-20 at 13:14 +, Hugh Dickins wrote: On Wed, 19 Dec 2007, Dave Hansen wrote: --- linux-2.6.24-rc5/mm/page_alloc.c~memory-controller-move-to-bug-on-in-free_hot_cold_page 2007-12-19 11:31:46.0 +0530 +++ linux-2.6.24-rc5-balbir/mm/page_alloc.c 2007-12-19 11:33:45.0 +0530 @@ -995,7 +995,7 @@ static void fastcall free_hot_cold_page( if (!PageHighMem(page)) debug_check_no_locks_freed(page_address(page), PAGE_SIZE); - page_assign_page_cgroup(page, NULL); + VM_BUG_ON(page_get_page_cgroup(page)); arch_free_page(page, 0); kernel_map_pages(page, 1, 0); Hi Balbir, You generally want to do these like: foo = page_assign_page_cgroup(page, NULL); VM_BUG_ON(foo); Some embedded people have been known to optimize kernel size like this: #define VM_BUG_ON(x) do{}while(0) Balbir's patch looks fine to me: I don't get your point there, Dave. There was a lengthy discussion here: http://lkml.org/lkml/2007/12/14/131 on the merit of debug statements with side effects. But looking at our definition: #ifdef CONFIG_DEBUG_VM #define VM_BUG_ON(cond) BUG_ON(cond) #else #define VM_BUG_ON(condition) do { } while(0) #endif disabling CONFIG_DEBUG_VM breaks the code as proposed by Balbir in that it will no longer acquire the reference. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] sg_ring for scsi
On Thu, Dec 20 2007 at 9:58 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Thu, Dec 20 2007, Rusty Russell wrote: On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote: On Thu, 20 Dec 2007 16:45:18 +1100 Rusty Russell [EMAIL PROTECTED] wrote: OK, some fixes since last time, as I wade through more SCSI drivers. Some drivers use use_sg as a flag to know whether the request_buffer is a scatterlist: I don't need the counter, but I still need the flag, so I fixed that in a more intuitive way (an explicit -sg pointer in the cmd). use_sg and the request_buffer will be removed shortly. http://marc.info/?l=linux-scsim=119754650614813w=2 Thanks! Is there a git tree somewhere with these changes? I think that we tried the similar idea before, scsi_sgtable, but we seem to settle in the current simple approach. Yes, a scsi-specific solution is a bad idea: other people use sg. Manipulating the magic chains is horrible; it looks simple to the places which simply want to iterate through it, but it's awful for code which wants to create them. The current code looks like that to minimize impact on 2.6.24, see this branch: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=sg for how it folds into lib/sg.c and the magic disappears from SCSI. Rusty, nobody claimed the sg code in 2.6.24 is perfect. I like to get things incrementally there. Dear Jens. Is this code scheduled for 2.6.25? I cannot find it in mm tree. Because above code conflicts with the scsi_data_buffer patch, that is in mm tree and I hope will get accepted into 2.6.25. Now the concepts are not at all conflicting, only that they both bang in the same places. (And by the way it does not apply on scsi-misc either. And it did not compile in your tree, a missing bit in ide-scsi.c) I have rebase and fixed your code on top of scsi_data_buffer patchset. Please review. (Patchset posted as reply to this mail) They are totaly untested, based on -mm tree. We should decide the order of these patches and rebase accordingly. AND ... Please send, to-be-included-in-next-kernel patches to Morton. This way we can account for them. Also I do not see Ack-by: of the scsi maintainer in the scsi bits of your patches. Is it not a costume to Ack-on bits that belong to other maintainers, even for maintainers? Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers
Manually doing chained sg lists is not trivial, so add some helpers to make sure that drivers get it right. Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- include/linux/scatterlist.h | 125 --- lib/Makefile|2 +- lib/scatterlist.c | 281 +++ 3 files changed, 307 insertions(+), 101 deletions(-) create mode 100644 lib/scatterlist.c diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 416e000..c3ca848 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -7,6 +7,12 @@ #include linux/string.h #include asm/io.h +struct sg_table { + struct scatterlist *sgl;/* the list */ + unsigned int nents; /* number of mapped entries */ + unsigned int orig_nents;/* original size of list */ +}; + /* * Notes on SG table design. * @@ -106,31 +112,6 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf, sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); } -/** - * sg_next - return the next scatterlist entry in a list - * @sg:The current sg entry - * - * Description: - * Usually the next entry will be @sg@ + 1, but if this sg element is part - * of a chained scatterlist, it could jump to the start of a new - * scatterlist array. - * - **/ -static inline struct scatterlist *sg_next(struct scatterlist *sg) -{ -#ifdef CONFIG_DEBUG_SG - BUG_ON(sg-sg_magic != SG_MAGIC); -#endif - if (sg_is_last(sg)) - return NULL; - - sg++; - if (unlikely(sg_is_chain(sg))) - sg = sg_chain_ptr(sg); - - return sg; -} - /* * Loop over each sg element, following the pointer to a new list if necessary */ @@ -138,40 +119,6 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg) for (__i = 0, sg = (sglist); __i (nr); __i++, sg = sg_next(sg)) /** - * sg_last - return the last scatterlist entry in a list - * @sgl: First entry in the scatterlist - * @nents: Number of entries in the scatterlist - * - * Description: - * Should only be used casually, it (currently) scan the entire list - * to get the last entry. - * - * Note that the @sgl@ pointer passed in need not be the first one, - * the important bit is that @nents@ denotes the number of entries that - * exist from @[EMAIL PROTECTED] - * - **/ -static inline struct scatterlist *sg_last(struct scatterlist *sgl, - unsigned int nents) -{ -#ifndef ARCH_HAS_SG_CHAIN - struct scatterlist *ret = sgl[nents - 1]; -#else - struct scatterlist *sg, *ret = NULL; - unsigned int i; - - for_each_sg(sgl, sg, nents, i) - ret = sg; - -#endif -#ifdef CONFIG_DEBUG_SG - BUG_ON(sgl[0].sg_magic != SG_MAGIC); - BUG_ON(!sg_is_last(ret)); -#endif - return ret; -} - -/** * sg_chain - Chain two sglists together * @prv: First scatterlist * @prv_nents: Number of entries in prv @@ -223,47 +170,6 @@ static inline void sg_mark_end(struct scatterlist *sg) } /** - * sg_init_table - Initialize SG table - * @sgl: The SG table - * @nents:Number of entries in table - * - * Notes: - * If this is part of a chained sg table, sg_mark_end() should be - * used only on the last table part. - * - **/ -static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents) -{ - memset(sgl, 0, sizeof(*sgl) * nents); -#ifdef CONFIG_DEBUG_SG - { - unsigned int i; - for (i = 0; i nents; i++) - sgl[i].sg_magic = SG_MAGIC; - } -#endif - sg_mark_end(sgl[nents - 1]); -} - -/** - * sg_init_one - Initialize a single entry sg list - * @sg: SG entry - * @buf:Virtual address for IO - * @buflen: IO length - * - * Notes: - * This should not be used on a single entry that is part of a larger - * table. Use sg_init_table() for that. - * - **/ -static inline void sg_init_one(struct scatterlist *sg, const void *buf, - unsigned int buflen) -{ - sg_init_table(sg, 1); - sg_set_buf(sg, buf, buflen); -} - -/** * sg_phys - Return physical address of an sg entry * @sg: SG entry * @@ -293,4 +199,23 @@ static inline void *sg_virt(struct scatterlist *sg) return page_address(sg_page(sg)) + sg-offset; } +struct scatterlist *sg_next(struct scatterlist *); +struct scatterlist *sg_last(struct scatterlist *s, unsigned int); +void sg_init_table(struct scatterlist *, unsigned int); +void sg_init_one(struct scatterlist *, const void *, unsigned int); + +typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t); +typedef void (sg_free_fn)(struct scatterlist *, unsigned int); + +void __sg_free_table(struct sg_table *, sg_free_fn *); +void sg_free_table(struct sg_table *); +int __sg_alloc_table(struct sg_table *,
[PATCH 2/3] SG: Convert SCSI to use scatterlist helpers for sg chaining
From: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- drivers/scsi/libsrp.c|2 +- drivers/scsi/scsi_error.c|4 +- drivers/scsi/scsi_lib.c | 150 +- drivers/usb/storage/isd200.c |4 +- include/scsi/scsi_cmnd.h |9 +-- 5 files changed, 24 insertions(+), 145 deletions(-) diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c index 8a8562a..b81350d 100644 --- a/drivers/scsi/libsrp.c +++ b/drivers/scsi/libsrp.c @@ -427,7 +427,7 @@ int srp_cmd_queue(struct Scsi_Host *shost, struct srp_cmd *cmd, void *info, sc-SCp.ptr = info; memcpy(sc-cmnd, cmd-cdb, MAX_COMMAND_SIZE); sc-sdb.length = len; - sc-sdb.sglist = (void *) (unsigned long) addr; + sc-sdb.sgt.sgl = (void *) (unsigned long) addr; sc-tag = tag; err = scsi_tgt_queue_command(sc, itn_id, (struct scsi_lun *)cmd-lun, cmd-tag); diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 5c8ba6a..1fd2a8c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -629,9 +629,9 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, sizeof(scmd-sense_buffer), sense_bytes); sg_init_one(ses-sense_sgl, scmd-sense_buffer, scmd-sdb.length); - scmd-sdb.sglist = ses-sense_sgl; + scmd-sdb.sgt.sgl = ses-sense_sgl; scmd-sc_data_direction = DMA_FROM_DEVICE; - scmd-sdb.sg_count = 1; + scmd-sdb.sgt.nents = 1; memset(scmd-cmnd, 0, sizeof(scmd-cmnd)); scmd-cmnd[0] = REQUEST_SENSE; scmd-cmnd[4] = scmd-sdb.length; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index a6aae56..c7107f1 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -750,136 +750,15 @@ static inline unsigned int scsi_sgtable_index(unsigned short nents) return index; } -static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, - unsigned short sg_count, gfp_t gfp_mask) +static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask) { - struct scsi_host_sg_pool *sgp; - struct scatterlist *sgl, *prev, *ret; - unsigned int index; - int this, left; - - left = sg_count; - ret = prev = NULL; - do { - this = left; - if (this SCSI_MAX_SG_SEGMENTS) { - this = SCSI_MAX_SG_SEGMENTS - 1; - index = SG_MEMPOOL_NR - 1; - } else - index = scsi_sgtable_index(this); - - left -= this; - - sgp = scsi_sg_pools + index; - - sgl = mempool_alloc(sgp-pool, gfp_mask); - if (unlikely(!sgl)) - goto enomem; - - sg_init_table(sgl, sgp-size); - - /* -* first loop through, set initial index and return value -*/ - if (!ret) - ret = sgl; - - /* -* chain previous sglist, if any. we know the previous -* sglist must be the biggest one, or we would not have -* ended up doing another loop. -*/ - if (prev) - sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl); - - /* -* if we have nothing left, mark the last segment as -* end-of-list -*/ - if (!left) - sg_mark_end(sgl[this - 1]); - - /* -* don't allow subsequent mempool allocs to sleep, it would -* violate the mempool principle. -*/ - gfp_mask = ~__GFP_WAIT; - gfp_mask |= __GFP_HIGH; - prev = sgl; - } while (left); - - /* -* -use_sg may get modified after dma mapping has potentially -* shrunk the number of segments, so keep a copy of it for free. -*/ - sdb-alloc_sg_count = sdb-sg_count = sg_count; - sdb-sglist = ret; - return 0; -enomem: - if (ret) { - /* -* Free entries chained off ret. Since we were trying to -* allocate another sglist, we know that all entries are of -* the max size. -*/ - sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1; - prev = ret; - ret = ret[SCSI_MAX_SG_SEGMENTS - 1]; - - while ((sgl = sg_chain_ptr(ret)) != NULL) { - ret = sgl[SCSI_MAX_SG_SEGMENTS - 1]; - mempool_free(sgl, sgp-pool); - } - - mempool_free(prev, sgp-pool); -
[PATCH 3/3] SG: Update ide/ to use sg_table
From: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Jens Axboe [EMAIL PROTECTED] --- drivers/ide/arm/icside.c |6 +++--- drivers/ide/cris/ide-cris.c |2 +- drivers/ide/ide-dma.c |8 drivers/ide/ide-io.c |2 +- drivers/ide/ide-probe.c |6 +- drivers/ide/ide-taskfile.c|2 +- drivers/ide/ide.c |2 +- drivers/ide/mips/au1xxx-ide.c |8 drivers/ide/pci/sgiioc4.c |4 ++-- drivers/ide/ppc/pmac.c|6 +++--- drivers/scsi/ide-scsi.c |2 +- include/linux/ide.h |2 +- 12 files changed, 23 insertions(+), 27 deletions(-) diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c index 93f71fc..a48a6bd 100644 --- a/drivers/ide/arm/icside.c +++ b/drivers/ide/arm/icside.c @@ -210,7 +210,7 @@ static void icside_build_sglist(ide_drive_t *drive, struct request *rq) { ide_hwif_t *hwif = drive-hwif; struct icside_state *state = hwif-hwif_data; - struct scatterlist *sg = hwif-sg_table; + struct scatterlist *sg = hwif-sg_table.sgl; ide_map_sg(drive, rq); @@ -319,7 +319,7 @@ static int icside_dma_end(ide_drive_t *drive) disable_dma(ECARD_DEV(state-dev)-dma); /* Teardown mappings after DMA has completed. */ - dma_unmap_sg(state-dev, hwif-sg_table, hwif-sg_nents, + dma_unmap_sg(state-dev, hwif-sg_table.sgl, hwif-sg_nents, hwif-sg_dma_direction); return get_dma_residue(ECARD_DEV(state-dev)-dma) != 0; @@ -373,7 +373,7 @@ static int icside_dma_setup(ide_drive_t *drive) * Tell the DMA engine about the SG table and * data direction. */ - set_dma_sg(ECARD_DEV(state-dev)-dma, hwif-sg_table, hwif-sg_nents); + set_dma_sg(ECARD_DEV(state-dev)-dma, hwif-sg_table.sgl, hwif-sg_nents); set_dma_mode(ECARD_DEV(state-dev)-dma, dma_mode); drive-waiting_for_dma = 1; diff --git a/drivers/ide/cris/ide-cris.c b/drivers/ide/cris/ide-cris.c index 476e0d6..3701aca 100644 --- a/drivers/ide/cris/ide-cris.c +++ b/drivers/ide/cris/ide-cris.c @@ -919,7 +919,7 @@ static int cris_ide_build_dmatable (ide_drive_t *drive) unsigned int count = 0; int i = 0; - sg = hwif-sg_table; + sg = hwif-sg_table.sgl; ata_tot_size = 0; diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c index 4703837..33a2f56 100644 --- a/drivers/ide/ide-dma.c +++ b/drivers/ide/ide-dma.c @@ -190,7 +190,7 @@ static int ide_dma_good_drive(ide_drive_t *drive) int ide_build_sglist(ide_drive_t *drive, struct request *rq) { ide_hwif_t *hwif = HWIF(drive); - struct scatterlist *sg = hwif-sg_table; + struct scatterlist *sg = hwif-sg_table.sgl; BUG_ON((rq-cmd_type == REQ_TYPE_ATA_TASKFILE) rq-nr_sectors 256); @@ -234,7 +234,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request *rq) if (!i) return 0; - sg = hwif-sg_table; + sg = hwif-sg_table.sgl; while (i) { u32 cur_addr; u32 cur_len; @@ -293,7 +293,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request *rq) printk(KERN_ERR %s: empty DMA table?\n, drive-name); use_pio_instead: pci_unmap_sg(hwif-pci_dev, -hwif-sg_table, +hwif-sg_table.sgl, hwif-sg_nents, hwif-sg_dma_direction); return 0; /* revert to PIO for this request */ @@ -315,7 +315,7 @@ EXPORT_SYMBOL_GPL(ide_build_dmatable); void ide_destroy_dmatable (ide_drive_t *drive) { struct pci_dev *dev = HWIF(drive)-pci_dev; - struct scatterlist *sg = HWIF(drive)-sg_table; + struct scatterlist *sg = HWIF(drive)-sg_table.sgl; int nents = HWIF(drive)-sg_nents; pci_unmap_sg(dev, sg, nents, HWIF(drive)-sg_dma_direction); diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c index bef781f..638e2db 100644 --- a/drivers/ide/ide-io.c +++ b/drivers/ide/ide-io.c @@ -819,7 +819,7 @@ static ide_startstop_t do_special (ide_drive_t *drive) void ide_map_sg(ide_drive_t *drive, struct request *rq) { ide_hwif_t *hwif = drive-hwif; - struct scatterlist *sg = hwif-sg_table; + struct scatterlist *sg = hwif-sg_table.sgl; if (hwif-sg_mapped)/* needed by ide-scsi */ return; diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c index 2994523..770f8cf 100644 --- a/drivers/ide/ide-probe.c +++ b/drivers/ide/ide-probe.c @@ -1312,15 +1312,11 @@ static int hwif_init(ide_hwif_t *hwif) if (!hwif-sg_max_nents) hwif-sg_max_nents = PRD_ENTRIES; - hwif-sg_table = kmalloc(sizeof(struct scatterlist)*hwif-sg_max_nents, -GFP_KERNEL); - if (!hwif-sg_table) { + if (sg_alloc_table(hwif-sg_table, hwif-sg_max_nents, GFP_KERNEL)) { printk(KERN_ERR %s: unable to allocate SG
Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.
Quoting Pavel Emelyanov ([EMAIL PROTECTED]): Oren Laadan wrote: Serge E. Hallyn wrote: Quoting Pavel Emelyanov ([EMAIL PROTECTED]): Oren Laadan wrote: Serge E. Hallyn wrote: Quoting Oren Laadan ([EMAIL PROTECTED]): I hate to bring this again, but what if the admin in the container mounts an external file system (eg. nfs, usb, loop mount from a file, or via fuse), and that file system already has a device that we would like to ban inside that container ? Miklos' user mount patches enforced that if !capable(CAP_MKNOD), then mnt-mnt_flags |= MNT_NODEV. So that's no problem. Yes, that works to disallow all device files from a mounted file system. But it's a black and white thing: either they are all banned or allowed; you can't have some devices allowed and others not, depending on type A scenario where this may be useful is, for instance, if we some apps in the container to execute withing a pre-made chroot (sub)tree within that container. But that's been pulled out of -mm! ? Crap. Since anyway we will have to keep a white- (or black-) list of devices that are permitted in a container, and that list may change even change per container -- why not enforce the access control at the VFS layer ? It's safer in the long run. By that you mean more along the lines of Pavel's patch than my whitelist LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that by 'vfs layer' :), or something different entirely? :) By 'vfs' I mean at open() time, and not at mount(), or mknod() time. Either yours or Pavel's; I tend to prefer not to use LSM as it may collide with future security modules. Oren, AFAIS you've seen my patches for device access controller, right? If you mean this one: http://openvz.org/pipermail/devel/2007-September/007647.html then ack :) Great! Thanks. Maybe we can revisit the issue then and try to come to agreement on what kind of model and implementation we all want? That would be great, Pavel. I do prefer your solution over my LSM, so if we can get an elegant block device control right in the vfs code that would be my preference. I concur. So it seems to me that we are all in favor of the model where open() of a device will consult a black/white-list. Also, we are all in favor of a non-LSM implementation, Pavel's code being a good example. Thank you, Oren and Serge! I will revisit this issue then, but I have a vacation the next week and, after this, we have a New Year and Christmas holidays in Russia. So I will be able to go on with it only after the 7th January :( Hope this is OK for you. Besides, Andrew told that he would pay little attention to new features till the 2.6.24 release, so I'm afraid we won't have this even in -mm in the nearest months :( Thanks, Pavel Cool, let me know any way I can help when you get started. thanks, -serge -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 00/47] AppArmor security module overview
-- This submission of the AppArmor security module is based against 2.6.24-rc4-mm. Any comments and feedback to improve implementation are appreciated. Changes since previous submission - added apparmor security goal document. Documentation/lsm/AppArmor-Security-Goal.txt - removed DAC style permissions in favor of a simpler file owner permissions specification - include the fgetattr and fsetattr patches by Miklos Szeredi [EMAIL PROTECTED], and update them to use ATTR_FILE to enable LSMs to distinguish file descriptor operations - fix error where a NULL sock passed to socket_getsocket_getpeersec_dgram() was not correctly handled. - fix error in link permission subset test Outstanding Issues - use of d_namespace_path and buffer allocation to obtain a pathname for mediation. - conditional passing of the vfsmnt. This can be addressed by rebasing on the lookup intent patches but that has not been done for this submission. - ipc and signal mediation are a wip and not included. - fine grained network mediation - system confinement from boot is a wip and not included. - documentation needs to be updated to include newest features The patch series consists of five areas: (1) Pass struct vfsmount through to LSM hooks. (2) Fixes and improvements to __d_path(): (a) make it unambiguous and exclude unreachable paths from /proc/mounts, (b) make its result consistent in the face of remounts, (c) introduce d_namespace_path(), a variant of d_path that goes up to the namespace root instead of the chroot. (d) the behavior of d_path() and getcwd() remain unchanged, and there is no hidding of unreachable paths in /proc/mounts. The patches addressing these have been seperated from the AppArmor submission and will be introduced at a later date. Part (a) has been in the -mm tree for a while; this series includes an updated copy of the -mm patch. Parts (b) and (c) shouldn't be too controversial. (3) Be able to distinguish file descriptor access from access by name in LSM hooks. Applications expect different behavior from file descriptor accesses and accesses by name in some cases. We need to pass this information down the LSM hooks to allow AppArmor to tell which is which. (4) Convert the selinux sysctl pathname computation code into a standalone function. (5) The AppArmor LSM itself. (See below.) A tarball of the kernel patches, base user-space utilities, example profiles, and technical documentation (including a walk-through) are available at: http://forgeftp.novell.com/apparmor/LKML_Submission-Dec-07/ Only the most recent features are covered in brief here for a more complete explaination please refere to the technical documentation. File ownership permissions The DAC style permissions mask allowing the specification of permission for each of user, group, and other have been removed, after further feed back and discussion, in favor of a simpler permission set that allows specifying permissions for file ownership as determined by fsuid. Traditional AppArmor rules map to specifying permissions for files all files, to reduce the permissions grant the owner keyword can be added to a rule. /foo rw, # allow access to file /foo owner /foo rw,# allow access to file /foo only if its uid == fsuid -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 01/47] Pass struct vfsmount to the inode_create LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |9 ++--- security/dummy.c |2 +- security/security.c |5 +++-- security/selinux/hooks.c |3 ++- 5 files changed, 13 insertions(+), 8 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1586,7 +1586,7 @@ int vfs_create(struct inode *dir, struct return -EACCES; /* shouldn't it be ENOSYS? */ mode = S_IALLUGO; mode |= S_IFREG; - error = security_inode_create(dir, dentry, mode); + error = security_inode_create(dir, dentry, nd ? nd-path.mnt : NULL, mode); if (error) return error; DQUOT_INIT(dir); --- a/include/linux/security.h +++ b/include/linux/security.h @@ -314,6 +314,7 @@ struct request_sock; * Check permission to create a regular file. * @dir contains inode structure of the parent of the new file. * @dentry contains the dentry structure for the file to be created. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @mode contains the file mode of the file to be created. * Return 0 if permission is granted. * @inode_link: @@ -1272,8 +1273,8 @@ struct security_operations { void (*inode_free_security) (struct inode *inode); int (*inode_init_security) (struct inode *inode, struct inode *dir, char **name, void **value, size_t *len); - int (*inode_create) (struct inode *dir, -struct dentry *dentry, int mode); + int (*inode_create) (struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode); int (*inode_link) (struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry); int (*inode_unlink) (struct inode *dir, struct dentry *dentry); @@ -1536,7 +1537,8 @@ int security_inode_alloc(struct inode *i void security_inode_free(struct inode *inode); int security_inode_init_security(struct inode *inode, struct inode *dir, char **name, void **value, size_t *len); -int security_inode_create(struct inode *dir, struct dentry *dentry, int mode); +int security_inode_create(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode); int security_inode_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry); int security_inode_unlink(struct inode *dir, struct dentry *dentry); @@ -1847,6 +1849,7 @@ static inline int security_inode_init_se static inline int security_inode_create (struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -290,7 +290,7 @@ static int dummy_inode_init_security (st } static int dummy_inode_create (struct inode *inode, struct dentry *dentry, - int mask) + struct vfsmount *mnt, int mask) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -348,11 +348,12 @@ int security_inode_init_security(struct } EXPORT_SYMBOL(security_inode_init_security); -int security_inode_create(struct inode *dir, struct dentry *dentry, int mode) +int security_inode_create(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode) { if (unlikely(IS_PRIVATE(dir))) return 0; - return security_ops-inode_create(dir, dentry, mode); + return security_ops-inode_create(dir, dentry, mnt, mode); } int security_inode_link(struct dentry *old_dentry, struct inode *dir, --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2411,7 +2411,8 @@ static int selinux_inode_init_security(s return 0; } -static int selinux_inode_create(struct inode *dir, struct dentry *dentry, int mask) +static int selinux_inode_create(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mask) { return may_create(dir, dentry, SECCLASS_FILE); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 02/47] Pass struct path down to remove_suid and children
Required by a later patch that adds a struct vfsmount parameter to notify_change(). Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ntfs/file.c |2 +- fs/reiser4/plugin/file/cryptcompress.c |2 +- fs/reiser4/plugin/file/file.c |2 +- fs/splice.c|4 ++-- fs/xfs/linux-2.6/xfs_lrw.c |2 +- include/linux/fs.h |4 ++-- mm/filemap.c | 16 mm/filemap_xip.c |2 +- 8 files changed, 17 insertions(+), 17 deletions(-) --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2118,7 +2118,7 @@ static ssize_t ntfs_file_aio_write_noloc goto out; if (!count) goto out; - err = remove_suid(file-f_path.dentry); + err = remove_suid(file-f_path); if (err) goto out; file_update_time(file); --- a/fs/reiser4/plugin/file/cryptcompress.c +++ b/fs/reiser4/plugin/file/cryptcompress.c @@ -2828,7 +2828,7 @@ ssize_t write_cryptcompress(struct file goto out; if (unlikely(count == 0)) goto out; - result = remove_suid(file-f_dentry); + result = remove_suid(file-f_path); if (unlikely(result != 0)) goto out; /* remove_suid might create a transaction */ --- a/fs/reiser4/plugin/file/file.c +++ b/fs/reiser4/plugin/file/file.c @@ -2124,7 +2124,7 @@ ssize_t write_unix_file(struct file *fil return result; } - result = remove_suid(file-f_dentry); + result = remove_suid(file-f_path); if (result) { mutex_unlock(inode-i_mutex); context_set_commit_async(ctx); --- a/fs/splice.c +++ b/fs/splice.c @@ -781,7 +781,7 @@ generic_file_splice_write_nolock(struct ssize_t ret; int err; - err = remove_suid(out-f_path.dentry); + err = remove_suid(out-f_path); if (unlikely(err)) return err; @@ -841,7 +841,7 @@ generic_file_splice_write(struct pipe_in if (killpriv) err = security_inode_killpriv(out-f_path.dentry); if (!err killsuid) - err = __remove_suid(out-f_path.dentry, killsuid); + err = __remove_suid(out-f_path, killsuid); mutex_unlock(inode-i_mutex); if (err) return err; --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -723,7 +723,7 @@ start: !capable(CAP_FSETID)) { error = xfs_write_clear_setuid(xip); if (likely(!error)) - error = -remove_suid(file-f_path.dentry); + error = -remove_suid(file-f_path); if (unlikely(error)) { goto out_unlock_internal; } --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1757,9 +1757,9 @@ extern void iget_failed(struct inode *); extern void clear_inode(struct inode *); extern void destroy_inode(struct inode *); extern struct inode *new_inode(struct super_block *); -extern int __remove_suid(struct dentry *, int); +extern int __remove_suid(struct path *, int); extern int should_remove_suid(struct dentry *); -extern int remove_suid(struct dentry *); +extern int remove_suid(struct path *); extern void __insert_inode_hash(struct inode *, unsigned long hashval); extern void remove_inode_hash(struct inode *); --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1638,26 +1638,26 @@ int should_remove_suid(struct dentry *de } EXPORT_SYMBOL(should_remove_suid); -int __remove_suid(struct dentry *dentry, int kill) +int __remove_suid(struct path *path, int kill) { struct iattr newattrs; newattrs.ia_valid = ATTR_FORCE | kill; - return notify_change(dentry, newattrs); + return notify_change(path-dentry, newattrs); } -int remove_suid(struct dentry *dentry) +int remove_suid(struct path *path) { - int killsuid = should_remove_suid(dentry); - int killpriv = security_inode_need_killpriv(dentry); + int killsuid = should_remove_suid(path-dentry); + int killpriv = security_inode_need_killpriv(path-dentry); int error = 0; if (killpriv 0) return killpriv; if (killpriv) - error = security_inode_killpriv(dentry); + error = security_inode_killpriv(path-dentry); if (!error killsuid) - error = __remove_suid(dentry, killsuid); + error = __remove_suid(path, killsuid); return error; } @@ -2370,7 +2370,7 @@ __generic_file_aio_write_nolock(struct k if (count == 0) goto out; - err = remove_suid(file-f_path.dentry); + err = remove_suid(file-f_path);
[AppArmor 03/47] Add a vfsmount parameter to notify_change()
The vfsmount parameter must be set appropriately for files visibile outside the kernel. Files that are only used in a filesystem (e.g., reiserfs xattr files) will have a NULL vfsmount. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/attr.c |3 ++- fs/ecryptfs/inode.c |4 +++- fs/exec.c |3 ++- fs/fat/file.c |2 +- fs/hpfs/namei.c |2 +- fs/namei.c |2 +- fs/nfsd/vfs.c |8 fs/open.c | 29 - fs/reiserfs/xattr.c |6 +++--- fs/sysfs/file.c |2 +- fs/unionfs/copyup.c | 22 +++--- fs/unionfs/inode.c | 23 --- fs/unionfs/rename.c |3 ++- fs/unionfs/subr.c |3 ++- fs/unionfs/union.h |3 ++- fs/utimes.c | 19 +-- include/linux/fs.h |6 +++--- mm/filemap.c|2 +- mm/tiny-shmem.c |2 +- 19 files changed, 85 insertions(+), 59 deletions(-) --- a/fs/attr.c +++ b/fs/attr.c @@ -100,7 +100,8 @@ int inode_setattr(struct inode * inode, } EXPORT_SYMBOL(inode_setattr); -int notify_change(struct dentry * dentry, struct iattr * attr) +int notify_change(struct dentry *dentry, struct vfsmount *mnt, + struct iattr *attr) { struct inode *inode = dentry-d_inode; mode_t mode = inode-i_mode; --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -860,6 +860,7 @@ static int ecryptfs_setattr(struct dentr { int rc = 0; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct inode *inode; struct inode *lower_inode; struct ecryptfs_crypt_stat *crypt_stat; @@ -870,6 +871,7 @@ static int ecryptfs_setattr(struct dentr inode = dentry-d_inode; lower_inode = ecryptfs_inode_to_lower(inode); lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); mutex_lock(crypt_stat-cs_mutex); if (S_ISDIR(dentry-d_inode-i_mode)) crypt_stat-flags = ~(ECRYPTFS_ENCRYPTED); @@ -920,7 +922,7 @@ static int ecryptfs_setattr(struct dentr if (ia-ia_valid (ATTR_KILL_SUID | ATTR_KILL_SGID)) ia-ia_valid = ~ATTR_MODE; - rc = notify_change(lower_dentry, ia); + rc = notify_change(lower_dentry, lower_mnt, ia); out: fsstack_copy_attr_all(inode, lower_inode); return rc; --- a/fs/exec.c +++ b/fs/exec.c @@ -1786,7 +1786,8 @@ int do_coredump(long signr, int exit_cod goto close_fail; if (!file-f_op-write) goto close_fail; - if (!ispipe do_truncate(file-f_path.dentry, 0, 0, file) != 0) + if (!ispipe + do_truncate(file-f_path.dentry, file-f_path.mnt, 0, 0, file) != 0) goto close_fail; retval = binfmt-core_dump(signr, regs, file, core_limit); --- a/fs/fat/file.c +++ b/fs/fat/file.c @@ -92,7 +92,7 @@ int fat_generic_ioctl(struct inode *inod } /* This MUST be done before doing anything irreversible... */ - err = notify_change(filp-f_path.dentry, ia); + err = notify_change(filp-f_path.dentry, filp-f_path.mnt, ia); if (err) goto up; --- a/fs/hpfs/namei.c +++ b/fs/hpfs/namei.c @@ -426,7 +426,7 @@ again: /*printk(HPFS: truncating file before delete.\n);*/ newattrs.ia_size = 0; newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME; - err = notify_change(dentry, newattrs); + err = notify_change(dentry, NULL, newattrs); put_write_access(inode); if (!err) goto again; --- a/fs/namei.c +++ b/fs/namei.c @@ -1663,7 +1663,7 @@ int may_open(struct nameidata *nd, int a if (!error) { DQUOT_INIT(inode); - error = do_truncate(dentry, 0, + error = do_truncate(dentry, nd-path.mnt, 0, ATTR_MTIME|ATTR_CTIME|ATTR_OPEN, NULL); } --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -388,7 +388,7 @@ nfsd_setattr(struct svc_rqst *rqstp, str err = nfserr_notsync; if (!check_guard || guardtime == inode-i_ctime.tv_sec) { fh_lock(fhp); - host_err = notify_change(dentry, iap); + host_err = notify_change(dentry, fhp-fh_export-ex_path.mnt, iap); err = nfserrno(host_err); fh_unlock(fhp); } @@ -944,13 +944,13 @@ out: return err; } -static void kill_suid(struct dentry *dentry) +static void kill_suid(struct dentry *dentry, struct vfsmount *mnt) { struct iattria;
[AppArmor 05/47] Add struct vfsmount parameter to vfs_mkdir()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |5 - fs/namei.c|5 +++-- fs/nfsd/nfs4recover.c |3 ++- fs/nfsd/vfs.c |9 ++--- fs/unionfs/copyup.c |5 - fs/unionfs/inode.c|3 ++- fs/unionfs/sioq.c |2 +- fs/unionfs/sioq.h |1 + include/linux/fs.h|2 +- 9 files changed, 24 insertions(+), 11 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -499,11 +499,14 @@ static int ecryptfs_mkdir(struct inode * { int rc; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry; lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); lower_dir_dentry = lock_parent(lower_dentry); - rc = vfs_mkdir(lower_dir_dentry-d_inode, lower_dentry, mode); + rc = vfs_mkdir(lower_dir_dentry-d_inode, lower_dentry, lower_mnt, + mode); if (rc || !lower_dentry-d_inode) goto out; rc = ecryptfs_interpose(lower_dentry, dentry, dir-i_sb, 0); --- a/fs/namei.c +++ b/fs/namei.c @@ -2107,7 +2107,8 @@ asmlinkage long sys_mknod(const char __u return sys_mknodat(AT_FDCWD, filename, mode, dev); } -int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +int vfs_mkdir(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, + int mode) { int error = may_create(dir, dentry, NULL); @@ -2154,7 +2155,7 @@ asmlinkage long sys_mkdirat(int dfd, con error = mnt_want_write(nd.path.mnt); if (error) goto out_dput; - error = vfs_mkdir(nd.path.dentry-d_inode, dentry, mode); + error = vfs_mkdir(nd.path.dentry-d_inode, dentry, nd.path.mnt, mode); mnt_drop_write(nd.path.mnt); out_dput: dput(dentry); --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -158,7 +158,8 @@ nfsd4_create_clid_dir(struct nfs4_client status = mnt_want_write(rec_dir.path.mnt); if (status) goto out_put; - status = vfs_mkdir(rec_dir.path.dentry-d_inode, dentry, S_IRWXU); + status = vfs_mkdir(rec_dir.path.dentry-d_inode, dentry, + rec_dir.path.mnt, S_IRWXU); mnt_drop_write(rec_dir.path.mnt); out_put: dput(dentry); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1166,6 +1166,7 @@ nfsd_create(struct svc_rqst *rqstp, stru int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild = NULL; + struct svc_export *exp; struct inode*dirp; __be32 err; int host_err; @@ -1182,6 +1183,7 @@ nfsd_create(struct svc_rqst *rqstp, stru goto out; dentry = fhp-fh_dentry; + exp = fhp-fh_export; dirp = dentry-d_inode; err = nfserr_notdir; @@ -1198,7 +1200,7 @@ nfsd_create(struct svc_rqst *rqstp, stru host_err = PTR_ERR(dchild); if (IS_ERR(dchild)) goto out_nfserr; - err = fh_compose(resfhp, fhp-fh_export, dchild, fhp); + err = fh_compose(resfhp, exp, dchild, fhp); if (err) goto out; } else { @@ -1237,7 +1239,8 @@ nfsd_create(struct svc_rqst *rqstp, stru host_err = vfs_create(dirp, dchild, iap-ia_mode, NULL); break; case S_IFDIR: - host_err = vfs_mkdir(dirp, dchild, iap-ia_mode); + host_err = vfs_mkdir(dirp, dchild, exp-ex_path.mnt, +iap-ia_mode); break; case S_IFCHR: case S_IFBLK: @@ -1256,7 +1259,7 @@ nfsd_create(struct svc_rqst *rqstp, stru if (host_err 0) goto out_nfserr; - if (EX_ISSYNC(fhp-fh_export)) { + if (EX_ISSYNC(exp)) { err = nfserrno(nfsd_sync_dir(dentry)); write_inode_now(dchild-d_inode, 1); } --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -165,6 +165,7 @@ out: */ static int __copyup_ndentry(struct dentry *old_lower_dentry, struct dentry *new_lower_dentry, + struct vfsmount *new_lower_mnt, struct dentry *new_lower_parent_dentry, char *symbuf) { @@ -175,6 +176,7 @@ static int __copyup_ndentry(struct dentr if (S_ISDIR(old_mode)) { args.mkdir.parent = new_lower_parent_dentry-d_inode; args.mkdir.dentry = new_lower_dentry; + args.mkdir.mnt = new_lower_mnt; args.mkdir.mode = old_mode; run_sioq(__unionfs_mkdir, args);
[AppArmor 06/47] Pass struct vfsmount to the inode_mkdir LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |8 ++-- security/dummy.c |2 +- security/security.c |5 +++-- security/selinux/hooks.c |3 ++- 5 files changed, 13 insertions(+), 7 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2119,7 +2119,7 @@ int vfs_mkdir(struct inode *dir, struct return -EPERM; mode = (S_IRWXUGO|S_ISVTX); - error = security_inode_mkdir(dir, dentry, mode); + error = security_inode_mkdir(dir, dentry, mnt, mode); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -339,6 +339,7 @@ struct request_sock; * associated with inode strcture @dir. * @dir containst the inode structure of parent of the directory to be created. * @dentry contains the dentry structure of new directory. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @mode contains the mode of new directory. * Return 0 if permission is granted. * @inode_rmdir: @@ -1281,7 +1282,8 @@ struct security_operations { int (*inode_unlink) (struct inode *dir, struct dentry *dentry); int (*inode_symlink) (struct inode *dir, struct dentry *dentry, const char *old_name); - int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, int mode); + int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode); int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, int mode, dev_t dev); @@ -1546,7 +1548,8 @@ int security_inode_link(struct dentry *o int security_inode_unlink(struct inode *dir, struct dentry *dentry); int security_inode_symlink(struct inode *dir, struct dentry *dentry, const char *old_name); -int security_inode_mkdir(struct inode *dir, struct dentry *dentry, int mode); +int security_inode_mkdir(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode); int security_inode_rmdir(struct inode *dir, struct dentry *dentry); int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev); int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, @@ -1880,6 +1883,7 @@ static inline int security_inode_symlink static inline int security_inode_mkdir (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -313,7 +313,7 @@ static int dummy_inode_symlink (struct i } static int dummy_inode_mkdir (struct inode *inode, struct dentry *dentry, - int mask) + struct vfsmount *mnt, int mask) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -379,11 +379,12 @@ int security_inode_symlink(struct inode return security_ops-inode_symlink(dir, dentry, old_name); } -int security_inode_mkdir(struct inode *dir, struct dentry *dentry, int mode) +int security_inode_mkdir(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode) { if (unlikely(IS_PRIVATE(dir))) return 0; - return security_ops-inode_mkdir(dir, dentry, mode); + return security_ops-inode_mkdir(dir, dentry, mnt, mode); } int security_inode_rmdir(struct inode *dir, struct dentry *dentry) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2442,7 +2442,8 @@ static int selinux_inode_symlink(struct return may_create(dir, dentry, SECCLASS_LNK_FILE); } -static int selinux_inode_mkdir(struct inode *dir, struct dentry *dentry, int mask) +static int selinux_inode_mkdir(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mask) { return may_create(dir, dentry, SECCLASS_DIR); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 07/47] Add a struct vfsmount parameter to vfs_mknod()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |5 - fs/namei.c | 10 ++ fs/nfsd/vfs.c |3 ++- fs/unionfs/copyup.c |1 + fs/unionfs/inode.c |3 ++- fs/unionfs/sioq.c |2 +- fs/unionfs/sioq.h |1 + include/linux/fs.h |2 +- net/unix/af_unix.c |3 ++- 9 files changed, 20 insertions(+), 10 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -550,11 +550,14 @@ ecryptfs_mknod(struct inode *dir, struct { int rc; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry; lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); lower_dir_dentry = lock_parent(lower_dentry); - rc = vfs_mknod(lower_dir_dentry-d_inode, lower_dentry, mode, dev); + rc = vfs_mknod(lower_dir_dentry-d_inode, lower_dentry, lower_mnt, mode, + dev); if (rc || !lower_dentry-d_inode) goto out; rc = ecryptfs_interpose(lower_dentry, dentry, dir-i_sb, 0); --- a/fs/namei.c +++ b/fs/namei.c @@ -2005,7 +2005,8 @@ fail: } EXPORT_SYMBOL_GPL(lookup_create); -int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) +int vfs_mknod(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, + int mode, dev_t dev) { int error = may_create(dir, dentry, NULL); @@ -2082,12 +2083,13 @@ asmlinkage long sys_mknodat(int dfd, con mode, nd); break; case S_IFCHR: case S_IFBLK: - error = vfs_mknod(nd.path.dentry-d_inode, dentry, mode, - new_decode_dev(dev)); + error = vfs_mknod(nd.path.dentry-d_inode, dentry, + nd.path.mnt, mode, + new_decode_dev(dev)); break; case S_IFIFO: case S_IFSOCK: error = vfs_mknod(nd.path.dentry-d_inode, dentry, - mode, 0); + nd.path.mnt, mode, 0); break; } mnt_drop_write(nd.path.mnt); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1249,7 +1249,8 @@ nfsd_create(struct svc_rqst *rqstp, stru host_err = mnt_want_write(fhp-fh_export-ex_path.mnt); if (host_err) break; - host_err = vfs_mknod(dirp, dchild, iap-ia_mode, rdev); + host_err = vfs_mknod(dirp, dchild, exp-ex_path.mnt, +iap-ia_mode, rdev); mnt_drop_write(fhp-fh_export-ex_path.mnt); break; default: --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -193,6 +193,7 @@ static int __copyup_ndentry(struct dentr S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) { args.mknod.parent = new_lower_parent_dentry-d_inode; args.mknod.dentry = new_lower_dentry; + args.mknod.mnt = new_lower_mnt; args.mknod.mode = old_mode; args.mknod.dev = old_lower_dentry-d_inode-i_rdev; --- a/fs/unionfs/inode.c +++ b/fs/unionfs/inode.c @@ -760,6 +760,7 @@ static int unionfs_mknod(struct inode *d continue; lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); + lower_mnt = unionfs_lower_mnt_idx(dentry, bindex); if (!lower_dentry) { lower_dentry = create_parents(dir, dentry, dentry-d_name.name, @@ -779,7 +780,7 @@ static int unionfs_mknod(struct inode *d } err = vfs_mknod(lower_parent_dentry-d_inode, - lower_dentry, mode, dev); + lower_dentry, lower_mnt, mode, dev); if (err) { unlock_dir(lower_parent_dentry); --- a/fs/unionfs/sioq.c +++ b/fs/unionfs/sioq.c @@ -78,7 +78,7 @@ void __unionfs_mknod(struct work_struct struct sioq_args *args = container_of(work, struct sioq_args, work); struct mknod_args *m = args-mknod; - args-err = vfs_mknod(m-parent, m-dentry, m-mode, m-dev); + args-err = vfs_mknod(m-parent, m-dentry, m-mnt, m-mode, m-dev); complete(args-comp); } --- a/fs/unionfs/sioq.h +++ b/fs/unionfs/sioq.h @@ -42,6 +42,7 @@ struct mkdir_args { struct mknod_args { struct inode *parent; struct dentry *dentry; + struct vfsmount *mnt; umode_t mode;
[AppArmor 08/47] Pass struct vfsmount to the inode_mknod LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |7 +-- security/dummy.c |2 +- security/security.c |5 +++-- security/selinux/hooks.c |5 +++-- 5 files changed, 13 insertions(+), 8 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2019,7 +2019,7 @@ int vfs_mknod(struct inode *dir, struct if (!dir-i_op || !dir-i_op-mknod) return -EPERM; - error = security_inode_mknod(dir, dentry, mode, dev); + error = security_inode_mknod(dir, dentry, mnt, mode, dev); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -354,6 +354,7 @@ struct request_sock; * and not this hook. * @dir contains the inode structure of parent of the new file. * @dentry contains the dentry structure of the new file. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @mode contains the mode of the new file. * @dev contains the device number. * Return 0 if permission is granted. @@ -1286,7 +1287,7 @@ struct security_operations { struct vfsmount *mnt, int mode); int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, - int mode, dev_t dev); + struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry); int (*inode_readlink) (struct dentry *dentry); @@ -1551,7 +1552,8 @@ int security_inode_symlink(struct inode int security_inode_mkdir(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); int security_inode_rmdir(struct inode *dir, struct dentry *dentry); -int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev); +int security_inode_mknod(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode, dev_t dev); int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry); int security_inode_readlink(struct dentry *dentry); @@ -1897,6 +1899,7 @@ static inline int security_inode_rmdir ( static inline int security_inode_mknod (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode, dev_t dev) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -324,7 +324,7 @@ static int dummy_inode_rmdir (struct ino } static int dummy_inode_mknod (struct inode *inode, struct dentry *dentry, - int mode, dev_t dev) + struct vfsmount *mnt, int mode, dev_t dev) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -394,11 +394,12 @@ int security_inode_rmdir(struct inode *d return security_ops-inode_rmdir(dir, dentry); } -int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) +int security_inode_mknod(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, int mode, dev_t dev) { if (unlikely(IS_PRIVATE(dir))) return 0; - return security_ops-inode_mknod(dir, dentry, mode, dev); + return security_ops-inode_mknod(dir, dentry, mnt, mode, dev); } int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2453,11 +2453,12 @@ static int selinux_inode_rmdir(struct in return may_link(dir, dentry, MAY_RMDIR); } -static int selinux_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) +static int selinux_inode_mknod(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, int mode, dev_t dev) { int rc; - rc = secondary_ops-inode_mknod(dir, dentry, mode, dev); + rc = secondary_ops-inode_mknod(dir, dentry, mnt, mode, dev); if (rc) return rc; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 09/47] Add a struct vfsmount parameter to vfs_symlink()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |4 +++- fs/namei.c |6 -- fs/nfsd/vfs.c | 13 + fs/unionfs/copyup.c |1 + fs/unionfs/inode.c |4 +++- fs/unionfs/sioq.c |3 ++- fs/unionfs/sioq.h |1 + include/linux/fs.h |2 +- 8 files changed, 24 insertions(+), 10 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -460,6 +460,7 @@ static int ecryptfs_symlink(struct inode { int rc; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry; umode_t mode; char *encoded_symname; @@ -468,6 +469,7 @@ static int ecryptfs_symlink(struct inode lower_dentry = ecryptfs_dentry_to_lower(dentry); dget(lower_dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); lower_dir_dentry = lock_parent(lower_dentry); mode = S_IALLUGO; encoded_symlen = ecryptfs_encode_filename(crypt_stat, symname, @@ -477,7 +479,7 @@ static int ecryptfs_symlink(struct inode rc = encoded_symlen; goto out_lock; } - rc = vfs_symlink(lower_dir_dentry-d_inode, lower_dentry, + rc = vfs_symlink(lower_dir_dentry-d_inode, lower_dentry, lower_mnt, encoded_symname, mode); kfree(encoded_symname); if (rc || !lower_dentry-d_inode) --- a/fs/namei.c +++ b/fs/namei.c @@ -2391,7 +2391,8 @@ asmlinkage long sys_unlink(const char __ return do_unlinkat(AT_FDCWD, pathname); } -int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, int mode) +int vfs_symlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, + const char *oldname, int mode) { int error = may_create(dir, dentry, NULL); @@ -2440,7 +2441,8 @@ asmlinkage long sys_symlinkat(const char error = mnt_want_write(nd.path.mnt); if (error) goto out_dput; - error = vfs_symlink(nd.path.dentry-d_inode, dentry, from, S_IALLUGO); + error = vfs_symlink(nd.path.dentry-d_inode, dentry, nd.path.mnt, from, + S_IALLUGO); mnt_drop_write(nd.path.mnt); out_dput: dput(dentry); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1495,6 +1495,7 @@ nfsd_symlink(struct svc_rqst *rqstp, str struct iattr *iap) { struct dentry *dentry, *dnew; + struct svc_export *exp; __be32 err, cerr; int host_err; umode_t mode; @@ -1521,6 +1522,7 @@ nfsd_symlink(struct svc_rqst *rqstp, str if (iap (iap-ia_valid ATTR_MODE)) mode = iap-ia_mode S_IALLUGO; + exp = fhp-fh_export; if (unlikely(path[plen] != 0)) { char *path_alloced = kmalloc(plen+1, GFP_KERNEL); if (path_alloced == NULL) @@ -1528,20 +1530,23 @@ nfsd_symlink(struct svc_rqst *rqstp, str else { strncpy(path_alloced, path, plen); path_alloced[plen] = 0; - host_err = vfs_symlink(dentry-d_inode, dnew, path_alloced, mode); + host_err = vfs_symlink(dentry-d_inode, dnew, + exp-ex_path.mnt, path_alloced, + mode); kfree(path_alloced); } } else - host_err = vfs_symlink(dentry-d_inode, dnew, path, mode); + host_err = vfs_symlink(dentry-d_inode, dnew, exp-ex_path.mnt, + path, mode); if (!host_err) { - if (EX_ISSYNC(fhp-fh_export)) + if (EX_ISSYNC(exp)) host_err = nfsd_sync_dir(dentry); } err = nfserrno(host_err); fh_unlock(fhp); - cerr = fh_compose(resfhp, fhp-fh_export, dnew, fhp); + cerr = fh_compose(resfhp, exp, dnew, fhp); dput(dnew); if (err==0) err = cerr; out: --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -184,6 +184,7 @@ static int __copyup_ndentry(struct dentr } else if (S_ISLNK(old_mode)) { args.symlink.parent = new_lower_parent_dentry-d_inode; args.symlink.dentry = new_lower_dentry; + args.symlink.mnt = new_lower_mnt; args.symlink.symbuf = symbuf; args.symlink.mode = old_mode; --- a/fs/unionfs/inode.c +++ b/fs/unionfs/inode.c @@ -452,6 +452,7 @@ static int unionfs_symlink(struct inode */ for (bindex = bstart; bindex = 0; bindex--) { lower_dentry = unionfs_lower_dentry_idx(dentry, bindex); +
[AppArmor 10/47] Pass struct vfsmount to the inode_symlink LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h |8 +--- security/dummy.c |2 +- security/security.c |4 ++-- security/selinux/hooks.c |3 ++- 5 files changed, 11 insertions(+), 8 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2402,7 +2402,7 @@ int vfs_symlink(struct inode *dir, struc if (!dir-i_op || !dir-i_op-symlink) return -EPERM; - error = security_inode_symlink(dir, dentry, oldname); + error = security_inode_symlink(dir, dentry, mnt, oldname); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -332,6 +332,7 @@ struct request_sock; * Check the permission to create a symbolic link to a file. * @dir contains the inode structure of parent directory of the symbolic link. * @dentry contains the dentry structure of the symbolic link. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * @old_name contains the pathname of file. * Return 0 if permission is granted. * @inode_mkdir: @@ -1281,8 +1282,8 @@ struct security_operations { int (*inode_link) (struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry); int (*inode_unlink) (struct inode *dir, struct dentry *dentry); - int (*inode_symlink) (struct inode *dir, - struct dentry *dentry, const char *old_name); + int (*inode_symlink) (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, const char *old_name); int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); @@ -1548,7 +1549,7 @@ int security_inode_link(struct dentry *o struct dentry *new_dentry); int security_inode_unlink(struct inode *dir, struct dentry *dentry); int security_inode_symlink(struct inode *dir, struct dentry *dentry, - const char *old_name); + struct vfsmount *mnt, const char *old_name); int security_inode_mkdir(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); int security_inode_rmdir(struct inode *dir, struct dentry *dentry); @@ -1878,6 +1879,7 @@ static inline int security_inode_unlink static inline int security_inode_symlink (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt, const char *old_name) { return 0; --- a/security/dummy.c +++ b/security/dummy.c @@ -307,7 +307,7 @@ static int dummy_inode_unlink (struct in } static int dummy_inode_symlink (struct inode *inode, struct dentry *dentry, - const char *name) + struct vfsmount *mnt, const char *name) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -372,11 +372,11 @@ int security_inode_unlink(struct inode * } int security_inode_symlink(struct inode *dir, struct dentry *dentry, - const char *old_name) + struct vfsmount *mnt, const char *old_name) { if (unlikely(IS_PRIVATE(dir))) return 0; - return security_ops-inode_symlink(dir, dentry, old_name); + return security_ops-inode_symlink(dir, dentry, mnt, old_name); } int security_inode_mkdir(struct inode *dir, struct dentry *dentry, --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2437,7 +2437,8 @@ static int selinux_inode_unlink(struct i return may_link(dir, dentry, MAY_UNLINK); } -static int selinux_inode_symlink(struct inode *dir, struct dentry *dentry, const char *name) +static int selinux_inode_symlink(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt, const char *name) { return may_create(dir, dentry, SECCLASS_LNK_FILE); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 11/47] Pass struct vfsmount to the inode_readlink LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/stat.c|3 ++- include/linux/security.h |8 +--- security/dummy.c |2 +- security/security.c |4 ++-- security/selinux/hooks.c |2 +- 5 files changed, 11 insertions(+), 8 deletions(-) --- a/fs/stat.c +++ b/fs/stat.c @@ -306,7 +306,8 @@ asmlinkage long sys_readlinkat(int dfd, error = -EINVAL; if (inode-i_op inode-i_op-readlink) { - error = security_inode_readlink(nd.path.dentry); + error = security_inode_readlink(nd.path.dentry, + nd.path.mnt); if (!error) { touch_atime(nd.path.mnt, nd.path.dentry); error = inode-i_op-readlink(nd.path.dentry, --- a/include/linux/security.h +++ b/include/linux/security.h @@ -369,6 +369,7 @@ struct request_sock; * @inode_readlink: * Check the permission to read the symbolic link. * @dentry contains the dentry structure for the file link. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * Return 0 if permission is granted. * @inode_follow_link: * Check permission to follow a symbolic link when looking up a pathname. @@ -1291,7 +1292,7 @@ struct security_operations { struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry); - int (*inode_readlink) (struct dentry *dentry); + int (*inode_readlink) (struct dentry *dentry, struct vfsmount *mnt); int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd); int (*inode_permission) (struct inode *inode, int mask, struct nameidata *nd); int (*inode_setattr) (struct dentry *dentry, struct vfsmount *mnt, @@ -1557,7 +1558,7 @@ int security_inode_mknod(struct inode *d struct vfsmount *mnt, int mode, dev_t dev); int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry); -int security_inode_readlink(struct dentry *dentry); +int security_inode_readlink(struct dentry *dentry, struct vfsmount *mnt); int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd); int security_inode_permission(struct inode *inode, int mask, struct nameidata *nd); int security_inode_setattr(struct dentry *dentry, struct vfsmount *mnt, @@ -1915,7 +1916,8 @@ static inline int security_inode_rename return 0; } -static inline int security_inode_readlink (struct dentry *dentry) +static inline int security_inode_readlink(struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -337,7 +337,7 @@ static int dummy_inode_rename (struct in return 0; } -static int dummy_inode_readlink (struct dentry *dentry) +static int dummy_inode_readlink (struct dentry *dentry, struct vfsmount *mnt) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -412,11 +412,11 @@ int security_inode_rename(struct inode * new_dir, new_dentry); } -int security_inode_readlink(struct dentry *dentry) +int security_inode_readlink(struct dentry *dentry, struct vfsmount *mnt) { if (unlikely(IS_PRIVATE(dentry-d_inode))) return 0; - return security_ops-inode_readlink(dentry); + return security_ops-inode_readlink(dentry, mnt); } int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2472,7 +2472,7 @@ static int selinux_inode_rename(struct i return may_rename(old_inode, old_dentry, new_inode, new_dentry); } -static int selinux_inode_readlink(struct dentry *dentry) +static int selinux_inode_readlink(struct dentry *dentry, struct vfsmount *mnt) { return dentry_has_perm(current, NULL, dentry, FILE__READ); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 12/47] Add struct vfsmount parameters to vfs_link()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |9 +++-- fs/namei.c |5 +++-- fs/nfsd/vfs.c |3 ++- fs/unionfs/inode.c | 14 +++--- include/linux/fs.h |2 +- 5 files changed, 24 insertions(+), 9 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -401,19 +401,24 @@ static int ecryptfs_link(struct dentry * struct dentry *new_dentry) { struct dentry *lower_old_dentry; + struct vfsmount *lower_old_mnt; struct dentry *lower_new_dentry; + struct vfsmount *lower_new_mnt; struct dentry *lower_dir_dentry; u64 file_size_save; int rc; file_size_save = i_size_read(old_dentry-d_inode); lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry); + lower_old_mnt = ecryptfs_dentry_to_lower_mnt(old_dentry); lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry); + lower_new_mnt = ecryptfs_dentry_to_lower_mnt(new_dentry); dget(lower_old_dentry); dget(lower_new_dentry); lower_dir_dentry = lock_parent(lower_new_dentry); - rc = vfs_link(lower_old_dentry, lower_dir_dentry-d_inode, - lower_new_dentry); + rc = vfs_link(lower_old_dentry, lower_old_mnt, + lower_dir_dentry-d_inode, lower_new_dentry, + lower_new_mnt); if (rc || !lower_new_dentry-d_inode) goto out_lock; rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir-i_sb, 0); --- a/fs/namei.c +++ b/fs/namei.c @@ -2461,7 +2461,7 @@ asmlinkage long sys_symlink(const char _ return sys_symlinkat(oldname, AT_FDCWD, newname); } -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry) +int vfs_link(struct dentry *old_dentry, struct vfsmount *old_mnt, struct inode *dir, struct dentry *new_dentry, struct vfsmount *new_mnt) { struct inode *inode = old_dentry-d_inode; int error; @@ -2542,7 +2542,8 @@ asmlinkage long sys_linkat(int olddfd, c error = mnt_want_write(nd.path.mnt); if (error) goto out_dput; - error = vfs_link(old_nd.path.dentry, nd.path.dentry-d_inode, new_dentry); + error = vfs_link(old_nd.path.dentry, old_nd.path.mnt, +nd.path.dentry-d_inode, new_dentry, nd.path.mnt); mnt_drop_write(nd.path.mnt); out_dput: dput(new_dentry); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1596,7 +1596,8 @@ nfsd_link(struct svc_rqst *rqstp, struct dold = tfhp-fh_dentry; dest = dold-d_inode; - host_err = vfs_link(dold, dirp, dnew); + host_err = vfs_link(dold, tfhp-fh_export-ex_path.mnt, dirp, + dnew, ffhp-fh_export-ex_path.mnt); if (!host_err) { if (EX_ISSYNC(ffhp-fh_export)) { err = nfserrno(nfsd_sync_dir(ddir)); --- a/fs/unionfs/inode.c +++ b/fs/unionfs/inode.c @@ -222,6 +222,7 @@ static int unionfs_link(struct dentry *o { int err = 0; struct dentry *lower_old_dentry = NULL; + struct vfsmount *lower_old_mnt = NULL; struct dentry *lower_new_dentry = NULL; struct vfsmount *lower_new_mnt = NULL; struct dentry *lower_dir_dentry = NULL; @@ -293,14 +294,17 @@ static int unionfs_link(struct dentry *o goto out; } lower_new_dentry = unionfs_lower_dentry(new_dentry); + lower_new_mnt = unionfs_lower_mnt(new_dentry); lower_old_dentry = unionfs_lower_dentry(old_dentry); + lower_old_mnt = unionfs_lower_mnt(old_dentry); BUG_ON(dbstart(old_dentry) != dbstart(new_dentry)); lower_dir_dentry = lock_parent(lower_new_dentry); err = is_robranch(old_dentry); if (!err) - err = vfs_link(lower_old_dentry, lower_dir_dentry-d_inode, - lower_new_dentry); + err = vfs_link(lower_old_dentry, lower_old_mnt, + lower_dir_dentry-d_inode, + lower_new_dentry, lower_new_mnt); unlock_dir(lower_dir_dentry); docopyup: @@ -321,12 +325,16 @@ docopyup: bindex, lower_new_mnt); lower_old_dentry = unionfs_lower_dentry(old_dentry); + lower_old_mnt = + unionfs_lower_mnt(old_dentry); lower_dir_dentry = lock_parent(lower_new_dentry); /* do vfs_link */ err = vfs_link(lower_old_dentry, +
[AppArmor 13/47] Pass the struct vfsmounts to the inode_link LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |3 ++- include/linux/security.h | 16 +++- security/dummy.c |6 -- security/security.c |8 +--- security/selinux/hooks.c |9 +++-- 5 files changed, 29 insertions(+), 13 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2486,7 +2486,8 @@ int vfs_link(struct dentry *old_dentry, if (S_ISDIR(old_dentry-d_inode-i_mode)) return -EPERM; - error = security_inode_link(old_dentry, dir, new_dentry); + error = security_inode_link(old_dentry, old_mnt, dir, new_dentry, + new_mnt); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -320,8 +320,10 @@ struct request_sock; * @inode_link: * Check permission before creating a new hard link to a file. * @old_dentry contains the dentry structure for an existing link to the file. + * @old_mnt is the vfsmount corresponding to @old_dentry (may be NULL). * @dir contains the inode structure of the parent directory of the new link. * @new_dentry contains the dentry structure for the new link. + * @new_mnt is the vfsmount corresponding to @new_dentry (may be NULL). * Return 0 if permission is granted. * @inode_unlink: * Check the permission to remove a hard link to a file. @@ -1280,8 +1282,9 @@ struct security_operations { char **name, void **value, size_t *len); int (*inode_create) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); - int (*inode_link) (struct dentry *old_dentry, - struct inode *dir, struct dentry *new_dentry); + int (*inode_link) (struct dentry *old_dentry, struct vfsmount *old_mnt, + struct inode *dir, struct dentry *new_dentry, + struct vfsmount *new_mnt); int (*inode_unlink) (struct inode *dir, struct dentry *dentry); int (*inode_symlink) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, const char *old_name); @@ -1546,8 +1549,9 @@ int security_inode_init_security(struct char **name, void **value, size_t *len); int security_inode_create(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); -int security_inode_link(struct dentry *old_dentry, struct inode *dir, -struct dentry *new_dentry); +int security_inode_link(struct dentry *old_dentry, struct vfsmount *old_mnt, + struct inode *dir, struct dentry *new_dentry, + struct vfsmount *new_mnt); int security_inode_unlink(struct inode *dir, struct dentry *dentry); int security_inode_symlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, const char *old_name); @@ -1866,8 +1870,10 @@ static inline int security_inode_create } static inline int security_inode_link (struct dentry *old_dentry, + struct vfsmount *old_mnt, struct inode *dir, - struct dentry *new_dentry) + struct dentry *new_dentry, + struct vfsmount *new_mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -295,8 +295,10 @@ static int dummy_inode_create (struct in return 0; } -static int dummy_inode_link (struct dentry *old_dentry, struct inode *inode, -struct dentry *new_dentry) +static int dummy_inode_link (struct dentry *old_dentry, +struct vfsmount *old_mnt, struct inode *inode, +struct dentry *new_dentry, +struct vfsmount *new_mnt) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -356,12 +356,14 @@ int security_inode_create(struct inode * return security_ops-inode_create(dir, dentry, mnt, mode); } -int security_inode_link(struct dentry *old_dentry, struct inode *dir, -struct dentry *new_dentry) +int security_inode_link(struct dentry *old_dentry, struct vfsmount *old_mnt, + struct inode *dir, struct dentry *new_dentry, + struct vfsmount *new_mnt) { if (unlikely(IS_PRIVATE(old_dentry-d_inode))) return 0; - return security_ops-inode_link(old_dentry, dir, new_dentry); + return security_ops-inode_link(old_dentry, old_mnt, dir, +
Re: [rfc][patch] mm: madvise(WILLNEED) for anonymous memory
On Thu, 20 Dec 2007, Peter Zijlstra wrote: Lennart asked for madvise(WILLNEED) to work on anonymous pages, he plans to use this to pre-fault pages. He currently uses: mlock/munlock for this purpose. I certainly agree with this in principle: it just seems an unnecessary and surprising restriction to refuse on anonymous vmas; I guess the only reason for not adding this was not having anyone asking for it until now. Though, does Lennart realize he could use MAP_POPULATE in the mmap? [ compile tested only ] I haven't tried it either, but generally it looks plausible. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- diff --git a/mm/madvise.c b/mm/madvise.c index 93ee375..eff60ce 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -100,6 +100,24 @@ out: return error; } +static long madvice_willneed_anon(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) mavise.c uses madvise_ rather than madvice_ throughout, so please go with the flow. +{ + int ret, len; + + *prev = vma; + if (end vma-vm_end) + end = vma-vm_end; Please check, but I think the upper level ensures end is within range. + + len = end - start; + ret = get_user_pages(current, current-mm, start, len, + 0, 0, NULL, NULL); + if (ret 0) + return ret; + return ret == len ? 0 : -1; It's not good to return -1 as an alternative to a real errno: it'll look like -EPERM. If you copied that from somewhere, better send a patch to fix the somewhere! Ah, yes, make_pages_present: it happens that nobody is interested in its return value, so we could make it a void; but that'd just be a cleanup. What to do here if non-negative ret less than len? Oh, just return 0, that's good enough in this case (the file case always returns 0). Hmm, might it be better to use make_pages_present itself, fixing its retval, rather than using get_user_pages directly? (I'd hope the caching makes its repeat of find_vma not an overhead.) Interesting divergence: make_pages_present faults in writable pages in a writable vma, whereas the file case's force_page_cache_readahead doesn't even insert the pages into the mm. +} + /* * Schedule all required I/O operations. Do not wait for completion. */ @@ -110,7 +128,7 @@ static long madvise_willneed(struct vm_area_struct * vma, struct file *file = vma-vm_file; if (!file) - return -EBADF; + return madvice_willneed_anon(vma, prev, start, end); if (file-f_mapping-a_ops-get_xip_page) { /* no bad return value, but ignore advice */ And there's a correctly invisible hunk to the patch too: this extension of MADV_WILLNEED also does not require down_write of mmap_sem, so madvise_need_mmap_write can remain unchanged. Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 14/47] Add a struct vfsmount parameter to vfs_rmdir()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |4 +++- fs/namei.c|4 ++-- fs/nfsd/nfs4recover.c |2 +- fs/nfsd/vfs.c |8 +--- fs/reiserfs/xattr.c |2 +- fs/unionfs/unlink.c |5 - include/linux/fs.h|2 +- 7 files changed, 17 insertions(+), 10 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -532,14 +532,16 @@ out: static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry) { struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry; int rc; lower_dentry = ecryptfs_dentry_to_lower(dentry); + lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); dget(dentry); lower_dir_dentry = lock_parent(lower_dentry); dget(lower_dentry); - rc = vfs_rmdir(lower_dir_dentry-d_inode, lower_dentry); + rc = vfs_rmdir(lower_dir_dentry-d_inode, lower_dentry, lower_mnt); dput(lower_dentry); if (!rc) d_delete(lower_dentry); --- a/fs/namei.c +++ b/fs/namei.c @@ -2202,7 +2202,7 @@ void dentry_unhash(struct dentry *dentry spin_unlock(dcache_lock); } -int vfs_rmdir(struct inode *dir, struct dentry *dentry) +int vfs_rmdir(struct inode *dir, struct dentry *dentry,struct vfsmount *mnt) { int error = may_delete(dir, dentry, 1); @@ -2269,7 +2269,7 @@ static long do_rmdir(int dfd, const char error = mnt_want_write(nd.path.mnt); if (error) goto exit3; - error = vfs_rmdir(nd.path.dentry-d_inode, dentry); + error = vfs_rmdir(nd.path.dentry-d_inode, dentry, nd.path.mnt); mnt_drop_write(nd.path.mnt); exit3: dput(dentry); --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -279,7 +279,7 @@ nfsd4_clear_clid_dir(struct dentry *dir, * a kernel from the future */ nfsd4_list_rec_dir(dentry, nfsd4_remove_clid_file); mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT); - status = vfs_rmdir(dir-d_inode, dentry); + status = vfs_rmdir(dir-d_inode, dentry, rec_dir.path.mnt); mutex_unlock(dir-d_inode-i_mutex); return status; } --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1734,6 +1734,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru char *fname, int flen) { struct dentry *dentry, *rdentry; + struct svc_export *exp; struct inode*dirp; __be32 err; int host_err; @@ -1748,6 +1749,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru fh_lock_nested(fhp, I_MUTEX_PARENT); dentry = fhp-fh_dentry; dirp = dentry-d_inode; + exp = fhp-fh_export; rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); @@ -1765,21 +1767,21 @@ nfsd_unlink(struct svc_rqst *rqstp, stru if (type != S_IFDIR) { /* It's UNLINK */ #ifdef MSNFS - if ((fhp-fh_export-ex_flags NFSEXP_MSNFS) + if ((exp-ex_flags NFSEXP_MSNFS) (atomic_read(rdentry-d_count) 1)) { host_err = -EPERM; } else #endif host_err = vfs_unlink(dirp, rdentry); } else { /* It's RMDIR */ - host_err = vfs_rmdir(dirp, rdentry); + host_err = vfs_rmdir(dirp, rdentry, exp-ex_path.mnt); } dput(rdentry); if (host_err) goto out_nfserr; - if (EX_ISSYNC(fhp-fh_export)) + if (EX_ISSYNC(exp)) host_err = nfsd_sync_dir(dentry); out_nfserr: --- a/fs/reiserfs/xattr.c +++ b/fs/reiserfs/xattr.c @@ -747,7 +747,7 @@ int reiserfs_delete_xattrs(struct inode if (dir-d_inode-i_nlink = 2) { root = get_xa_root(inode-i_sb, XATTR_REPLACE); reiserfs_write_lock_xattrs(inode-i_sb); - err = vfs_rmdir(root-d_inode, dir); + err = vfs_rmdir(root-d_inode, dir, NULL); reiserfs_write_unlock_xattrs(inode-i_sb); dput(root); } else { --- a/fs/unionfs/unlink.c +++ b/fs/unionfs/unlink.c @@ -125,6 +125,7 @@ static int unionfs_rmdir_first(struct in { int err; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry = NULL; /* Here we need to remove whiteout entries. */ @@ -133,6 +134,7 @@ static int unionfs_rmdir_first(struct in goto out; lower_dentry = unionfs_lower_dentry(dentry); + lower_mnt = unionfs_lower_mnt(dentry); lower_dir_dentry = lock_parent(lower_dentry); @@ -140,7 +142,8 @@ static int unionfs_rmdir_first(struct in dget(lower_dentry); err =
[AppArmor 15/47] Pass struct vfsmount to the inode_rmdir LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h | 10 +++--- security/dummy.c |3 ++- security/security.c |5 +++-- security/selinux/hooks.c |3 ++- 5 files changed, 15 insertions(+), 8 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2219,7 +2219,7 @@ int vfs_rmdir(struct inode *dir, struct if (d_mountpoint(dentry)) error = -EBUSY; else { - error = security_inode_rmdir(dir, dentry); + error = security_inode_rmdir(dir, dentry, mnt); if (!error) { error = dir-i_op-rmdir(dir, dentry); if (!error) --- a/include/linux/security.h +++ b/include/linux/security.h @@ -349,6 +349,7 @@ struct request_sock; * Check the permission to remove a directory. * @dir contains the inode structure of parent of the directory to be removed. * @dentry contains the dentry structure of directory to be removed. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * Return 0 if permission is granted. * @inode_mknod: * Check permissions when creating a special file (or a socket or a fifo @@ -1290,7 +1291,8 @@ struct security_operations { struct vfsmount *mnt, const char *old_name); int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); - int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); + int (*inode_rmdir) (struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, @@ -1557,7 +1559,8 @@ int security_inode_symlink(struct inode struct vfsmount *mnt, const char *old_name); int security_inode_mkdir(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode); -int security_inode_rmdir(struct inode *dir, struct dentry *dentry); +int security_inode_rmdir(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt); int security_inode_mknod(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode, dev_t dev); int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, @@ -1901,7 +1904,8 @@ static inline int security_inode_mkdir ( } static inline int security_inode_rmdir (struct inode *dir, - struct dentry *dentry) + struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -320,7 +320,8 @@ static int dummy_inode_mkdir (struct ino return 0; } -static int dummy_inode_rmdir (struct inode *inode, struct dentry *dentry) +static int dummy_inode_rmdir (struct inode *inode, struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -389,11 +389,12 @@ int security_inode_mkdir(struct inode *d return security_ops-inode_mkdir(dir, dentry, mnt, mode); } -int security_inode_rmdir(struct inode *dir, struct dentry *dentry) +int security_inode_rmdir(struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt) { if (unlikely(IS_PRIVATE(dentry-d_inode))) return 0; - return security_ops-inode_rmdir(dir, dentry); + return security_ops-inode_rmdir(dir, dentry, mnt); } int security_inode_mknod(struct inode *dir, struct dentry *dentry, --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2454,7 +2454,8 @@ static int selinux_inode_mkdir(struct in return may_create(dir, dentry, SECCLASS_DIR); } -static int selinux_inode_rmdir(struct inode *dir, struct dentry *dentry) +static int selinux_inode_rmdir(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt) { return may_link(dir, dentry, MAY_RMDIR); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 16/47] Call lsm hook before unhashing dentry in vfs_rmdir()
If we unhash the dentry before calling the security_inode_rmdir hook, we cannot compute the file's pathname in the hook anymore. AppArmor needs to know the filename in order to decide whether a file may be deleted, though. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- fs/namei.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2212,6 +2212,10 @@ int vfs_rmdir(struct inode *dir, struct if (!dir-i_op || !dir-i_op-rmdir) return -EPERM; + error = security_inode_rmdir(dir, dentry, mnt); + if (error) + return error; + DQUOT_INIT(dir); mutex_lock(dentry-d_inode-i_mutex); @@ -2219,12 +2223,9 @@ int vfs_rmdir(struct inode *dir, struct if (d_mountpoint(dentry)) error = -EBUSY; else { - error = security_inode_rmdir(dir, dentry, mnt); - if (!error) { - error = dir-i_op-rmdir(dir, dentry); - if (!error) - dentry-d_inode-i_flags |= S_DEAD; - } + error = dir-i_op-rmdir(dir, dentry); + if (!error) + dentry-d_inode-i_flags |= S_DEAD; } mutex_unlock(dentry-d_inode-i_mutex); if (!error) { -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 17/47] Add a struct vfsmount parameter to vfs_unlink()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |3 ++- fs/namei.c |5 +++-- fs/nfsd/nfs4recover.c |2 +- fs/nfsd/vfs.c |2 +- fs/unionfs/commonfops.c |4 +++- fs/unionfs/copyup.c |3 ++- fs/unionfs/dirhelper.c |5 - fs/unionfs/inode.c | 22 -- fs/unionfs/rename.c |8 ++-- fs/unionfs/sioq.c |2 +- fs/unionfs/sioq.h |1 + fs/unionfs/unlink.c |5 - include/linux/fs.h |2 +- ipc/mqueue.c|2 +- 14 files changed, 46 insertions(+), 20 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -443,10 +443,11 @@ static int ecryptfs_unlink(struct inode { int rc = 0; struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry); + struct vfsmount *lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry); struct inode *lower_dir_inode = ecryptfs_inode_to_lower(dir); lock_parent(lower_dentry); - rc = vfs_unlink(lower_dir_inode, lower_dentry); + rc = vfs_unlink(lower_dir_inode, lower_dentry, lower_mnt); if (rc) { printk(KERN_ERR Error in vfs_unlink; rc = [%d]\n, rc); goto out_unlock; --- a/fs/namei.c +++ b/fs/namei.c @@ -2288,7 +2288,7 @@ asmlinkage long sys_rmdir(const char __u return do_rmdir(AT_FDCWD, pathname); } -int vfs_unlink(struct inode *dir, struct dentry *dentry) +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt) { int error = may_delete(dir, dentry, 0); @@ -2356,7 +2356,8 @@ static long do_unlinkat(int dfd, const c error = mnt_want_write(nd.path.mnt); if (error) goto exit2; - error = vfs_unlink(nd.path.dentry-d_inode, dentry); + error = vfs_unlink(nd.path.dentry-d_inode, dentry, + nd.path.mnt); mnt_drop_write(nd.path.mnt); exit2: dput(dentry); --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -264,7 +264,7 @@ nfsd4_remove_clid_file(struct dentry *di return -EINVAL; } mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT); - status = vfs_unlink(dir-d_inode, dentry); + status = vfs_unlink(dir-d_inode, dentry, rec_dir.path.mnt); mutex_unlock(dir-d_inode-i_mutex); return status; } --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1772,7 +1772,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru host_err = -EPERM; } else #endif - host_err = vfs_unlink(dirp, rdentry); + host_err = vfs_unlink(dirp, rdentry, exp-ex_path.mnt); } else { /* It's RMDIR */ host_err = vfs_rmdir(dirp, rdentry, exp-ex_path.mnt); } --- a/fs/unionfs/commonfops.c +++ b/fs/unionfs/commonfops.c @@ -34,6 +34,7 @@ static int copyup_deleted_file(struct fi int err; struct dentry *tmp_dentry = NULL; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; struct dentry *lower_dir_dentry = NULL; lower_dentry = unionfs_lower_dentry_idx(dentry, bstart); @@ -82,13 +83,14 @@ retry: /* bring it to the same state as an unlinked file */ lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry)); + lower_mnt = unionfs_lower_mnt_idx(dentry, dbstart(dentry)); if (!unionfs_lower_inode_idx(dentry-d_inode, bindex)) { atomic_inc(lower_dentry-d_inode-i_count); unionfs_set_lower_inode_idx(dentry-d_inode, bindex, lower_dentry-d_inode); } lower_dir_dentry = lock_parent(lower_dentry); - err = vfs_unlink(lower_dir_dentry-d_inode, lower_dentry); + err = vfs_unlink(lower_dir_dentry-d_inode, lower_dentry, lower_mnt); unlock_dir(lower_dir_dentry); out: --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -491,7 +491,8 @@ out_unlink: * quota, or something else happened so let's unlink; we don't * really care about the return value of vfs_unlink */ - vfs_unlink(new_lower_parent_dentry-d_inode, new_lower_dentry); + vfs_unlink(new_lower_parent_dentry-d_inode, new_lower_dentry, + new_lower_mnt); if (copyup_file) { /* need to close the file */ --- a/fs/unionfs/dirhelper.c +++ b/fs/unionfs/dirhelper.c @@ -29,6 +29,7 @@ int do_delete_whiteouts(struct dentry *d int err = 0; struct dentry *lower_dir_dentry = NULL; struct dentry *lower_dentry; + struct vfsmount *lower_mnt; char *name = NULL, *p; struct inode *lower_dir; int i; @@
[AppArmor 18/47] Pass struct vfsmount to the inode_unlink LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 +- include/linux/security.h | 10 +++--- security/dummy.c |3 ++- security/security.c |5 +++-- security/selinux/hooks.c |5 +++-- 5 files changed, 16 insertions(+), 9 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2304,7 +2304,7 @@ int vfs_unlink(struct inode *dir, struct if (d_mountpoint(dentry)) error = -EBUSY; else { - error = security_inode_unlink(dir, dentry); + error = security_inode_unlink(dir, dentry, mnt); if (!error) error = dir-i_op-unlink(dir, dentry); } --- a/include/linux/security.h +++ b/include/linux/security.h @@ -329,6 +329,7 @@ struct request_sock; * Check the permission to remove a hard link to a file. * @dir contains the inode structure of parent directory of the file. * @dentry contains the dentry structure for file to be unlinked. + * @mnt is the vfsmount corresponding to @dentry (may be NULL). * Return 0 if permission is granted. * @inode_symlink: * Check the permission to create a symbolic link to a file. @@ -1286,7 +1287,8 @@ struct security_operations { int (*inode_link) (struct dentry *old_dentry, struct vfsmount *old_mnt, struct inode *dir, struct dentry *new_dentry, struct vfsmount *new_mnt); - int (*inode_unlink) (struct inode *dir, struct dentry *dentry); + int (*inode_unlink) (struct inode *dir, struct dentry *dentry, +struct vfsmount *mnt); int (*inode_symlink) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, const char *old_name); int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, @@ -1554,7 +1556,8 @@ int security_inode_create(struct inode * int security_inode_link(struct dentry *old_dentry, struct vfsmount *old_mnt, struct inode *dir, struct dentry *new_dentry, struct vfsmount *new_mnt); -int security_inode_unlink(struct inode *dir, struct dentry *dentry); +int security_inode_unlink(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt); int security_inode_symlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, const char *old_name); int security_inode_mkdir(struct inode *dir, struct dentry *dentry, @@ -1882,7 +1885,8 @@ static inline int security_inode_link (s } static inline int security_inode_unlink (struct inode *dir, -struct dentry *dentry) +struct dentry *dentry, +struct vfsmount *mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -303,7 +303,8 @@ static int dummy_inode_link (struct dent return 0; } -static int dummy_inode_unlink (struct inode *inode, struct dentry *dentry) +static int dummy_inode_unlink (struct inode *inode, struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -366,11 +366,12 @@ int security_inode_link(struct dentry *o new_dentry, new_mnt); } -int security_inode_unlink(struct inode *dir, struct dentry *dentry) +int security_inode_unlink(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt) { if (unlikely(IS_PRIVATE(dentry-d_inode))) return 0; - return security_ops-inode_unlink(dir, dentry); + return security_ops-inode_unlink(dir, dentry, mnt); } int security_inode_symlink(struct inode *dir, struct dentry *dentry, --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2432,11 +2432,12 @@ static int selinux_inode_link(struct den return may_link(dir, old_dentry, MAY_LINK); } -static int selinux_inode_unlink(struct inode *dir, struct dentry *dentry) +static int selinux_inode_unlink(struct inode *dir, struct dentry *dentry, + struct vfsmount *mnt) { int rc; - rc = secondary_ops-inode_unlink(dir, dentry); + rc = secondary_ops-inode_unlink(dir, dentry, mnt); if (rc) return rc; return may_link(dir, dentry, MAY_UNLINK); -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 19/47] Add struct vfsmount parameters to vfs_rename()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |7 ++- fs/namei.c | 19 --- fs/nfsd/vfs.c |3 ++- fs/unionfs/rename.c |6 +- include/linux/fs.h |2 +- 5 files changed, 26 insertions(+), 11 deletions(-) --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -588,19 +588,24 @@ ecryptfs_rename(struct inode *old_dir, s { int rc; struct dentry *lower_old_dentry; + struct vfsmount *lower_old_mnt; struct dentry *lower_new_dentry; + struct vfsmount *lower_new_mnt; struct dentry *lower_old_dir_dentry; struct dentry *lower_new_dir_dentry; lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry); + lower_old_mnt = ecryptfs_dentry_to_lower_mnt(old_dentry); lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry); + lower_new_mnt = ecryptfs_dentry_to_lower_mnt(new_dentry); dget(lower_old_dentry); dget(lower_new_dentry); lower_old_dir_dentry = dget_parent(lower_old_dentry); lower_new_dir_dentry = dget_parent(lower_new_dentry); lock_rename(lower_old_dir_dentry, lower_new_dir_dentry); rc = vfs_rename(lower_old_dir_dentry-d_inode, lower_old_dentry, - lower_new_dir_dentry-d_inode, lower_new_dentry); + lower_old_mnt, lower_new_dir_dentry-d_inode, + lower_new_dentry, lower_new_mnt); if (rc) goto out_lock; fsstack_copy_attr_all(new_dir, lower_new_dir_dentry-d_inode); --- a/fs/namei.c +++ b/fs/namei.c @@ -2600,7 +2600,8 @@ asmlinkage long sys_link(const char __us *locking]. */ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt) { int error = 0; struct inode *target; @@ -2643,7 +2644,8 @@ static int vfs_rename_dir(struct inode * } static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt) { struct inode *target; int error; @@ -2671,7 +2673,8 @@ static int vfs_rename_other(struct inode } int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt) { int error; int is_dir = S_ISDIR(old_dentry-d_inode-i_mode); @@ -2700,9 +2703,11 @@ int vfs_rename(struct inode *old_dir, st old_name = fsnotify_oldname_init(old_dentry-d_name.name); if (is_dir) - error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry); + error = vfs_rename_dir(old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); else - error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry); + error = vfs_rename_other(old_dir, old_dentry, old_mnt, +new_dir, new_dentry, new_mnt); if (!error) { const char *new_name = old_dentry-d_name.name; fsnotify_move(old_dir, new_dir, old_name, new_name, is_dir, @@ -2777,8 +2782,8 @@ static int do_rename(int olddfd, const c error = mnt_want_write(oldnd.path.mnt); if (error) goto exit5; - error = vfs_rename(old_dir-d_inode, old_dentry, - new_dir-d_inode, new_dentry); + error = vfs_rename(old_dir-d_inode, old_dentry, oldnd.path.mnt, + new_dir-d_inode, new_dentry, newnd.path.mnt); mnt_drop_write(oldnd.path.mnt); exit5: dput(new_dentry); --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1697,7 +1697,8 @@ nfsd_rename(struct svc_rqst *rqstp, stru if (host_err) goto out_dput_new; - host_err = vfs_rename(fdir, odentry, tdir, ndentry); + host_err = vfs_rename(fdir, odentry, ffhp-fh_export-ex_path.mnt, + tdir, ndentry, tfhp-fh_export-ex_path.mnt); if (!host_err EX_ISSYNC(tfhp-fh_export)) { host_err = nfsd_sync_dir(tdentry); if (!host_err) --- a/fs/unionfs/rename.c +++ b/fs/unionfs/rename.c @@ -24,6 +24,7 @@ static int __unionfs_rename(struct inode { int err =
[AppArmor 20/47] Pass struct vfsmount to the inode_rename LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |6 -- include/linux/security.h | 13 ++--- security/dummy.c |4 +++- security/security.c |7 --- security/selinux/hooks.c |8 ++-- 5 files changed, 27 insertions(+), 11 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -2616,7 +2616,8 @@ static int vfs_rename_dir(struct inode * return error; } - error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry); + error = security_inode_rename(old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); if (error) return error; @@ -2650,7 +2651,8 @@ static int vfs_rename_other(struct inode struct inode *target; int error; - error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry); + error = security_inode_rename(old_dir, old_dentry, old_mnt, + new_dir, new_dentry, new_mnt); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -367,8 +367,10 @@ struct request_sock; * Check for permission to rename a file or directory. * @old_dir contains the inode structure for parent of the old link. * @old_dentry contains the dentry structure of the old link. + * @old_mnt is the vfsmount corresponding to @old_dentry (may be NULL). * @new_dir contains the inode structure for parent of the new link. * @new_dentry contains the dentry structure of the new link. + * @new_mnt is the vfsmount corresponding to @new_dentry (may be NULL). * Return 0 if permission is granted. * @inode_readlink: * Check the permission to read the symbolic link. @@ -1298,7 +1300,9 @@ struct security_operations { int (*inode_mknod) (struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, -struct inode *new_dir, struct dentry *new_dentry); +struct vfsmount *old_mnt, +struct inode *new_dir, struct dentry *new_dentry, +struct vfsmount *new_mnt); int (*inode_readlink) (struct dentry *dentry, struct vfsmount *mnt); int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd); int (*inode_permission) (struct inode *inode, int mask, struct nameidata *nd); @@ -1567,7 +1571,8 @@ int security_inode_rmdir(struct inode *d int security_inode_mknod(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt, int mode, dev_t dev); int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry); + struct vfsmount *old_mnt, struct inode *new_dir, + struct dentry *new_dentry, struct vfsmount *new_mnt); int security_inode_readlink(struct dentry *dentry, struct vfsmount *mnt); int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd); int security_inode_permission(struct inode *inode, int mask, struct nameidata *nd); @@ -1924,8 +1929,10 @@ static inline int security_inode_mknod ( static inline int security_inode_rename (struct inode *old_dir, struct dentry *old_dentry, +struct vfsmount *old_mnt, struct inode *new_dir, -struct dentry *new_dentry) +struct dentry *new_dentry, +struct vfsmount *new_mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -335,8 +335,10 @@ static int dummy_inode_mknod (struct ino static int dummy_inode_rename (struct inode *old_inode, struct dentry *old_dentry, + struct vfsmount *old_mnt, struct inode *new_inode, - struct dentry *new_dentry) + struct dentry *new_dentry, + struct vfsmount *new_mnt) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -407,13 +407,14 @@ int security_inode_mknod(struct inode *d } int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry) + struct vfsmount *old_mnt, struct inode *new_dir, +
[AppArmor 21/47] Add a struct vfsmount parameter to vfs_setxattr()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/nfsd/vfs.c | 16 +++- fs/unionfs/copyup.c | 18 -- fs/unionfs/xattr.c|7 +-- fs/xattr.c| 16 include/linux/xattr.h |3 ++- 5 files changed, 38 insertions(+), 22 deletions(-) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -426,7 +426,8 @@ static ssize_t nfsd_getxattr(struct dent #if defined(CONFIG_NFSD_V4) static int -set_nfsv4_acl_one(struct dentry *dentry, struct posix_acl *pacl, char *key) +set_nfsv4_acl_one(struct dentry *dentry, struct vfsmount *mnt, + struct posix_acl *pacl, char *key) { int len; size_t buflen; @@ -445,7 +446,7 @@ set_nfsv4_acl_one(struct dentry *dentry, goto out; } - error = vfs_setxattr(dentry, key, buf, len, 0); + error = vfs_setxattr(dentry, mnt, key, buf, len, 0); out: kfree(buf); return error; @@ -458,6 +459,7 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst __be32 error; int host_error; struct dentry *dentry; + struct vfsmount *mnt; struct inode *inode; struct posix_acl *pacl = NULL, *dpacl = NULL; unsigned int flags = 0; @@ -468,6 +470,7 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst return error; dentry = fhp-fh_dentry; + mnt = fhp-fh_export-ex_path.mnt; inode = dentry-d_inode; if (S_ISDIR(inode-i_mode)) flags = NFS4_ACL_DIR; @@ -478,12 +481,14 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst } else if (host_error 0) goto out_nfserr; - host_error = set_nfsv4_acl_one(dentry, pacl, POSIX_ACL_XATTR_ACCESS); + host_error = set_nfsv4_acl_one(dentry, mnt, pacl, + POSIX_ACL_XATTR_ACCESS); if (host_error 0) goto out_release; if (S_ISDIR(inode-i_mode)) - host_error = set_nfsv4_acl_one(dentry, dpacl, POSIX_ACL_XATTR_DEFAULT); + host_error = set_nfsv4_acl_one(dentry, mnt, dpacl, + POSIX_ACL_XATTR_DEFAULT); out_release: posix_acl_release(pacl); @@ -2057,7 +2062,8 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i size = 0; if (size) - error = vfs_setxattr(fhp-fh_dentry, name, value, size, 0); + error = vfs_setxattr(fhp-fh_dentry, fhp-fh_export-ex_path.mnt, +name, value, size,0); else { if (!S_ISDIR(inode-i_mode) type == ACL_TYPE_DEFAULT) error = 0; --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -26,7 +26,9 @@ #ifdef CONFIG_UNION_FS_XATTR /* copyup all extended attrs for a given dentry */ static int copyup_xattrs(struct dentry *old_lower_dentry, -struct dentry *new_lower_dentry) +struct vfsmount *old_lower_mnt, +struct dentry *new_lower_dentry, +struct vfsmount *new_lower_mnt) { int err = 0; ssize_t list_size = -1; @@ -82,8 +84,8 @@ static int copyup_xattrs(struct dentry * goto out; } /* Don't lock here since vfs_setxattr does it for us. */ - err = vfs_setxattr(new_lower_dentry, name_list, attr_value, - size, 0); + err = vfs_setxattr(new_lower_dentry, new_lower_mnt, name_list, + attr_value, size, 0); /* * Selinux depends on security.* xattrs, so to maintain * the security of copied-up files, if Selinux is active, @@ -93,8 +95,8 @@ static int copyup_xattrs(struct dentry * */ if (err == -EPERM !capable(CAP_FOWNER)) { cap_raise(current-cap_effective, CAP_FOWNER); - err = vfs_setxattr(new_lower_dentry, name_list, - attr_value, size, 0); + err = vfs_setxattr(new_lower_dentry, new_lower_mnt, + name_list, attr_value, size, 0); cap_lower(current-cap_effective, CAP_FOWNER); } if (err 0) @@ -380,6 +382,7 @@ int copyup_dentry(struct inode *dir, str struct dentry *new_lower_dentry; struct vfsmount *new_lower_mnt; struct dentry *old_lower_dentry = NULL; + struct vfsmount *old_lower_mnt; struct super_block *sb; int err = 0; int old_bindex; @@ -413,6 +416,8 @@ int copyup_dentry(struct inode *dir, str } old_lower_dentry =
[AppArmor 22/47] Pass struct vfsmount to the inode_setxattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |4 ++-- include/linux/security.h | 35 +-- security/commoncap.c |4 ++-- security/dummy.c |9 ++--- security/security.c | 14 -- security/selinux/hooks.c |8 ++-- 6 files changed, 45 insertions(+), 29 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -78,7 +78,7 @@ vfs_setxattr(struct dentry *dentry, stru return error; mutex_lock(inode-i_mutex); - error = security_inode_setxattr(dentry, name, value, size, flags); + error = security_inode_setxattr(dentry, mnt, name, value, size, flags); if (error) goto out; error = -EOPNOTSUPP; @@ -86,7 +86,7 @@ vfs_setxattr(struct dentry *dentry, stru error = inode-i_op-setxattr(dentry, name, value, size, flags); if (!error) { fsnotify_xattr(dentry); - security_inode_post_setxattr(dentry, name, value, + security_inode_post_setxattr(dentry, mnt, name, value, size, flags); } } else if (!strncmp(name, XATTR_SECURITY_PREFIX, --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,7 +57,7 @@ extern void cap_capset_set (struct task_ extern int cap_bprm_set_security (struct linux_binprm *bprm); extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe); extern int cap_bprm_secureexec(struct linux_binprm *bprm); -extern int cap_inode_setxattr(struct dentry *dentry, char *name, void *value, size_t size, int flags); +extern int cap_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, void *value, size_t size, int flags); extern int cap_inode_removexattr(struct dentry *dentry, char *name); extern int cap_inode_need_killpriv(struct dentry *dentry); extern int cap_inode_killpriv(struct dentry *dentry); @@ -415,11 +415,11 @@ struct request_sock; * inode. * @inode_setxattr: * Check permission before setting the extended attributes - * @value identified by @name for @dentry. + * @value identified by @name for @dentry and @mnt. * Return 0 if permission is granted. * @inode_post_setxattr: * Update inode security field after successful setxattr operation. - * @value identified by @name for @dentry. + * @value identified by @name for @dentry and @mnt. * @inode_getxattr: * Check permission before obtaining the extended attributes * identified by @name for @dentry. @@ -1310,9 +1310,11 @@ struct security_operations { struct iattr *attr); int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry); void (*inode_delete) (struct inode *inode); - int (*inode_setxattr) (struct dentry *dentry, char *name, void *value, - size_t size, int flags); - void (*inode_post_setxattr) (struct dentry *dentry, char *name, void *value, + int (*inode_setxattr) (struct dentry *dentry, struct vfsmount *mnt, + char *name, void *value, size_t size, int flags); + void (*inode_post_setxattr) (struct dentry *dentry, +struct vfsmount *mnt, +char *name, void *value, size_t size, int flags); int (*inode_getxattr) (struct dentry *dentry, char *name); int (*inode_listxattr) (struct dentry *dentry); @@ -1580,10 +1582,11 @@ int security_inode_setattr(struct dentry struct iattr *attr); int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry); void security_inode_delete(struct inode *inode); -int security_inode_setxattr(struct dentry *dentry, char *name, -void *value, size_t size, int flags); -void security_inode_post_setxattr(struct dentry *dentry, char *name, - void *value, size_t size, int flags); +int security_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, + char *name, void *value, size_t size, int flags); +void security_inode_post_setxattr(struct dentry *dentry, struct vfsmount *mnt, + char *name, void *value, size_t size, + int flags); int security_inode_getxattr(struct dentry *dentry, char *name); int security_inode_listxattr(struct dentry *dentry); int security_inode_removexattr(struct dentry *dentry, char *name); @@ -1971,14 +1974,18 @@ static inline int security_inode_getattr static inline void security_inode_delete (struct inode *inode) { } -static inline int
[AppArmor 23/47] Add a struct vfsmount parameter to vfs_getxattr()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/nfsd/nfs4xdr.c |2 +- fs/nfsd/vfs.c | 21 - fs/unionfs/copyup.c |2 +- fs/unionfs/xattr.c|4 +++- fs/xattr.c| 14 -- include/linux/nfsd/nfsd.h |3 ++- include/linux/xattr.h |2 +- 7 files changed, 28 insertions(+), 20 deletions(-) --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1501,7 +1501,7 @@ nfsd4_encode_fattr(struct svc_fh *fhp, s } if (bmval0 (FATTR4_WORD0_ACL | FATTR4_WORD0_ACLSUPPORT | FATTR4_WORD0_SUPPORTED_ATTRS)) { - err = nfsd4_get_nfs4_acl(rqstp, dentry, acl); + err = nfsd4_get_nfs4_acl(rqstp, dentry, exp-ex_path.mnt, acl); aclsupport = (err == 0); if (bmval0 FATTR4_WORD0_ACL) { if (err == -EOPNOTSUPP) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -408,11 +408,12 @@ out_nfserr: #if defined(CONFIG_NFSD_V2_ACL) || \ defined(CONFIG_NFSD_V3_ACL) || \ defined(CONFIG_NFSD_V4) -static ssize_t nfsd_getxattr(struct dentry *dentry, char *key, void **buf) +static ssize_t nfsd_getxattr(struct dentry *dentry, struct vfsmount *mnt, +char *key, void **buf) { ssize_t buflen; - buflen = vfs_getxattr(dentry, key, NULL, 0); + buflen = vfs_getxattr(dentry, mnt, key, NULL, 0); if (buflen = 0) return buflen; @@ -420,7 +421,7 @@ static ssize_t nfsd_getxattr(struct dent if (!*buf) return -ENOMEM; - return vfs_getxattr(dentry, key, *buf, buflen); + return vfs_getxattr(dentry, mnt, key, *buf, buflen); } #endif @@ -501,13 +502,13 @@ out_nfserr: } static struct posix_acl * -_get_posix_acl(struct dentry *dentry, char *key) +_get_posix_acl(struct dentry *dentry, struct vfsmount *mnt, char *key) { void *buf = NULL; struct posix_acl *pacl = NULL; int buflen; - buflen = nfsd_getxattr(dentry, key, buf); + buflen = nfsd_getxattr(dentry, mnt, key, buf); if (!buflen) buflen = -ENODATA; if (buflen = 0) @@ -519,14 +520,15 @@ _get_posix_acl(struct dentry *dentry, ch } int -nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, struct nfs4_acl **acl) +nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, + struct vfsmount *mnt, struct nfs4_acl **acl) { struct inode *inode = dentry-d_inode; int error = 0; struct posix_acl *pacl = NULL, *dpacl = NULL; unsigned int flags = 0; - pacl = _get_posix_acl(dentry, POSIX_ACL_XATTR_ACCESS); + pacl = _get_posix_acl(dentry, mnt, POSIX_ACL_XATTR_ACCESS); if (IS_ERR(pacl) PTR_ERR(pacl) == -ENODATA) pacl = posix_acl_from_mode(inode-i_mode, GFP_KERNEL); if (IS_ERR(pacl)) { @@ -536,7 +538,7 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqst } if (S_ISDIR(inode-i_mode)) { - dpacl = _get_posix_acl(dentry, POSIX_ACL_XATTR_DEFAULT); + dpacl = _get_posix_acl(dentry, mnt, POSIX_ACL_XATTR_DEFAULT); if (IS_ERR(dpacl) PTR_ERR(dpacl) == -ENODATA) dpacl = NULL; else if (IS_ERR(dpacl)) { @@ -2017,7 +2019,8 @@ nfsd_get_posix_acl(struct svc_fh *fhp, i return ERR_PTR(-EOPNOTSUPP); } - size = nfsd_getxattr(fhp-fh_dentry, name, value); + size = nfsd_getxattr(fhp-fh_dentry, fhp-fh_export-ex_path.mnt, name, +value); if (size 0) return ERR_PTR(size); --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -72,7 +72,7 @@ static int copyup_xattrs(struct dentry * /* Lock here since vfs_getxattr doesn't lock for us */ mutex_lock(old_lower_dentry-d_inode-i_mutex); - size = vfs_getxattr(old_lower_dentry, name_list, + size = vfs_getxattr(old_lower_dentry, old_lower_mnt, name_list, attr_value, XATTR_SIZE_MAX); mutex_unlock(old_lower_dentry-d_inode-i_mutex); if (size 0) { --- a/fs/unionfs/xattr.c +++ b/fs/unionfs/xattr.c @@ -43,6 +43,7 @@ ssize_t unionfs_getxattr(struct dentry * size_t size) { struct dentry *lower_dentry = NULL; + struct vfsmount *lower_mnt; int err = -EOPNOTSUPP; unionfs_read_lock(dentry-d_sb); @@ -54,8 +55,9 @@ ssize_t unionfs_getxattr(struct dentry * } lower_dentry = unionfs_lower_dentry(dentry); + lower_mnt = unionfs_lower_mnt(dentry); - err = vfs_getxattr(lower_dentry, (char *) name, value, size); + err =
[AppArmor 24/47] Pass struct vfsmount to the inode_getxattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |2 +- include/linux/security.h | 11 +++ security/dummy.c |3 ++- security/security.c |5 +++-- security/selinux/hooks.c |3 ++- 5 files changed, 15 insertions(+), 9 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -141,7 +141,7 @@ vfs_getxattr(struct dentry *dentry, stru if (error) return error; - error = security_inode_getxattr(dentry, name); + error = security_inode_getxattr(dentry, mnt, name); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -422,7 +422,7 @@ struct request_sock; * @value identified by @name for @dentry and @mnt. * @inode_getxattr: * Check permission before obtaining the extended attributes - * identified by @name for @dentry. + * identified by @name for @dentry and @mnt. * Return 0 if permission is granted. * @inode_listxattr: * Check permission before obtaining the list of extended attribute @@ -1316,7 +1316,8 @@ struct security_operations { struct vfsmount *mnt, char *name, void *value, size_t size, int flags); - int (*inode_getxattr) (struct dentry *dentry, char *name); + int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt, + char *name); int (*inode_listxattr) (struct dentry *dentry); int (*inode_removexattr) (struct dentry *dentry, char *name); int (*inode_need_killpriv) (struct dentry *dentry); @@ -1587,7 +1588,8 @@ int security_inode_setxattr(struct dentr void security_inode_post_setxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, void *value, size_t size, int flags); -int security_inode_getxattr(struct dentry *dentry, char *name); +int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt, + char *name); int security_inode_listxattr(struct dentry *dentry); int security_inode_removexattr(struct dentry *dentry, char *name); int security_inode_need_killpriv(struct dentry *dentry); @@ -1988,7 +1990,8 @@ static inline void security_inode_post_s int flags) { } -static inline int security_inode_getxattr (struct dentry *dentry, char *name) +static inline int security_inode_getxattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -393,7 +393,8 @@ static void dummy_inode_post_setxattr (s { } -static int dummy_inode_getxattr (struct dentry *dentry, char *name) +static int dummy_inode_getxattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -478,11 +478,12 @@ void security_inode_post_setxattr(struct security_ops-inode_post_setxattr(dentry, mnt, name, value, size, flags); } -int security_inode_getxattr(struct dentry *dentry, char *name) +int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt, + char *name) { if (unlikely(IS_PRIVATE(dentry-d_inode))) return 0; - return security_ops-inode_getxattr(dentry, name); + return security_ops-inode_getxattr(dentry, mnt, name); } int security_inode_listxattr(struct dentry *dentry) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2636,7 +2636,8 @@ static void selinux_inode_post_setxattr( return; } -static int selinux_inode_getxattr (struct dentry *dentry, char *name) +static int selinux_inode_getxattr (struct dentry *dentry, struct vfsmount *mnt, + char *name) { return dentry_has_perm(current, NULL, dentry, FILE__GETATTR); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 25/47] Add a struct vfsmount parameter to vfs_listxattr()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/unionfs/copyup.c |5 +++-- fs/unionfs/xattr.c|4 +++- fs/xattr.c| 25 ++--- include/linux/xattr.h |2 +- 4 files changed, 21 insertions(+), 15 deletions(-) --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -37,7 +37,7 @@ static int copyup_xattrs(struct dentry * char *name_list_buf = NULL; /* query the actual size of the xattr list */ - list_size = vfs_listxattr(old_lower_dentry, NULL, 0); + list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, NULL, 0); if (list_size = 0) { err = list_size; goto out; @@ -53,7 +53,8 @@ static int copyup_xattrs(struct dentry * name_list_buf = name_list; /* save for kfree at end */ /* now get the actual xattr list of the source file */ - list_size = vfs_listxattr(old_lower_dentry, name_list, list_size); + list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, name_list, + list_size); if (list_size = 0) { err = list_size; goto out; --- a/fs/unionfs/xattr.c +++ b/fs/unionfs/xattr.c @@ -134,6 +134,7 @@ out: ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size) { struct dentry *lower_dentry = NULL; + struct vfsmount *lower_mnt; int err = -EOPNOTSUPP; char *encoded_list = NULL; @@ -146,9 +147,10 @@ ssize_t unionfs_listxattr(struct dentry } lower_dentry = unionfs_lower_dentry(dentry); + lower_mnt = unionfs_lower_mnt(dentry); encoded_list = list; - err = vfs_listxattr(lower_dentry, encoded_list, size); + err = vfs_listxattr(lower_dentry, lower_mnt, encoded_list, size); out: unionfs_check_dentry(dentry); --- a/fs/xattr.c +++ b/fs/xattr.c @@ -168,18 +168,20 @@ nolsm: EXPORT_SYMBOL_GPL(vfs_getxattr); ssize_t -vfs_listxattr(struct dentry *d, char *list, size_t size) +vfs_listxattr(struct dentry *dentry, struct vfsmount *mnt, char *list, + size_t size) { + struct inode *inode = dentry-d_inode; ssize_t error; - error = security_inode_listxattr(d); + error = security_inode_listxattr(dentry); if (error) return error; error = -EOPNOTSUPP; - if (d-d_inode-i_op d-d_inode-i_op-listxattr) { - error = d-d_inode-i_op-listxattr(d, list, size); - } else { - error = security_inode_listsecurity(d-d_inode, list, size); + if (inode-i_op inode-i_op-listxattr) + error = inode-i_op-listxattr(dentry, list, size); + else { + error = security_inode_listsecurity(inode, list, size); if (size error size) error = -ERANGE; } @@ -400,7 +402,8 @@ sys_fgetxattr(int fd, char __user *name, * Extended attribute LIST operations */ static ssize_t -listxattr(struct dentry *d, char __user *list, size_t size) +listxattr(struct dentry *dentry, struct vfsmount *mnt, char __user *list, + size_t size) { ssize_t error; char *klist = NULL; @@ -413,7 +416,7 @@ listxattr(struct dentry *d, char __user return -ENOMEM; } - error = vfs_listxattr(d, klist, size); + error = vfs_listxattr(dentry, mnt, klist, size); if (error 0) { if (size copy_to_user(list, klist, error)) error = -EFAULT; @@ -435,7 +438,7 @@ sys_listxattr(char __user *path, char __ error = user_path_walk(path, nd); if (error) return error; - error = listxattr(nd.path.dentry, list, size); + error = listxattr(nd.path.dentry, nd.path.mnt, list, size); path_put(nd.path); return error; } @@ -449,7 +452,7 @@ sys_llistxattr(char __user *path, char _ error = user_path_walk_link(path, nd); if (error) return error; - error = listxattr(nd.path.dentry, list, size); + error = listxattr(nd.path.dentry, nd.path.mnt, list, size); path_put(nd.path); return error; } @@ -464,7 +467,7 @@ sys_flistxattr(int fd, char __user *list if (!f) return error; audit_inode(NULL, f-f_path.dentry); - error = listxattr(f-f_path.dentry, list, size); + error = listxattr(f-f_path.dentry, f-f_path.mnt, list, size); fput(f); return error; } --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -48,7 +48,7 @@ struct xattr_handler { ssize_t xattr_getsecurity(struct inode *, const char *, void *, size_t); ssize_t vfs_getxattr(struct dentry *, struct vfsmount *, char *, void *, size_t); -ssize_t vfs_listxattr(struct
[AppArmor 26/47] Pass struct vfsmount to the inode_listxattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |2 +- include/linux/security.h |9 + security/dummy.c |2 +- security/security.c |4 ++-- security/selinux/hooks.c |2 +- 5 files changed, 10 insertions(+), 9 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -174,7 +174,7 @@ vfs_listxattr(struct dentry *dentry, str struct inode *inode = dentry-d_inode; ssize_t error; - error = security_inode_listxattr(dentry); + error = security_inode_listxattr(dentry, mnt); if (error) return error; error = -EOPNOTSUPP; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -426,7 +426,7 @@ struct request_sock; * Return 0 if permission is granted. * @inode_listxattr: * Check permission before obtaining the list of extended attribute - * names for @dentry. + * names for @dentry and @mnt. * Return 0 if permission is granted. * @inode_removexattr: * Check permission before removing the extended attribute @@ -1318,7 +1318,7 @@ struct security_operations { size_t size, int flags); int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt, char *name); - int (*inode_listxattr) (struct dentry *dentry); + int (*inode_listxattr) (struct dentry *dentry, struct vfsmount *mnt); int (*inode_removexattr) (struct dentry *dentry, char *name); int (*inode_need_killpriv) (struct dentry *dentry); int (*inode_killpriv) (struct dentry *dentry); @@ -1590,7 +1590,7 @@ void security_inode_post_setxattr(struct int flags); int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt, char *name); -int security_inode_listxattr(struct dentry *dentry); +int security_inode_listxattr(struct dentry *dentry, struct vfsmount *mnt); int security_inode_removexattr(struct dentry *dentry, char *name); int security_inode_need_killpriv(struct dentry *dentry); int security_inode_killpriv(struct dentry *dentry); @@ -1996,7 +1996,8 @@ static inline int security_inode_getxatt return 0; } -static inline int security_inode_listxattr (struct dentry *dentry) +static inline int security_inode_listxattr (struct dentry *dentry, + struct vfsmount *mnt) { return 0; } --- a/security/dummy.c +++ b/security/dummy.c @@ -399,7 +399,7 @@ static int dummy_inode_getxattr (struct return 0; } -static int dummy_inode_listxattr (struct dentry *dentry) +static int dummy_inode_listxattr (struct dentry *dentry, struct vfsmount *mnt) { return 0; } --- a/security/security.c +++ b/security/security.c @@ -486,11 +486,11 @@ int security_inode_getxattr(struct dentr return security_ops-inode_getxattr(dentry, mnt, name); } -int security_inode_listxattr(struct dentry *dentry) +int security_inode_listxattr(struct dentry *dentry, struct vfsmount *mnt) { if (unlikely(IS_PRIVATE(dentry-d_inode))) return 0; - return security_ops-inode_listxattr(dentry); + return security_ops-inode_listxattr(dentry, mnt); } int security_inode_removexattr(struct dentry *dentry, char *name) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2642,7 +2642,7 @@ static int selinux_inode_getxattr (struc return dentry_has_perm(current, NULL, dentry, FILE__GETATTR); } -static int selinux_inode_listxattr (struct dentry *dentry) +static int selinux_inode_listxattr (struct dentry *dentry, struct vfsmount *mnt) { return dentry_has_perm(current, NULL, dentry, FILE__GETATTR); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 9182] Critical memory leak (dirty pages)
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote: On Sun, 16 Dec 2007, Krzysztof Oledzki wrote: I'll confirm this tomorrow but it seems that even switching to data=ordered (AFAIK default o ext3) is indeed enough to cure this problem. Ok, do we actually have any ext3 expert following this? I have no idea about what the journalling code does, but I have painful memories of ext3 doing really odd buffer-head-based IO and totally bypassing all the normal page dirty logic. Judging by the symptoms (sorry for not following this well, it came up while I was mostly away travelling), something probably *does* clear the dirty bit on the pages, but the dirty *accounting* is not done properly, so the kernel keeps thinking it has dirty pages. Now, a simple grep shows that ext3 does not actually do any ClearPageDirty() or similar on its own, although maybe I missed some other subtle way this can happen. And the *normal* VFS routines that do ClearPageDirty should all be doing the proper accounting. So I see a couple of possible cases: - actually clearing the PG_dirty bit somehow, without doing the accounting. This looks very unlikely. PG_dirty is always cleared by some variant of *ClearPageDirty(), and that bit definition isn't used for anything else in the whole kernel judging by grep (the page allocator tests the bit, that's it). OK, so I looked for PG_dirty anyway. In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers bail out if the page is dirty. Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed truncate_complete_page, because it called cancel_dirty_page (and thus cleared PG_dirty) after try_to_free_buffers was called via do_invalidatepage. Now, if I'm not mistaken, we can end up as follows. truncate_complete_page() cancel_dirty_page() // PG_dirty cleared, decr. dirty pages do_invalidatepage() ext3_invalidatepage() journal_invalidatepage() journal_unmap_buffer() __dispose_buffer() __journal_unfile_buffer() __journal_temp_unlink_buffer() mark_buffer_dirty(); // PG_dirty set, incr. dirty pages If journal_unmap_buffer then returns 0, try_to_free_buffers is not called and neither is cancel_dirty_page, so the dirty pages accounting is not decreased again. As try_to_free_buffers got its ext3 hack back in ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe 3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for the accounting fix in cancel_dirty_page, of course). On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task io accounting for cancelled writes happened always happened if the page was dirty, regardless of page-mapping. This was also already true for the old test_clear_page_dirty code, and the commit log for 8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic change either, so maybe the if (account_size) block should be moved out of the if (mapping ...) block? Björn - not sending patches because he needs sleep and wouldn't have a damn clue about what to write as a commit message anyway -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 27/47] Add a struct vfsmount parameter to vfs_removexattr()
The vfsmount will be passed down to the LSM hook so that LSMs can compute pathnames. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/nfsd/vfs.c |7 --- fs/unionfs/xattr.c|4 +++- fs/xattr.c| 12 ++-- include/linux/xattr.h |2 +- 4 files changed, 14 insertions(+), 11 deletions(-) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -2032,6 +2032,7 @@ nfsd_get_posix_acl(struct svc_fh *fhp, i int nfsd_set_posix_acl(struct svc_fh *fhp, int type, struct posix_acl *acl) { + struct vfsmount *mnt; struct inode *inode = fhp-fh_dentry-d_inode; char *name; void *value = NULL; @@ -2064,14 +2065,14 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i } else size = 0; + mnt = fhp-fh_export-ex_path.mnt; if (size) - error = vfs_setxattr(fhp-fh_dentry, fhp-fh_export-ex_path.mnt, -name, value, size,0); + error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0); else { if (!S_ISDIR(inode-i_mode) type == ACL_TYPE_DEFAULT) error = 0; else { - error = vfs_removexattr(fhp-fh_dentry, name); + error = vfs_removexattr(fhp-fh_dentry, mnt, name); if (error == -ENODATA) error = 0; } --- a/fs/unionfs/xattr.c +++ b/fs/unionfs/xattr.c @@ -106,6 +106,7 @@ out: int unionfs_removexattr(struct dentry *dentry, const char *name) { struct dentry *lower_dentry = NULL; + struct vfsmount *lower_mnt; int err = -EOPNOTSUPP; unionfs_read_lock(dentry-d_sb); @@ -117,8 +118,9 @@ int unionfs_removexattr(struct dentry *d } lower_dentry = unionfs_lower_dentry(dentry); + lower_mnt = unionfs_lower_mnt(dentry); - err = vfs_removexattr(lower_dentry, (char *) name); + err = vfs_removexattr(lower_dentry, lower_mnt, (char *) name); out: unionfs_check_dentry(dentry); --- a/fs/xattr.c +++ b/fs/xattr.c @@ -190,7 +190,7 @@ vfs_listxattr(struct dentry *dentry, str EXPORT_SYMBOL_GPL(vfs_listxattr); int -vfs_removexattr(struct dentry *dentry, char *name) +vfs_removexattr(struct dentry *dentry, struct vfsmount *mnt, char *name) { struct inode *inode = dentry-d_inode; int error; @@ -476,7 +476,7 @@ sys_flistxattr(int fd, char __user *list * Extended attribute REMOVE operations */ static long -removexattr(struct dentry *d, char __user *name) +removexattr(struct dentry *dentry, struct vfsmount *mnt, char __user *name) { int error; char kname[XATTR_NAME_MAX + 1]; @@ -487,7 +487,7 @@ removexattr(struct dentry *d, char __use if (error 0) return error; - return vfs_removexattr(d, kname); + return vfs_removexattr(dentry, mnt, kname); } asmlinkage long @@ -499,7 +499,7 @@ sys_removexattr(char __user *path, char error = user_path_walk(path, nd); if (error) return error; - error = removexattr(nd.path.dentry, name); + error = removexattr(nd.path.dentry, nd.path.mnt, name); path_put(nd.path); return error; } @@ -513,7 +513,7 @@ sys_lremovexattr(char __user *path, char error = user_path_walk_link(path, nd); if (error) return error; - error = removexattr(nd.path.dentry, name); + error = removexattr(nd.path.dentry, nd.path.mnt, name); path_put(nd.path); return error; } @@ -530,7 +530,7 @@ sys_fremovexattr(int fd, char __user *na return error; dentry = f-f_path.dentry; audit_inode(NULL, dentry); - error = removexattr(dentry, name); + error = removexattr(dentry, f-f_path.mnt, name); fput(f); return error; } --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -51,7 +51,7 @@ ssize_t vfs_getxattr(struct dentry *, st ssize_t vfs_listxattr(struct dentry *d, struct vfsmount *, char *list, size_t size); int vfs_setxattr(struct dentry *, struct vfsmount *, char *, void *, size_t, int); -int vfs_removexattr(struct dentry *, char *); +int vfs_removexattr(struct dentry *, struct vfsmount *, char *); ssize_t generic_getxattr(struct dentry *dentry, const char *name, void *buffer, size_t size); ssize_t generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size); -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 28/47] Pass struct vfsmount to the inode_removexattr LSM hook
This is needed for computing pathnames in the AppArmor LSM. Signed-off-by: Tony Jones [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/xattr.c |2 +- include/linux/security.h | 13 - security/commoncap.c |3 ++- security/dummy.c |3 ++- security/security.c |5 +++-- security/selinux/hooks.c |3 ++- 6 files changed, 18 insertions(+), 11 deletions(-) --- a/fs/xattr.c +++ b/fs/xattr.c @@ -202,7 +202,7 @@ vfs_removexattr(struct dentry *dentry, s if (error) return error; - error = security_inode_removexattr(dentry, name); + error = security_inode_removexattr(dentry, mnt, name); if (error) return error; --- a/include/linux/security.h +++ b/include/linux/security.h @@ -58,7 +58,7 @@ extern int cap_bprm_set_security (struct extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe); extern int cap_bprm_secureexec(struct linux_binprm *bprm); extern int cap_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, char *name, void *value, size_t size, int flags); -extern int cap_inode_removexattr(struct dentry *dentry, char *name); +extern int cap_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, char *name); extern int cap_inode_need_killpriv(struct dentry *dentry); extern int cap_inode_killpriv(struct dentry *dentry); extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags); @@ -1319,7 +1319,8 @@ struct security_operations { int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt, char *name); int (*inode_listxattr) (struct dentry *dentry, struct vfsmount *mnt); - int (*inode_removexattr) (struct dentry *dentry, char *name); + int (*inode_removexattr) (struct dentry *dentry, struct vfsmount *mnt, + char *name); int (*inode_need_killpriv) (struct dentry *dentry); int (*inode_killpriv) (struct dentry *dentry); int (*inode_getsecurity)(const struct inode *inode, const char *name, void **buffer, bool alloc); @@ -1591,7 +1592,8 @@ void security_inode_post_setxattr(struct int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt, char *name); int security_inode_listxattr(struct dentry *dentry, struct vfsmount *mnt); -int security_inode_removexattr(struct dentry *dentry, char *name); +int security_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, + char *name); int security_inode_need_killpriv(struct dentry *dentry); int security_inode_killpriv(struct dentry *dentry); int security_inode_getsecurity(const struct inode *inode, const char *name, void **buffer, bool alloc); @@ -2002,9 +2004,10 @@ static inline int security_inode_listxat return 0; } -static inline int security_inode_removexattr (struct dentry *dentry, char *name) +static inline int security_inode_removexattr (struct dentry *dentry, + struct vfsmount *mnt, char *name) { - return cap_inode_removexattr(dentry, name); + return cap_inode_removexattr(dentry, mnt, name); } static inline int security_inode_need_killpriv(struct dentry *dentry) --- a/security/commoncap.c +++ b/security/commoncap.c @@ -393,7 +393,8 @@ int cap_inode_setxattr(struct dentry *de return 0; } -int cap_inode_removexattr(struct dentry *dentry, char *name) +int cap_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, + char *name) { if (!strcmp(name, XATTR_NAME_CAPS)) { if (!capable(CAP_SETFCAP)) --- a/security/dummy.c +++ b/security/dummy.c @@ -404,7 +404,8 @@ static int dummy_inode_listxattr (struct return 0; } -static int dummy_inode_removexattr (struct dentry *dentry, char *name) +static int dummy_inode_removexattr (struct dentry *dentry, struct vfsmount *mnt, + char *name) { if (!strncmp(name, XATTR_SECURITY_PREFIX, sizeof(XATTR_SECURITY_PREFIX) - 1) --- a/security/security.c +++ b/security/security.c @@ -493,11 +493,12 @@ int security_inode_listxattr(struct dent return security_ops-inode_listxattr(dentry, mnt); } -int security_inode_removexattr(struct dentry *dentry, char *name) +int security_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, + char *name) { if (unlikely(IS_PRIVATE(dentry-d_inode))) return 0; - return security_ops-inode_removexattr(dentry, name); + return security_ops-inode_removexattr(dentry, mnt, name); } int security_inode_need_killpriv(struct dentry *dentry) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2647,7 +2647,8 @@ static int
[AppArmor 29/47] Fix __d_path() for lazy unmounts and make it unambiguous
First, when __d_path() hits a lazily unmounted mount point, it tries to prepend the name of the lazily unmounted dentry to the path name. It gets this wrong, and also overwrites the slash that separates the name from the following pathname component. This patch fixes that; if a process was in directory /foo/bar and /foo got lazily unmounted, the old result was ``foobar'' (note the missing slash), while the new result with this patch is ``foo/bar''. Second, it isn't always possible to tell from the __d_path() result whether the specified root and rootmnt (i.e., the chroot) was reached. We need an unambiguous result for AppArmor at least though, so we make sure that paths will only start with a slash if the path leads all the way up to the root. We also add a @fail_deleted argument, which allows to get rid of some of the mess in sys_getcwd(). This patch leaves getcwd() and d_path() as they were before for everything except for bind-mounted directories; for them, it reports ``/foo/bar'' instead of ``foobar'' in the example described above. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] Acked-by: Alan Cox [EMAIL PROTECTED] --- fs/dcache.c | 166 ++-- 1 file changed, 96 insertions(+), 70 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1748,51 +1748,49 @@ shouldnt_be_hashed: } /** - * d_path - return the path of a dentry - * @dentry: dentry to report - * @vfsmnt: vfsmnt to which the dentry belongs - * @root: root dentry - * @rootmnt: vfsmnt to which the root dentry belongs + * __d_path - return the path of a dentry + * @path: path to report + * @root: root path * @buffer: buffer to return value in * @buflen: buffer length + * @fail_deleted: what to return for deleted files * - * Convert a dentry into an ASCII path name. If the entry has been deleted + * If @path is not connected to @root, the path returned will be relative + * (i.e., it will not start with a slash). * the string (deleted) is appended. Note that this is ambiguous. * * Returns the buffer or an error code if the path was too long. * - * buflen should be positive. Caller holds the dcache_lock. + * Returns the buffer or an error code. */ -static char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt, - struct path *root, char *buffer, int buflen) +static char *__d_path(struct path *path, struct path *root, + char *buffer, int buflen, int fail_deleted) { - char * end = buffer+buflen; - char * retval; - int namelen; + int namelen, is_slash; + struct dentry *dentry = path-dentry; + struct vfsmount *vfsmnt = path-mnt; + + if (buflen 2) + return ERR_PTR(-ENAMETOOLONG); + buffer += --buflen; + *buffer = '\0'; - *--end = '\0'; - buflen--; + spin_lock(dcache_lock); if (!IS_ROOT(dentry) d_unhashed(dentry)) { - buflen -= 10; - end -= 10; - if (buflen 0) + if (fail_deleted) { + buffer = ERR_PTR(-ENOENT); + goto out; + } + if (buflen 10) goto Elong; - memcpy(end, (deleted), 10); + buflen -= 10; + buffer -= 10; + memcpy(buffer, (deleted), 10); } - - if (buflen 1) - goto Elong; - /* Get '/' right */ - retval = end-1; - *retval = '/'; - - for (;;) { + while (dentry != root-dentry || vfsmnt != root-mnt) { struct dentry * parent; - if (dentry == root-dentry vfsmnt == root-mnt) - break; if (dentry == vfsmnt-mnt_root || IS_ROOT(dentry)) { - /* Global root? */ spin_lock(vfsmount_lock); if (vfsmnt-mnt_parent == vfsmnt) { spin_unlock(vfsmount_lock); @@ -1806,28 +1804,67 @@ static char *__d_path(struct dentry *den parent = dentry-d_parent; prefetch(parent); namelen = dentry-d_name.len; - buflen -= namelen + 1; - if (buflen 0) + if (buflen namelen + 1) goto Elong; - end -= namelen; - memcpy(end, dentry-d_name.name, namelen); - *--end = '/'; - retval = end; + buflen -= namelen + 1; + buffer -= namelen; + memcpy(buffer, dentry-d_name.name, namelen); + *--buffer = '/'; dentry = parent; } + /* Get '/' right. */ + if (*buffer != '/') + *--buffer = '/'; - return retval; +out: + spin_unlock(dcache_lock); + return buffer; global_root: + /* +* We went
[AppArmor 30/47] Make d_path() consistent across mount operations
The path that __d_path() computes can become slightly inconsistent when it races with mount operations: it grabs the vfsmount_lock when traversing mount points but immediately drops it again, only to re-grab it when it reaches the next mount point. The result is that the filename computed is not always consisent, and the file may never have had that name. (This is unlikely, but still possible.) Fix this by grabbing the vfsmount_lock when the first mount point is reached, and holding onto it until the d_cache lookup is completed. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/dcache.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1766,7 +1766,7 @@ shouldnt_be_hashed: static char *__d_path(struct path *path, struct path *root, char *buffer, int buflen, int fail_deleted) { - int namelen, is_slash; +int namelen, is_slash, vfsmount_locked = 0; struct dentry *dentry = path-dentry; struct vfsmount *vfsmnt = path-mnt; @@ -1791,14 +1791,14 @@ static char *__d_path(struct path *path, struct dentry * parent; if (dentry == vfsmnt-mnt_root || IS_ROOT(dentry)) { - spin_lock(vfsmount_lock); - if (vfsmnt-mnt_parent == vfsmnt) { - spin_unlock(vfsmount_lock); - goto global_root; + if (!vfsmount_locked) { + spin_lock(vfsmount_lock); + vfsmount_locked = 1; } + if (vfsmnt-mnt_parent == vfsmnt) + goto global_root; dentry = vfsmnt-mnt_mountpoint; vfsmnt = vfsmnt-mnt_parent; - spin_unlock(vfsmount_lock); continue; } parent = dentry-d_parent; @@ -1817,6 +1817,8 @@ static char *__d_path(struct path *path, *--buffer = '/'; out: + if (vfsmount_locked) + spin_unlock(vfsmount_lock); spin_unlock(dcache_lock); return buffer; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 31/47] Add d_namespace_path() to compute namespace relative pathnames
In AppArmor, we are interested in pathnames relative to the namespace root. This is the same as d_path() except for the root where the search ends. Add a function for computing the namespace-relative path. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/dcache.c|4 ++-- fs/namespace.c | 26 ++ include/linux/dcache.h |1 + include/linux/mount.h |3 +++ 4 files changed, 32 insertions(+), 2 deletions(-) --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1763,8 +1763,8 @@ shouldnt_be_hashed: * * Returns the buffer or an error code. */ -static char *__d_path(struct path *path, struct path *root, - char *buffer, int buflen, int fail_deleted) +char *__d_path(struct path *path, struct path *root, char *buffer, int buflen, + int fail_deleted) { int namelen, is_slash, vfsmount_locked = 0; struct dentry *dentry = path-dentry; --- a/fs/namespace.c +++ b/fs/namespace.c @@ -26,6 +26,7 @@ #include linux/security.h #include linux/mount.h #include linux/ramfs.h +#include linux/path.h #include asm/uaccess.h #include asm/unistd.h #include pnode.h @@ -2167,3 +2168,28 @@ void __put_mnt_ns(struct mnt_namespace * release_mounts(umount_list); kfree(ns); } + +char *d_namespace_path(struct path *path, char *buf, int buflen) +{ + struct path root = { NULL, NULL }; + struct vfsmount *rootmnt; + char *res; + + read_lock(current-fs-lock); + rootmnt = mntget(current-fs-root.mnt); + read_unlock(current-fs-lock); + spin_lock(vfsmount_lock); + if (rootmnt-mnt_ns) + root.mnt = mntget(rootmnt-mnt_ns-root); + spin_unlock(vfsmount_lock); + mntput(rootmnt); + if (root.mnt) + root.dentry = dget(root.mnt-mnt_root); + res = __d_path(path, root, buf, buflen, 1); + path_put(root); + /* Prevent empty path for lazily unmounted filesystems. */ + if (!IS_ERR(res) *res == '\0') + *--res = '.'; + return res; +} +EXPORT_SYMBOL(d_namespace_path); --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -302,6 +302,7 @@ extern int d_validate(struct dentry *, s extern char *dynamic_dname(struct dentry *, char *, int, const char *, ...); extern char *d_path(struct path *, char *, int); +extern char *__d_path(struct path *, struct path *, char *, int, int); /* Allocation counts.. */ --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -16,6 +16,7 @@ #include linux/list.h #include linux/nodemask.h #include linux/spinlock.h +#include linux/path.h #include asm/atomic.h struct super_block; @@ -114,5 +115,7 @@ extern void shrink_submounts(struct vfsm extern spinlock_t vfsmount_lock; extern dev_t name_to_dev_t(char *name); +extern char *d_namespace_path(struct path *, char *, int); + #endif #endif /* _LINUX_MOUNT_H */ -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 32/47] From: Miklos Szeredi [EMAIL PROTECTED]
Add a new file operation: f_op-fgetattr(), that is invoked by fstat(). Fall back to i_op-getattr() if it is not defined. We need this because fstat() semantics can in some cases be better implemented if the filesystem has the open file available. Let's take the following example: we have a network filesystem, with the server implemented as an unprivileged userspace process running on a UNIX system (this is basically what sshfs does). We want the filesystem to follow the familiar UNIX file semantics as closely as possible. If for example we have this sequence of events, we still would like fstat to work correctly: 1) file X is opened on client 2) file X is renamed to Y on server 3) fstat() is performed on open file descriptor on client This is only possible if the filesystem server acutally uses fstat() on a file descriptor obtained when the file was opened. Which means, the filesystem client needs a way to get this information from the VFS. Even if we assume, that the remote filesystem never changes, it is difficult to implement open-unlink-fstat semantics correctly in the client, without having this information. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- --- fs/fuse/file.c | 13 + fs/stat.c | 29 - include/linux/fs.h |1 + 3 files changed, 42 insertions(+), 1 deletion(-) --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -871,6 +871,17 @@ static int fuse_file_flock(struct file * return err; } +static int fuse_file_fgetattr(struct file *file, struct kstat *stat) +{ + struct inode *inode = file-f_dentry-d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + + if (!fuse_allow_task(fc, current)) + return -EACCES; + + return fuse_update_attributes(inode, stat, file, NULL); +} + static sector_t fuse_bmap(struct address_space *mapping, sector_t block) { struct inode *inode = mapping-host; @@ -920,6 +931,7 @@ static const struct file_operations fuse .fsync = fuse_fsync, .lock = fuse_file_lock, .flock = fuse_file_flock, + .fgetattr = fuse_file_fgetattr, .splice_read= generic_file_splice_read, }; @@ -933,6 +945,7 @@ static const struct file_operations fuse .fsync = fuse_fsync, .lock = fuse_file_lock, .flock = fuse_file_flock, + .fgetattr = fuse_file_fgetattr, /* no mmap and splice_read */ }; --- a/fs/stat.c +++ b/fs/stat.c @@ -55,6 +55,33 @@ int vfs_getattr(struct vfsmount *mnt, st EXPORT_SYMBOL(vfs_getattr); +/* + * Perform getattr on an open file + * + * Fall back to i_op-getattr (or generic_fillattr) if the filesystem + * doesn't define an f_op-fgetattr operation. + */ +static int vfs_fgetattr(struct file *file, struct kstat *stat) +{ + struct vfsmount *mnt = file-f_path.mnt; + struct dentry *dentry = file-f_path.dentry; + struct inode *inode = dentry-d_inode; + int retval; + + retval = security_inode_getattr(mnt, dentry); + if (retval) + return retval; + + if (file-f_op file-f_op-fgetattr) { + return file-f_op-fgetattr(file, stat); + } else if (inode-i_op-getattr) { + return inode-i_op-getattr(mnt, dentry, stat); + } else { + generic_fillattr(inode, stat); + return 0; + } +} + int vfs_stat_fd(int dfd, char __user *name, struct kstat *stat) { struct nameidata nd; @@ -101,7 +128,7 @@ int vfs_fstat(unsigned int fd, struct ks int error = -EBADF; if (f) { - error = vfs_getattr(f-f_path.mnt, f-f_path.dentry, stat); + error = vfs_fgetattr(f, stat); fput(f); } return error; --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1190,6 +1190,7 @@ struct file_operations { ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); int (*setlease)(struct file *, long, struct file_lock **); + int (*fgetattr)(struct file *, struct kstat *); }; struct inode_operations { -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 33/47] VFS: new fsetattr() file operation
From: Miklos Szeredi [EMAIL PROTECTED] Add a new file operation: f_op-fsetattr(), that is invoked by ftruncate, fchmod, fchown and utimensat. Fall back to i_op-setattr() if it is not defined. For the reasons why we need this, see patch adding fgetattr(). ftruncate() already passed the open file to the filesystem via the ia_file member of struct iattr. However it is cleaner to have a separate file operation for this, so remove ia_file and convert existing users: fuse, AFS, and reiserfs4. Use ATTR_FILE to allow LSMs to distinguish file descriptor based operations. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/afs/dir.c |1 + fs/afs/file.c|1 + fs/afs/inode.c | 19 +++ fs/afs/internal.h|1 + fs/attr.c| 18 ++ fs/fuse/dir.c| 20 +--- fs/fuse/file.c |7 +++ fs/fuse/fuse_i.h |4 fs/open.c| 25 + fs/reiser4/plugin/file/cryptcompress.c | 12 +++- fs/reiser4/plugin/file/file.c| 27 --- fs/reiser4/plugin/file/file.h|3 +++ fs/reiser4/plugin/file/file_conversion.c |7 +++ fs/reiser4/plugin/object.c |5 - fs/reiser4/plugin/plugin.h |1 + fs/utimes.c |2 +- include/linux/fs.h |9 ++--- 17 files changed, 114 insertions(+), 48 deletions(-) --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -45,6 +45,7 @@ const struct file_operations afs_dir_fil .release= afs_release, .readdir= afs_readdir, .lock = afs_lock, + .fsetattr = afs_fsetattr, }; const struct inode_operations afs_dir_inode_operations = { --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -36,6 +36,7 @@ const struct file_operations afs_file_op .fsync = afs_fsync, .lock = afs_lock, .flock = afs_flock, + .fsetattr = afs_fsetattr, }; const struct inode_operations afs_file_inode_operations = { --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -358,7 +358,8 @@ void afs_clear_inode(struct inode *inode /* * set the attributes of an inode */ -int afs_setattr(struct dentry *dentry, struct iattr *attr) +static int afs_do_setattr(struct dentry *dentry, struct iattr *attr, + struct file *file) { struct afs_vnode *vnode = AFS_FS_I(dentry-d_inode); struct key *key; @@ -380,8 +381,8 @@ int afs_setattr(struct dentry *dentry, s afs_writeback_all(vnode); } - if (attr-ia_valid ATTR_FILE) { - key = attr-ia_file-private_data; + if (file) { + key = file-private_data; } else { key = afs_request_key(vnode-volume-cell); if (IS_ERR(key)) { @@ -391,10 +392,20 @@ int afs_setattr(struct dentry *dentry, s } ret = afs_vnode_setattr(vnode, key, attr); - if (!(attr-ia_valid ATTR_FILE)) + if (!file) key_put(key); error: _leave( = %d, ret); return ret; } + +int afs_setattr(struct dentry *dentry, struct iattr *attr) +{ + return afs_do_setattr(dentry, attr, NULL); +} + +int afs_fsetattr(struct file *file, struct iattr *attr) +{ + return afs_do_setattr(file-f_path.dentry, attr, file); +} --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -550,6 +550,7 @@ extern void afs_zap_data(struct afs_vnod extern int afs_validate(struct afs_vnode *, struct key *); extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *); extern int afs_setattr(struct dentry *, struct iattr *); +extern int afs_fsetattr(struct file *, struct iattr *); extern void afs_clear_inode(struct inode *); /* --- a/fs/attr.c +++ b/fs/attr.c @@ -100,8 +100,8 @@ int inode_setattr(struct inode * inode, } EXPORT_SYMBOL(inode_setattr); -int notify_change(struct dentry *dentry, struct vfsmount *mnt, - struct iattr *attr) +int fnotify_change(struct dentry *dentry, struct vfsmount *mnt, + struct iattr *attr, struct file *file) { struct inode *inode = dentry-d_inode; mode_t mode = inode-i_mode; @@ -160,8 +160,12 @@ int notify_change(struct dentry *dentry, if (inode-i_op inode-i_op-setattr) { error = security_inode_setattr(dentry, mnt, attr); - if (!error) - error = inode-i_op-setattr(dentry, attr); + if (!error) { + if (file file-f_op file-f_op-fsetattr) + error = file-f_op-fsetattr(file, attr); + else +
[AppArmor 34/47] Pass struct file down the inode_*xattr security LSM hooks
This allows LSMs to also distinguish between file descriptor and path access for the xattr operations. (The other relevant operations are covered by the setattr hook.) Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/nfsd/vfs.c| 12 + fs/unionfs/copyup.c | 12 + fs/unionfs/xattr.c |9 +++ fs/xattr.c | 60 +-- include/linux/security.h | 40 ++- include/linux/xattr.h| 10 --- security/commoncap.c |4 +-- security/dummy.c | 10 --- security/security.c | 21 +--- security/selinux/hooks.c | 10 --- 10 files changed, 107 insertions(+), 81 deletions(-) --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -413,7 +413,7 @@ static ssize_t nfsd_getxattr(struct dent { ssize_t buflen; - buflen = vfs_getxattr(dentry, mnt, key, NULL, 0); + buflen = vfs_getxattr(dentry, mnt, key, NULL, 0, NULL); if (buflen = 0) return buflen; @@ -421,7 +421,7 @@ static ssize_t nfsd_getxattr(struct dent if (!*buf) return -ENOMEM; - return vfs_getxattr(dentry, mnt, key, *buf, buflen); + return vfs_getxattr(dentry, mnt, key, *buf, buflen, NULL); } #endif @@ -447,7 +447,7 @@ set_nfsv4_acl_one(struct dentry *dentry, goto out; } - error = vfs_setxattr(dentry, mnt, key, buf, len, 0); + error = vfs_setxattr(dentry, mnt, key, buf, len, 0, NULL); out: kfree(buf); return error; @@ -2067,12 +2067,14 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i mnt = fhp-fh_export-ex_path.mnt; if (size) - error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0); + error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0, +NULL); else { if (!S_ISDIR(inode-i_mode) type == ACL_TYPE_DEFAULT) error = 0; else { - error = vfs_removexattr(fhp-fh_dentry, mnt, name); + error = vfs_removexattr(fhp-fh_dentry, mnt, name, + NULL); if (error == -ENODATA) error = 0; } --- a/fs/unionfs/copyup.c +++ b/fs/unionfs/copyup.c @@ -37,7 +37,8 @@ static int copyup_xattrs(struct dentry * char *name_list_buf = NULL; /* query the actual size of the xattr list */ - list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, NULL, 0); + list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, NULL, 0, + NULL); if (list_size = 0) { err = list_size; goto out; @@ -54,7 +55,7 @@ static int copyup_xattrs(struct dentry * /* now get the actual xattr list of the source file */ list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, name_list, - list_size); + list_size, NULL); if (list_size = 0) { err = list_size; goto out; @@ -74,7 +75,7 @@ static int copyup_xattrs(struct dentry * /* Lock here since vfs_getxattr doesn't lock for us */ mutex_lock(old_lower_dentry-d_inode-i_mutex); size = vfs_getxattr(old_lower_dentry, old_lower_mnt, name_list, - attr_value, XATTR_SIZE_MAX); + attr_value, XATTR_SIZE_MAX, NULL); mutex_unlock(old_lower_dentry-d_inode-i_mutex); if (size 0) { err = size; @@ -86,7 +87,7 @@ static int copyup_xattrs(struct dentry * } /* Don't lock here since vfs_setxattr does it for us. */ err = vfs_setxattr(new_lower_dentry, new_lower_mnt, name_list, - attr_value, size, 0); + attr_value, size, 0, NULL); /* * Selinux depends on security.* xattrs, so to maintain * the security of copied-up files, if Selinux is active, @@ -97,7 +98,8 @@ static int copyup_xattrs(struct dentry * if (err == -EPERM !capable(CAP_FOWNER)) { cap_raise(current-cap_effective, CAP_FOWNER); err = vfs_setxattr(new_lower_dentry, new_lower_mnt, - name_list, attr_value, size, 0); + name_list, attr_value, size, 0, + NULL); cap_lower(current-cap_effective, CAP_FOWNER); } if (err 0) --- a/fs/unionfs/xattr.c +++ b/fs/unionfs/xattr.c @@
[AppArmor 35/47] Factor out sysctl pathname code
Convert the selinux sysctl pathname computation code into a standalone function. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] Reviewed-by: James Morris [EMAIL PROTECTED] --- include/linux/sysctl.h |2 ++ kernel/sysctl.c | 27 +++ security/selinux/hooks.c | 35 +-- 3 files changed, 34 insertions(+), 30 deletions(-) --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -977,6 +977,8 @@ extern int proc_doulongvec_minmax(struct extern int proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int, struct file *, void __user *, size_t *, loff_t *); +extern char *sysctl_pathname(ctl_table *, char *, int); + extern int do_sysctl (int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen); --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1443,6 +1443,33 @@ void register_sysctl_root(struct ctl_tab spin_unlock(sysctl_lock); } +char *sysctl_pathname(ctl_table *table, char *buffer, int buflen) +{ + if (buflen 1) + return NULL; + buffer += --buflen; + *buffer = '\0'; + + while (table) { + int namelen = strlen(table-procname); + + if (buflen namelen + 1) + return NULL; + buflen -= namelen + 1; + buffer -= namelen; + memcpy(buffer, table-procname, namelen); + *--buffer = '/'; + table = table-parent; + } + if (buflen 4) + return NULL; + buffer -= 4; + memcpy(buffer, /sys, 4); + + return buffer; +} +EXPORT_SYMBOL(sysctl_pathname); + #ifdef CONFIG_SYSCTL_SYSCALL int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, void __user *newval, size_t newlen) --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1659,40 +1659,15 @@ static int selinux_capable(struct task_s static int selinux_sysctl_get_sid(ctl_table *table, u16 tclass, u32 *sid) { - int buflen, rc; - char *buffer, *path, *end; + char *buffer, *path; + int rc = -ENOMEM; - rc = -ENOMEM; buffer = (char*)__get_free_page(GFP_KERNEL); if (!buffer) goto out; - - buflen = PAGE_SIZE; - end = buffer+buflen; - *--end = '\0'; - buflen--; - path = end-1; - *path = '/'; - while (table) { - const char *name = table-procname; - size_t namelen = strlen(name); - buflen -= namelen + 1; - if (buflen 0) - goto out_free; - end -= namelen; - memcpy(end, name, namelen); - *--end = '/'; - path = end; - table = table-parent; - } - buflen -= 4; - if (buflen 0) - goto out_free; - end -= 4; - memcpy(end, /sys, 4); - path = end; - rc = security_genfs_sid(proc, path, tclass, sid); -out_free: + path = sysctl_pathname(table, buffer, PAGE_SIZE); + if (path) + rc = security_genfs_sid(proc, path, tclass, sid); free_page((unsigned long)buffer); out: return rc; -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 36/47] Allow permission functions to tell between parent and leaf checks
Set the LOOKUP_CONTINUE flag when checking parent permissions. This allows permission functions to tell between parent and leaf checks. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |2 ++ 1 file changed, 2 insertions(+) --- a/fs/namei.c +++ b/fs/namei.c @@ -1511,6 +1511,8 @@ static inline int may_create(struct inod return -EEXIST; if (IS_DEADDIR(dir)) return -ENOENT; + if (nd) + nd-flags |= LOOKUP_CONTINUE; return permission(dir,MAY_WRITE | MAY_EXEC, nd); } -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 38/47] Switch to vfs_permission() in sys_fchdir()
Switch from file_permission() to vfs_permission() in sys_fchdir(): this avoids calling permission() with a NULL nameidata here. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/open.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) --- a/fs/open.c +++ b/fs/open.c @@ -512,8 +512,8 @@ out: asmlinkage long sys_fchdir(unsigned int fd) { + struct nameidata nd; struct file *file; - struct inode *inode; int error; error = -EBADF; @@ -521,15 +521,16 @@ asmlinkage long sys_fchdir(unsigned int if (!file) goto out; - inode = file-f_path.dentry-d_inode; + nd.path = file-f_path; + nd.flags = 0; error = -ENOTDIR; - if (!S_ISDIR(inode-i_mode)) + if (!S_ISDIR(nd.path.dentry-d_inode-i_mode)) goto out_putf; - error = file_permission(file, MAY_EXEC); + error = vfs_permission(nd, MAY_EXEC); if (!error) - set_fs_pwd(current-fs, file-f_path); + set_fs_pwd(current-fs, nd.path); out_putf: fput(file); out: -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 39/47] Fix file_permission()
We cannot easily switch from file_permission() to vfs_permission() everywhere, so fix file_permission() to not use a NULL nameidata for the remaining users. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -313,7 +313,13 @@ int vfs_permission(struct nameidata *nd, */ int file_permission(struct file *file, int mask) { - return permission(file-f_path.dentry-d_inode, mask, NULL); + struct nameidata nd; + + nd.path.dentry = file-f_path.dentry; + nd.path.mnt = file-f_path.mnt; + nd.flags = LOOKUP_ACCESS; + + return permission(nd.path.dentry-d_inode, mask, nd); } /* -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 40/47] Export audit subsystem for use by modules
Update kenel audit range comments to show AppArmor's registered range of 1500-1599. This range used to be reserved for LSPP but LSPP uses the SE Linux range and the range was given to AppArmor. Adds necessary export symbols for audit subsystem routines. Changes audit_log_vformat to be externally visible (analagous to vprintf) Patch is not in mainline -- pending AppArmor code submission to lkml Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- include/linux/audit.h | 12 +++- kernel/audit.c|6 -- 2 files changed, 15 insertions(+), 3 deletions(-) --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -33,7 +33,7 @@ * 1200 - 1299 messages internal to the audit daemon * 1300 - 1399 audit event messages * 1400 - 1499 SE Linux use - * 1500 - 1599 kernel LSPP events + * 1500 - 1599 AppArmor use * 1600 - 1699 kernel crypto events * 1700 - 1799 kernel anomaly records * 1800 - 1999 future kernel use (maybe integrity labels and related events) @@ -118,6 +118,13 @@ #define AUDIT_MAC_UNLBL_STCADD 1416/* NetLabel: add a static label */ #define AUDIT_MAC_UNLBL_STCDEL 1417/* NetLabel: del a static label */ +#define AUDIT_APPARMOR_AUDIT 1501/* AppArmor audited grants */ +#define AUDIT_APPARMOR_ALLOWED 1502/* Allowed Access for learning */ +#define AUDIT_APPARMOR_DENIED 1503 +#define AUDIT_APPARMOR_HINT1504/* Process Tracking information */ +#define AUDIT_APPARMOR_STATUS 1505/* Changes in config */ +#define AUDIT_APPARMOR_ERROR 1506/* Internal AppArmor Errors */ + #define AUDIT_FIRST_KERN_ANOM_MSG 1700 #define AUDIT_LAST_KERN_ANOM_MSG1799 #define AUDIT_ANOM_PROMISCUOUS 1700 /* Device changed promiscuous mode */ @@ -515,6 +522,9 @@ extern void audit_log(struct audit_ __attribute__((format(printf,4,5))); extern struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask, int type); +extern voidaudit_log_vformat(struct audit_buffer *ab, + const char *fmt, va_list args) + __attribute__((format(printf,2,0))); extern voidaudit_log_format(struct audit_buffer *ab, const char *fmt, ...) __attribute__((format(printf,2,3))); --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1215,8 +1215,7 @@ static inline int audit_expand(struct au * will be called a second time. Currently, we assume that a printk * can't format message larger than 1024 bytes, so we don't either. */ -static void audit_log_vformat(struct audit_buffer *ab, const char *fmt, - va_list args) +void audit_log_vformat(struct audit_buffer *ab, const char *fmt, va_list args) { int len, avail; struct sk_buff *skb; @@ -1472,3 +1471,6 @@ EXPORT_SYMBOL(audit_log_start); EXPORT_SYMBOL(audit_log_end); EXPORT_SYMBOL(audit_log_format); EXPORT_SYMBOL(audit_log); +EXPORT_SYMBOL_GPL(audit_log_vformat); +EXPORT_SYMBOL_GPL(audit_log_untrustedstring); +EXPORT_SYMBOL_GPL(audit_log_d_path); -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 41/47] AppArmor: Main Part
The underlying functions by which the AppArmor LSM hooks are implemented. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/apparmor/main.c | 1357 +++ 1 file changed, 1357 insertions(+) --- /dev/null +++ b/security/apparmor/main.c @@ -0,0 +1,1357 @@ +/* + * Copyright (C) 2002-2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * AppArmor Core + */ + +#include linux/security.h +#include linux/namei.h +#include linux/audit.h +#include linux/mount.h +#include linux/ptrace.h + +#include apparmor.h + +#include inline.h + +/* + * Table of capability names: we generate it from capabilities.h. + */ +static const char *capability_names[] = { +#include capability_names.h +}; + +struct aa_namespace *default_namespace; + +static int aa_inode_mode(struct inode *inode) +{ + /* if the inode doesn't exist the user is creating it */ + if (!inode || current-fsuid == inode-i_uid) + return AA_USER_SHIFT; + return AA_OTHER_SHIFT; +} + +/** + * aa_file_denied - check for @mask access on a file + * @profile: profile to check against + * @name: pathname of file + * @mask: permission mask requested for file + * + * Return %0 on success, or else the permissions in @mask that the + * profile denies. + */ +static int aa_file_denied(struct aa_profile *profile, const char *name, + int mask) +{ + return (mask ~aa_match(profile-file_rules, name)); +} + +/** + * aa_link_denied - check for permission to link a file + * @profile: profile to check against + * @link: pathname of link being created + * @target: pathname of target to be linked to + * @target_mode: UGO shift for target inode + * @request_mask: the permissions subset valid only if link succeeds + * Return %0 on success, or else the permissions that the profile denies. + */ +static int aa_link_denied(struct aa_profile *profile, const char *link, + const char *target, int target_mode, + int *request_mask) +{ + unsigned int state; + int l_mode, t_mode, denied_mask = 0; + int link_mask = AA_MAY_LINK target_mode; + + *request_mask = link_mask; + + l_mode = aa_match_state(profile-file_rules, DFA_START, link, state); + if (l_mode link_mask) { + int mode; + /* test to see if target can be paired with link */ + state = aa_dfa_null_transition(profile-file_rules, state); + mode = aa_match_state(profile-file_rules, state, target, + NULL); + + if (!(mode link_mask)) + denied_mask |= link_mask; + if (!(mode (AA_LINK_SUBSET_TEST target_mode))) + return denied_mask; + } + + /* do link perm subset test */ + t_mode = aa_match(profile-file_rules, target); + + /* Ignore valid-profile-transition flags. */ + l_mode = ~AA_SHARED_PERMS; + t_mode = ~AA_SHARED_PERMS; + + *request_mask = l_mode | link_mask; + + /* Link always requires 'l' on the link for both parts of the pair. +* If a subset test is required a permission subset test of the +* perms for the link are done against the user:group:other of the +* target's 'r', 'w', 'x', 'a', 'z', and 'm' permissions. +* +* If the link has 'x', an exact match of all the execute flags +* ('i', 'u', 'p'). safe exec is treated as a subset of unsafe exec +*/ +#define SUBSET_PERMS (AA_FILE_PERMS ~AA_LINK_BITS) + denied_mask |= ~l_mode link_mask; + if (l_mode SUBSET_PERMS) { + denied_mask |= (l_mode SUBSET_PERMS) ~t_mode; + if (denied_mask AA_EXEC_BITS) + denied_mask |= l_mode AA_ALL_EXEC_MODS; + else if (l_mode AA_EXEC_BITS) { + if (!(l_mode AA_USER_EXEC_UNSAFE)) + l_mode |= t_mode AA_USER_EXEC_UNSAFE; + if (l_mode AA_USER_EXEC + (l_mode AA_USER_EXEC_MODS) != + (t_mode AA_USER_EXEC_MODS)) + denied_mask |= AA_USER_EXEC | + (l_mode AA_USER_EXEC_MODS); + if (!(l_mode AA_OTHER_EXEC_UNSAFE)) + l_mode |= t_mode AA_OTHER_EXEC_UNSAFE; + if (l_mode AA_OTHER_EXEC + (l_mode AA_OTHER_EXEC_MODS) != + (t_mode AA_OTHER_EXEC_MODS)) + denied_mask |= AA_OTHER_EXEC | + (l_mode
[AppArmor 42/47] AppArmor: Module and LSM hooks
Module parameters, LSM hooks, initialization and teardown. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/apparmor/lsm.c | 815 1 file changed, 815 insertions(+) --- /dev/null +++ b/security/apparmor/lsm.c @@ -0,0 +1,815 @@ +/* + * Copyright (C) 1998-2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * AppArmor LSM interface + */ + +#include linux/security.h +#include linux/module.h +#include linux/mm.h +#include linux/mman.h +#include linux/mount.h +#include linux/namei.h +#include linux/ctype.h +#include linux/sysctl.h +#include linux/audit.h + +#include apparmor.h +#include inline.h + +/* Boot time disable flag */ +int apparmor_enabled = CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE; + +static int __init apparmor_enabled_setup(char *str) +{ + apparmor_enabled = simple_strtol(str, NULL, 0); + return 1; +} +__setup(apparmor=, apparmor_enabled_setup); + + +static int param_set_aabool(const char *val, struct kernel_param *kp); +static int param_get_aabool(char *buffer, struct kernel_param *kp); +#define param_check_aabool(name, p) __param_check(name, p, int) + +static int param_set_aauint(const char *val, struct kernel_param *kp); +static int param_get_aauint(char *buffer, struct kernel_param *kp); +#define param_check_aauint(name, p) __param_check(name, p, int) + +/* Flag values, also controllable via /sys/module/apparmor/parameters + * We define special types as we want to do additional mediation. + * + * Complain mode -- in complain mode access failures result in auditing only + * and task is allowed access. audit events are processed by userspace to + * generate policy. Default is 'enforce' (0). + * Value is also togglable per profile and referenced when global value is + * enforce. + */ +int apparmor_complain = 0; +module_param_named(complain, apparmor_complain, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_complain, Toggle AppArmor complain mode); + +/* Debug mode */ +int apparmor_debug = 0; +module_param_named(debug, apparmor_debug, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_debug, Toggle AppArmor debug mode); + +/* Audit mode */ +int apparmor_audit = 0; +module_param_named(audit, apparmor_audit, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_audit, Toggle AppArmor audit mode); + +/* Syscall logging mode */ +int apparmor_logsyscall = 0; +module_param_named(logsyscall, apparmor_logsyscall, aabool, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_logsyscall, Toggle AppArmor logsyscall mode); + +/* Maximum pathname length before accesses will start getting rejected */ +unsigned int apparmor_path_max = 2 * PATH_MAX; +module_param_named(path_max, apparmor_path_max, aauint, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(apparmor_path_max, Maximum pathname length allowed); + +static int param_set_aabool(const char *val, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_set_bool(val, kp); +} + +static int param_get_aabool(char *buffer, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_get_bool(buffer, kp); +} + +static int param_set_aauint(const char *val, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_set_uint(val, kp); +} + +static int param_get_aauint(char *buffer, struct kernel_param *kp) +{ + if (aa_task_context(current)) + return -EPERM; + return param_get_uint(buffer, kp); +} + +static int aa_reject_syscall(struct task_struct *task, gfp_t flags, +const char *name) +{ + struct aa_profile *profile = aa_get_profile(task); + int error = 0; + + if (profile) { + error = aa_audit_syscallreject(profile, flags, name); + aa_put_profile(profile); + } + + return error; +} + +static int apparmor_ptrace(struct task_struct *parent, + struct task_struct *child) +{ + struct aa_task_context *cxt; + int error = 0; + + /* +* parent can ptrace child when +* - parent is unconfined +* - parent child are in the same namespace +* - parent is in complain mode +* - parent and child are confined by the same profile +* - parent profile has CAP_SYS_PTRACE +*/ + + rcu_read_lock(); + cxt = aa_task_context(parent); + if (cxt) { + if (parent-nsproxy != child-nsproxy) { + struct aa_audit sa; + memset(sa, 0, sizeof(sa)); + sa.operation = ptrace; +
[PATCH] IB/ehca: Forward event client-reregister-required to registered clients
This patch allows ehca to forward event client-reregister-required to registered clients. Such one event is generated by the switch eg. after its reboot. Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- drivers/infiniband/hw/ehca/ehca_irq.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c index 3f617b2..4c734ec 100644 --- a/drivers/infiniband/hw/ehca/ehca_irq.c +++ b/drivers/infiniband/hw/ehca/ehca_irq.c @@ -62,6 +62,7 @@ #define NEQE_PORT_NUMBER EHCA_BMASK_IBM( 8, 15) #define NEQE_PORT_AVAILABILITY EHCA_BMASK_IBM(16, 16) #define NEQE_DISRUPTIVEEHCA_BMASK_IBM(16, 16) +#define NEQE_SPECIFIC_EVENTEHCA_BMASK_IBM(16, 23) #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52, 63) #define ERROR_DATA_TYPEEHCA_BMASK_IBM( 0, 7) @@ -354,6 +355,7 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) { u8 ec = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe); u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe); + u8 spec_event; switch (ec) { case 0x30: /* port availability change */ @@ -394,6 +396,16 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe) case 0x33: /* trace stopped */ ehca_err(shca-ib_device, Traced stopped.); break; + case 0x34: /* util async event */ + spec_event = EHCA_BMASK_GET(NEQE_SPECIFIC_EVENT, eqe); + if (spec_event == 0x80) /* client reregister required */ + dispatch_port_event(shca, port, + IB_EVENT_CLIENT_REREGISTER, + client reregister req.); + else + ehca_warn(shca-ib_device, Unknown util async + event %x on port %x, spec_event, port); + break; default: ehca_err(shca-ib_device, Unknown event code: %x on %s., ec, shca-ib_device.name); -- 1.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 43/47] AppArmor: Profile loading and manipulation, pathname matching
Pathname matching, transition table loading, profile loading and manipulation. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/apparmor/match.c| 299 + security/apparmor/match.h| 85 +++ security/apparmor/module_interface.c | 805 +++ 3 files changed, 1189 insertions(+) --- /dev/null +++ b/security/apparmor/match.c @@ -0,0 +1,299 @@ +/* + * Copyright (C) 2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * Regular expression transition table matching + */ + +#include linux/kernel.h +#include linux/slab.h +#include linux/errno.h +#include apparmor.h +#include match.h + +static struct table_header *unpack_table(void *blob, size_t bsize) +{ + struct table_header *table = NULL; + struct table_header th; + size_t tsize; + + if (bsize sizeof(struct table_header)) + goto out; + + th.td_id = be16_to_cpu(*(u16 *) (blob)); + th.td_flags = be16_to_cpu(*(u16 *) (blob + 2)); + th.td_lolen = be32_to_cpu(*(u32 *) (blob + 8)); + blob += sizeof(struct table_header); + + if (!(th.td_flags == YYTD_DATA16 || th.td_flags == YYTD_DATA32 || + th.td_flags == YYTD_DATA8)) + goto out; + + tsize = table_size(th.td_lolen, th.td_flags); + if (bsize tsize) + goto out; + + table = kmalloc(tsize, GFP_KERNEL); + if (table) { + *table = th; + if (th.td_flags == YYTD_DATA8) + UNPACK_ARRAY(table-td_data, blob, th.td_lolen, +u8, byte_to_byte); + else if (th.td_flags == YYTD_DATA16) + UNPACK_ARRAY(table-td_data, blob, th.td_lolen, +u16, be16_to_cpu); + else + UNPACK_ARRAY(table-td_data, blob, th.td_lolen, +u32, be32_to_cpu); + } + +out: + return table; +} + +int unpack_dfa(struct aa_dfa *dfa, void *blob, size_t size) +{ + int hsize, i; + int error = -ENOMEM; + + /* get dfa table set header */ + if (size sizeof(struct table_set_header)) + goto fail; + + if (ntohl(*(u32 *)blob) != YYTH_MAGIC) + goto fail; + + hsize = ntohl(*(u32 *)(blob + 4)); + if (size hsize) + goto fail; + + blob += hsize; + size -= hsize; + + error = -EPROTO; + while (size 0) { + struct table_header *table; + table = unpack_table(blob, size); + if (!table) + goto fail; + + switch(table-td_id) { + case YYTD_ID_ACCEPT: + case YYTD_ID_BASE: + dfa-tables[table-td_id - 1] = table; + if (table-td_flags != YYTD_DATA32) + goto fail; + break; + case YYTD_ID_DEF: + case YYTD_ID_NXT: + case YYTD_ID_CHK: + dfa-tables[table-td_id - 1] = table; + if (table-td_flags != YYTD_DATA16) + goto fail; + break; + case YYTD_ID_EC: + dfa-tables[table-td_id - 1] = table; + if (table-td_flags != YYTD_DATA8) + goto fail; + break; + default: + kfree(table); + goto fail; + } + + blob += table_size(table-td_lolen, table-td_flags); + size -= table_size(table-td_lolen, table-td_flags); + } + + return 0; + +fail: + for (i = 0; i ARRAY_SIZE(dfa-tables); i++) { + if (dfa-tables[i]) { + kfree(dfa-tables[i]); + dfa-tables[i] = NULL; + } + } + return error; +} + +/** + * verify_dfa - verify that all the transitions and states in the dfa tables + * are in bounds. + * @dfa: dfa to test + * + * assumes dfa has gone through the verification done by unpacking + */ +int verify_dfa(struct aa_dfa *dfa) +{ + size_t i, state_count, trans_count; + int error = -EPROTO; + + /* check that required tables exist */ + if (!(dfa-tables[YYTD_ID_ACCEPT -1 ] + dfa-tables[YYTD_ID_DEF - 1] + dfa-tables[YYTD_ID_BASE - 1] + dfa-tables[YYTD_ID_NXT - 1] + dfa-tables[YYTD_ID_CHK - 1])) + goto out; + + /* accept.size == default.size == base.size */ + state_count =
[AppArmor 44/47] AppArmor: all the rest
All the things that didn't nicely fit in a category on their own: kbuild code, declararions and inline functions, /sys/kernel/security/apparmor filesystem for controlling apparmor from user space, profile list functions, locking documentation, /proc/$pid/task/$tid/attr/current access. Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/apparmor/Kconfig |9 + security/apparmor/Makefile | 13 + security/apparmor/apparmor.h | 323 + security/apparmor/apparmorfs.c | 252 +++ security/apparmor/inline.h | 240 ++ security/apparmor/list.c | 156 +++ security/apparmor/locking.txt | 68 security/apparmor/procattr.c | 194 8 files changed, 1255 insertions(+) --- /dev/null +++ b/security/apparmor/Kconfig @@ -0,0 +1,9 @@ +config SECURITY_APPARMOR + tristate AppArmor support + depends on SECURITY!=n + help + This enables the AppArmor security module. + Required userspace tools (if they are not included in your + distribution) and further information may be found at + http://forge.novell.com/modules/xfmod/project/?apparmor + If you are unsure how to answer this question, answer N. --- /dev/null +++ b/security/apparmor/Makefile @@ -0,0 +1,13 @@ +# Makefile for AppArmor Linux Security Module +# +obj-$(CONFIG_SECURITY_APPARMOR) += apparmor.o + +apparmor-y := main.o list.o procattr.o lsm.o apparmorfs.o \ + module_interface.o match.o + +quiet_cmd_make-caps = GEN $@ +cmd_make-caps = sed -n -e /CAP_FS_MASK/d -e s/^\#define[ \\t]\\+CAP_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\$$/[\\2] = \\\1\,/p $ | tr A-Z a-z $@ + +$(obj)/main.o : $(obj)/capability_names.h +$(obj)/capability_names.h : $(srctree)/include/linux/capability.h + $(call cmd,make-caps) --- /dev/null +++ b/security/apparmor/apparmor.h @@ -0,0 +1,323 @@ +/* + * Copyright (C) 1998-2007 Novell/SUSE + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + * AppArmor internal prototypes + */ + +#ifndef __APPARMOR_H +#define __APPARMOR_H + +#include linux/sched.h +#include linux/fs.h +#include linux/binfmts.h +#include linux/rcupdate.h + +/* + * We use MAY_READ, MAY_WRITE, MAY_EXEC, MAY_APPEND and the following flags + * for profile permissions + */ +#define AA_MAY_LINK0x0010 +#define AA_MAY_LOCK0x0020 +#define AA_EXEC_MMAP 0x0040 +#define AA_EXEC_UNSAFE 0x0080 +#define AA_EXEC_MOD_0 0x0100 +#define AA_EXEC_MOD_1 0x0200 +#define AA_BASE_PERMS (MAY_READ | MAY_WRITE | MAY_EXEC | \ +MAY_APPEND | AA_MAY_LINK | \ +AA_MAY_LOCK | AA_EXEC_MMAP | \ +AA_EXEC_UNSAFE | AA_EXEC_MOD_0 | \ +AA_EXEC_MOD_1) +#define AA_LINK_SUBSET_TEST0x0020 + +#define AA_EXEC_UNCONFINED 0 +#define AA_EXEC_INHERITAA_EXEC_MOD_0 +#define AA_EXEC_PROFILEAA_EXEC_MOD_1 +#define AA_EXEC_PIX(AA_EXEC_MOD_0 | AA_EXEC_MOD_1) + +#define AA_EXEC_MODIFIERS (AA_EXEC_MOD_0 | AA_EXEC_MOD_1) + +#define AA_USER_SHIFT 0 +#define AA_OTHER_SHIFT 10 + +#define AA_USER_PERMS (AA_BASE_PERMS AA_USER_SHIFT) +#define AA_OTHER_PERMS (AA_BASE_PERMS AA_OTHER_SHIFT) + +#define AA_FILE_PERMS (AA_USER_PERMS | AA_OTHER_PERMS) + +#define AA_LINK_BITS ((AA_MAY_LINK AA_USER_SHIFT) | \ +(AA_MAY_LINK AA_OTHER_SHIFT)) + +#define AA_USER_EXEC (MAY_EXEC AA_USER_SHIFT) +#define AA_OTHER_EXEC (MAY_EXEC AA_OTHER_SHIFT) + +#define AA_USER_EXEC_MODS (AA_EXEC_MODIFIERS AA_USER_SHIFT) +#define AA_OTHER_EXEC_MODS (AA_EXEC_MODIFIERS AA_OTHER_SHIFT) + +#define AA_USER_EXEC_UNSAFE(AA_EXEC_UNSAFE AA_USER_SHIFT) +#define AA_OTHER_EXEC_UNSAFE (AA_EXEC_UNSAFE AA_OTHER_SHIFT) + +#define AA_EXEC_BITS (AA_USER_EXEC | AA_OTHER_EXEC) + +#define AA_ALL_EXEC_MODS (AA_USER_EXEC_MODS | \ +AA_OTHER_EXEC_MODS) + +/* shared permissions that are not duplicated in user:group:other */ +#define AA_CHANGE_PROFILE 0x4000 + +#define AA_SHARED_PERMS(AA_CHANGE_PROFILE) + +#define AA_VALID_PERM_MASK (AA_FILE_PERMS | AA_SHARED_PERMS) +
[AppArmor 45/47] Add AppArmor LSM to security/Makefile
Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] --- security/Kconfig |1 + security/Makefile |1 + security/apparmor/Kconfig | 22 -- 3 files changed, 22 insertions(+), 2 deletions(-) --- a/security/Kconfig +++ b/security/Kconfig @@ -105,6 +105,7 @@ config SECURITY_ROOTPLUG source security/selinux/Kconfig source security/smack/Kconfig +source security/apparmor/Kconfig endmenu --- a/security/Makefile +++ b/security/Makefile @@ -16,5 +16,6 @@ obj-$(CONFIG_SECURITY)+= security.o d # Must precede capability.o in order to stack properly. obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o obj-$(CONFIG_SECURITY_SMACK) += commoncap.o smack/built-in.o +obj-$(CONFIG_SECURITY_APPARMOR)+= commoncap.o apparmor/ obj-$(CONFIG_SECURITY_CAPABILITIES)+= commoncap.o capability.o obj-$(CONFIG_SECURITY_ROOTPLUG)+= commoncap.o root_plug.o --- a/security/apparmor/Kconfig +++ b/security/apparmor/Kconfig @@ -1,9 +1,27 @@ config SECURITY_APPARMOR - tristate AppArmor support - depends on SECURITY!=n + bool AppArmor support + depends on SECURITY + select AUDIT help This enables the AppArmor security module. Required userspace tools (if they are not included in your distribution) and further information may be found at http://forge.novell.com/modules/xfmod/project/?apparmor + If you are unsure how to answer this question, answer N. + +config SECURITY_APPARMOR_BOOTPARAM_VALUE + int AppArmor boot parameter default value + depends on SECURITY_APPARMOR + range 0 1 + default 1 + help + This option sets the default value for the kernel parameter + 'apparmor', which allows AppArmor to be enabled or disabled + at boot. If this option is set to 0 (zero), the AppArmor + kernel parameter will default to 0, disabling AppArmor at + bootup. If this option is set to 1 (one), the AppArmor + kernel parameter will default to 1, enabling AppArmor at + bootup. + + If you are unsure how to answer this question, answer 1. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 46/47] add simple network toggles to apparmor
Signed-off-by: John Johansen [EMAIL PROTECTED] Signed-off-by: Jesse Michael [EMAIL PROTECTED] --- security/apparmor/Makefile |7 + security/apparmor/apparmor.h |7 + security/apparmor/lsm.c | 129 ++- security/apparmor/main.c | 96 ++ security/apparmor/module_interface.c | 20 + 5 files changed, 255 insertions(+), 4 deletions(-) --- a/security/apparmor/Makefile +++ b/security/apparmor/Makefile @@ -8,6 +8,11 @@ apparmor-y := main.o list.o procattr.o l quiet_cmd_make-caps = GEN $@ cmd_make-caps = sed -n -e /CAP_FS_MASK/d -e s/^\#define[ \\t]\\+CAP_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\$$/[\\2] = \\\1\,/p $ | tr A-Z a-z $@ -$(obj)/main.o : $(obj)/capability_names.h +quiet_cmd_make-af = GEN $@ +cmd_make-af = sed -n -e /AF_MAX/d -e /AF_LOCAL/d -e s/^\#define[ \\t]\\+AF_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\\(.*\\)\$$/[\\2] = \\\1\,/p $ | tr A-Z a-z $@ + +$(obj)/main.o : $(obj)/capability_names.h $(obj)/af_names.h $(obj)/capability_names.h : $(srctree)/include/linux/capability.h $(call cmd,make-caps) +$(obj)/af_names.h : $(srctree)/include/linux/socket.h + $(call cmd,make-af) --- a/security/apparmor/apparmor.h +++ b/security/apparmor/apparmor.h @@ -16,6 +16,8 @@ #include linux/fs.h #include linux/binfmts.h #include linux/rcupdate.h +#include linux/socket.h +#include net/sock.h /* * We use MAY_READ, MAY_WRITE, MAY_EXEC, MAY_APPEND and the following flags @@ -169,6 +171,7 @@ struct aa_profile { struct list_head task_contexts; spinlock_t lock; unsigned long int_flags; + u16 network_families[AF_MAX]; }; extern struct list_head profile_ns_list; @@ -215,6 +218,7 @@ struct aa_audit { int request_mask, denied_mask; struct iattr *iattr; pid_t task, parent; + int family, type, protocol; int error_code; }; @@ -276,6 +280,9 @@ extern void aa_change_task_context(struc struct aa_profile *previous_profile); extern int aa_may_ptrace(struct aa_task_context *cxt, struct aa_profile *tracee); +extern int aa_net_perm(struct aa_profile *profile, char *operation, + int family, int type, int protocol); +extern int aa_revalidate_sk(struct sock *sk, char *operation); /* list.c */ extern struct aa_namespace *__aa_find_namespace(const char *name, --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -18,6 +18,7 @@ #include linux/ctype.h #include linux/sysctl.h #include linux/audit.h +#include net/sock.h #include apparmor.h #include inline.h @@ -616,6 +617,117 @@ static void apparmor_task_free_security( aa_release(task); } +static int apparmor_socket_create(int family, int type, int protocol, int kern) +{ + struct aa_profile *profile; + int error = 0; + + if (kern) + return 0; + + profile = aa_get_profile(current); + if (profile) + error = aa_net_perm(profile, socket_create, family, + type, protocol); + aa_put_profile(profile); + + return error; +} + +static int apparmor_socket_post_create(struct socket *sock, int family, + int type, int protocol, int kern) +{ + struct sock *sk = sock-sk; + + if (kern) + return 0; + + return aa_revalidate_sk(sk, socket_post_create); +} + +static int apparmor_socket_bind(struct socket *sock, + struct sockaddr *address, int addrlen) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_bind); +} + +static int apparmor_socket_connect(struct socket *sock, + struct sockaddr *address, int addrlen) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_connect); +} + +static int apparmor_socket_listen(struct socket *sock, int backlog) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_listen); +} + +static int apparmor_socket_accept(struct socket *sock, struct socket *newsock) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_accept); +} + +static int apparmor_socket_sendmsg(struct socket *sock, + struct msghdr *msg, int size) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_sendmsg); +} + +static int apparmor_socket_recvmsg(struct socket *sock, + struct msghdr *msg, int size, int flags) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_recvmsg); +} + +static int apparmor_socket_getsockname(struct socket *sock) +{ + struct sock *sk = sock-sk; + + return aa_revalidate_sk(sk, socket_getsockname); +} + +static int
[AppArmor 37/47] Switch to vfs_permission() in do_path_lookup()
Switch from file_permission() to vfs_permission() in do_path_lookup(): this avoids calling permission() with a NULL nameidata here. Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED] Signed-off-by: John Johansen [EMAIL PROTECTED] --- fs/namei.c |9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) --- a/fs/namei.c +++ b/fs/namei.c @@ -1150,24 +1150,21 @@ static int fastcall do_path_lookup(int d path_get(fs-pwd); read_unlock(fs-lock); } else { - struct dentry *dentry; - file = fget_light(dfd, fput_needed); retval = -EBADF; if (!file) goto out_fail; - dentry = file-f_path.dentry; + nd-path = file-f_path; retval = -ENOTDIR; - if (!S_ISDIR(dentry-d_inode-i_mode)) + if (!S_ISDIR(nd-path.dentry-d_inode-i_mode)) goto fput_fail; - retval = file_permission(file, MAY_EXEC); + retval = vfs_permission(nd, MAY_EXEC); if (retval) goto fput_fail; - nd-path = file-f_path; path_get(file-f_path); fput_light(file, fput_needed); -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[AppArmor 47/47] --- Documentation/lsm/AppArmor-Security-Goal.txt | 134 +++++++++++++++++++++++++++ 1 file changed, 134 insertions(+)
--- /dev/null +++ b/Documentation/lsm/AppArmor-Security-Goal.txt @@ -0,0 +1,134 @@ +AppArmor Security Goal +Crispin Cowan, PhD +MercenaryLinux.com + +This document specifies the security goal that AppArmor is intended to +achieve, so that users can evaluate whether AppArmor will meet their +needs, and kernel developers can evaluate whether AppArmor is living up +to its claims. This document is *not* a general purpose explanation of +how AppArmor works, nor is it an explanation for why one might want to +use AppArmor rather than some other system. + +AppArmor is intended to limit system damage from attackers exploiting +vulnerabilities in applications that the system hosts. The threat is +that an attacker can cause a vulnerable application to do something +unexpected and undesirable. AppArmor addresses this threat by confining +the application to access only the resources it needs to access to +execute properly, effectively imposing least privilege execution on +the application. + +Applications interact with the rest of the system via resources +including files, interprocess communication, networking, capabilities, +and execution of other applications. The purpose of least privilege is +to bound the damage that a malicious user or code can do by removing +access to resources that the application does not need for its intended +function. This is true for all access control systems, including AppArmor. + +The attacker is someone trying to gain the privileges of a process for +themselves. For instance, a policy for a web server might grant read +only access to most web documents, preventing an attacker who can +corrupt the web server from defacing the web pages. A web server has +access to the web server's local file system, and a network attacker +trying to hack the web server does not have such file access. An e-mail +attacker attempting to infect the recipient of the e-mail does not have +access to the files that the victim user's mail client does. By limiting +the scope of access for an application, AppArmor can limit the damage an +attacker can do by exploiting vulnerabilities in applications. + +An application is one or more related processes performing a function, +e.g. the gang of processes that constitute an Apache web server, or a +Postfix mail server. AppArmor *only* confines processes that the +AppArmor policy says it should confine, and other processes are +permitted to do anything that DAC permits. This is sometimes known as a +targeted security policy. + +AppArmor does not provide a default policy that applies to all +processes. So to defend an entire host, you have to piece-wise confine +each process that is exposed to potential attack. For instance, to +defend a system against network attack, place AppArmor profiles around +every application that accesses the network. This limits the damage a +network attacker can do to the file system to only those files granted +by the profiles for the network-available applications. Similarly, to +defend a system against attack from the console, place AppArmor profiles +around every application that accessed the keyboard and mouse. The +system is defended in that the worst the attacker can do to corrupt +the system is limited to the transitive closure of what the confined +processes are allowed to access. + +AppArmor currently mediates access to files, ability to use POSIX.1e +Capabilities, and coarse-grained control on network access. This is +sufficient to prevent a confined process from *directly* corrupting the +file system. It is not sufficient to prevent a confined process from +*indirectly* corrupting the system by influencing some other process to +do the dirty deed. But to do so requires a complicit process that can be +manipulated through another channel such as IPC. A complicit process +is either a malicious process the attacker somehow got control of, or is +a process that is actively listening to IPC of some kind and can be +corrupted via IPC. + +The only IPC that AppArmor mediates is access to named sockets, FIFOs, +etc. that appear in the file system name space, a side effect of +AppArmor's file access mediation. Future versions of AppArmor will +mediate more resources, including finer grained network access controls, +and controls on various forms of IPC. + +AppArmor specifies the programs to be confined and the resources they +can access in a syntax similar to how users are accustomed to accessing +those resources. So file access controls are specified using absolute +paths with respect to the name space the process is in. POSIX.1e +capabilities are specified by name. Network access controls currently +are specified by simply naming the protocol that can be used e.g. tcp, +udp, and in the future will be more general, resembling firewall rules. + +Thus the AppArmor security goal should be considered piecewise from the +point of view of a single confined process: that process should only be +able to access the resources specified in its