Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)

2007-12-20 Thread Jaswinder Singh
Hello Remy,

On 12/20/07, Remy Bohmer [EMAIL PROTECTED] wrote:
 So, Is this a serious requirement? Should this be possible?

I have noticed this problem:
[EMAIL PROTECTED]:~# cat /proc/loadavgrt
1.00 1.00 1.00 0/52 1158
[EMAIL PROTECTED]:~# cat /proc/loadavg
0.00 0.00 0.02 1/52 1159
[EMAIL PROTECTED]:~#

So I am curious, if possible, user can switch softirq-threads or IRQs
RT tasks to non-RT tasks for slow hardware or least important hardware
for NON-RT tasks. So this will improve RT behaviour.

By default, all softirq-threads or IRQs will be RT task but if there
is some option user can switch it to NON-RT task then it will be good.

Thank you,

Jaswinder Singh.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 7/8] debugfs: allow access to signed values

2007-12-20 Thread Stefano Brivio
debugfs: allow access to signed values

Add debugfs_create_s{8,16,32,64}. For these to work properly, we need to remove
a cast in libfs, change the simple_attr_open prototype and thus fix the users as
well.

Cc: Johannes Berg [EMAIL PROTECTED]
Cc: Mattias Nissler [EMAIL PROTECTED]
To: Greg Kroah-Hartman [EMAIL PROTECTED]
To: Arnd Bergmann [EMAIL PROTECTED]
To: Akinobu Mita [EMAIL PROTECTED]
Signed-off-by: Stefano Brivio [EMAIL PROTECTED]
---

This version addresses last Johannes' concerns. Please just discard v4.

---
 arch/powerpc/platforms/cell/spufs/file.c |   48 
 fs/debugfs/file.c|  186 +--
 fs/libfs.c   |   21 +--
 include/linux/debugfs.h  |   38 ++
 include/linux/fs.h   |   25 ++--
 lib/fault-inject.c   |   11 +
 6 files changed, 267 insertions(+), 62 deletions(-)

Index: wireless-2.6/fs/debugfs/file.c
===
--- wireless-2.6.orig/fs/debugfs/file.c
+++ wireless-2.6/fs/debugfs/file.c
@@ -56,11 +56,11 @@ const struct inode_operations debugfs_li
.follow_link= debugfs_follow_link,
 };
 
-static void debugfs_u8_set(void *data, u64 val)
+static void debugfs_u8_set(void *data, unsigned long long val)
 {
*(u8 *)data = val;
 }
-static u64 debugfs_u8_get(void *data)
+static unsigned long long debugfs_u8_get(void *data)
 {
return *(u8 *)data;
 }
@@ -97,11 +97,11 @@ struct dentry *debugfs_create_u8(const c
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u8);
 
-static void debugfs_u16_set(void *data, u64 val)
+static void debugfs_u16_set(void *data, unsigned long long val)
 {
*(u16 *)data = val;
 }
-static u64 debugfs_u16_get(void *data)
+static unsigned long long debugfs_u16_get(void *data)
 {
return *(u16 *)data;
 }
@@ -138,11 +138,11 @@ struct dentry *debugfs_create_u16(const 
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u16);
 
-static void debugfs_u32_set(void *data, u64 val)
+static void debugfs_u32_set(void *data, unsigned long long val)
 {
*(u32 *)data = val;
 }
-static u64 debugfs_u32_get(void *data)
+static unsigned long long debugfs_u32_get(void *data)
 {
return *(u32 *)data;
 }
@@ -179,12 +179,12 @@ struct dentry *debugfs_create_u32(const 
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u32);
 
-static void debugfs_u64_set(void *data, u64 val)
+static void debugfs_u64_set(void *data, unsigned long long val)
 {
*(u64 *)data = val;
 }
 
-static u64 debugfs_u64_get(void *data)
+static unsigned long long debugfs_u64_get(void *data)
 {
return *(u64 *)data;
 }
@@ -221,6 +221,172 @@ struct dentry *debugfs_create_u64(const 
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u64);
 
+
+static void debugfs_s8_set(void *data, unsigned long long val)
+{
+   *(s8 *)data = val;
+}
+static unsigned long long debugfs_s8_get(void *data)
+{
+   return *(s8 *)data;
+}
+DEFINE_SIMPLE_ATTRIBUTE(fops_s8, debugfs_s8_get, debugfs_s8_set, %lld\n);
+
+/**
+ * debugfs_create_s8 - create a debugfs file that is used to read and write a 
signed 8-bit value
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *  directory dentry if set.  If this parameter is %NULL, then the
+ *  file will be created in the root of the debugfs filesystem.
+ * @value: a pointer to the variable that the file should read to and write
+ * from.
+ *
+ * This function creates a file in debugfs with the given name that
+ * contains the value of the variable @value.  If the @mode variable is so
+ * set, it can be read from, and written to.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the debugfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, %NULL will be returned.
+ *
+ * If debugfs is not enabled in the kernel, the value -%ENODEV will be
+ * returned.  It is not wise to check for this value, but rather, check for
+ * %NULL or !%NULL instead as to eliminate the need for #ifdef in the calling
+ * code.
+ */
+struct dentry *debugfs_create_s8(const char *name, mode_t mode,
+struct dentry *parent, s8 *value)
+{
+   return debugfs_create_file(name, mode, parent, value, fops_s8);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_s8);
+
+static void debugfs_s16_set(void *data, unsigned long long val)
+{
+   *(s16 *)data = val;
+}
+static unsigned long long debugfs_s16_get(void *data)
+{
+   return *(s16 *)data;
+}
+DEFINE_SIMPLE_ATTRIBUTE(fops_s16, debugfs_s16_get, debugfs_s16_set, %lld\n);
+
+/**
+ * debugfs_create_s16 - create a debugfs file that is used to read and write a 
signed 16-bit value
+ * @name: a pointer to a string 

Re: [PATCH v4 7/8] debugfs: allow access to signed values

2007-12-20 Thread Arnd Bergmann
On Thursday 20 December 2007, Stefano Brivio wrote:
 debugfs: allow access to signed values
 
 Add debugfs_create_s{8,16,32,64}. For these to work properly, we need to 
 remove
 a cast in libfs, change the simple_attr_open prototype and thus fix the users 
 as
 well.
 
 Cc: Johannes Berg [EMAIL PROTECTED]
 Cc: Mattias Nissler [EMAIL PROTECTED]
 To: Greg Kroah-Hartman [EMAIL PROTECTED]
 To: Arnd Bergmann [EMAIL PROTECTED]
 To: Akinobu Mita [EMAIL PROTECTED]
 Signed-off-by: Stefano Brivio [EMAIL PROTECTED]

Have you checked that spufs still builds? I would guess that you need
to do the same interface changes there.

Also, Christoph has recently posted a suggestion for how to improve
the interface to allow the 'get' operation to return an error:
http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=14962

I'd suggest consolidating the two changes in order to avoid merge
conflicts.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)

2007-12-20 Thread Remy Bohmer
 By default, all softirq-threads or IRQs will be RT task but if there
 is some option user can switch it to NON-RT task then it will be good.

Okay, understood.
I will think about how to add that too. But, probably we can add it
seperately from this patchset.

Kind Regards,

Remy


2007/12/20, Jaswinder Singh [EMAIL PROTECTED]:
 Hello Remy,

 On 12/20/07, Remy Bohmer [EMAIL PROTECTED] wrote:
  So, Is this a serious requirement? Should this be possible?

 I have noticed this problem:
 [EMAIL PROTECTED]:~# cat /proc/loadavgrt
 1.00 1.00 1.00 0/52 1158
 [EMAIL PROTECTED]:~# cat /proc/loadavg
 0.00 0.00 0.02 1/52 1159
 [EMAIL PROTECTED]:~#

 So I am curious, if possible, user can switch softirq-threads or IRQs
 RT tasks to non-RT tasks for slow hardware or least important hardware
 for NON-RT tasks. So this will improve RT behaviour.

 By default, all softirq-threads or IRQs will be RT task but if there
 is some option user can switch it to NON-RT task then it will be good.

 Thank you,

 Jaswinder Singh.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] udf: fix sparse warnings (shadowing mismatch between declaration and definition)

2007-12-20 Thread Jan Kara
On Wed 19-12-07 19:35:14, Marcin Slusarz wrote:
 On Mon, Dec 17, 2007 at 05:50:17PM +0100, Jan Kara wrote:
   fix warnings:
   fs/udf/super.c:1320:24: warning: symbol 'bh' shadows an earlier one
   fs/udf/super.c:1240:21: originally declared here
   fs/udf/super.c:1583:4: warning: symbol 'i' shadows an earlier one
   fs/udf/super.c:1418:6: originally declared here
   fs/udf/super.c:1585:4: warning: symbol 'i' shadows an earlier one
   fs/udf/super.c:1418:6: originally declared here
   fs/udf/super.c:1658:4: warning: symbol 'i' shadows an earlier one
   fs/udf/super.c:1648:6: originally declared here
   fs/udf/super.c:1660:4: warning: symbol 'i' shadows an earlier one
   fs/udf/super.c:1648:6: originally declared here
   fs/udf/super.c:450:6: warning: symbol 'udf_write_super' was not declared. 
   Should it be static?
  
   Signed-off-by: Marcin Slusarz [EMAIL PROTECTED]
   CC: Ben Fennema [EMAIL PROTECTED]
Thanks for the patch.  The 'bh' change is fine. The problems with 'i'
  should be solved differently I think. Those functions UDF_SB_FREE,
  UDF_SB_ALLOC_PARTMAPS should be functions and not macros. Please convert
  those to either inline functions if they are small or to regular
  functions if they are larger.  It won't be completely trivial because of
  the hackery e.g. in UDF_SB_ALLOC_BITMAP. It gets an argument meaning on
  which struct member something should be performed. But for example in
  the UDF_SB_ALLOC_BITMAP case you can simply make the function return the
  pointer to allocated and initialized space and the caller would assign
  it to a proper element of the superblock.
 Ok, I'll try to do it.
  Thanks.

  This would help the overall
  code quality of UDF (which is sadly quite poor).
 If you have other suggestions how to clean up this code, let me know.
 I'll see what I can do with them ;)
  Hmm, it's hard to formulate precisely. It's nothing particular but all in
all the code is simply hard to read - all those strange names of variables,
unusual sideeffects of functions, ... But one specific problem I'm aware of
is the lack of error handling so in case of IO error or filesystem
corruption results are quite spectacular (kernel crash, etc.).

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread David Miller
From: Matt Mackall [EMAIL PROTECTED]
Date: Mon, 17 Dec 2007 08:55:54 -0600

 On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote:
 Actually, you may only need these two:
 
  maps4-add-proc-kpagecount-interface.patch
  maps4-add-proc-kpageflags-interface.patch

Yes these two were enough, and exporting fs/proc/base.c's
mem_lseek().

As hard as I try, I can't reproduce this at all.  I tried
both on my workstation and my niagara boxes.

It must be other needle in the 30MB+ -mm haystack. :-(

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOPS: 2.6.24-rc5-mm1 -- EIP is at r_show+0x2a/0x70 -- (triggered by cat /proc/iomem)

2007-12-20 Thread Miles Lane
On Dec 20, 2007 6:37 AM, David Howells [EMAIL PROTECTED] wrote:
 Andrew Morton [EMAIL PROTECTED] wrote:

  I would be suspecting iget-stop-procfs-from-using-iget-and-read_inode.patch.

 I think your suspicions are very unlikely.  The patch only affects
 proc_get_inode() - and looking at the patch backtrace, it looks like the
 system is successfully past that already (it's unlikely that
 proc_reg_read+0x60/0x74 would have been reached otherwise).

 If my patch to procfs is wrong, it would affect all proc files and ought be
 immediately detectable.

I tested the patch Andrew sent.  I ran cat /proc/iomem before trying
a suspend-to-disk.  It worked fine.  Then I suspended and resumed.
This time, cat /proc/iomem caused the stack trace to be generated.
So, you are right, you patch is not the problem.

Thanks,
  Miles
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 00/43] user_regset framework -- arch maintainers take note!

2007-12-20 Thread Ingo Molnar

* Roland McGrath [EMAIL PROTECTED] wrote:

 This is a large series of patches, but there are only a couple that 
 you need to read in detail to know how to get started on cleaning up 
 your arch code (1, 4, 6).
 
 user_regset is a new kernel-internal interface into the arch code for 
 accessing the user-space view of machine-specific state (registers et 
 al--everything machine-specific that is visible via ptrace and the 
 like, or should be).  The idea is that arch code will have just one 
 place it has to support fetching and changing the user-visible machine 
 state of a user thread.  This same interface can be used for writing 
 core dumps, to underlie the implementation of PTRACE_GETREGS, 
 PTRACE_SETREGS, and the like, and by any new set of debugging 
 facilities that might come along.
[...]
 Patches 26 through 43 affect only arch/x86 code.  I have not CC'd 
 these ones to linux-arch.  They include a bunch of cleanup that is 
 specific to the idiosyncracies of the x86 code and isn't interesting 
 as an example for what another arch would do.

thanks Roland - this is a really impressive set of cleanups 
generalizations!

Testing feedback: i've put the x86 and core bits into x86.git and your 
regset series has so far successfully passed a couple of hundred 
iterations of random-qa on 32-bit and 64-bit x86 as well. (with a few 
ptrace tests added to the mix as well) So it's all green as far as 
arch/x86 and core goes :-)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[rfc][patch] mm: madvise(WILLNEED) for anonymous memory

2007-12-20 Thread Peter Zijlstra
Hi,

Lennart asked for madvise(WILLNEED) to work on anonymous pages, he plans
to use this to pre-fault pages. He currently uses: mlock/munlock for
this purpose.

[ compile tested only ]

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---
diff --git a/mm/madvise.c b/mm/madvise.c
index 93ee375..eff60ce 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -100,6 +100,24 @@ out:
return error;
 }
 
+static long madvice_willneed_anon(struct vm_area_struct *vma,
+ struct vm_area_struct **prev,
+ unsigned long start, unsigned long end)
+{
+   int ret, len;
+
+   *prev = vma;
+   if (end  vma-vm_end)
+   end = vma-vm_end;
+
+   len = end - start;
+   ret = get_user_pages(current, current-mm, start, len,
+   0, 0, NULL, NULL);
+   if (ret  0)
+   return ret;
+   return ret == len ? 0 : -1;
+}
+
 /*
  * Schedule all required I/O operations.  Do not wait for completion.
  */
@@ -110,7 +128,7 @@ static long madvise_willneed(struct vm_area_struct * vma,
struct file *file = vma-vm_file;
 
if (!file)
-   return -EBADF;
+   return madvice_willneed_anon(vma, prev, start, end);
 
if (file-f_mapping-a_ops-get_xip_page) {
/* no bad return value, but ignore advice */


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] add task handling notifier

2007-12-20 Thread Jan Beulich
With more and more sub-systems/sub-components leaving their footprint
in task handling functions, it seems reasonable to add notifiers that
these components can use instead of having them all patch themselves
directly into core files.

Patch 1 introduces the base definitions and hooks for task creation
and deletion.
Patch 2 switches delayacct to make use of the notifier.
Patch 3 makes the procevents/connector use the infrastructure and adds
additional notifiers needed there.
Patch 4 makes the security keys handling use this, too.

Signed-off-by: Jan Beulich [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] add task handling notifier: base definitions

2007-12-20 Thread Jan Beulich
This is the base patch, adding notification for task creation and
deletion.

Signed-off-by: Jan Beulich [EMAIL PROTECTED]
---
 include/linux/sched.h |8 +++-
 kernel/fork.c |   11 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

--- 2.6.24-rc5-notify-task.orig/include/linux/sched.h
+++ 2.6.24-rc5-notify-task/include/linux/sched.h
@@ -80,7 +80,7 @@ struct sched_param {
 #include linux/rcupdate.h
 #include linux/futex.h
 #include linux/rtmutex.h
-
+#include linux/notifier.h
 #include linux/time.h
 #include linux/param.h
 #include linux/resource.h
@@ -1700,6 +1700,12 @@ extern int do_execve(char *, char __user
 extern long do_fork(unsigned long, unsigned long, struct pt_regs *, unsigned 
long, int __user *, int __user *);
 struct task_struct *fork_idle(int);
 
+#define TASK_NEW 1
+#define TASK_DELETE 2
+
+extern struct blocking_notifier_head task_notifier_list;
+extern struct atomic_notifier_head atomic_task_notifier_list;
+
 extern void set_task_comm(struct task_struct *tsk, char *from);
 extern void get_task_comm(char *to, struct task_struct *tsk);
 
--- 2.6.24-rc5-notify-task.orig/kernel/fork.c
+++ 2.6.24-rc5-notify-task/kernel/fork.c
@@ -46,6 +46,7 @@
 #include linux/tsacct_kern.h
 #include linux/cn_proc.h
 #include linux/freezer.h
+#include linux/notifier.h
 #include linux/delayacct.h
 #include linux/taskstats_kern.h
 #include linux/random.h
@@ -71,6 +72,11 @@ DEFINE_PER_CPU(unsigned long, process_co
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
 
+BLOCKING_NOTIFIER_HEAD(task_notifier_list);
+EXPORT_SYMBOL_GPL(task_notifier_list);
+ATOMIC_NOTIFIER_HEAD(atomic_task_notifier_list);
+EXPORT_SYMBOL_GPL(atomic_task_notifier_list);
+
 int nr_processes(void)
 {
int cpu;
@@ -121,6 +127,9 @@ void __put_task_struct(struct task_struc
WARN_ON(atomic_read(tsk-usage));
WARN_ON(tsk == current);
 
+   atomic_notifier_call_chain(atomic_task_notifier_list,
+  TASK_DELETE, tsk);
+
security_task_free(tsk);
free_uid(tsk-user);
put_group_info(tsk-group_info);
@@ -1450,6 +1459,8 @@ long do_fork(unsigned long clone_flags,
set_tsk_thread_flag(p, TIF_SIGPENDING);
}
 
+   blocking_notifier_call_chain(task_notifier_list, TASK_NEW, p);
+
if (!(clone_flags  CLONE_STOPPED))
wake_up_new_task(p, clone_flags);
else



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] add task handling notifier: connector based proc events

2007-12-20 Thread Jan Beulich
This has the additional benefit of allowing the code to now be built
as a module (which made it necessary to add MODULE_xxx declarations).

Signed-off-by: Jan Beulich [EMAIL PROTECTED]
Cc: Matt Helsley [EMAIL PROTECTED]
---
 drivers/connector/Kconfig   |5 +--
 drivers/connector/cn_proc.c |   57 
 fs/exec.c   |5 ++-
 include/linux/cn_proc.h |   21 
 include/linux/sched.h   |4 +++
 kernel/exit.c   |5 ++-
 kernel/fork.c   |2 -
 kernel/sys.c|   27 +---
 8 files changed, 83 insertions(+), 43 deletions(-)

--- 2.6.24-rc5-notify-task.orig/drivers/connector/Kconfig
+++ 2.6.24-rc5-notify-task/drivers/connector/Kconfig
@@ -12,9 +12,8 @@ menuconfig CONNECTOR
 if CONNECTOR
 
 config PROC_EVENTS
-   boolean Report process events to userspace
-   depends on CONNECTOR=y
-   default y
+   tristate Report process events to userspace
+   default CONNECTOR
---help---
  Provide a connector that reports process events to userspace. Send
  events such as fork, exec, id change (uid, gid, suid, etc), and exit.
--- 2.6.24-rc5-notify-task.orig/drivers/connector/cn_proc.c
+++ 2.6.24-rc5-notify-task/drivers/connector/cn_proc.c
@@ -26,12 +26,17 @@
 #include linux/kernel.h
 #include linux/ktime.h
 #include linux/init.h
+#include linux/notifier.h
 #include linux/connector.h
 #include asm/atomic.h
 #include asm/unaligned.h
 
 #include linux/cn_proc.h
 
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Matt Helsley [EMAIL PROTECTED]);
+MODULE_DESCRIPTION(Process events - userspace connector);
+
 #define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event))
 
 static atomic_t proc_event_num_listeners = ATOMIC_INIT(0);
@@ -47,7 +52,7 @@ static inline void get_seq(__u32 *ts, in
put_cpu_var(proc_event_counts);
 }
 
-void proc_fork_connector(struct task_struct *task)
+static void proc_fork_connector(struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
@@ -75,7 +80,7 @@ void proc_fork_connector(struct task_str
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
-void proc_exec_connector(struct task_struct *task)
+static void proc_exec_connector(struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
@@ -100,7 +105,7 @@ void proc_exec_connector(struct task_str
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
-void proc_id_connector(struct task_struct *task, int which_id)
+static void proc_id_connector(struct task_struct *task, int which_id)
 {
struct cn_msg *msg;
struct proc_event *ev;
@@ -133,7 +138,7 @@ void proc_id_connector(struct task_struc
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
-void proc_exit_connector(struct task_struct *task)
+static void proc_exit_connector(struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
@@ -220,6 +225,35 @@ static void cn_proc_mcast_ctl(void *data
cn_proc_ack(err, msg-seq, msg-ack);
 }
 
+static int connector_task_notifier(struct notifier_block *nb,
+  unsigned long action,
+  void *task)
+{
+   switch (action) {
+   case TASK_NEW:
+   proc_fork_connector(task);
+   break;
+   case TASK_EXEC:
+   proc_exec_connector(task);
+   break;
+   case TASK_EXIT:
+   proc_exit_connector(task);
+   break;
+   case TASK_UID_CHANGE:
+   proc_id_connector(task, PROC_EVENT_UID);
+   break;
+   case TASK_GID_CHANGE:
+   proc_id_connector(task, PROC_EVENT_GID);
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block connector_task_notifier_block = {
+   .notifier_call = connector_task_notifier
+};
+
 /*
  * cn_proc_init - initialization entry point
  *
@@ -234,7 +268,22 @@ static int __init cn_proc_init(void)
printk(KERN_WARNING cn_proc failed to register\n);
return err;
}
+   blocking_notifier_chain_register(task_notifier_list,
+connector_task_notifier_block);
return 0;
 }
 
+/*
+ * cn_proc_exit - unload entry point
+ *
+ * Removes the connector callback from the connector driver.
+ */
+static void __exit cn_proc_exit(void)
+{
+   blocking_notifier_chain_unregister(task_notifier_list,
+  connector_task_notifier_block);
+   cn_del_callback(cn_proc_event_id);
+}
+
 module_init(cn_proc_init);
+module_exit(cn_proc_exit);
--- 2.6.24-rc5-notify-task.orig/fs/exec.c
+++ 2.6.24-rc5-notify-task/fs/exec.c
@@ -49,7 +49,7 @@
 #include linux/syscalls.h
 #include linux/rmap.h
 #include linux/tsacct_kern.h
-#include linux/cn_proc.h
+#include linux/notifier.h
 #include linux/audit.h
 
 #include 

[PATCH 4/4] add task handling notifier: security keys

2007-12-20 Thread Jan Beulich
Signed-off-by: Jan Beulich [EMAIL PROTECTED]
Cc: David Howells [EMAIL PROTECTED]
---
 arch/mips/kernel/kspd.c  |7 +++--
 include/linux/key.h  |4 ---
 kernel/sys.c |8 --
 security/keys/process_keys.c |   55 ++-
 4 files changed, 39 insertions(+), 35 deletions(-)

--- 2.6.24-rc5-notify-task.orig/arch/mips/kernel/kspd.c
+++ 2.6.24-rc5-notify-task/arch/mips/kernel/kspd.c
@@ -25,6 +25,7 @@
 #include linux/workqueue.h
 #include linux/errno.h
 #include linux/list.h
+#include linux/notifier.h
 
 #include asm/vpe.h
 #include asm/rtlx.h
@@ -177,8 +178,10 @@ static void sp_setfsuidgid( uid_t uid, g
current-fsuid = uid;
current-fsgid = gid;
 
-   key_fsuid_changed(current);
-   key_fsgid_changed(current);
+   blocking_notifier_call_chain(task_notifier_list,
+TASK_UID_CHANGE, current);
+   blocking_notifier_call_chain(task_notifier_list,
+TASK_GID_CHANGE, current);
 }
 
 /*
--- 2.6.24-rc5-notify-task.orig/include/linux/key.h
+++ 2.6.24-rc5-notify-task/include/linux/key.h
@@ -272,8 +272,6 @@ extern void exit_keys(struct task_struct
 extern void exit_thread_group_keys(struct signal_struct *tg);
 extern int suid_keys(struct task_struct *tsk);
 extern int exec_keys(struct task_struct *tsk);
-extern void key_fsuid_changed(struct task_struct *tsk);
-extern void key_fsgid_changed(struct task_struct *tsk);
 extern void key_init(void);
 
 #define __install_session_keyring(tsk, keyring)\
@@ -302,8 +300,6 @@ extern void key_init(void);
 #define exit_thread_group_keys(tg) do { } while(0)
 #define suid_keys(t)   do { } while(0)
 #define exec_keys(t)   do { } while(0)
-#define key_fsuid_changed(t)   do { } while(0)
-#define key_fsgid_changed(t)   do { } while(0)
 #define key_init() do { } while(0)
 
 /* Initial keyrings */
--- 2.6.24-rc5-notify-task.orig/kernel/sys.c
+++ 2.6.24-rc5-notify-task/kernel/sys.c
@@ -517,7 +517,6 @@ asmlinkage long sys_setregid(gid_t rgid,
current-fsgid = new_egid;
current-egid = new_egid;
current-gid = new_rgid;
-   key_fsgid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_GID_CHANGE, current);
 
@@ -554,7 +553,6 @@ asmlinkage long sys_setgid(gid_t gid)
else
return -EPERM;
 
-   key_fsgid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_GID_CHANGE, current);
 
@@ -644,7 +642,6 @@ asmlinkage long sys_setreuid(uid_t ruid,
current-suid = current-euid;
current-fsuid = current-euid;
 
-   key_fsuid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_UID_CHANGE, current);
 
@@ -692,7 +689,6 @@ asmlinkage long sys_setuid(uid_t uid)
current-fsuid = current-euid = uid;
current-suid = new_suid;
 
-   key_fsuid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_UID_CHANGE, current);
 
@@ -741,7 +737,6 @@ asmlinkage long sys_setresuid(uid_t ruid
if (suid != (uid_t) -1)
current-suid = suid;
 
-   key_fsuid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_UID_CHANGE, current);
 
@@ -794,7 +789,6 @@ asmlinkage long sys_setresgid(gid_t rgid
if (sgid != (gid_t) -1)
current-sgid = sgid;
 
-   key_fsgid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_GID_CHANGE, current);
return 0;
@@ -836,7 +830,6 @@ asmlinkage long sys_setfsuid(uid_t uid)
current-fsuid = uid;
}
 
-   key_fsuid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_UID_CHANGE, current);
 
@@ -864,7 +857,6 @@ asmlinkage long sys_setfsgid(gid_t gid)
smp_wmb();
}
current-fsgid = gid;
-   key_fsgid_changed(current);
blocking_notifier_call_chain(task_notifier_list,
 TASK_GID_CHANGE, current);
}
--- 2.6.24-rc5-notify-task.orig/security/keys/process_keys.c
+++ 2.6.24-rc5-notify-task/security/keys/process_keys.c
@@ -16,6 +16,7 @@
 #include linux/keyctl.h
 #include linux/fs.h
 #include linux/err.h
+#include linux/notifier.h
 #include linux/mutex.h
 #include asm/uaccess.h
 #include internal.h
@@ -354,35 +355,47 @@ int suid_keys(struct task_struct *tsk)
 
 } /* end suid_keys() */
 
-/*/
-/*
- * the filesystem user ID changed
- */

[patch 1/5] x86, ptrace: rlimit BTS buffer allocation

2007-12-20 Thread Markus Metzger
Check the rlimit of the tracing task for total and locked memory when 
allocating the BTS buffer.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:51:21.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-20 13:51:45.%N +0100
@@ -620,12 +620,80 @@
return i;
 }
 
+static int ptrace_bts_realloc(struct task_struct *child,
+ int size, int reduce_size)
+{
+   unsigned long rlim, vm;
+   int ret, old_size;
+
+   if (size  0)
+   return -EINVAL;
+
+   old_size = ds_get_bts_size((void *)child-thread.ds_area_msr);
+   if (old_size  0)
+   return old_size;
+
+   ret = ds_free((void **)child-thread.ds_area_msr);
+   if (ret  0)
+   goto out;
+
+   size = PAGE_SHIFT;
+   old_size = PAGE_SHIFT;
+
+   current-mm-total_vm  -= old_size;
+   current-mm-locked_vm -= old_size;
+
+   if (size == 0)
+   goto out;
+
+   rlim = current-signal-rlim[RLIMIT_AS].rlim_cur  PAGE_SHIFT;
+   vm = current-mm-total_vm  + size;
+   if (rlim  vm) {
+   ret = -ENOMEM;
+
+   if (!reduce_size)
+   goto out;
+
+   size = rlim - current-mm-total_vm;
+   if (size = 0)
+   goto out;
+   }
+
+   rlim = current-signal-rlim[RLIMIT_MEMLOCK].rlim_cur  PAGE_SHIFT;
+   vm = current-mm-locked_vm  + size;
+   if (rlim  vm) {
+   ret = -ENOMEM;
+
+   if (!reduce_size)
+   goto out;
+
+   size = rlim - current-mm-locked_vm;
+   if (size = 0)
+   goto out;
+   }
+
+   ret = ds_allocate((void **)child-thread.ds_area_msr,
+ size  PAGE_SHIFT);
+   if (ret  0)
+   goto out;
+
+   current-mm-total_vm  += size;
+   current-mm-locked_vm += size;
+
+out:
+   if (child-thread.ds_area_msr)
+   set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
+   else
+   clear_tsk_thread_flag(child, TIF_DS_AREA_MSR);
+
+   return ret;
+}
+
 static int ptrace_bts_config(struct task_struct *child,
 const struct ptrace_bts_config __user *ucfg)
 {
struct ptrace_bts_config cfg;
-   unsigned long debugctl_mask;
-   int bts_size, ret;
+   int bts_size, ret = 0;
void *ds;
 
if (copy_from_user(cfg, ucfg, sizeof(cfg)))
@@ -638,59 +706,46 @@
if (bts_size  0)
return bts_size;
}
+   cfg.size = PAGE_ALIGN(cfg.size);
 
if (bts_size != cfg.size) {
-   ret = ds_free((void **)child-thread.ds_area_msr);
+   ret = ptrace_bts_realloc(child, cfg.size,
+cfg.flags  PTRACE_BTS_O_CUT_SIZE);
if (ret  0)
-   return ret;
+   goto errout;
 
-   if (cfg.size  0)
-   ret = ds_allocate((void **)child-thread.ds_area_msr,
- cfg.size);
ds = (void *)child-thread.ds_area_msr;
-   if (ds)
-   set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
-   else
-   clear_tsk_thread_flag(child, TIF_DS_AREA_MSR);
-
-   if (ret  0)
-   return ret;
-
-   bts_size = ds_get_bts_size(ds);
-   if (bts_size = 0)
-   return bts_size;
}
 
-   if (ds) {
-   if (cfg.flags  PTRACE_BTS_O_SIGNAL) {
-   ret = ds_set_overflow(ds, DS_O_SIGNAL);
-   } else {
-   ret = ds_set_overflow(ds, DS_O_WRAP);
-   }
-   if (ret  0)
-   return ret;
-   }
-
-   debugctl_mask = ds_debugctl_mask();
-   if (ds  (cfg.flags  PTRACE_BTS_O_TRACE)) {
-   child-thread.debugctlmsr |= debugctl_mask;
-   set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
-   } else {
-   /* there is no way for us to check whether we 'own'
-* the respective bits in the DEBUGCTL MSR, we're
-* about to clear */
-   child-thread.debugctlmsr = ~debugctl_mask;
+   if (cfg.flags  PTRACE_BTS_O_SIGNAL)
+   ret = ds_set_overflow(ds, DS_O_SIGNAL);
+   else
+   ret = ds_set_overflow(ds, DS_O_WRAP);
+   if (ret  0)
+   goto errout;
 
-   if (!child-thread.debugctlmsr)
-   clear_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
-   }
+   if (cfg.flags  PTRACE_BTS_O_TRACE)
+   child-thread.debugctlmsr |= ds_debugctl_mask();
+   else
+ 

[patch 2/5] x86, ptrace: support 32bit-cross-64bit BTS recording

2007-12-20 Thread Markus Metzger
Support BTS recording of 32bit and 64bit tasks from 32bit or 64bit tasks.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-20 13:51:20.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-20 13:52:01.%N +0100
@@ -111,53 +111,53 @@
  * Accessor functions for some DS and BTS fields using the above
  * global ptrace_bts_cfg.
  */
-static inline void *get_bts_buffer_base(char *base)
+static inline unsigned long get_bts_buffer_base(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_buffer_base.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_buffer_base.offset);
 }
-static inline void set_bts_buffer_base(char *base, void *value)
+static inline void set_bts_buffer_base(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_buffer_base.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_buffer_base.offset)) = value;
 }
-static inline void *get_bts_index(char *base)
+static inline unsigned long get_bts_index(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_index.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_index.offset);
 }
-static inline void set_bts_index(char *base, void *value)
+static inline void set_bts_index(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_index.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_index.offset)) = value;
 }
-static inline void *get_bts_absolute_maximum(char *base)
+static inline unsigned long get_bts_absolute_maximum(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_absolute_maximum.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset);
 }
-static inline void set_bts_absolute_maximum(char *base, void *value)
+static inline void set_bts_absolute_maximum(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_absolute_maximum.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset)) = value;
 }
-static inline void *get_bts_interrupt_threshold(char *base)
+static inline unsigned long get_bts_interrupt_threshold(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_interrupt_threshold.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset);
 }
-static inline void set_bts_interrupt_threshold(char *base, void *value)
+static inline void set_bts_interrupt_threshold(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_interrupt_threshold.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset)) = 
value;
 }
-static inline long get_from_ip(char *base)
+static inline unsigned long get_from_ip(char *base)
 {
-   return *(long *)(base + ds_cfg.from_ip.offset);
+   return *(unsigned long *)(base + ds_cfg.from_ip.offset);
 }
-static inline void set_from_ip(char *base, long value)
+static inline void set_from_ip(char *base, unsigned long value)
 {
-   (*(long *)(base + ds_cfg.from_ip.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.from_ip.offset)) = value;
 }
-static inline long get_to_ip(char *base)
+static inline unsigned long get_to_ip(char *base)
 {
-   return *(long *)(base + ds_cfg.to_ip.offset);
+   return *(unsigned long *)(base + ds_cfg.to_ip.offset);
 }
-static inline void set_to_ip(char *base, long value)
+static inline void set_to_ip(char *base, unsigned long value)
 {
-   (*(long *)(base + ds_cfg.to_ip.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.to_ip.offset)) = value;
 }
 static inline unsigned char get_info_type(char *base)
 {
@@ -180,7 +180,7 @@
 int ds_allocate(void **dsp, size_t bts_size_in_bytes)
 {
size_t bts_size_in_records;
-   void *bts;
+   unsigned long bts;
void *ds;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
@@ -197,7 +197,7 @@
if (bts_size_in_bytes = 0)
return -EINVAL;
 
-   bts = kzalloc(bts_size_in_bytes, GFP_KERNEL);
+   bts = (unsigned long)kzalloc(bts_size_in_bytes, GFP_KERNEL);
 
if (!bts)
return -ENOMEM;
@@ -205,7 +205,7 @@
ds = kzalloc(ds_cfg.sizeof_ds, GFP_KERNEL);
 
if (!ds) {
-   kfree(bts);
+   kfree((void *)bts);
return -ENOMEM;
}
 
@@ -221,7 +221,7 @@
 int ds_free(void **dsp)
 {
if (*dsp)
-   kfree(get_bts_buffer_base(*dsp));
+   kfree((void *)get_bts_buffer_base(*dsp));
kfree(*dsp);
*dsp = 0;
 
@@ -230,7 +230,7 @@
 
 int ds_get_bts_size(void *ds)
 {
-   size_t size_in_bytes;
+   int size_in_bytes;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
@@ -246,7 +246,7 @@
 
 int ds_get_bts_end(void *ds)
 {
-   size_t size_in_bytes = ds_get_bts_size(ds);
+   int size_in_bytes = 

Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)

2007-12-20 Thread Juergen Beisert
On Thursday 20 December 2007 13:45, Jaswinder Singh wrote:
 On 12/20/07, Remy Bohmer [EMAIL PROTECTED] wrote:
  So, Is this a serious requirement? Should this be possible?

 I have noticed this problem:
 [EMAIL PROTECTED]:~# cat /proc/loadavgrt
 1.00 1.00 1.00 0/52 1158
 [EMAIL PROTECTED]:~# cat /proc/loadavg
 0.00 0.00 0.02 1/52 1159
 [EMAIL PROTECTED]:~#

 So I am curious, if possible, user can switch softirq-threads or IRQs
 RT tasks to non-RT tasks for slow hardware or least important hardware
 for NON-RT tasks. So this will improve RT behaviour.
^^
Why?

IMHO: Simply decrease the RT priority of less important IRQs or increase all 
other more important IRQs. IRQs are always more important than other 
processes in a system, also in non RT systems.

Juergen
-- 
Dipl.-Ing. Juergen Beisert | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry
    Handelsregister: Amtsgericht Hildesheim, HRA 2686
     Vertretung Sued/Muenchen, Germany
   Phone: +49-8766-939 228 |  Fax: +49-5121-206917-9
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/5] x86, ptrace: add buffer size checks

2007-12-20 Thread Markus Metzger
Pass the buffer size for (most) ptrace commands that pass user-allocated 
buffers and check that size before accessing the buffer. Unfortunately, 
PTRACE_BTS_GET already uses all 4 parameters.
Commands that access user buffers return the number of bytes or records read or 
written.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:52:01.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-20 13:52:09.%N +0100
@@ -591,6 +591,7 @@
 }
 
 static int ptrace_bts_drain(struct task_struct *child,
+   long size,
struct bts_struct __user *out)
 {
int end, i;
@@ -603,6 +604,9 @@
if (end = 0)
return end;
 
+   if (size  (end * sizeof(struct bts_struct)))
+   return -EIO;
+
for (i = 0; i  end; i++, out++) {
struct bts_struct ret;
int retval;
@@ -617,7 +621,7 @@
 
ds_clear(ds);
 
-   return i;
+   return end;
 }
 
 static int ptrace_bts_realloc(struct task_struct *child,
@@ -690,15 +694,22 @@
 }
 
 static int ptrace_bts_config(struct task_struct *child,
+long cfg_size,
 const struct ptrace_bts_config __user *ucfg)
 {
struct ptrace_bts_config cfg;
int bts_size, ret = 0;
void *ds;
 
+   if (cfg_size  sizeof(cfg))
+   return -EIO;
+
if (copy_from_user(cfg, ucfg, sizeof(cfg)))
return -EFAULT;
 
+   if ((int)cfg.size  0)
+   return -EINVAL;
+
bts_size = 0;
ds = (void *)child-thread.ds_area_msr;
if (ds) {
@@ -734,6 +745,8 @@
else
clear_tsk_thread_flag(child, TIF_BTS_TRACE_TS);
 
+   ret = sizeof(cfg);
+
 out:
if (child-thread.debugctlmsr)
set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
@@ -749,11 +762,15 @@
 }
 
 static int ptrace_bts_status(struct task_struct *child,
+long cfg_size,
 struct ptrace_bts_config __user *ucfg)
 {
void *ds = (void *)child-thread.ds_area_msr;
struct ptrace_bts_config cfg;
 
+   if (cfg_size  sizeof(cfg))
+   return -EIO;
+
memset(cfg, 0, sizeof(cfg));
 
if (ds) {
@@ -935,12 +952,12 @@
 
case PTRACE_BTS_CONFIG:
ret = ptrace_bts_config
-   (child, (struct ptrace_bts_config __user *)addr);
+   (child, data, (struct ptrace_bts_config __user *)addr);
break;
 
case PTRACE_BTS_STATUS:
ret = ptrace_bts_status
-   (child, (struct ptrace_bts_config __user *)addr);
+   (child, data, (struct ptrace_bts_config __user *)addr);
break;
 
case PTRACE_BTS_SIZE:
@@ -958,7 +975,7 @@
 
case PTRACE_BTS_DRAIN:
ret = ptrace_bts_drain
-   (child, (struct bts_struct __user *) addr);
+   (child, data, (struct bts_struct __user *) addr);
break;
 
default:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:01.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-20 13:52:09.%N +0100
@@ -99,13 +99,15 @@
 
 #define PTRACE_BTS_CONFIG  40
 /* Configure branch trace recording.
-   DATA is ignored, ADDR points to a struct ptrace_bts_config.
+   ADDR points to a struct ptrace_bts_config.
+   DATA gives the size of that buffer.
A new buffer is allocated, iff the size changes.
+   Returns the number of bytes read.
 */
 #define PTRACE_BTS_STATUS  41
-/* Return the current configuration.
-   DATA is ignored, ADDR points to a struct ptrace_bts_config
-   that will contain the result.
+/* Return the current configuration in a struct ptrace_bts_config
+   pointed to by ADDR; DATA gives the size of that buffer.
+   Returns the number of bytes written.
 */
 #define PTRACE_BTS_SIZE42
 /* Return the number of available BTS records.
@@ -123,8 +125,8 @@
 */
 #define PTRACE_BTS_DRAIN   45
 /* Read all available BTS records and clear the buffer.
-   DATA is ignored. ADDR points to an array of struct bts_struct of
-   suitable size.
+   ADDR points to an array of struct bts_struct.
+   DATA gives the size of that buffer.
BTS records are read from oldest to newest.
Returns number of BTS records drained.
 */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer

Re: [PATCH] Move page_assign_page_cgroup to VM_BUG_ON in free_hot_cold_page

2007-12-20 Thread Hugh Dickins
On Wed, 19 Dec 2007, Dave Hansen wrote:
  --- 
  linux-2.6.24-rc5/mm/page_alloc.c~memory-controller-move-to-bug-on-in-free_hot_cold_page
   2007-12-19 11:31:46.0 +0530
  +++ linux-2.6.24-rc5-balbir/mm/page_alloc.c 2007-12-19 
  11:33:45.0 +0530
  @@ -995,7 +995,7 @@ static void fastcall free_hot_cold_page(
   
  if (!PageHighMem(page))
  debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
  -   page_assign_page_cgroup(page, NULL);
  +   VM_BUG_ON(page_get_page_cgroup(page));
  arch_free_page(page, 0);
  kernel_map_pages(page, 1, 0); 
 
 Hi Balbir,
 
 You generally want to do these like:
 
   foo = page_assign_page_cgroup(page, NULL);
   VM_BUG_ON(foo);
 
 Some embedded people have been known to optimize kernel size like this:
 
   #define VM_BUG_ON(x) do{}while(0)

Balbir's patch looks fine to me: I don't get your point there, Dave.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix crash with FLAT_MEMORY and ARCH_PFN_OFFSET != 0

2007-12-20 Thread Mel Gorman
On (20/12/07 13:43), Thomas Bogendoerfer didst pronounce:
 On Thu, Dec 20, 2007 at 11:44:06AM +, Mel Gorman wrote:
  --- a/include/asm-mips/page.h
  +++ b/include/asm-mips/page.h
  @@ -37,13 +37,6 @@
   #include linux/pfn.h
   #include asm/io.h
   
  -/*
  - * It's normally defined only for FLATMEM config but it's
  - * used in our early mem init code for all memory models.
  - * So always define it.
  - */
  -#define ARCH_PFN_OFFSETPFN_UP(PHYS_OFFSET)
  -
 
 hmm, doesn't this break what I've fixed ? Without this #define
 ARCH_PFN_OFFSET gets defined to 0 and the bug is back. Or did
 I miss anything ?
 

ARCH_PFN_OFFSET goes to 0, so page_to_pfn() is no longer adjusting by
PFN_UP(PHYS_OFFSET) like it was when your problem occured. I am guessing
that the nature of the crash was that page_to_pfn() was returning bogus
values early in boot and trying to initialise memmap that didn't exist.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] udf: fix signedness issue

2007-12-20 Thread Jan Kara
On Wed 19-12-07 20:27:20, Marcin Slusarz wrote:
 On Mon, Dec 17, 2007 at 05:32:17PM +0100, Jan Kara wrote:
   sparse generated:
   fs/udf/namei.c:896:15: originally declared here
   fs/udf/namei.c:1147:41: warning: incorrect type in argument 3 (different 
   signedness)
   fs/udf/namei.c:1147:41:expected int *offset
   fs/udf/namei.c:1147:41:got unsigned int *noident
   fs/udf/namei.c:1152:78: warning: incorrect type in argument 3 (different 
   signedness)
   fs/udf/namei.c:1152:78:expected int *offset
   fs/udf/namei.c:1152:78:got unsigned int *noident
  
   Signed-off-by: Marcin Slusarz [EMAIL PROTECTED]
I don't think this is right. udf_get_fileident() should take unsigned
  int * as an offset, not just int. This means changing struct
  udf_fileident_bh to use unsigned int too but that is better anyway.
And BTW the type shouldn't be uint32_t but really unsigned int in
  udf_rename (int needn't have 32 bits on all archs (although I think it
  has currently)).
 That would be hard. Look what is happening with soffset and eoffset
 eg in udf_fileident_read() - these fields are used as signed ints.
  Doh, you're right. soffset can go below 0. OK, then your patch is fine.
You can add there
  Acked-by: Jan Kara [EMAIL PROTECTED]

  BTW: If you haven't got an email from Andrew about accepting the patches
we agreed upon, then please resend the patches to him (with additional
Acked-by: Jan Kara [EMAIL PROTECTED] and CC me) so that they don't get lost.
Thanks.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/16] lguest: introduce vcpu structure

2007-12-20 Thread Glauber de Oliveira Costa
this patch makes room for the vcpu structure in lguest, already used in
this very same way at lguest64. It's the first part of our plan to
have lguest and lguest64 unified too. 

When two dogs hang out, you don't have new puppies right in the other day.
Some time has to be elapsed. They have to grow first. In this same spirit, 
having these 
patches _do not_ mean smp guests can be launched (yet)
Much more work is to come, but this is the basic infrastructure.

Enjoy


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/16] introduce vcpu struct

2007-12-20 Thread Glauber de Oliveira Costa
this patch introduces a vcpu struct for lguest. In upcoming patches,
more and more fields will be moved from the lguest struct to the vcpu

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/lg.h |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 8692489..9723732 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -38,6 +38,13 @@ struct lguest_pages
 #define CHANGED_GDT_TLS4 /* Actually a subset of CHANGED_GDT */
 #define CHANGED_ALL3
 
+struct lguest;
+
+struct lguest_vcpu {
+   int vcpu_id;
+   struct lguest *lg;
+};
+
 /* The private info the thread maintains about the guest. */
 struct lguest
 {
@@ -47,6 +54,9 @@ struct lguest
struct lguest_data __user *lguest_data;
struct task_struct *tsk;
struct mm_struct *mm;   /* == tsk-mm, but that becomes NULL on exit */
+   struct lguest_vcpu vcpus[NR_CPUS];
+   unsigned int nr_vcpus;
+
u32 pfn_limit;
/* This provides the offset to the base of guest-physical
 * memory in the Launcher. */
@@ -92,6 +102,11 @@ struct lguest
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
 };
 
+static inline struct lguest *lg_of_vcpu(struct lguest_vcpu *vcpu)
+{
+   return container_of((vcpu - vcpu-vcpu_id), struct lguest, vcpus[0]);
+}
+
 extern struct mutex lguest_lock;
 
 /* core.c: */
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/16] initialize vcpu

2007-12-20 Thread Glauber de Oliveira Costa
this patch initializes the first vcpu in the initialize() routing,
which is responsible for starting the process of putting the guest up.
right now, as much of the fields are still not per-vcpu, it does not
do much.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/lguest_user.c |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 3b92a61..d1b1c26 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -88,6 +88,17 @@ static ssize_t read(struct file *file, char __user *user, 
size_t size,loff_t*o)
return run_guest(lg, (unsigned long __user *)user);
 }
 
+static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
+ unsigned long start_ip)
+{
+   vcpu-vcpu_id = vcpu_id;
+
+   vcpu-lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);
+   vcpu-lg-nr_vcpus++;
+
+   return 0;
+}
+
 /*L:020 The initialization write supplies 4 pointer sized (32 or 64 bit)
  * values (in addition to the LHREQ_INITIALIZE value).  These are:
  *
@@ -134,6 +145,12 @@ static int initialize(struct file *file, const unsigned 
long __user *input)
lg-mem_base = (void __user *)(long)args[0];
lg-pfn_limit = args[1];
 
+   /* This is the first cpu */
+   lg-nr_vcpus = 0;
+   err = vcpu_start(lg-vcpus[0], 0, args[3]);
+   if (err)
+   goto release_guest;
+
/* We need a complete page for the Guest registers: they are accessible
 * to the Guest and we can only grant it access to whole pages. */
lg-regs_page = get_zeroed_page(GFP_KERNEL);
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/16] adapt lguest launcher to per-cpuness

2007-12-20 Thread Glauber de Oliveira Costa
This patch makes uses of pread() and pwrite() in lguest launcher
to communicate the vcpu id to the lguest driver. The id is kept in
a thread variable, which means we'll span in the future, vcpus as
threads. But right now, only the infrastructure is out there.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 Documentation/lguest/lguest.c |   24 +---
 1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 9b0e322..c406ba9 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -79,6 +79,9 @@ static void *guest_base;
 /* The maximum guest physical address allowed, and maximum possible. */
 static unsigned long guest_limit, guest_max;
 
+/* a per-cpu variable indicating whose vcpu is currently running */
+static unsigned int __thread vcpu_id;
+
 /* This is our list of devices. */
 struct device_list
 {
@@ -554,7 +557,7 @@ static void wake_parent(int pipefd, int lguest_fd)
else
FD_CLR(-fd - 1, devices.infds);
} else /* Send LHREQ_BREAK command. */
-   write(lguest_fd, args, sizeof(args));
+   pwrite(lguest_fd, args, sizeof(args), 0);
}
 }
 
@@ -1511,7 +1514,8 @@ static void __attribute__((noreturn)) run_guest(int 
lguest_fd)
int readval;
 
/* We read from the /dev/lguest device to run the Guest. */
-   readval = read(lguest_fd, notify_addr, sizeof(notify_addr));
+   readval = pread(lguest_fd, notify_addr,
+   sizeof(notify_addr), vcpu_id);
 
/* One unsigned long means the Guest did HCALL_NOTIFY */
if (readval == sizeof(notify_addr)) {
@@ -1521,17 +1525,22 @@ static void __attribute__((noreturn)) run_guest(int 
lguest_fd)
/* ENOENT means the Guest died.  Reading tells us why. */
} else if (errno == ENOENT) {
char reason[1024] = { 0 };
-   read(lguest_fd, reason, sizeof(reason)-1);
+   pread(lguest_fd, reason, sizeof(reason)-1, vcpu_id);
errx(1, %s, reason);
/* EAGAIN means the Waker wanted us to look at some input.
 * Anything else means a bug or incompatible change. */
} else if (errno != EAGAIN)
err(1, Running guest failed);
 
-   /* Service input, then unset the BREAK to release the Waker. */
-   handle_input(lguest_fd);
-   if (write(lguest_fd, args, sizeof(args))  0)
-   err(1, Resetting break);
+   if (!vcpu_id) {
+   /*
+* Service input, then unset the BREAK to
+* release the Waker.
+*/
+   handle_input(lguest_fd);
+   if (pwrite(lguest_fd, args, sizeof(args), 0)  0)
+   err(1, Resetting break);
+   }
}
 }
 /*
@@ -1582,6 +1591,7 @@ int main(int argc, char *argv[])
devices.lastdev = devices.dev;
devices.next_irq = 1;
 
+   vcpu_id = 0;
/* We need to know how much memory so we can set up the device
 * descriptor and memory pages for the devices as we parse the command
 * line.  So we quickly look through the arguments to find the amount
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/16] per-cpu run guest

2007-12-20 Thread Glauber de Oliveira Costa
This patch makes the run_guest() routine use the vcpu struct.
This is required since in a smp guest environment, there's no
more the notion of running the guest, but rather, it is running the vcpu

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/core.c|6 --
 drivers/lguest/lg.h  |4 ++--
 drivers/lguest/lguest_user.c |6 +-
 drivers/lguest/x86/core.c|   16 +++-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index cb4c670..70fc65e 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -174,8 +174,10 @@ void __lgwrite(struct lguest *lg, unsigned long addr, 
const void *b,
 /*H:030 Let's jump straight to the the main loop which runs the Guest.
  * Remember, this is called by the Launcher reading /dev/lguest, and we keep
  * going around and around until something interesting happens. */
-int run_guest(struct lguest *lg, unsigned long __user *user)
+int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user)
 {
+   struct lguest *lg = vcpu-lg;
+
/* We stop running once the Guest is dead. */
while (!lg-dead) {
/* First we run any hypercalls the Guest wants done. */
@@ -226,7 +228,7 @@ int run_guest(struct lguest *lg, unsigned long __user *user)
local_irq_disable();
 
/* Actually run the Guest until something happens. */
-   lguest_arch_run_guest(lg);
+   lguest_arch_run_guest(vcpu);
 
/* Now we're ready to be interrupted or moved to other CPUs */
local_irq_enable();
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 9723732..c4a0a97 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -131,7 +131,7 @@ void __lgwrite(struct lguest *, unsigned long, const void 
*, unsigned);
} while(0)
 /* (end of memory access helper routines) :*/
 
-int run_guest(struct lguest *lg, unsigned long __user *user);
+int run_guest(struct lguest_vcpu *vcpu, unsigned long __user *user);
 
 /* Helper macros to obtain the first 12 or the last 20 bits, this is only the
  * first step in the migration to the kernel types.  pte_pfn is already defined
@@ -182,7 +182,7 @@ void page_table_guest_data_init(struct lguest *lg);
 /* arch/core.c: */
 void lguest_arch_host_init(void);
 void lguest_arch_host_fini(void);
-void lguest_arch_run_guest(struct lguest *lg);
+void lguest_arch_run_guest(struct lguest_vcpu *vcpu);
 void lguest_arch_handle_trap(struct lguest *lg);
 int lguest_arch_init_hypercalls(struct lguest *lg);
 int lguest_arch_do_hcall(struct lguest *lg, struct hcall_args *args);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index d1b1c26..894d530 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -55,11 +55,15 @@ static int user_send_irq(struct lguest *lg, const unsigned 
long __user *input)
 static ssize_t read(struct file *file, char __user *user, size_t size,loff_t*o)
 {
struct lguest *lg = file-private_data;
+   struct lguest_vcpu *vcpu = NULL;
+   unsigned int vcpu_id = *o;
 
/* You must write LHREQ_INITIALIZE first! */
if (!lg)
return -EINVAL;
 
+   vcpu = lg-vcpus[vcpu_id];
+
/* If you're not the task which owns the Guest, go away. */
if (current != lg-tsk)
return -EPERM;
@@ -85,7 +89,7 @@ static ssize_t read(struct file *file, char __user *user, 
size_t size,loff_t*o)
lg-pending_notify = 0;
 
/* Run the Guest until something interesting happens. */
-   return run_guest(lg, (unsigned long __user *)user);
+   return run_guest(vcpu, (unsigned long __user *)user);
 }
 
 static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 482aec2..0530ef3 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -73,8 +73,10 @@ static DEFINE_PER_CPU(struct lguest *, last_guest);
  * since it last ran.  We saw this set in interrupts_and_traps.c and
  * segments.c.
  */
-static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
+static void copy_in_guest_info(struct lguest_vcpu *vcpu,
+  struct lguest_pages *pages)
 {
+   struct lguest *lg = vcpu-lg;
/* Copying all this data can be quite expensive.  We usually run the
 * same Guest we ran last time (and that Guest hasn't run anywhere else
 * meanwhile).  If that's not the case, we pretend everything in the
@@ -113,14 +115,16 @@ static void copy_in_guest_info(struct lguest *lg, struct 
lguest_pages *pages)
 }
 
 /* Finally: the code to actually call into the Switcher to run the Guest. */
-static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)
+static void run_guest_once(struct lguest_vcpu *vcpu,
+  struct 

[PATCH 05/16] make write() operation smp aware

2007-12-20 Thread Glauber de Oliveira Costa
This patch makes the write() file operation smp aware. Which means, receiving
the vcpu_id value through the offset parameter, and being well aware to which
vcpu we're talking to.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/lguest_user.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index 894d530..ae5bf4c 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -223,14 +223,21 @@ static ssize_t write(struct file *file, const char __user 
*in,
struct lguest *lg = file-private_data;
const unsigned long __user *input = (const unsigned long __user *)in;
unsigned long req;
+   struct lguest_vcpu *vcpu = NULL;
+   int vcpu_id = *off;
 
if (get_user(req, input) != 0)
return -EFAULT;
input++;
 
/* If you haven't initialized, you must do that first. */
-   if (req != LHREQ_INITIALIZE  !lg)
-   return -EINVAL;
+   if (req != LHREQ_INITIALIZE) {
+   if (!lg)
+   return -EINVAL;
+   vcpu = lg-vcpus[vcpu_id];
+   if (!vcpu)
+   return -EINVAL;
+   }
 
/* Once the Guest is dead, all you can do is read() why it died. */
if (lg  lg-dead)
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/16] make hypercalls use the vcpu struct

2007-12-20 Thread Glauber de Oliveira Costa
this patch changes do_hcall() and do_async_hcall() interfaces (and obviously 
their
callers) to get a vcpu struct. Again, a vcpu services the hypercall, not the 
whole
guest

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/core.c   |6 +++---
 drivers/lguest/hypercalls.c |   42 +++---
 drivers/lguest/lg.h |   16 
 drivers/lguest/x86/core.c   |   16 ++--
 4 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 70fc65e..ef35e02 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -181,8 +181,8 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long 
__user *user)
/* We stop running once the Guest is dead. */
while (!lg-dead) {
/* First we run any hypercalls the Guest wants done. */
-   if (lg-hcall)
-   do_hypercalls(lg);
+   if (vcpu-hcall)
+   do_hypercalls(vcpu);
 
/* It's possible the Guest did a NOTIFY hypercall to the
 * Launcher, in which case we return from the read() now. */
@@ -234,7 +234,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long 
__user *user)
local_irq_enable();
 
/* Now we deal with whatever happened to the Guest. */
-   lguest_arch_handle_trap(lg);
+   lguest_arch_handle_trap(vcpu);
}
 
/* The Guest is dead = No such file or directory */
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index b478aff..62da355 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -29,8 +29,10 @@
 
 /*H:120 This is the core hypercall routine: where the Guest gets what it wants.
  * Or gets killed.  Or, in the case of LHCALL_CRASH, both. */
-static void do_hcall(struct lguest *lg, struct hcall_args *args)
+static void do_hcall(struct lguest_vcpu *vcpu, struct hcall_args *args)
 {
+   struct lguest *lg = vcpu-lg;
+
switch (args-arg0) {
case LHCALL_FLUSH_ASYNC:
/* This call does nothing, except by breaking out of the Guest
@@ -91,7 +93,7 @@ static void do_hcall(struct lguest *lg, struct hcall_args 
*args)
break;
default:
/* It should be an architecture-specific hypercall. */
-   if (lguest_arch_do_hcall(lg, args))
+   if (lguest_arch_do_hcall(vcpu, args))
kill_guest(lg, Bad hypercall %li\n, args-arg0);
}
 }
@@ -104,10 +106,11 @@ static void do_hcall(struct lguest *lg, struct hcall_args 
*args)
  * Guest put them in the ring, but we also promise the Guest that they will
  * happen before any normal hypercall (which is why we check this before
  * checking for a normal hcall). */
-static void do_async_hcalls(struct lguest *lg)
+static void do_async_hcalls(struct lguest_vcpu *vcpu)
 {
unsigned int i;
u8 st[LHCALL_RING_SIZE];
+   struct lguest *lg = vcpu-lg;
 
/* For simplicity, we copy the entire call status array in at once. */
if (copy_from_user(st, lg-lguest_data-hcall_status, sizeof(st)))
@@ -119,7 +122,7 @@ static void do_async_hcalls(struct lguest *lg)
/* We remember where we were up to from last time.  This makes
 * sure that the hypercalls are done in the order the Guest
 * places them in the ring. */
-   unsigned int n = lg-next_hcall;
+   unsigned int n = vcpu-next_hcall;
 
/* 0xFF means there's no call here (yet). */
if (st[n] == 0xFF)
@@ -127,8 +130,8 @@ static void do_async_hcalls(struct lguest *lg)
 
/* OK, we have hypercall.  Increment the next_hcall cursor,
 * and wrap back to 0 if we reach the end. */
-   if (++lg-next_hcall == LHCALL_RING_SIZE)
-   lg-next_hcall = 0;
+   if (++vcpu-next_hcall == LHCALL_RING_SIZE)
+   vcpu-next_hcall = 0;
 
/* Copy the hypercall arguments into a local copy of
 * the hcall_args struct. */
@@ -139,7 +142,7 @@ static void do_async_hcalls(struct lguest *lg)
}
 
/* Do the hypercall, same as a normal one. */
-   do_hcall(lg, args);
+   do_hcall(vcpu, args);
 
/* Mark the hypercall done. */
if (put_user(0xFF, lg-lguest_data-hcall_status[n])) {
@@ -156,16 +159,17 @@ static void do_async_hcalls(struct lguest *lg)
 
 /* Last of all, we look at what happens first of all.  The very first time the
  * Guest makes a hypercall, we end up here to set things up: */
-static void initialize(struct lguest *lg)
+static void initialize(struct lguest_vcpu *vcpu)
 {
+   struct lguest *lg = vcpu-lg;
/* You can't do anything until you're initialized.  The Guest 

[PATCH 07/16] per-vcpu lguest timers

2007-12-20 Thread Glauber de Oliveira Costa
Here, I introduce per-vcpu timers. With this, we can have
local expiries, needed for accounting time in smp guests

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/hypercalls.c   |2 +-
 drivers/lguest/interrupts_and_traps.c |   20 ++--
 drivers/lguest/lg.h   |   10 +-
 drivers/lguest/lguest_user.c  |   12 +++-
 4 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 62da355..4364bc2 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -78,7 +78,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
guest_set_pmd(lg, args-arg1, args-arg2);
break;
case LHCALL_SET_CLOCKEVENT:
-   guest_set_clockevent(lg, args-arg1);
+   guest_set_clockevent(vcpu, args-arg1);
break;
case LHCALL_TS:
/* This sets the TS flag, as we saw used in run_guest(). */
diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index 2b66f79..189d66e 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -470,13 +470,13 @@ void copy_traps(const struct lguest *lg, struct 
desc_struct *idt,
  * infrastructure to set a callback at that time.
  *
  * 0 means turn off the clock. */
-void guest_set_clockevent(struct lguest *lg, unsigned long delta)
+void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta)
 {
ktime_t expires;
 
if (unlikely(delta == 0)) {
/* Clock event device is shutting down. */
-   hrtimer_cancel(lg-hrt);
+   hrtimer_cancel(vcpu-hrt);
return;
}
 
@@ -484,25 +484,25 @@ void guest_set_clockevent(struct lguest *lg, unsigned 
long delta)
 * all the time between now and the timer interrupt it asked for.  This
 * is almost always the right thing to do. */
expires = ktime_add_ns(ktime_get_real(), delta);
-   hrtimer_start(lg-hrt, expires, HRTIMER_MODE_ABS);
+   hrtimer_start(vcpu-hrt, expires, HRTIMER_MODE_ABS);
 }
 
 /* This is the function called when the Guest's timer expires. */
 static enum hrtimer_restart clockdev_fn(struct hrtimer *timer)
 {
-   struct lguest *lg = container_of(timer, struct lguest, hrt);
+   struct lguest_vcpu *vcpu = container_of(timer, struct lguest_vcpu, hrt);
 
/* Remember the first interrupt is the timer interrupt. */
-   set_bit(0, lg-irqs_pending);
+   set_bit(0, vcpu-lg-irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
-   if (lg-halted)
-   wake_up_process(lg-tsk);
+   if (vcpu-lg-halted)
+   wake_up_process(vcpu-lg-tsk);
return HRTIMER_NORESTART;
 }
 
 /* This sets up the timer for this Guest. */
-void init_clockdev(struct lguest *lg)
+void init_clockdev(struct lguest_vcpu *vcpu)
 {
-   hrtimer_init(lg-hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS);
-   lg-hrt.function = clockdev_fn;
+   hrtimer_init(vcpu-hrt, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+   vcpu-hrt.function = clockdev_fn;
 }
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 696cdf1..0205409 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -47,6 +47,9 @@ struct lguest_vcpu {
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
+
+   /* Virtual clock device */
+   struct hrtimer hrt;
 };
 
 /* The private info the thread maintains about the guest. */
@@ -95,9 +98,6 @@ struct lguest
 
struct lguest_arch arch;
 
-   /* Virtual clock device */
-   struct hrtimer hrt;
-
/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
 };
@@ -150,8 +150,8 @@ void setup_default_idt_entries(struct lguest_ro_state 
*state,
   const unsigned long *def);
 void copy_traps(const struct lguest *lg, struct desc_struct *idt,
const unsigned long *def);
-void guest_set_clockevent(struct lguest *lg, unsigned long delta);
-void init_clockdev(struct lguest *lg);
+void guest_set_clockevent(struct lguest_vcpu *vcpu, unsigned long delta);
+void init_clockdev(struct lguest_vcpu *vcpu);
 bool check_syscall_vector(struct lguest *lg);
 int init_interrupts(void);
 void free_interrupts(void);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index ae5bf4c..7481e82 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -97,6 +97,9 @@ static int vcpu_start(struct lguest_vcpu *vcpu, int vcpu_id,
 {
vcpu-vcpu_id = vcpu_id;
 
+   /* The timer for lguest's clock needs initialization. */
+   init_clockdev(vcpu);
+
vcpu-lg = container_of((vcpu - vcpu_id), struct lguest, vcpus[0]);

[PATCH 08/16] per-vcpu interrupt processing.

2007-12-20 Thread Glauber de Oliveira Costa
This patch adapts interrupt processing for using the vcpu struct.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/core.c |2 +-
 drivers/lguest/interrupts_and_traps.c |   25 ++---
 drivers/lguest/lg.h   |   10 +-
 drivers/lguest/lguest_user.c  |7 ---
 drivers/lguest/x86/core.c |2 +-
 5 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index ef35e02..4d0102d 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -203,7 +203,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long 
__user *user)
/* Check if there are any interrupts which can be delivered
 * now: if so, this sets up the hander to be executed when we
 * next run the Guest. */
-   maybe_do_interrupt(lg);
+   maybe_do_interrupt(vcpu);
 
/* All long-lived kernel loops need to check with this horrible
 * thing called the freezer.  If the Host is trying to suspend,
diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index 189d66e..db440cb 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -60,11 +60,13 @@ static void push_guest_stack(struct lguest *lg, unsigned 
long *gstack, u32 val)
  * We set up the stack just like the CPU does for a real interrupt, so it's
  * identical for the Guest (and the standard iret instruction will undo
  * it). */
-static void set_guest_interrupt(struct lguest *lg, u32 lo, u32 hi, int has_err)
+static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 lo, u32 hi,
+   int has_err)
 {
unsigned long gstack, origstack;
u32 eflags, ss, irq_enable;
unsigned long virtstack;
+   struct lguest *lg = vcpu-lg;
 
/* There are two cases for interrupts: one where the Guest is already
 * in the kernel, and a more complex one where the Guest is in
@@ -129,9 +131,10 @@ static void set_guest_interrupt(struct lguest *lg, u32 lo, 
u32 hi, int has_err)
  *
  * maybe_do_interrupt() gets called before every entry to the Guest, to see if
  * we should divert the Guest to running an interrupt handler. */
-void maybe_do_interrupt(struct lguest *lg)
+void maybe_do_interrupt(struct lguest_vcpu *vcpu)
 {
unsigned int irq;
+   struct lguest *lg = vcpu-lg;
DECLARE_BITMAP(blk, LGUEST_IRQS);
struct desc_struct *idt;
 
@@ -145,7 +148,7 @@ void maybe_do_interrupt(struct lguest *lg)
   sizeof(blk)))
return;
 
-   bitmap_andnot(blk, lg-irqs_pending, blk, LGUEST_IRQS);
+   bitmap_andnot(blk, vcpu-irqs_pending, blk, LGUEST_IRQS);
 
/* Find the first interrupt. */
irq = find_first_bit(blk, LGUEST_IRQS);
@@ -180,11 +183,11 @@ void maybe_do_interrupt(struct lguest *lg)
/* If they don't have a handler (yet?), we just ignore it */
if (idt_present(idt-a, idt-b)) {
/* OK, mark it no longer pending and deliver it. */
-   clear_bit(irq, lg-irqs_pending);
+   clear_bit(irq, vcpu-irqs_pending);
/* set_guest_interrupt() takes the interrupt descriptor and a
 * flag to say whether this interrupt pushes an error code onto
 * the stack as well: virtual interrupts never do. */
-   set_guest_interrupt(lg, idt-a, idt-b, 0);
+   set_guest_interrupt(vcpu, idt-a, idt-b, 0);
}
 
/* Every time we deliver an interrupt, we update the timestamp in the
@@ -245,19 +248,19 @@ static int has_err(unsigned int trap)
 }
 
 /* deliver_trap() returns true if it could deliver the trap. */
-int deliver_trap(struct lguest *lg, unsigned int num)
+int deliver_trap(struct lguest_vcpu *vcpu, unsigned int num)
 {
/* Trap numbers are always 8 bit, but we set an impossible trap number
 * for traps inside the Switcher, so check that here. */
-   if (num = ARRAY_SIZE(lg-arch.idt))
+   if (num = ARRAY_SIZE(vcpu-lg-arch.idt))
return 0;
 
/* Early on the Guest hasn't set the IDT entries (or maybe it put a
 * bogus one in): if we fail here, the Guest will be killed. */
-   if (!idt_present(lg-arch.idt[num].a, lg-arch.idt[num].b))
+   if (!idt_present(vcpu-lg-arch.idt[num].a, vcpu-lg-arch.idt[num].b))
return 0;
-   set_guest_interrupt(lg, lg-arch.idt[num].a, lg-arch.idt[num].b,
-   has_err(num));
+   set_guest_interrupt(vcpu, vcpu-lg-arch.idt[num].a,
+   vcpu-lg-arch.idt[num].b, has_err(num));
return 1;
 }
 
@@ -493,7 +496,7 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer 
*timer)
struct lguest_vcpu *vcpu = container_of(timer, struct lguest_vcpu, hrt);
 
   

[PATCH 09/16] map_switcher_in_guest() per-vcpu

2007-12-20 Thread Glauber de Oliveira Costa
The switcher needs to be mapped per-vcpu, because different vcpus
will potentially have different page tables (they don't have to,
because threads will share the same).

So our first step is the make the function receive a vcpu struct

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/lg.h  |3 ++-
 drivers/lguest/page_tables.c |4 +++-
 drivers/lguest/x86/core.c|2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index db2edd6..f6e9020 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -173,7 +173,8 @@ void guest_pagetable_clear_all(struct lguest *lg);
 void guest_pagetable_flush_user(struct lguest *lg);
 void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
   unsigned long vaddr, pte_t val);
-void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages);
+void map_switcher_in_guest(struct lguest_vcpu *vcpu,
+  struct lguest_pages *pages);
 int demand_page(struct lguest *info, unsigned long cr2, int errcode);
 void pin_page(struct lguest *lg, unsigned long vaddr);
 unsigned long guest_pa(struct lguest *lg, unsigned long vaddr);
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index fffabb3..7fb8627 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -634,8 +634,10 @@ void free_guest_pagetable(struct lguest *lg)
  * Guest (and not the pages for other CPUs).  We have the appropriate PTE pages
  * for each CPU already set up, we just need to hook them in now we know which
  * Guest is about to run on this CPU. */
-void map_switcher_in_guest(struct lguest *lg, struct lguest_pages *pages)
+void map_switcher_in_guest(struct lguest_vcpu *vcpu,
+  struct lguest_pages *pages)
 {
+   struct lguest *lg = vcpu-lg;
pte_t *switcher_pte_page = __get_cpu_var(switcher_pte_pages);
pgd_t switcher_pgd;
pte_t regs_pte;
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 3d21c6d..9bf2213 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -92,7 +92,7 @@ static void copy_in_guest_info(struct lguest_vcpu *vcpu,
pages-state.host_cr3 = __pa(current-mm-pgd);
/* Set up the Guest's page tables to see this CPU's pages (and no
 * other CPU's pages). */
-   map_switcher_in_guest(lg, pages);
+   map_switcher_in_guest(vcpu, pages);
/* Set up the two TSS members which tell the CPU what stack to use
 * for traps which do directly into the Guest (ie. traps at privilege
 * level 1). */
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/16] make emulate_insn receive a vcpu struct.

2007-12-20 Thread Glauber de Oliveira Costa
emulate_insn() needs to know about current eip, which will be,
in the future, a per-vcpu thing. So in this patch, the function
prototype is modified to receive a vcpu struct

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/x86/core.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 9bf2213..2fb9cd3 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -220,8 +220,9 @@ void lguest_arch_run_guest(struct lguest_vcpu *vcpu)
  * When the Guest uses one of these instructions, we get a trap (General
  * Protection Fault) and come here.  We see if it's one of those troublesome
  * instructions and skip over it.  We return true if we did. */
-static int emulate_insn(struct lguest *lg)
+static int emulate_insn(struct lguest_vcpu *vcpu)
 {
+   struct lguest *lg = vcpu-lg;
u8 insn;
unsigned int insnlen = 0, in = 0, shift = 0;
/* The eip contains the *virtual* address of the Guest's instruction:
@@ -294,7 +295,7 @@ void lguest_arch_handle_trap(struct lguest_vcpu *vcpu)
 * instructions which we need to emulate.  If so, we just go
 * back into the Guest after we've done it. */
if (lg-regs-errcode == 0) {
-   if (emulate_insn(lg))
+   if (emulate_insn(vcpu))
return;
}
break;
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/16] replace lguest_arch with lguest_vcpu_arch.

2007-12-20 Thread Glauber de Oliveira Costa
The fields found in lguest_arch are not really per-guest,
but per-cpu (gdt, idt, etc). So this patch turns lguest_arch
into lguest_vcpu_arch.

It makes sense to have a per-guest per-arch struct, but this
can be addressed later, when the need arrives.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/interrupts_and_traps.c |   29 +++--
 drivers/lguest/lg.h   |   19 +++---
 drivers/lguest/segments.c |   43 +---
 drivers/lguest/x86/core.c |   24 --
 include/asm-x86/lguest.h  |2 +-
 5 files changed, 60 insertions(+), 57 deletions(-)

diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index 1ceff5f..b3d444a 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -180,7 +180,7 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu)
/* Look at the IDT entry the Guest gave us for this interrupt.  The
 * first 32 (FIRST_EXTERNAL_VECTOR) entries are for traps, so we skip
 * over them. */
-   idt = lg-arch.idt[FIRST_EXTERNAL_VECTOR+irq];
+   idt = vcpu-arch.idt[FIRST_EXTERNAL_VECTOR+irq];
/* If they don't have a handler (yet?), we just ignore it */
if (idt_present(idt-a, idt-b)) {
/* OK, mark it no longer pending and deliver it. */
@@ -253,15 +253,15 @@ int deliver_trap(struct lguest_vcpu *vcpu, unsigned int 
num)
 {
/* Trap numbers are always 8 bit, but we set an impossible trap number
 * for traps inside the Switcher, so check that here. */
-   if (num = ARRAY_SIZE(vcpu-lg-arch.idt))
+   if (num = ARRAY_SIZE(vcpu-arch.idt))
return 0;
 
/* Early on the Guest hasn't set the IDT entries (or maybe it put a
 * bogus one in): if we fail here, the Guest will be killed. */
-   if (!idt_present(vcpu-lg-arch.idt[num].a, vcpu-lg-arch.idt[num].b))
+   if (!idt_present(vcpu-arch.idt[num].a, vcpu-arch.idt[num].b))
return 0;
-   set_guest_interrupt(vcpu, vcpu-lg-arch.idt[num].a,
-   vcpu-lg-arch.idt[num].b, has_err(num));
+   set_guest_interrupt(vcpu, vcpu-arch.idt[num].a,
+   vcpu-arch.idt[num].b, has_err(num));
return 1;
 }
 
@@ -387,7 +387,8 @@ static void set_trap(struct lguest *lg, struct desc_struct 
*trap,
  *
  * We saw the Guest setting Interrupt Descriptor Table (IDT) entries with the
  * LHCALL_LOAD_IDT_ENTRY hypercall before: that comes here. */
-void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)
+void load_guest_idt_entry(struct lguest_vcpu *vcpu,
+ unsigned int num, u32 lo, u32 hi)
 {
/* Guest never handles: NMI, doublefault, spurious interrupt or
 * hypercall.  We ignore when it tries to set them. */
@@ -396,13 +397,13 @@ void load_guest_idt_entry(struct lguest *lg, unsigned int 
num, u32 lo, u32 hi)
 
/* Mark the IDT as changed: next time the Guest runs we'll know we have
 * to copy this again. */
-   lg-changed |= CHANGED_IDT;
+   vcpu-lg-changed |= CHANGED_IDT;
 
/* Check that the Guest doesn't try to step outside the bounds. */
-   if (num = ARRAY_SIZE(lg-arch.idt))
-   kill_guest(lg, Setting idt entry %u, num);
+   if (num = ARRAY_SIZE(vcpu-arch.idt))
+   kill_guest(vcpu-lg, Setting idt entry %u, num);
else
-   set_trap(lg, lg-arch.idt[num], num, lo, hi);
+   set_trap(vcpu-lg, vcpu-arch.idt[num], num, lo, hi);
 }
 
 /* The default entry for each interrupt points into the Switcher routines which
@@ -438,14 +439,14 @@ void setup_default_idt_entries(struct lguest_ro_state 
*state,
 /*H:240 We don't use the IDT entries in the struct lguest directly, instead
  * we copy them into the IDT which we've set up for Guests on this CPU, just
  * before we run the Guest.  This routine does that copy. */
-void copy_traps(const struct lguest *lg, struct desc_struct *idt,
+void copy_traps(const struct lguest_vcpu *vcpu, struct desc_struct *idt,
const unsigned long *def)
 {
unsigned int i;
 
/* We can simply copy the direct traps, otherwise we use the default
 * ones in the Switcher: they will return to the Host. */
-   for (i = 0; i  ARRAY_SIZE(lg-arch.idt); i++) {
+   for (i = 0; i  ARRAY_SIZE(vcpu-arch.idt); i++) {
/* If no Guest can ever override this trap, leave it alone. */
if (!direct_trap(i))
continue;
@@ -454,8 +455,8 @@ void copy_traps(const struct lguest *lg, struct desc_struct 
*idt,
 * Interrupt gates (type 14) disable interrupts as they are
 * entered, which we never let the Guest do.  Not present
 * entries (type 0x0) also can't go direct, of course. */
-  

neigh: timer !nud_in_timer

2007-12-20 Thread John Sigler

Hello,

I noticed the following message in my kernel log.
kernel: neigh: timer  !nud_in_timer
(Might be due to a race condition.)

I'm running a UP Linux version 2.6.22.1-rt9
( http://rt.wiki.kernel.org/index.php )

The following /proc entries might be relevant.

/proc/sys/net/ipv4/conf/all/arp_accept
0
/proc/sys/net/ipv4/conf/all/arp_announce
2
/proc/sys/net/ipv4/conf/all/arp_filter
0
/proc/sys/net/ipv4/conf/all/arp_ignore
1

I also lowered the priority of softirq-timer/0 to 10 which means
it can be interrupted by other IRQ handlers.

Regards.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/16] make registers per-vcpu

2007-12-20 Thread Glauber de Oliveira Costa
This is the most obvious per-vcpu field: registers.

So this patch moves it from struct lguest to struct vcpu,
and patch the places in which they are used, accordingly

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/interrupts_and_traps.c |   29 ---
 drivers/lguest/lg.h   |9 ---
 drivers/lguest/lguest_user.c  |   36 +++---
 drivers/lguest/page_tables.c  |4 ++-
 drivers/lguest/x86/core.c |   39 +
 5 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index db440cb..1ceff5f 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -71,7 +71,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 
lo, u32 hi,
/* There are two cases for interrupts: one where the Guest is already
 * in the kernel, and a more complex one where the Guest is in
 * userspace.  We check the privilege level to find out. */
-   if ((lg-regs-ss0x3) != GUEST_PL) {
+   if ((vcpu-regs-ss0x3) != GUEST_PL) {
/* The Guest told us their kernel stack with the SET_STACK
 * hypercall: both the virtual address and the segment */
virtstack = lg-esp1;
@@ -82,12 +82,12 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, 
u32 lo, u32 hi,
 * stack: when the Guest does an iret back from the interrupt
 * handler the CPU will notice they're dropping privilege
 * levels and expect these here. */
-   push_guest_stack(lg, gstack, lg-regs-ss);
-   push_guest_stack(lg, gstack, lg-regs-esp);
+   push_guest_stack(lg, gstack, vcpu-regs-ss);
+   push_guest_stack(lg, gstack, vcpu-regs-esp);
} else {
/* We're staying on the same Guest (kernel) stack. */
-   virtstack = lg-regs-esp;
-   ss = lg-regs-ss;
+   virtstack = vcpu-regs-esp;
+   ss = vcpu-regs-ss;
 
origstack = gstack = guest_pa(lg, virtstack);
}
@@ -96,7 +96,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 
lo, u32 hi,
 * the Interrupt Flag bit is always set.  We copy that bit from the
 * Guest's irq_enabled field into the eflags word: we saw the Guest
 * copy it back in lguest_iret. */
-   eflags = lg-regs-eflags;
+   eflags = vcpu-regs-eflags;
if (get_user(irq_enable, lg-lguest_data-irq_enabled) == 0
 !(irq_enable  X86_EFLAGS_IF))
eflags = ~X86_EFLAGS_IF;
@@ -105,19 +105,19 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, 
u32 lo, u32 hi,
 * eflags word, the old code segment, and the old instruction
 * pointer. */
push_guest_stack(lg, gstack, eflags);
-   push_guest_stack(lg, gstack, lg-regs-cs);
-   push_guest_stack(lg, gstack, lg-regs-eip);
+   push_guest_stack(lg, gstack, vcpu-regs-cs);
+   push_guest_stack(lg, gstack, vcpu-regs-eip);
 
/* For the six traps which supply an error code, we push that, too. */
if (has_err)
-   push_guest_stack(lg, gstack, lg-regs-errcode);
+   push_guest_stack(lg, gstack, vcpu-regs-errcode);
 
/* Now we've pushed all the old state, we change the stack, the code
 * segment and the address to execute. */
-   lg-regs-ss = ss;
-   lg-regs-esp = virtstack + (gstack - origstack);
-   lg-regs-cs = (__KERNEL_CS|GUEST_PL);
-   lg-regs-eip = idt_address(lo, hi);
+   vcpu-regs-ss = ss;
+   vcpu-regs-esp = virtstack + (gstack - origstack);
+   vcpu-regs-cs = (__KERNEL_CS|GUEST_PL);
+   vcpu-regs-eip = idt_address(lo, hi);
 
/* There are two kinds of interrupt handlers: 0xE is an interrupt
 * gate which expects interrupts to be disabled on entry. */
@@ -158,7 +158,8 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu)
 
/* They may be in the middle of an iret, where they asked us never to
 * deliver interrupts. */
-   if (lg-regs-eip = lg-noirq_start  lg-regs-eip  lg-noirq_end)
+   if ((vcpu-regs-eip = lg-noirq_start) 
+   (vcpu-regs-eip  lg-noirq_end))
return;
 
/* If they're halted, interrupts restart them. */
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index f6e9020..d05fe38 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -44,6 +44,10 @@ struct lguest_vcpu {
int vcpu_id;
struct lguest *lg;
 
+   /* At end of a page shared mapped over lguest_pages in guest.  */
+   unsigned long regs_page;
+   struct lguest_regs *regs;
+
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
@@ -58,9 

[PATCH 14/16] makes special fields be per-vcpu

2007-12-20 Thread Glauber de Oliveira Costa
lguest struct have room for some fields, namely, cr2, ts, esp1
and ss1, that are not really guest-wide, but rather, vcpu-wide.

This patch puts it in the vcpu struct

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/hypercalls.c   |   10 +-
 drivers/lguest/interrupts_and_traps.c |   24 +---
 drivers/lguest/lg.h   |   18 ++
 drivers/lguest/page_tables.c  |   11 ++-
 drivers/lguest/x86/core.c |   10 --
 5 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 41ea2e2..c6b87ef 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -58,7 +58,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
/* FLUSH_TLB comes in two flavors, depending on the
 * argument: */
if (args-arg1)
-   guest_pagetable_clear_all(lg);
+   guest_pagetable_clear_all(vcpu);
else
guest_pagetable_flush_user(lg);
break;
@@ -66,10 +66,10 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
/* All these calls simply pass the arguments through to the right
 * routines. */
case LHCALL_NEW_PGTABLE:
-   guest_new_pagetable(lg, args-arg1);
+   guest_new_pagetable(vcpu, args-arg1);
break;
case LHCALL_SET_STACK:
-   guest_set_stack(lg, args-arg1, args-arg2, args-arg3);
+   guest_set_stack(vcpu, args-arg1, args-arg2, args-arg3);
break;
case LHCALL_SET_PTE:
guest_set_pte(lg, args-arg1, args-arg2, __pte(args-arg3));
@@ -82,7 +82,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
break;
case LHCALL_TS:
/* This sets the TS flag, as we saw used in run_guest(). */
-   lg-ts = args-arg1;
+   vcpu-ts = args-arg1;
break;
case LHCALL_HALT:
/* Similarly, this sets the halted flag for run_guest(). */
@@ -189,7 +189,7 @@ static void initialize(struct lguest_vcpu *vcpu)
 * first write to a Guest page.  This may have caused a copy-on-write
 * fault, but the old page might be (read-only) in the Guest
 * pagetable. */
-   guest_pagetable_clear_all(lg);
+   guest_pagetable_clear_all(vcpu);
 }
 
 /*H:100
diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index 10c9aea..78f6210 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -74,8 +74,8 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 
lo, u32 hi,
if ((vcpu-regs-ss0x3) != GUEST_PL) {
/* The Guest told us their kernel stack with the SET_STACK
 * hypercall: both the virtual address and the segment */
-   virtstack = lg-esp1;
-   ss = lg-ss1;
+   virtstack = vcpu-esp1;
+   ss = vcpu-ss1;
 
origstack = gstack = guest_pa(lg, virtstack);
/* We push the old stack segment and pointer onto the new
@@ -313,10 +313,11 @@ static int direct_trap(unsigned int num)
  * the Guest.
  *
  * Which is deeply unfair, because (literally!) it wasn't the Guests' fault. */
-void pin_stack_pages(struct lguest *lg)
+void pin_stack_pages(struct lguest_vcpu *vcpu)
 {
unsigned int i;
 
+   struct lguest *lg = vcpu-lg;
/* Depending on the CONFIG_4KSTACKS option, the Guest can have one or
 * two pages of stack space. */
for (i = 0; i  lg-stack_pages; i++)
@@ -324,7 +325,7 @@ void pin_stack_pages(struct lguest *lg)
 * start of the page after the kernel stack.  Subtract one to
 * get back onto the first stack page, and keep subtracting to
 * get to the rest of the stack pages. */
-   pin_page(lg, lg-esp1 - 1 - i * PAGE_SIZE);
+   pin_page(lg, vcpu-esp1 - 1 - i * PAGE_SIZE);
 }
 
 /* Direct traps also mean that we need to know whenever the Guest wants to use
@@ -335,21 +336,22 @@ void pin_stack_pages(struct lguest *lg)
  *
  * In Linux each process has its own kernel stack, so this happens a lot: we
  * change stacks on each context switch. */
-void guest_set_stack(struct lguest *lg, u32 seg, u32 esp, unsigned int pages)
+void guest_set_stack(struct lguest_vcpu *vcpu, u32 seg, u32 esp,
+unsigned int pages)
 {
/* You are not allowed have a stack segment with privilege level 0: bad
 * Guest! */
if ((seg  0x3) != GUEST_PL)
-   kill_guest(lg, bad stack segment %i, seg);
+   kill_guest(vcpu-lg, bad stack segment %i, seg);
/* We only expect one or two 

[PATCH 13/16] per-vcpu lguest task management

2007-12-20 Thread Glauber de Oliveira Costa
lguest uses tasks to control its running behaviour (like sending
breaks, controlling halted state, etc). In a per-vcpu environment,
each vcpu will have its own underlying task. So this patch
makes the infrastructure for that possible

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/core.c |4 +-
 drivers/lguest/hypercalls.c   |2 +-
 drivers/lguest/interrupts_and_traps.c |8 ++--
 drivers/lguest/lg.h   |   14 
 drivers/lguest/lguest_user.c  |   56 ++--
 5 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 4d0102d..285a465 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -197,7 +197,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long 
__user *user)
return -ERESTARTSYS;
 
/* If Waker set break_out, return to Launcher. */
-   if (lg-break_out)
+   if (vcpu-break_out)
return -EAGAIN;
 
/* Check if there are any interrupts which can be delivered
@@ -217,7 +217,7 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long 
__user *user)
 
/* If the Guest asked to be stopped, we sleep.  The Guest's
 * clock timer or LHCALL_BREAK from the Waker will wake us. */
-   if (lg-halted) {
+   if (vcpu-halted) {
set_current_state(TASK_INTERRUPTIBLE);
schedule();
continue;
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 4364bc2..41ea2e2 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -86,7 +86,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
break;
case LHCALL_HALT:
/* Similarly, this sets the halted flag for run_guest(). */
-   lg-halted = 1;
+   vcpu-halted = 1;
break;
case LHCALL_NOTIFY:
lg-pending_notify = args-arg1;
diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index b3d444a..10c9aea 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -163,11 +163,11 @@ void maybe_do_interrupt(struct lguest_vcpu *vcpu)
return;
 
/* If they're halted, interrupts restart them. */
-   if (lg-halted) {
+   if (vcpu-halted) {
/* Re-enable interrupts. */
if (put_user(X86_EFLAGS_IF, lg-lguest_data-irq_enabled))
kill_guest(lg, Re-enabling interrupts);
-   lg-halted = 0;
+   vcpu-halted = 0;
} else {
/* Otherwise we check if they have interrupts disabled. */
u32 irq_enabled;
@@ -500,8 +500,8 @@ static enum hrtimer_restart clockdev_fn(struct hrtimer 
*timer)
/* Remember the first interrupt is the timer interrupt. */
set_bit(0, vcpu-irqs_pending);
/* If the Guest is actually stopped, we need to wake it up. */
-   if (vcpu-lg-halted)
-   wake_up_process(vcpu-lg-tsk);
+   if (vcpu-halted)
+   wake_up_process(vcpu-tsk);
return HRTIMER_NORESTART;
 }
 
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index f9429ff..b23694e 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -43,6 +43,8 @@ struct lguest;
 struct lguest_vcpu {
int vcpu_id;
struct lguest *lg;
+   struct task_struct *tsk;
+   struct mm_struct *mm;   /* == tsk-mm, but that becomes NULL on exit */
 
/* At end of a page shared mapped over lguest_pages in guest.  */
unsigned long regs_page;
@@ -55,6 +57,11 @@ struct lguest_vcpu {
/* Virtual clock device */
struct hrtimer hrt;
 
+   /* Do we need to stop what we're doing and return to userspace? */
+   int break_out;
+   wait_queue_head_t break_wq;
+   int halted;
+
/* Pending virtual interrupts */
DECLARE_BITMAP(irqs_pending, LGUEST_IRQS);
 
@@ -65,8 +72,6 @@ struct lguest_vcpu {
 struct lguest
 {
struct lguest_data __user *lguest_data;
-   struct task_struct *tsk;
-   struct mm_struct *mm;   /* == tsk-mm, but that becomes NULL on exit */
struct lguest_vcpu vcpus[NR_CPUS];
unsigned int nr_vcpus;
 
@@ -76,15 +81,10 @@ struct lguest
void __user *mem_base;
unsigned long kernel_address;
u32 cr2;
-   int halted;
int ts;
u32 esp1;
u8 ss1;
 
-   /* Do we need to stop what we're doing and return to userspace? */
-   int break_out;
-   wait_queue_head_t break_wq;
-
/* Bitmap of what has changed: see CHANGED_* above. */
int changed;
struct lguest_pages *last_pages;
diff --git a/drivers/lguest/lguest_user.c 

[PATCH 15/16] make pending notifications per-vcpu

2007-12-20 Thread Glauber de Oliveira Costa
this patch makes the pending_notify field, used to control
pending notifications, per-vcpu, instead of per-guest

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/core.c|6 +++---
 drivers/lguest/hypercalls.c  |6 +++---
 drivers/lguest/lg.h  |3 ++-
 drivers/lguest/lguest_user.c |4 ++--
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
index 285a465..d628515 100644
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -186,10 +186,10 @@ int run_guest(struct lguest_vcpu *vcpu, unsigned long 
__user *user)
 
/* It's possible the Guest did a NOTIFY hypercall to the
 * Launcher, in which case we return from the read() now. */
-   if (lg-pending_notify) {
-   if (put_user(lg-pending_notify, user))
+   if (vcpu-pending_notify) {
+   if (put_user(vcpu-pending_notify, user))
return -EFAULT;
-   return sizeof(lg-pending_notify);
+   return sizeof(vcpu-pending_notify);
}
 
/* Check for signals */
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index c6b87ef..95e1062 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -89,7 +89,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
vcpu-halted = 1;
break;
case LHCALL_NOTIFY:
-   lg-pending_notify = args-arg1;
+   vcpu-pending_notify = args-arg1;
break;
default:
/* It should be an architecture-specific hypercall. */
@@ -152,7 +152,7 @@ static void do_async_hcalls(struct lguest_vcpu *vcpu)
 
/* Stop doing hypercalls if they want to notify the Launcher:
 * it needs to service this first. */
-   if (lg-pending_notify)
+   if (vcpu-pending_notify)
break;
}
 }
@@ -217,7 +217,7 @@ void do_hypercalls(struct lguest_vcpu *vcpu)
/* If we stopped reading the hypercall ring because the Guest did a
 * NOTIFY to the Launcher, we want to return now.  Otherwise we do
 * the hypercall. */
-   if (!vcpu-lg-pending_notify) {
+   if (!vcpu-pending_notify) {
do_hcall(vcpu, vcpu-hcall);
/* Tricky point: we reset the hcall pointer to mark the
 * hypercall as done.  We use the hcall pointer rather than
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index dbf70c6..6faf90d 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -51,6 +51,8 @@ struct lguest_vcpu {
u32 esp1;
u8 ss1;
 
+   unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */
+
/* At end of a page shared mapped over lguest_pages in guest.  */
unsigned long regs_page;
struct lguest_regs *regs;
@@ -95,7 +97,6 @@ struct lguest
struct pgdir pgdirs[4];
 
unsigned long noirq_start, noirq_end;
-   unsigned long pending_notify; /* pfn from LHCALL_NOTIFY */
 
unsigned int stack_pages;
u32 tsc_khz;
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index d081db4..349d69d 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -88,8 +88,8 @@ static ssize_t read(struct file *file, char __user *user, 
size_t size,loff_t*o)
 
/* If we returned from read() last time because the Guest notified,
 * clear the flag. */
-   if (lg-pending_notify)
-   lg-pending_notify = 0;
+   if (vcpu-pending_notify)
+   vcpu-pending_notify = 0;
 
/* Run the Guest until something interesting happens. */
return run_guest(vcpu, (unsigned long __user *)user);
-- 
1.5.0.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/16] per-vcpu lguest pgdir management

2007-12-20 Thread Glauber de Oliveira Costa
this patch makes the pgdir management per-vcpu. The pgdirs pool
is still guest-wide (although it'll probably need to grow when we
are really executing more vcpus), but the pgdidx index is gone,
since it makes no sense anymore. Instead, we use a per-vcpu
index.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
---
 drivers/lguest/hypercalls.c   |2 +-
 drivers/lguest/interrupts_and_traps.c |6 ++--
 drivers/lguest/lg.h   |   12 +++---
 drivers/lguest/page_tables.c  |   60 +
 drivers/lguest/x86/core.c |6 ++--
 5 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 95e1062..f379475 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -60,7 +60,7 @@ static void do_hcall(struct lguest_vcpu *vcpu, struct 
hcall_args *args)
if (args-arg1)
guest_pagetable_clear_all(vcpu);
else
-   guest_pagetable_flush_user(lg);
+   guest_pagetable_flush_user(vcpu);
break;
 
/* All these calls simply pass the arguments through to the right
diff --git a/drivers/lguest/interrupts_and_traps.c 
b/drivers/lguest/interrupts_and_traps.c
index 78f6210..a0ac77e 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -77,7 +77,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 
lo, u32 hi,
virtstack = vcpu-esp1;
ss = vcpu-ss1;
 
-   origstack = gstack = guest_pa(lg, virtstack);
+   origstack = gstack = guest_pa(vcpu, virtstack);
/* We push the old stack segment and pointer onto the new
 * stack: when the Guest does an iret back from the interrupt
 * handler the CPU will notice they're dropping privilege
@@ -89,7 +89,7 @@ static void set_guest_interrupt(struct lguest_vcpu *vcpu, u32 
lo, u32 hi,
virtstack = vcpu-regs-esp;
ss = vcpu-regs-ss;
 
-   origstack = gstack = guest_pa(lg, virtstack);
+   origstack = gstack = guest_pa(vcpu, virtstack);
}
 
/* Remember that we never let the Guest actually disable interrupts, so
@@ -325,7 +325,7 @@ void pin_stack_pages(struct lguest_vcpu *vcpu)
 * start of the page after the kernel stack.  Subtract one to
 * get back onto the first stack page, and keep subtracting to
 * get to the rest of the stack pages. */
-   pin_page(lg, vcpu-esp1 - 1 - i * PAGE_SIZE);
+   pin_page(vcpu, vcpu-esp1 - 1 - i * PAGE_SIZE);
 }
 
 /* Direct traps also mean that we need to know whenever the Guest wants to use
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 6faf90d..e700408 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -57,6 +57,8 @@ struct lguest_vcpu {
unsigned long regs_page;
struct lguest_regs *regs;
 
+   int vcpu_pgd; /* which pgd this vcpu is currently using */
+
/* If a hypercall was asked for, this points to the arguments. */
struct hcall_args *hcall;
u32 next_hcall;
@@ -92,8 +94,6 @@ struct lguest
int changed;
struct lguest_pages *last_pages;
 
-   /* We keep a small number of these. */
-   u32 pgdidx;
struct pgdir pgdirs[4];
 
unsigned long noirq_start, noirq_end;
@@ -175,14 +175,14 @@ void free_guest_pagetable(struct lguest *lg);
 void guest_new_pagetable(struct lguest_vcpu *vcpu, unsigned long pgtable);
 void guest_set_pmd(struct lguest *lg, unsigned long gpgdir, u32 i);
 void guest_pagetable_clear_all(struct lguest_vcpu *vcpu);
-void guest_pagetable_flush_user(struct lguest *lg);
+void guest_pagetable_flush_user(struct lguest_vcpu *vcpu);
 void guest_set_pte(struct lguest *lg, unsigned long gpgdir,
   unsigned long vaddr, pte_t val);
 void map_switcher_in_guest(struct lguest_vcpu *vcpu,
   struct lguest_pages *pages);
-int demand_page(struct lguest *info, unsigned long cr2, int errcode);
-void pin_page(struct lguest *lg, unsigned long vaddr);
-unsigned long guest_pa(struct lguest *lg, unsigned long vaddr);
+int demand_page(struct lguest_vcpu *vcpu, unsigned long cr2, int errcode);
+void pin_page(struct lguest_vcpu *vcpu, unsigned long vaddr);
+unsigned long guest_pa(struct lguest_vcpu *vcpu, unsigned long vaddr);
 void page_table_guest_data_init(struct lguest *lg);
 
 /* arch/core.c: */
diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index f0f271d..84c22d7 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -94,10 +94,10 @@ static pte_t *spte_addr(struct lguest *lg, pgd_t spgd, 
unsigned long vaddr)
 
 /* These two functions just like the above two, except they access the Guest
  * page tables.  

Re: OOPS: 2.6.24-rc5-mm1 -- EIP is at r_show+0x2a/0x70 -- (triggered by cat /proc/iomem AFTER suspend-to-disk/resume)

2007-12-20 Thread Miles Lane
On further investigation, cat /proc/iomem does not trigger the stack 
trace until after a suspend-to-disk/resume cycle has occurred.


I am removing Ingo and Russell from the TO list (as they are apparently 
the wrong people) and adding the suspend folks, as suspend is implicated.
My .config file can be found here:  
http://marc.info/?l=linux-kernelm=119812903001296w=2


Miles Lane wrote:

.config attached in order to not trip spam filters.

Miles Lane wrote:
[  252.868386] BUG: unable to handle kernel NULL pointer dereference 
at virtual address 0018

[  252.868393] printing ip: c012d527 *pde = 
[  252.868399] Oops:  [#1] SMP
[  252.868403] last sysfs file: 
/sys/devices/pci:00/:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda3/stat 

[  252.868407] Modules linked in: aes_i586 aes_generic i915 drm 
rfcomm l2cap bluetooth acpi_cpufreq cpufreq_stats 
cpufreq_conservative sbs sbshc dm_crypt sbp2 parport_pc lp parport 
arc4 ecb crypto_blkcipher cryptomgr crypto_algapi snd_hda_intel 
snd_pcm_oss snd_mixer_oss pcmcia snd_pcm iTCO_wdt iTCO_vendor_support 
snd_seq_dummy watchdog_core watchdog_dev snd_seq_oss snd_seq_midi 
tifm_7xx1 snd_rawmidi iwl3945 snd_seq_midi_event rng_core tifm_core 
mac80211 snd_seq snd_timer snd_seq_device cfg80211 sky2 battery 
yenta_socket rsrc_nonstatic pcmcia_core ac snd soundcore 
snd_page_alloc button shpchp pci_hotplug sr_mod cdrom pata_acpi piix 
ide_core firewire_ohci firewire_core crc_itu_t thermal processor fan

[  252.868469]
[  252.868472] Pid: 7088, comm: head Not tainted (2.6.24-rc5-mm1 #9)
[  252.868476] EIP: 0060:[c012d527] EFLAGS: 00010297 CPU: 0
[  252.868481] EIP is at r_show+0x2a/0x70
[  252.868483] EAX:  EBX: 0001 ECX: c07e3224 EDX: c04bb034
[  252.868486] ESI: 0008 EDI: ed1f52c0 EBP: f5320f10 ESP: f5320f04
[  252.868489]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  252.868493] Process head (pid: 7088, ti=f532 task=f532e000 
task.ti=f532)
[  252.868495] Stack: c03a6cac ed1f52c0 c07e3224 f5320f50 c0199a7e 
2000 bf930807 e1007800
[  252.868504]ed1f52e0   01d3 000e 
 000d 
[  252.868512]fffb f7d39370 c01998e4 f5320f74 c01af4f5 
f5320f9c 2000 bf930807

[  252.868521] Call Trace:
[  252.868523]  [c0107d55] show_trace_log_lvl+0x12/0x25
[  252.868529]  [c0107df2] show_stack_log_lvl+0x8a/0x95
[  252.868534]  [c0107e89] show_registers+0x8c/0x154
[  252.868538]  [c010805f] die+0x10e/0x1d2
[  252.868542]  [c039c8c9] do_page_fault+0x52b/0x600
[  252.868547]  [c039af9a] error_code+0x72/0x78
[  252.868552]  [c0199a7e] seq_read+0x19a/0x26c
[  252.868557]  [c01af4f5] proc_reg_read+0x60/0x74
[  252.868562]  [c018390d] vfs_read+0xa2/0x11e
[  252.868567]  [c0183d02] sys_read+0x3b/0x60
[  252.868571]  [c0106bae] sysenter_past_esp+0x6b/0xc1
[  252.868575]  ===
[  252.868577] Code: c3 55 89 d1 89 e5 57 89 c7 56 53 8b 50 64 83 7a 
0c 00 77 0e 81 7a 08 ff ff 00 00 be 04 00 00 00 76 05 be 08 00 00 00 
89 c8 31 db 8b 40 18 39 d0 74 06 43 83 fb 05 75 f3 8b 41 10 ba 2f 
1b 45 c0

[  252.868623] EIP: [c012d527] r_show+0x2a/0x70 SS:ESP 0068:f5320f04

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable setting of IRQ-thread priorities from kernel cmdline. (repost:CC to LKML)

2007-12-20 Thread Jaswinder Singh
hello Juergen,

On 12/20/07, Juergen Beisert [EMAIL PROTECTED] wrote:
 On Thursday 20 December 2007 13:45, Jaswinder Singh wrote:
  So I am curious, if possible, user can switch softirq-threads or IRQs
  RT tasks to non-RT tasks for slow hardware or least important hardware
  for NON-RT tasks. So this will improve RT behaviour.
 ^^
 Why?

 IMHO: Simply decrease the RT priority of less important IRQs or increase all
 other more important IRQs. IRQs are always more important than other
 processes in a system, also in non RT systems.


In RT Kernel we have :-
RT Tasks and Non-RT tasks.

So we can also support :-
RT softirq-threads/IRQs and non-RT softirq-threads/IRQs

So RT softirq-threads/IRQs for RT Tasks
and non-RT softirq-threads/IRQs for non-RT Tasks.

OR also remove Non-RT tasks from RT Kernel and simply decrease the RT
priority of less important task as per your suggestions.

Thank you,

Jaswinder Singh.

 Juergen
 --
 Dipl.-Ing. Juergen Beisert | http://www.pengutronix.de
  Pengutronix - Linux Solutions for Science and Industry
 Handelsregister: Amtsgericht Hildesheim, HRA 2686
  Vertretung Sued/Muenchen, Germany
Phone: +49-8766-939 228 |  Fax: +49-5121-206917-9

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/5] x86, ptrace: overflow signal API

2007-12-20 Thread Markus Metzger
Establish the user API for sending a user-defined signal to the traced task on 
a BTS buffer overflow.

This should complete the user API for the BTS ptrace extension.
The patches so far implement wrap-around overflow handling as is needed for 
debugging.

The remaining open is another overflow handling mechanism that sends a signal 
to the traced task on a buffer overflow.
This will take some more time from my side.

Since, from a user perspective, this occurs behind the scenes, the patch set 
should already be useful. More features may/will be added on top of it 
(overflow signal, pageable back-up buffers, kernel tracing, core file support, 
profiling, ...).


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
 ---

Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:09.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-20 13:52:14.%N +0100
@@ -88,11 +88,13 @@
unsigned int size;
/* bitmask of below flags */
unsigned int flags;
+   /* buffer overflow signal */
+   unsigned int signal;
 };
 
 #define PTRACE_BTS_O_TRACE 0x1 /* branch trace */
 #define PTRACE_BTS_O_SCHED 0x2 /* scheduling events w/ jiffies */
-#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIG? on buffer overflow
+#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIGsignal on buffer overflow
   instead of wrapping around */
 #define PTRACE_BTS_O_CUT_SIZE  0x8 /* cut requested size to max available
   instead of failing */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] msi: set 'En' bit of MSI Mapping Capability

2007-12-20 Thread Peer Chen
The quirk is for our Intel platform, we don't want HT MSI mapping
enabled in any of our devices.

BRs
Peer Chen

-Original Message-
From: Eric W. Biederman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 19, 2007 5:59 AM
To: peerchen
Cc: linux-kernel; akpm; Andy Currid; Peer Chen
Subject: Re: [PATCH] msi: set 'En' bit of MSI Mapping Capability

peerchen [EMAIL PROTECTED] writes:

 According to the HyperTransport spec, 'En' indicate if the MSI Mapping
is
 active.
 Set the 'En' bit when setup pci and add the quirk for some nvidia
devices. 

 The patch base on kernel 2.6.24-rc5

Ok.  This is starting to look good.

 Signed-off-by: Andy Currid [EMAIL PROTECTED]
 Signed-off-by: Peer Chen [EMAIL PROTECTED]

 ---
 diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff
 linux-2.6.24-rc5-vanilla/drivers/pci/probe.c
 linux-2.6.24-rc5/drivers/pci/probe.c
 --- linux-2.6.24-rc5-vanilla/drivers/pci/probe.c 2007-12-18
14:35:46.0
 -0500
 +++ linux-2.6.24-rc5/drivers/pci/probe.c 2007-12-18 16:28:29.0
-0500
 @@ -721,6 +721,9 @@ static int pci_setup_device(struct pci_d
  
   /* Unknown power state */
   dev-current_state = PCI_UNKNOWN;
 + 
 + /* Enable HT MSI mapping */
 + ht_enable_msi_mapping(dev);
  
   /* Early fixups, before probing the BARs */
   pci_fixup_device(pci_fixup_early, dev);
 diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff
 linux-2.6.24-rc5-vanilla/drivers/pci/quirks.c
 linux-2.6.24-rc5/drivers/pci/quirks.c
 --- linux-2.6.24-rc5-vanilla/drivers/pci/quirks.c 2007-12-18
14:35:46.0
 -0500
 +++ linux-2.6.24-rc5/drivers/pci/quirks.c 2007-12-18
16:28:41.0 -0500
 @@ -1705,6 +1705,45 @@ static void __devinit quirk_nvidia_ck804
  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA,
PCI_DEVICE_ID_NVIDIA_CK804_PCIE,
   quirk_nvidia_ck804_msi_ht_cap);
  
 +static void __devinit quirk_msi_ht_cap_disable(struct pci_dev *dev) {
 + struct pci_dev *host_bridge;
 + int pos, ttl = 48;
 +
 + /* HT MSI mapping should be disabled on devices that are below
 +  * a non-Hypertransport host bridge. Locate the host bridge...
 +  */
 +
 + if ((host_bridge = pci_get_bus_and_slot(0, PCI_DEVFN(0,0))) ==
NULL) {
 + printk(KERN_WARNING
 + PCI: quirk_msi_ht_cap_disable didn't locate host bridge\n);
 + return;
 + }
 +
 + if ((pos = pci_find_ht_capability(host_bridge, HT_CAPTYPE_SLAVE)) !=
0) {
 + /* Host bridge is to HT */
 + return;
 + }
 +
 + /* Host bridge is not to HT, disable HT MSI mapping on this
device */
 +
 + pos = pci_find_ht_capability(dev, HT_CAPTYPE_MSI_MAPPING);
 + while (pos  ttl--) {
 + u8 flags;
 +
 + if (pci_read_config_byte(dev, pos + HT_MSI_FLAGS, flags) == 0) {
 + printk(KERN_INFO PCI: Quirk disabling HT MSI mapping on %s\n,
 +pci_name(dev));
 +
 + pci_write_config_byte(dev, pos + HT_MSI_FLAGS,
 +   flags 
~HT_MSI_FLAGS_ENABLE);
 + }
 + pos = pci_find_next_ht_capability(dev, pos,
 +
HT_CAPTYPE_MSI_MAPPING);
 + }
 +}
 +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
 + quirk_msi_ht_cap_disable);

Could you explain the need for this quirk?

My expectation would be that if we turned an MSI interrupt into a
hypertransport interrupt then if we hit a non-hyptransport bridge
upstream it would just turn the hypertransport interrupts into
whatever makes sense for the upstream bridge.  I can see it being
excess work and adding some latency but I don't see it being a
correctness problem. 

Or are my expectations off and the NVIDIA chipsets do not handle
hypertransport interrupts the way I would expect.  Dropping them
instead of converting when going to a non-hypertransport host bridge.


  static void __devinit quirk_msi_intx_disable_bug(struct pci_dev *dev)
  {
   dev-dev_flags |= PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG;
 diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff
 linux-2.6.24-rc5-vanilla/include/asm-generic/pci.h
 linux-2.6.24-rc5/include/asm-generic/pci.h
 --- linux-2.6.24-rc5-vanilla/include/asm-generic/pci.h 2007-12-18
 14:35:52.0 -0500
 +++ linux-2.6.24-rc5/include/asm-generic/pci.h 2007-12-18
16:29:12.0
 -0500
 @@ -45,6 +45,10 @@ pcibios_select_root(struct pci_dev *pdev
  
  #define pcibios_scan_all_fns(a, b)   0
  
 +#ifndef HAVE_ARCH_HT_ENABLE_MSI_MAPPING
 +#define ht_enable_msi_mapping(a) 0
 +#endif /* HAVE_ARCH_HT_ENABLE_MSI_MAPPING */
 +
  #ifndef HAVE_ARCH_PCI_GET_LEGACY_IDE_IRQ
  static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int
channel)
  {
 diff -uprN -X linux-2.6.24-rc5-vanilla/Documentation/dontdiff
 linux-2.6.24-rc5-vanilla/include/asm-x86/pci.h
 linux-2.6.24-rc5/include/asm-x86/pci.h
 --- linux-2.6.24-rc5-vanilla/include/asm-x86/pci.h 2007-12-18
14:35:51.0
 -0500
 +++ linux-2.6.24-rc5/include/asm-x86/pci.h 2007-12-18

[patch 5/5] x86, ptrace, man: man pages for ptrace BTS extensions

2007-12-20 Thread Markus Metzger
Document changes for this patch set.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-14 17:45:33.%N +0100
+++ man/man2/ptrace.2   2007-12-20 13:20:07.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,131 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   unsigned int size;
+   unsigned int flags;
+   unsigned int signal;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+draining using PTRACE_BTS_GET commands.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_DRAIN
+Reads all available BTS records into the buffer pointed to by
+\fIaddr\fP and clears the buffer. 

Re: 2.6.22.14 oops msg with commvault galaxy ?

2007-12-20 Thread Vincent Fortier
Le vendredi 14 décembre 2007 à 09:28 -0800, Greg KH a écrit :
 On Fri, Dec 14, 2007 at 10:37:39PM +0530, Dhaval Giani wrote:
  On Fri, Dec 14, 2007 at 08:26:42AM -0800, Greg KH wrote:
   On Thu, Dec 13, 2007 at 09:21:26PM +0100, Ingo Molnar wrote:

* Kay Sievers [EMAIL PROTECTED] wrote:

This one also fails to apply properly at the exact same place 
has Ingo's previously posted patch.  Would need to backport his 
one.
   
   It depends on a completely reworked sysfs logic, I don't think it 
   makes any sense to backport that.
  
  well, if it fixes a live bug in a still supported stable kernel 
  release...
  
  Vincent, could you try to just get rid of all actual uses of 
  se-attr.owner, within fs/sysfs/*.c? Something like the patch 
  below. 
  (totally untested - might be fatally broken as well)
 
 How can you think that this is not needed? You can not remove it with 
 sysfs you are patching. Hope this explains it: 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0ab66088c855eca68513bdd7442a426c4b374ced

yeah - as i said it might be fatally broken (in fact it is). Do we 
understand why Vincent got the crashes with vanilla 2.6.22.14 ?
   
   No, and I can't seem to duplicate them here at all.
   
   Does anyone have a test case for this that I can work on trying to
   duplicate?
   
  
  If you apply CFS without my fix, and try to constantly check cpu_shares
  for a user who is logging and logging out, you should hit it. (That's
  what I was doing).
 
 Hm, how about a vanilla 2.6.22.14 kernel _without_ any patches.
 That's what I am most worried about :)

Since I was getting the problem with both vanilla  CFS patched kernels
and that, sadly, I don't have the time to do git bisect at the moment I
decided to go ahead and prepare a full migration to 2.6.23 (I was hoping
to skip directly to 2.6.24 but...).

I can confirm at the moment that 2.6.23 works properly with Galaxy (just
has 2.6.20  2.6.21 used to...).

Thnx very much everyone for the help but sadly this bug will have to
remain unresolved.

 thanks,

- vin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Move page_assign_page_cgroup to VM_BUG_ON in free_hot_cold_page

2007-12-20 Thread Peter Zijlstra

On Thu, 2007-12-20 at 13:14 +, Hugh Dickins wrote:
 On Wed, 19 Dec 2007, Dave Hansen wrote:
   --- 
   linux-2.6.24-rc5/mm/page_alloc.c~memory-controller-move-to-bug-on-in-free_hot_cold_page
2007-12-19 11:31:46.0 +0530
   +++ linux-2.6.24-rc5-balbir/mm/page_alloc.c 2007-12-19 
   11:33:45.0 +0530
   @@ -995,7 +995,7 @@ static void fastcall free_hot_cold_page(

   if (!PageHighMem(page))
   debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
   -   page_assign_page_cgroup(page, NULL);
   +   VM_BUG_ON(page_get_page_cgroup(page));
   arch_free_page(page, 0);
   kernel_map_pages(page, 1, 0); 
  
  Hi Balbir,
  
  You generally want to do these like:
  
  foo = page_assign_page_cgroup(page, NULL);
  VM_BUG_ON(foo);
  
  Some embedded people have been known to optimize kernel size like this:
  
  #define VM_BUG_ON(x) do{}while(0)
 
 Balbir's patch looks fine to me: I don't get your point there, Dave.

There was a lengthy discussion here:
  http://lkml.org/lkml/2007/12/14/131

on the merit of debug statements with side effects. But looking at our
definition:

#ifdef CONFIG_DEBUG_VM
#define VM_BUG_ON(cond) BUG_ON(cond)
#else
#define VM_BUG_ON(condition) do { } while(0)
#endif

disabling CONFIG_DEBUG_VM breaks the code as proposed by Balbir in that
it will no longer acquire the reference.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] sg_ring for scsi

2007-12-20 Thread Boaz Harrosh
On Thu, Dec 20 2007 at 9:58 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
 On Thu, Dec 20 2007, Rusty Russell wrote:
 On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote:
 On Thu, 20 Dec 2007 16:45:18 +1100

 Rusty Russell [EMAIL PROTECTED] wrote:
 OK, some fixes since last time, as I wade through more SCSI drivers. 
 Some drivers use use_sg as a flag to know whether the request_buffer is
 a scatterlist: I don't need the counter, but I still need the flag, so I
 fixed that in a more intuitive way (an explicit -sg pointer in the cmd).
 use_sg and the request_buffer will be removed shortly.

 http://marc.info/?l=linux-scsim=119754650614813w=2
 Thanks!  Is there a git tree somewhere with these changes?

 I think that we tried the similar idea before, scsi_sgtable, but we
 seem to settle in the current simple approach.
 Yes, a scsi-specific solution is a bad idea: other people use sg.  
 Manipulating the magic chains is horrible; it looks simple to the places 
 which simply want to iterate through it, but it's awful for code which wants 
 to create them.
 
 The current code looks like that to minimize impact on 2.6.24, see this
 branch:
 
 http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=sg
 
 for how it folds into lib/sg.c and the magic disappears from SCSI.
 Rusty, nobody claimed the sg code in 2.6.24 is perfect. I like to get
 things incrementally there.
 
Dear Jens.

Is this code scheduled for 2.6.25? I cannot find it in mm tree.

Because above code conflicts with the scsi_data_buffer patch,
that is in mm tree and I hope will get accepted into 2.6.25.
Now the concepts are not at all conflicting, only that they
both bang in the same places.
(And by the way it does not apply on scsi-misc either.
And it did not compile in your tree, a missing bit in 
ide-scsi.c)

I have rebase and fixed your code on top of scsi_data_buffer patchset.
Please review. (Patchset posted as reply to this mail)

They are totaly untested, based on -mm tree.
We should decide the order of these patches and rebase
accordingly.

AND ...
Please send, to-be-included-in-next-kernel patches to Morton. 
This way we can account for them. Also I do not see Ack-by: 
of the scsi maintainer in the scsi bits of your patches.
Is it not a costume to Ack-on bits that belong to other maintainers, 
even for maintainers?

Boaz
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers

2007-12-20 Thread Boaz Harrosh

Manually doing chained sg lists is not trivial, so add some helpers
to make sure that drivers get it right.

Signed-off-by: Jens Axboe [EMAIL PROTECTED]
---
 include/linux/scatterlist.h |  125 ---
 lib/Makefile|2 +-
 lib/scatterlist.c   |  281 +++
 3 files changed, 307 insertions(+), 101 deletions(-)
 create mode 100644 lib/scatterlist.c

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 416e000..c3ca848 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -7,6 +7,12 @@
 #include linux/string.h
 #include asm/io.h
 
+struct sg_table {
+   struct scatterlist *sgl;/* the list */
+   unsigned int nents; /* number of mapped entries */
+   unsigned int orig_nents;/* original size of list */
+};
+
 /*
  * Notes on SG table design.
  *
@@ -106,31 +112,6 @@ static inline void sg_set_buf(struct scatterlist *sg, 
const void *buf,
sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
 }
 
-/**
- * sg_next - return the next scatterlist entry in a list
- * @sg:The current sg entry
- *
- * Description:
- *   Usually the next entry will be @sg@ + 1, but if this sg element is part
- *   of a chained scatterlist, it could jump to the start of a new
- *   scatterlist array.
- *
- **/
-static inline struct scatterlist *sg_next(struct scatterlist *sg)
-{
-#ifdef CONFIG_DEBUG_SG
-   BUG_ON(sg-sg_magic != SG_MAGIC);
-#endif
-   if (sg_is_last(sg))
-   return NULL;
-
-   sg++;
-   if (unlikely(sg_is_chain(sg)))
-   sg = sg_chain_ptr(sg);
-
-   return sg;
-}
-
 /*
  * Loop over each sg element, following the pointer to a new list if necessary
  */
@@ -138,40 +119,6 @@ static inline struct scatterlist *sg_next(struct 
scatterlist *sg)
for (__i = 0, sg = (sglist); __i  (nr); __i++, sg = sg_next(sg))
 
 /**
- * sg_last - return the last scatterlist entry in a list
- * @sgl:   First entry in the scatterlist
- * @nents: Number of entries in the scatterlist
- *
- * Description:
- *   Should only be used casually, it (currently) scan the entire list
- *   to get the last entry.
- *
- *   Note that the @sgl@ pointer passed in need not be the first one,
- *   the important bit is that @nents@ denotes the number of entries that
- *   exist from @[EMAIL PROTECTED]
- *
- **/
-static inline struct scatterlist *sg_last(struct scatterlist *sgl,
- unsigned int nents)
-{
-#ifndef ARCH_HAS_SG_CHAIN
-   struct scatterlist *ret = sgl[nents - 1];
-#else
-   struct scatterlist *sg, *ret = NULL;
-   unsigned int i;
-
-   for_each_sg(sgl, sg, nents, i)
-   ret = sg;
-
-#endif
-#ifdef CONFIG_DEBUG_SG
-   BUG_ON(sgl[0].sg_magic != SG_MAGIC);
-   BUG_ON(!sg_is_last(ret));
-#endif
-   return ret;
-}
-
-/**
  * sg_chain - Chain two sglists together
  * @prv:   First scatterlist
  * @prv_nents: Number of entries in prv
@@ -223,47 +170,6 @@ static inline void sg_mark_end(struct scatterlist *sg)
 }
 
 /**
- * sg_init_table - Initialize SG table
- * @sgl:  The SG table
- * @nents:Number of entries in table
- *
- * Notes:
- *   If this is part of a chained sg table, sg_mark_end() should be
- *   used only on the last table part.
- *
- **/
-static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
-{
-   memset(sgl, 0, sizeof(*sgl) * nents);
-#ifdef CONFIG_DEBUG_SG
-   {
-   unsigned int i;
-   for (i = 0; i  nents; i++)
-   sgl[i].sg_magic = SG_MAGIC;
-   }
-#endif
-   sg_mark_end(sgl[nents - 1]);
-}
-
-/**
- * sg_init_one - Initialize a single entry sg list
- * @sg: SG entry
- * @buf:Virtual address for IO
- * @buflen: IO length
- *
- * Notes:
- *   This should not be used on a single entry that is part of a larger
- *   table. Use sg_init_table() for that.
- *
- **/
-static inline void sg_init_one(struct scatterlist *sg, const void *buf,
-  unsigned int buflen)
-{
-   sg_init_table(sg, 1);
-   sg_set_buf(sg, buf, buflen);
-}
-
-/**
  * sg_phys - Return physical address of an sg entry
  * @sg: SG entry
  *
@@ -293,4 +199,23 @@ static inline void *sg_virt(struct scatterlist *sg)
return page_address(sg_page(sg)) + sg-offset;
 }
 
+struct scatterlist *sg_next(struct scatterlist *);
+struct scatterlist *sg_last(struct scatterlist *s, unsigned int);
+void sg_init_table(struct scatterlist *, unsigned int);
+void sg_init_one(struct scatterlist *, const void *, unsigned int);
+
+typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t);
+typedef void (sg_free_fn)(struct scatterlist *, unsigned int);
+
+void __sg_free_table(struct sg_table *, sg_free_fn *);
+void sg_free_table(struct sg_table *);
+int __sg_alloc_table(struct sg_table *, 

[PATCH 2/3] SG: Convert SCSI to use scatterlist helpers for sg chaining

2007-12-20 Thread Boaz Harrosh
From: Jens Axboe [EMAIL PROTECTED]

Signed-off-by: Jens Axboe [EMAIL PROTECTED]
---
 drivers/scsi/libsrp.c|2 +-
 drivers/scsi/scsi_error.c|4 +-
 drivers/scsi/scsi_lib.c  |  150 +-
 drivers/usb/storage/isd200.c |4 +-
 include/scsi/scsi_cmnd.h |9 +--
 5 files changed, 24 insertions(+), 145 deletions(-)

diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c
index 8a8562a..b81350d 100644
--- a/drivers/scsi/libsrp.c
+++ b/drivers/scsi/libsrp.c
@@ -427,7 +427,7 @@ int srp_cmd_queue(struct Scsi_Host *shost, struct srp_cmd 
*cmd, void *info,
sc-SCp.ptr = info;
memcpy(sc-cmnd, cmd-cdb, MAX_COMMAND_SIZE);
sc-sdb.length = len;
-   sc-sdb.sglist = (void *) (unsigned long) addr;
+   sc-sdb.sgt.sgl = (void *) (unsigned long) addr;
sc-tag = tag;
err = scsi_tgt_queue_command(sc, itn_id, (struct scsi_lun *)cmd-lun,
 cmd-tag);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5c8ba6a..1fd2a8c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -629,9 +629,9 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct 
scsi_eh_save *ses,
   sizeof(scmd-sense_buffer), sense_bytes);
sg_init_one(ses-sense_sgl, scmd-sense_buffer,
  scmd-sdb.length);
-   scmd-sdb.sglist = ses-sense_sgl;
+   scmd-sdb.sgt.sgl = ses-sense_sgl;
scmd-sc_data_direction = DMA_FROM_DEVICE;
-   scmd-sdb.sg_count = 1;
+   scmd-sdb.sgt.nents = 1;
memset(scmd-cmnd, 0, sizeof(scmd-cmnd));
scmd-cmnd[0] = REQUEST_SENSE;
scmd-cmnd[4] = scmd-sdb.length;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a6aae56..c7107f1 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -750,136 +750,15 @@ static inline unsigned int scsi_sgtable_index(unsigned 
short nents)
return index;
 }
 
-static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb,
-   unsigned short sg_count, gfp_t gfp_mask)
+static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
 {
-   struct scsi_host_sg_pool *sgp;
-   struct scatterlist *sgl, *prev, *ret;
-   unsigned int index;
-   int this, left;
-
-   left = sg_count;
-   ret = prev = NULL;
-   do {
-   this = left;
-   if (this  SCSI_MAX_SG_SEGMENTS) {
-   this = SCSI_MAX_SG_SEGMENTS - 1;
-   index = SG_MEMPOOL_NR - 1;
-   } else
-   index = scsi_sgtable_index(this);
-
-   left -= this;
-
-   sgp = scsi_sg_pools + index;
-
-   sgl = mempool_alloc(sgp-pool, gfp_mask);
-   if (unlikely(!sgl))
-   goto enomem;
-
-   sg_init_table(sgl, sgp-size);
-
-   /*
-* first loop through, set initial index and return value
-*/
-   if (!ret)
-   ret = sgl;
-
-   /*
-* chain previous sglist, if any. we know the previous
-* sglist must be the biggest one, or we would not have
-* ended up doing another loop.
-*/
-   if (prev)
-   sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
-
-   /*
-* if we have nothing left, mark the last segment as
-* end-of-list
-*/
-   if (!left)
-   sg_mark_end(sgl[this - 1]);
-
-   /*
-* don't allow subsequent mempool allocs to sleep, it would
-* violate the mempool principle.
-*/
-   gfp_mask = ~__GFP_WAIT;
-   gfp_mask |= __GFP_HIGH;
-   prev = sgl;
-   } while (left);
-
-   /*
-* -use_sg may get modified after dma mapping has potentially
-* shrunk the number of segments, so keep a copy of it for free.
-*/
-   sdb-alloc_sg_count = sdb-sg_count = sg_count;
-   sdb-sglist = ret;
-   return 0;
-enomem:
-   if (ret) {
-   /*
-* Free entries chained off ret. Since we were trying to
-* allocate another sglist, we know that all entries are of
-* the max size.
-*/
-   sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
-   prev = ret;
-   ret = ret[SCSI_MAX_SG_SEGMENTS - 1];
-
-   while ((sgl = sg_chain_ptr(ret)) != NULL) {
-   ret = sgl[SCSI_MAX_SG_SEGMENTS - 1];
-   mempool_free(sgl, sgp-pool);
-   }
-
-   mempool_free(prev, sgp-pool);
-   

[PATCH 3/3] SG: Update ide/ to use sg_table

2007-12-20 Thread Boaz Harrosh
From: Jens Axboe [EMAIL PROTECTED]

Signed-off-by: Jens Axboe [EMAIL PROTECTED]
---
 drivers/ide/arm/icside.c  |6 +++---
 drivers/ide/cris/ide-cris.c   |2 +-
 drivers/ide/ide-dma.c |8 
 drivers/ide/ide-io.c  |2 +-
 drivers/ide/ide-probe.c   |6 +-
 drivers/ide/ide-taskfile.c|2 +-
 drivers/ide/ide.c |2 +-
 drivers/ide/mips/au1xxx-ide.c |8 
 drivers/ide/pci/sgiioc4.c |4 ++--
 drivers/ide/ppc/pmac.c|6 +++---
 drivers/scsi/ide-scsi.c   |2 +-
 include/linux/ide.h   |2 +-
 12 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c
index 93f71fc..a48a6bd 100644
--- a/drivers/ide/arm/icside.c
+++ b/drivers/ide/arm/icside.c
@@ -210,7 +210,7 @@ static void icside_build_sglist(ide_drive_t *drive, struct 
request *rq)
 {
ide_hwif_t *hwif = drive-hwif;
struct icside_state *state = hwif-hwif_data;
-   struct scatterlist *sg = hwif-sg_table;
+   struct scatterlist *sg = hwif-sg_table.sgl;
 
ide_map_sg(drive, rq);
 
@@ -319,7 +319,7 @@ static int icside_dma_end(ide_drive_t *drive)
disable_dma(ECARD_DEV(state-dev)-dma);
 
/* Teardown mappings after DMA has completed. */
-   dma_unmap_sg(state-dev, hwif-sg_table, hwif-sg_nents,
+   dma_unmap_sg(state-dev, hwif-sg_table.sgl, hwif-sg_nents,
 hwif-sg_dma_direction);
 
return get_dma_residue(ECARD_DEV(state-dev)-dma) != 0;
@@ -373,7 +373,7 @@ static int icside_dma_setup(ide_drive_t *drive)
 * Tell the DMA engine about the SG table and
 * data direction.
 */
-   set_dma_sg(ECARD_DEV(state-dev)-dma, hwif-sg_table, hwif-sg_nents);
+   set_dma_sg(ECARD_DEV(state-dev)-dma, hwif-sg_table.sgl, 
hwif-sg_nents);
set_dma_mode(ECARD_DEV(state-dev)-dma, dma_mode);
 
drive-waiting_for_dma = 1;
diff --git a/drivers/ide/cris/ide-cris.c b/drivers/ide/cris/ide-cris.c
index 476e0d6..3701aca 100644
--- a/drivers/ide/cris/ide-cris.c
+++ b/drivers/ide/cris/ide-cris.c
@@ -919,7 +919,7 @@ static int cris_ide_build_dmatable (ide_drive_t *drive)
unsigned int count = 0;
int i = 0;
 
-   sg = hwif-sg_table;
+   sg = hwif-sg_table.sgl;
 
ata_tot_size = 0;
 
diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c
index 4703837..33a2f56 100644
--- a/drivers/ide/ide-dma.c
+++ b/drivers/ide/ide-dma.c
@@ -190,7 +190,7 @@ static int ide_dma_good_drive(ide_drive_t *drive)
 int ide_build_sglist(ide_drive_t *drive, struct request *rq)
 {
ide_hwif_t *hwif = HWIF(drive);
-   struct scatterlist *sg = hwif-sg_table;
+   struct scatterlist *sg = hwif-sg_table.sgl;
 
BUG_ON((rq-cmd_type == REQ_TYPE_ATA_TASKFILE)  rq-nr_sectors  256);
 
@@ -234,7 +234,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request 
*rq)
if (!i)
return 0;
 
-   sg = hwif-sg_table;
+   sg = hwif-sg_table.sgl;
while (i) {
u32 cur_addr;
u32 cur_len;
@@ -293,7 +293,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request 
*rq)
printk(KERN_ERR %s: empty DMA table?\n, drive-name);
 use_pio_instead:
pci_unmap_sg(hwif-pci_dev,
-hwif-sg_table,
+hwif-sg_table.sgl,
 hwif-sg_nents,
 hwif-sg_dma_direction);
return 0; /* revert to PIO for this request */
@@ -315,7 +315,7 @@ EXPORT_SYMBOL_GPL(ide_build_dmatable);
 void ide_destroy_dmatable (ide_drive_t *drive)
 {
struct pci_dev *dev = HWIF(drive)-pci_dev;
-   struct scatterlist *sg = HWIF(drive)-sg_table;
+   struct scatterlist *sg = HWIF(drive)-sg_table.sgl;
int nents = HWIF(drive)-sg_nents;
 
pci_unmap_sg(dev, sg, nents, HWIF(drive)-sg_dma_direction);
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index bef781f..638e2db 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -819,7 +819,7 @@ static ide_startstop_t do_special (ide_drive_t *drive)
 void ide_map_sg(ide_drive_t *drive, struct request *rq)
 {
ide_hwif_t *hwif = drive-hwif;
-   struct scatterlist *sg = hwif-sg_table;
+   struct scatterlist *sg = hwif-sg_table.sgl;
 
if (hwif-sg_mapped)/* needed by ide-scsi */
return;
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 2994523..770f8cf 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -1312,15 +1312,11 @@ static int hwif_init(ide_hwif_t *hwif)
if (!hwif-sg_max_nents)
hwif-sg_max_nents = PRD_ENTRIES;
 
-   hwif-sg_table = kmalloc(sizeof(struct scatterlist)*hwif-sg_max_nents,
-GFP_KERNEL);
-   if (!hwif-sg_table) {
+   if (sg_alloc_table(hwif-sg_table, hwif-sg_max_nents, GFP_KERNEL)) {
printk(KERN_ERR %s: unable to allocate SG 

Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

2007-12-20 Thread Serge E. Hallyn
Quoting Pavel Emelyanov ([EMAIL PROTECTED]):
 Oren Laadan wrote:
  
  Serge E. Hallyn wrote:
  Quoting Pavel Emelyanov ([EMAIL PROTECTED]):
  Oren Laadan wrote:
  Serge E. Hallyn wrote:
  Quoting Oren Laadan ([EMAIL PROTECTED]):
  I hate to bring this again, but what if the admin in the container
  mounts an external file system (eg. nfs, usb, loop mount from a file,
  or via fuse), and that file system already has a device that we would
  like to ban inside that container ?
  Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
  then mnt-mnt_flags |= MNT_NODEV.  So that's no problem.
  Yes, that works to disallow all device files from a mounted file system.
 
  But it's a black and white thing: either they are all banned or allowed;
  you can't have some devices allowed and others not, depending on type
  A scenario where this may be useful is, for instance, if we some apps in
  the container to execute withing a pre-made chroot (sub)tree within that
  container.
 
  But that's been pulled out of -mm! ?  Crap.
 
  Since anyway we will have to keep a white- (or black-) list of devices
  that are permitted in a container, and that list may change even change
  per container -- why not enforce the access control at the VFS layer ?
  It's safer in the long run.
  By that you mean more along the lines of Pavel's patch than my whitelist
  LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean 
  that
  by 'vfs layer' :), or something different entirely?
  :)
 
  By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
  Either yours or Pavel's; I tend to prefer not to use LSM as it may
  collide with future security modules.
  Oren, AFAIS you've seen my patches for device access controller, right?
  
  If you mean this one:
  http://openvz.org/pipermail/devel/2007-September/007647.html
  then ack :)
 
 Great! Thanks.
 
  Maybe we can revisit the issue then and try to come to agreement on what
  kind of model and implementation we all want?
  That would be great, Pavel.  I do prefer your solution over my LSM, so
  if we can get an elegant block device control right in the vfs code that
  would be my preference.
  
  I concur.
  
  So it seems to me that we are all in favor of the model where open()
  of a device will consult a black/white-list. Also, we are all in favor
  of a non-LSM implementation, Pavel's code being a good example.
 
 Thank you, Oren and Serge! I will revisit this issue then, but
 I have a vacation the next week and, after this, we have a New
 Year and Christmas holidays in Russia. So I will be able to go
 on with it only after the 7th January :( Hope this is OK for you.
 
 Besides, Andrew told that he would pay little attention to new
 features till the 2.6.24 release, so I'm afraid we won't have this 
 even in -mm in the nearest months :(
 
 Thanks,
 Pavel

Cool, let me know any way I can help when you get started.

thanks,
-serge
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 00/47] AppArmor security module overview

2007-12-20 Thread John
-- 

This submission of the AppArmor security module is based against 2.6.24-rc4-mm.

Any comments and feedback to improve implementation are appreciated.

Changes since previous submission
- added apparmor security goal document.
  Documentation/lsm/AppArmor-Security-Goal.txt
- removed DAC style permissions in favor of a simpler file owner
  permissions specification
- include the fgetattr and fsetattr patches by Miklos Szeredi
  [EMAIL PROTECTED], and update them to use ATTR_FILE to enable LSMs to
  distinguish file descriptor operations
- fix error where a NULL sock passed to socket_getsocket_getpeersec_dgram()
  was not correctly handled.
- fix error in link permission subset test

Outstanding Issues
- use of d_namespace_path and buffer allocation to obtain a pathname for
  mediation.
- conditional passing of the vfsmnt.  This can be addressed by rebasing
  on the lookup intent patches but that has not been done for this
  submission.
- ipc and signal mediation are a wip and not included.
- fine grained network mediation
- system confinement from boot is a wip and not included.
- documentation needs to be updated to include newest features


The patch series consists of five areas:

 (1) Pass struct vfsmount through to LSM hooks.

 (2) Fixes and improvements to __d_path():

 (a) make it unambiguous and exclude unreachable paths from
 /proc/mounts,

 (b) make its result consistent in the face of remounts,

 (c) introduce d_namespace_path(), a variant of d_path that goes up
 to the namespace root instead of the chroot.

 (d) the behavior of d_path() and getcwd() remain unchanged, and
 there is no hidding of unreachable paths in /proc/mounts.  The
 patches addressing these have been seperated from the AppArmor
 submission and will be introduced at a later date.
 
 Part (a) has been in the -mm tree for a while; this series includes
 an updated copy of the -mm patch. Parts (b) and (c) shouldn't be too
 controversial.

 (3) Be able to distinguish file descriptor access from access by name
 in LSM hooks.

 Applications expect different behavior from file descriptor
 accesses and accesses by name in some cases. We need to pass this
 information down the LSM hooks to allow AppArmor to tell which is
 which.

 (4) Convert the selinux sysctl pathname computation code into a standalone
 function.

 (5) The AppArmor LSM itself.

 (See below.)

A tarball of the kernel patches, base user-space utilities, example
profiles, and technical documentation (including a walk-through) are
available at:

  http://forgeftp.novell.com/apparmor/LKML_Submission-Dec-07/

Only the most recent features are covered in brief here for a more
complete explaination please refere to the technical documentation.


File ownership permissions
  The DAC style permissions mask allowing the specification of permission
  for each of user, group, and other have been removed, after further feed
  back and discussion, in favor of a simpler permission set that allows
  specifying permissions for file ownership as determined by fsuid.

  Traditional AppArmor rules map to specifying permissions for files
  all files, to reduce the permissions grant the owner keyword can
  be added to a rule.

  /foo rw,  # allow access to file /foo
  owner /foo rw,# allow access to file /foo only if its uid == fsuid


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 01/47] Pass struct vfsmount to the inode_create LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |2 +-
 include/linux/security.h |9 ++---
 security/dummy.c |2 +-
 security/security.c  |5 +++--
 security/selinux/hooks.c |3 ++-
 5 files changed, 13 insertions(+), 8 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1586,7 +1586,7 @@ int vfs_create(struct inode *dir, struct
return -EACCES; /* shouldn't it be ENOSYS? */
mode = S_IALLUGO;
mode |= S_IFREG;
-   error = security_inode_create(dir, dentry, mode);
+   error = security_inode_create(dir, dentry, nd ? nd-path.mnt : NULL, 
mode);
if (error)
return error;
DQUOT_INIT(dir);
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -314,6 +314,7 @@ struct request_sock;
  * Check permission to create a regular file.
  * @dir contains inode structure of the parent of the new file.
  * @dentry contains the dentry structure for the file to be created.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * @mode contains the file mode of the file to be created.
  * Return 0 if permission is granted.
  * @inode_link:
@@ -1272,8 +1273,8 @@ struct security_operations {
void (*inode_free_security) (struct inode *inode);
int (*inode_init_security) (struct inode *inode, struct inode *dir,
char **name, void **value, size_t *len);
-   int (*inode_create) (struct inode *dir,
-struct dentry *dentry, int mode);
+   int (*inode_create) (struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, int mode);
int (*inode_link) (struct dentry *old_dentry,
   struct inode *dir, struct dentry *new_dentry);
int (*inode_unlink) (struct inode *dir, struct dentry *dentry);
@@ -1536,7 +1537,8 @@ int security_inode_alloc(struct inode *i
 void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
  char **name, void **value, size_t *len);
-int security_inode_create(struct inode *dir, struct dentry *dentry, int mode);
+int security_inode_create(struct inode *dir, struct dentry *dentry,
+ struct vfsmount *mnt, int mode);
 int security_inode_link(struct dentry *old_dentry, struct inode *dir,
 struct dentry *new_dentry);
 int security_inode_unlink(struct inode *dir, struct dentry *dentry);
@@ -1847,6 +1849,7 @@ static inline int security_inode_init_se

 static inline int security_inode_create (struct inode *dir,
 struct dentry *dentry,
+struct vfsmount *mnt,
 int mode)
 {
return 0;
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -290,7 +290,7 @@ static int dummy_inode_init_security (st
 }
 
 static int dummy_inode_create (struct inode *inode, struct dentry *dentry,
-  int mask)
+  struct vfsmount *mnt, int mask)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -348,11 +348,12 @@ int security_inode_init_security(struct 
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
-int security_inode_create(struct inode *dir, struct dentry *dentry, int mode)
+int security_inode_create(struct inode *dir, struct dentry *dentry,
+ struct vfsmount *mnt, int mode)
 {
if (unlikely(IS_PRIVATE(dir)))
return 0;
-   return security_ops-inode_create(dir, dentry, mode);
+   return security_ops-inode_create(dir, dentry, mnt, mode);
 }
 
 int security_inode_link(struct dentry *old_dentry, struct inode *dir,
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2411,7 +2411,8 @@ static int selinux_inode_init_security(s
return 0;
 }
 
-static int selinux_inode_create(struct inode *dir, struct dentry *dentry, int 
mask)
+static int selinux_inode_create(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, int mask)
 {
return may_create(dir, dentry, SECCLASS_FILE);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 02/47] Pass struct path down to remove_suid and children

2007-12-20 Thread John
Required by a later patch that adds a struct vfsmount parameter to
notify_change().

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---

 fs/ntfs/file.c |2 +-
 fs/reiser4/plugin/file/cryptcompress.c |2 +-
 fs/reiser4/plugin/file/file.c  |2 +-
 fs/splice.c|4 ++--
 fs/xfs/linux-2.6/xfs_lrw.c |2 +-
 include/linux/fs.h |4 ++--
 mm/filemap.c   |   16 
 mm/filemap_xip.c   |2 +-
 8 files changed, 17 insertions(+), 17 deletions(-)

--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2118,7 +2118,7 @@ static ssize_t ntfs_file_aio_write_noloc
goto out;
if (!count)
goto out;
-   err = remove_suid(file-f_path.dentry);
+   err = remove_suid(file-f_path);
if (err)
goto out;
file_update_time(file);
--- a/fs/reiser4/plugin/file/cryptcompress.c
+++ b/fs/reiser4/plugin/file/cryptcompress.c
@@ -2828,7 +2828,7 @@ ssize_t write_cryptcompress(struct file 
goto out;
if (unlikely(count == 0))
goto out;
-   result = remove_suid(file-f_dentry);
+   result = remove_suid(file-f_path);
if (unlikely(result != 0))
goto out;
/* remove_suid might create a transaction */
--- a/fs/reiser4/plugin/file/file.c
+++ b/fs/reiser4/plugin/file/file.c
@@ -2124,7 +2124,7 @@ ssize_t write_unix_file(struct file *fil
return result;
}
 
-   result = remove_suid(file-f_dentry);
+   result = remove_suid(file-f_path);
if (result) {
mutex_unlock(inode-i_mutex);
context_set_commit_async(ctx);
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -781,7 +781,7 @@ generic_file_splice_write_nolock(struct 
ssize_t ret;
int err;
 
-   err = remove_suid(out-f_path.dentry);
+   err = remove_suid(out-f_path);
if (unlikely(err))
return err;
 
@@ -841,7 +841,7 @@ generic_file_splice_write(struct pipe_in
if (killpriv)
err = security_inode_killpriv(out-f_path.dentry);
if (!err  killsuid)
-   err = __remove_suid(out-f_path.dentry, killsuid);
+   err = __remove_suid(out-f_path, killsuid);
mutex_unlock(inode-i_mutex);
if (err)
return err;
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -723,7 +723,7 @@ start:
 !capable(CAP_FSETID)) {
error = xfs_write_clear_setuid(xip);
if (likely(!error))
-   error = -remove_suid(file-f_path.dentry);
+   error = -remove_suid(file-f_path);
if (unlikely(error)) {
goto out_unlock_internal;
}
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1757,9 +1757,9 @@ extern void iget_failed(struct inode *);
 extern void clear_inode(struct inode *);
 extern void destroy_inode(struct inode *);
 extern struct inode *new_inode(struct super_block *);
-extern int __remove_suid(struct dentry *, int);
+extern int __remove_suid(struct path *, int);
 extern int should_remove_suid(struct dentry *);
-extern int remove_suid(struct dentry *);
+extern int remove_suid(struct path *);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 extern void remove_inode_hash(struct inode *);
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1638,26 +1638,26 @@ int should_remove_suid(struct dentry *de
 }
 EXPORT_SYMBOL(should_remove_suid);
 
-int __remove_suid(struct dentry *dentry, int kill)
+int __remove_suid(struct path *path, int kill)
 {
struct iattr newattrs;
 
newattrs.ia_valid = ATTR_FORCE | kill;
-   return notify_change(dentry, newattrs);
+   return notify_change(path-dentry, newattrs);
 }
 
-int remove_suid(struct dentry *dentry)
+int remove_suid(struct path *path)
 {
-   int killsuid = should_remove_suid(dentry);
-   int killpriv = security_inode_need_killpriv(dentry);
+   int killsuid = should_remove_suid(path-dentry);
+   int killpriv = security_inode_need_killpriv(path-dentry);
int error = 0;
 
if (killpriv  0)
return killpriv;
if (killpriv)
-   error = security_inode_killpriv(dentry);
+   error = security_inode_killpriv(path-dentry);
if (!error  killsuid)
-   error = __remove_suid(dentry, killsuid);
+   error = __remove_suid(path, killsuid);
 
return error;
 }
@@ -2370,7 +2370,7 @@ __generic_file_aio_write_nolock(struct k
if (count == 0)
goto out;
 
-   err = remove_suid(file-f_path.dentry);
+   err = remove_suid(file-f_path);

[AppArmor 03/47] Add a vfsmount parameter to notify_change()

2007-12-20 Thread John
The vfsmount parameter must be set appropriately for files visibile
outside the kernel. Files that are only used in a filesystem (e.g.,
reiserfs xattr files) will have a NULL vfsmount.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/attr.c   |3 ++-
 fs/ecryptfs/inode.c |4 +++-
 fs/exec.c   |3 ++-
 fs/fat/file.c   |2 +-
 fs/hpfs/namei.c |2 +-
 fs/namei.c  |2 +-
 fs/nfsd/vfs.c   |8 
 fs/open.c   |   29 -
 fs/reiserfs/xattr.c |6 +++---
 fs/sysfs/file.c |2 +-
 fs/unionfs/copyup.c |   22 +++---
 fs/unionfs/inode.c  |   23 ---
 fs/unionfs/rename.c |3 ++-
 fs/unionfs/subr.c   |3 ++-
 fs/unionfs/union.h  |3 ++-
 fs/utimes.c |   19 +--
 include/linux/fs.h  |6 +++---
 mm/filemap.c|2 +-
 mm/tiny-shmem.c |2 +-
 19 files changed, 85 insertions(+), 59 deletions(-)

--- a/fs/attr.c
+++ b/fs/attr.c
@@ -100,7 +100,8 @@ int inode_setattr(struct inode * inode, 
 }
 EXPORT_SYMBOL(inode_setattr);
 
-int notify_change(struct dentry * dentry, struct iattr * attr)
+int notify_change(struct dentry *dentry, struct vfsmount *mnt,
+ struct iattr *attr)
 {
struct inode *inode = dentry-d_inode;
mode_t mode = inode-i_mode;
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -860,6 +860,7 @@ static int ecryptfs_setattr(struct dentr
 {
int rc = 0;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct inode *inode;
struct inode *lower_inode;
struct ecryptfs_crypt_stat *crypt_stat;
@@ -870,6 +871,7 @@ static int ecryptfs_setattr(struct dentr
inode = dentry-d_inode;
lower_inode = ecryptfs_inode_to_lower(inode);
lower_dentry = ecryptfs_dentry_to_lower(dentry);
+   lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
mutex_lock(crypt_stat-cs_mutex);
if (S_ISDIR(dentry-d_inode-i_mode))
crypt_stat-flags = ~(ECRYPTFS_ENCRYPTED);
@@ -920,7 +922,7 @@ static int ecryptfs_setattr(struct dentr
if (ia-ia_valid  (ATTR_KILL_SUID | ATTR_KILL_SGID))
ia-ia_valid = ~ATTR_MODE;
 
-   rc = notify_change(lower_dentry, ia);
+   rc = notify_change(lower_dentry, lower_mnt, ia);
 out:
fsstack_copy_attr_all(inode, lower_inode);
return rc;
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1786,7 +1786,8 @@ int do_coredump(long signr, int exit_cod
goto close_fail;
if (!file-f_op-write)
goto close_fail;
-   if (!ispipe  do_truncate(file-f_path.dentry, 0, 0, file) != 0)
+   if (!ispipe 
+   do_truncate(file-f_path.dentry, file-f_path.mnt, 0, 0, file) != 0)
goto close_fail;
 
retval = binfmt-core_dump(signr, regs, file, core_limit);
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -92,7 +92,7 @@ int fat_generic_ioctl(struct inode *inod
}
 
/* This MUST be done before doing anything irreversible... */
-   err = notify_change(filp-f_path.dentry, ia);
+   err = notify_change(filp-f_path.dentry, filp-f_path.mnt, ia);
if (err)
goto up;
 
--- a/fs/hpfs/namei.c
+++ b/fs/hpfs/namei.c
@@ -426,7 +426,7 @@ again:
/*printk(HPFS: truncating file before delete.\n);*/
newattrs.ia_size = 0;
newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
-   err = notify_change(dentry, newattrs);
+   err = notify_change(dentry, NULL, newattrs);
put_write_access(inode);
if (!err)
goto again;
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1663,7 +1663,7 @@ int may_open(struct nameidata *nd, int a
if (!error) {
DQUOT_INIT(inode);
 
-   error = do_truncate(dentry, 0,
+   error = do_truncate(dentry, nd-path.mnt, 0,
ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
NULL);
}
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -388,7 +388,7 @@ nfsd_setattr(struct svc_rqst *rqstp, str
err = nfserr_notsync;
if (!check_guard || guardtime == inode-i_ctime.tv_sec) {
fh_lock(fhp);
-   host_err = notify_change(dentry, iap);
+   host_err = notify_change(dentry, fhp-fh_export-ex_path.mnt, 
iap);
err = nfserrno(host_err);
fh_unlock(fhp);
}
@@ -944,13 +944,13 @@ out:
return err;
 }
 
-static void kill_suid(struct dentry *dentry)
+static void kill_suid(struct dentry *dentry, struct vfsmount *mnt)
 {
struct iattria;

[AppArmor 05/47] Add struct vfsmount parameter to vfs_mkdir()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c   |5 -
 fs/namei.c|5 +++--
 fs/nfsd/nfs4recover.c |3 ++-
 fs/nfsd/vfs.c |9 ++---
 fs/unionfs/copyup.c   |5 -
 fs/unionfs/inode.c|3 ++-
 fs/unionfs/sioq.c |2 +-
 fs/unionfs/sioq.h |1 +
 include/linux/fs.h|2 +-
 9 files changed, 24 insertions(+), 11 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -499,11 +499,14 @@ static int ecryptfs_mkdir(struct inode *
 {
int rc;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct dentry *lower_dir_dentry;
 
lower_dentry = ecryptfs_dentry_to_lower(dentry);
+   lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
lower_dir_dentry = lock_parent(lower_dentry);
-   rc = vfs_mkdir(lower_dir_dentry-d_inode, lower_dentry, mode);
+   rc = vfs_mkdir(lower_dir_dentry-d_inode, lower_dentry, lower_mnt,
+  mode);
if (rc || !lower_dentry-d_inode)
goto out;
rc = ecryptfs_interpose(lower_dentry, dentry, dir-i_sb, 0);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2107,7 +2107,8 @@ asmlinkage long sys_mknod(const char __u
return sys_mknodat(AT_FDCWD, filename, mode, dev);
 }
 
-int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+int vfs_mkdir(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt,
+ int mode)
 {
int error = may_create(dir, dentry, NULL);
 
@@ -2154,7 +2155,7 @@ asmlinkage long sys_mkdirat(int dfd, con
error = mnt_want_write(nd.path.mnt);
if (error)
goto out_dput;
-   error = vfs_mkdir(nd.path.dentry-d_inode, dentry, mode);
+   error = vfs_mkdir(nd.path.dentry-d_inode, dentry, nd.path.mnt, mode);
mnt_drop_write(nd.path.mnt);
 out_dput:
dput(dentry);
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -158,7 +158,8 @@ nfsd4_create_clid_dir(struct nfs4_client
status = mnt_want_write(rec_dir.path.mnt);
if (status)
goto out_put;
-   status = vfs_mkdir(rec_dir.path.dentry-d_inode, dentry, S_IRWXU);
+   status = vfs_mkdir(rec_dir.path.dentry-d_inode, dentry,
+  rec_dir.path.mnt, S_IRWXU);
mnt_drop_write(rec_dir.path.mnt);
 out_put:
dput(dentry);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1166,6 +1166,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
int type, dev_t rdev, struct svc_fh *resfhp)
 {
struct dentry   *dentry, *dchild = NULL;
+   struct svc_export *exp;
struct inode*dirp;
__be32  err;
int host_err;
@@ -1182,6 +1183,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
goto out;
 
dentry = fhp-fh_dentry;
+   exp = fhp-fh_export;
dirp = dentry-d_inode;
 
err = nfserr_notdir;
@@ -1198,7 +1200,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
host_err = PTR_ERR(dchild);
if (IS_ERR(dchild))
goto out_nfserr;
-   err = fh_compose(resfhp, fhp-fh_export, dchild, fhp);
+   err = fh_compose(resfhp, exp, dchild, fhp);
if (err)
goto out;
} else {
@@ -1237,7 +1239,8 @@ nfsd_create(struct svc_rqst *rqstp, stru
host_err = vfs_create(dirp, dchild, iap-ia_mode, NULL);
break;
case S_IFDIR:
-   host_err = vfs_mkdir(dirp, dchild, iap-ia_mode);
+   host_err = vfs_mkdir(dirp, dchild, exp-ex_path.mnt,
+iap-ia_mode);
break;
case S_IFCHR:
case S_IFBLK:
@@ -1256,7 +1259,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
if (host_err  0)
goto out_nfserr;
 
-   if (EX_ISSYNC(fhp-fh_export)) {
+   if (EX_ISSYNC(exp)) {
err = nfserrno(nfsd_sync_dir(dentry));
write_inode_now(dchild-d_inode, 1);
}
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -165,6 +165,7 @@ out:
  */
 static int __copyup_ndentry(struct dentry *old_lower_dentry,
struct dentry *new_lower_dentry,
+   struct vfsmount *new_lower_mnt,
struct dentry *new_lower_parent_dentry,
char *symbuf)
 {
@@ -175,6 +176,7 @@ static int __copyup_ndentry(struct dentr
if (S_ISDIR(old_mode)) {
args.mkdir.parent = new_lower_parent_dentry-d_inode;
args.mkdir.dentry = new_lower_dentry;
+   args.mkdir.mnt = new_lower_mnt;
args.mkdir.mode = old_mode;
 
run_sioq(__unionfs_mkdir, args);

[AppArmor 06/47] Pass struct vfsmount to the inode_mkdir LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |2 +-
 include/linux/security.h |8 ++--
 security/dummy.c |2 +-
 security/security.c  |5 +++--
 security/selinux/hooks.c |3 ++-
 5 files changed, 13 insertions(+), 7 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2119,7 +2119,7 @@ int vfs_mkdir(struct inode *dir, struct 
return -EPERM;
 
mode = (S_IRWXUGO|S_ISVTX);
-   error = security_inode_mkdir(dir, dentry, mode);
+   error = security_inode_mkdir(dir, dentry, mnt, mode);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -339,6 +339,7 @@ struct request_sock;
  * associated with inode strcture @dir. 
  * @dir containst the inode structure of parent of the directory to be 
created.
  * @dentry contains the dentry structure of new directory.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * @mode contains the mode of new directory.
  * Return 0 if permission is granted.
  * @inode_rmdir:
@@ -1281,7 +1282,8 @@ struct security_operations {
int (*inode_unlink) (struct inode *dir, struct dentry *dentry);
int (*inode_symlink) (struct inode *dir,
  struct dentry *dentry, const char *old_name);
-   int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, int mode);
+   int (*inode_mkdir) (struct inode *dir, struct dentry *dentry,
+   struct vfsmount *mnt, int mode);
int (*inode_rmdir) (struct inode *dir, struct dentry *dentry);
int (*inode_mknod) (struct inode *dir, struct dentry *dentry,
int mode, dev_t dev);
@@ -1546,7 +1548,8 @@ int security_inode_link(struct dentry *o
 int security_inode_unlink(struct inode *dir, struct dentry *dentry);
 int security_inode_symlink(struct inode *dir, struct dentry *dentry,
const char *old_name);
-int security_inode_mkdir(struct inode *dir, struct dentry *dentry, int mode);
+int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, int mode);
 int security_inode_rmdir(struct inode *dir, struct dentry *dentry);
 int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, 
dev_t dev);
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
@@ -1880,6 +1883,7 @@ static inline int security_inode_symlink
 
 static inline int security_inode_mkdir (struct inode *dir,
struct dentry *dentry,
+   struct vfsmount *mnt,
int mode)
 {
return 0;
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -313,7 +313,7 @@ static int dummy_inode_symlink (struct i
 }
 
 static int dummy_inode_mkdir (struct inode *inode, struct dentry *dentry,
- int mask)
+ struct vfsmount *mnt, int mask)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -379,11 +379,12 @@ int security_inode_symlink(struct inode 
return security_ops-inode_symlink(dir, dentry, old_name);
 }
 
-int security_inode_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, int mode)
 {
if (unlikely(IS_PRIVATE(dir)))
return 0;
-   return security_ops-inode_mkdir(dir, dentry, mode);
+   return security_ops-inode_mkdir(dir, dentry, mnt, mode);
 }
 
 int security_inode_rmdir(struct inode *dir, struct dentry *dentry)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2442,7 +2442,8 @@ static int selinux_inode_symlink(struct 
return may_create(dir, dentry, SECCLASS_LNK_FILE);
 }
 
-static int selinux_inode_mkdir(struct inode *dir, struct dentry *dentry, int 
mask)
+static int selinux_inode_mkdir(struct inode *dir, struct dentry *dentry,
+  struct vfsmount *mnt, int mask)
 {
return may_create(dir, dentry, SECCLASS_DIR);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 07/47] Add a struct vfsmount parameter to vfs_mknod()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c |5 -
 fs/namei.c  |   10 ++
 fs/nfsd/vfs.c   |3 ++-
 fs/unionfs/copyup.c |1 +
 fs/unionfs/inode.c  |3 ++-
 fs/unionfs/sioq.c   |2 +-
 fs/unionfs/sioq.h   |1 +
 include/linux/fs.h  |2 +-
 net/unix/af_unix.c  |3 ++-
 9 files changed, 20 insertions(+), 10 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -550,11 +550,14 @@ ecryptfs_mknod(struct inode *dir, struct
 {
int rc;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct dentry *lower_dir_dentry;
 
lower_dentry = ecryptfs_dentry_to_lower(dentry);
+   lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
lower_dir_dentry = lock_parent(lower_dentry);
-   rc = vfs_mknod(lower_dir_dentry-d_inode, lower_dentry, mode, dev);
+   rc = vfs_mknod(lower_dir_dentry-d_inode, lower_dentry, lower_mnt, mode,
+  dev);
if (rc || !lower_dentry-d_inode)
goto out;
rc = ecryptfs_interpose(lower_dentry, dentry, dir-i_sb, 0);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2005,7 +2005,8 @@ fail:
 }
 EXPORT_SYMBOL_GPL(lookup_create);
 
-int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
+int vfs_mknod(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt,
+ int mode, dev_t dev)
 {
int error = may_create(dir, dentry, NULL);
 
@@ -2082,12 +2083,13 @@ asmlinkage long sys_mknodat(int dfd, con
mode, nd);
break;
case S_IFCHR: case S_IFBLK:
-   error = vfs_mknod(nd.path.dentry-d_inode, dentry, mode,
-   new_decode_dev(dev));
+   error = vfs_mknod(nd.path.dentry-d_inode, dentry,
+ nd.path.mnt, mode,
+ new_decode_dev(dev));
break;
case S_IFIFO: case S_IFSOCK:
error = vfs_mknod(nd.path.dentry-d_inode, dentry,
-   mode, 0);
+ nd.path.mnt, mode, 0);
break;
}
mnt_drop_write(nd.path.mnt);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1249,7 +1249,8 @@ nfsd_create(struct svc_rqst *rqstp, stru
host_err = mnt_want_write(fhp-fh_export-ex_path.mnt);
if (host_err)
break;
-   host_err = vfs_mknod(dirp, dchild, iap-ia_mode, rdev);
+   host_err = vfs_mknod(dirp, dchild, exp-ex_path.mnt,
+iap-ia_mode, rdev);
mnt_drop_write(fhp-fh_export-ex_path.mnt);
break;
default:
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -193,6 +193,7 @@ static int __copyup_ndentry(struct dentr
   S_ISFIFO(old_mode) || S_ISSOCK(old_mode)) {
args.mknod.parent = new_lower_parent_dentry-d_inode;
args.mknod.dentry = new_lower_dentry;
+   args.mknod.mnt = new_lower_mnt;
args.mknod.mode = old_mode;
args.mknod.dev = old_lower_dentry-d_inode-i_rdev;
 
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -760,6 +760,7 @@ static int unionfs_mknod(struct inode *d
continue;
 
lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   lower_mnt = unionfs_lower_mnt_idx(dentry, bindex);
if (!lower_dentry) {
lower_dentry = create_parents(dir, dentry,
  dentry-d_name.name,
@@ -779,7 +780,7 @@ static int unionfs_mknod(struct inode *d
}
 
err = vfs_mknod(lower_parent_dentry-d_inode,
-   lower_dentry, mode, dev);
+   lower_dentry, lower_mnt, mode, dev);
 
if (err) {
unlock_dir(lower_parent_dentry);
--- a/fs/unionfs/sioq.c
+++ b/fs/unionfs/sioq.c
@@ -78,7 +78,7 @@ void __unionfs_mknod(struct work_struct 
struct sioq_args *args = container_of(work, struct sioq_args, work);
struct mknod_args *m = args-mknod;
 
-   args-err = vfs_mknod(m-parent, m-dentry, m-mode, m-dev);
+   args-err = vfs_mknod(m-parent, m-dentry, m-mnt, m-mode, m-dev);
complete(args-comp);
 }
 
--- a/fs/unionfs/sioq.h
+++ b/fs/unionfs/sioq.h
@@ -42,6 +42,7 @@ struct mkdir_args {
 struct mknod_args {
struct inode *parent;
struct dentry *dentry;
+   struct vfsmount *mnt;
umode_t mode;

[AppArmor 08/47] Pass struct vfsmount to the inode_mknod LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |2 +-
 include/linux/security.h |7 +--
 security/dummy.c |2 +-
 security/security.c  |5 +++--
 security/selinux/hooks.c |5 +++--
 5 files changed, 13 insertions(+), 8 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2019,7 +2019,7 @@ int vfs_mknod(struct inode *dir, struct 
if (!dir-i_op || !dir-i_op-mknod)
return -EPERM;
 
-   error = security_inode_mknod(dir, dentry, mode, dev);
+   error = security_inode_mknod(dir, dentry, mnt, mode, dev);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -354,6 +354,7 @@ struct request_sock;
  * and not this hook.
  * @dir contains the inode structure of parent of the new file.
  * @dentry contains the dentry structure of the new file.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * @mode contains the mode of the new file.
  * @dev contains the device number.
  * Return 0 if permission is granted.
@@ -1286,7 +1287,7 @@ struct security_operations {
struct vfsmount *mnt, int mode);
int (*inode_rmdir) (struct inode *dir, struct dentry *dentry);
int (*inode_mknod) (struct inode *dir, struct dentry *dentry,
-   int mode, dev_t dev);
+   struct vfsmount *mnt, int mode, dev_t dev);
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
 struct inode *new_dir, struct dentry *new_dentry);
int (*inode_readlink) (struct dentry *dentry);
@@ -1551,7 +1552,8 @@ int security_inode_symlink(struct inode 
 int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
 struct vfsmount *mnt, int mode);
 int security_inode_rmdir(struct inode *dir, struct dentry *dentry);
-int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, 
dev_t dev);
+int security_inode_mknod(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, int mode, dev_t dev);
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
   struct inode *new_dir, struct dentry *new_dentry);
 int security_inode_readlink(struct dentry *dentry);
@@ -1897,6 +1899,7 @@ static inline int security_inode_rmdir (
 
 static inline int security_inode_mknod (struct inode *dir,
struct dentry *dentry,
+   struct vfsmount *mnt,
int mode, dev_t dev)
 {
return 0;
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -324,7 +324,7 @@ static int dummy_inode_rmdir (struct ino
 }
 
 static int dummy_inode_mknod (struct inode *inode, struct dentry *dentry,
- int mode, dev_t dev)
+ struct vfsmount *mnt, int mode, dev_t dev)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -394,11 +394,12 @@ int security_inode_rmdir(struct inode *d
return security_ops-inode_rmdir(dir, dentry);
 }
 
-int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, 
dev_t dev)
+int security_inode_mknod(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, int mode, dev_t dev)
 {
if (unlikely(IS_PRIVATE(dir)))
return 0;
-   return security_ops-inode_mknod(dir, dentry, mode, dev);
+   return security_ops-inode_mknod(dir, dentry, mnt, mode, dev);
 }
 
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2453,11 +2453,12 @@ static int selinux_inode_rmdir(struct in
return may_link(dir, dentry, MAY_RMDIR);
 }
 
-static int selinux_inode_mknod(struct inode *dir, struct dentry *dentry, int 
mode, dev_t dev)
+static int selinux_inode_mknod(struct inode *dir, struct dentry *dentry,
+  struct vfsmount *mnt, int mode, dev_t dev)
 {
int rc;
 
-   rc = secondary_ops-inode_mknod(dir, dentry, mode, dev);
+   rc = secondary_ops-inode_mknod(dir, dentry, mnt, mode, dev);
if (rc)
return rc;
 

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 09/47] Add a struct vfsmount parameter to vfs_symlink()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c |4 +++-
 fs/namei.c  |6 --
 fs/nfsd/vfs.c   |   13 +
 fs/unionfs/copyup.c |1 +
 fs/unionfs/inode.c  |4 +++-
 fs/unionfs/sioq.c   |3 ++-
 fs/unionfs/sioq.h   |1 +
 include/linux/fs.h  |2 +-
 8 files changed, 24 insertions(+), 10 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -460,6 +460,7 @@ static int ecryptfs_symlink(struct inode
 {
int rc;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct dentry *lower_dir_dentry;
umode_t mode;
char *encoded_symname;
@@ -468,6 +469,7 @@ static int ecryptfs_symlink(struct inode
 
lower_dentry = ecryptfs_dentry_to_lower(dentry);
dget(lower_dentry);
+   lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
lower_dir_dentry = lock_parent(lower_dentry);
mode = S_IALLUGO;
encoded_symlen = ecryptfs_encode_filename(crypt_stat, symname,
@@ -477,7 +479,7 @@ static int ecryptfs_symlink(struct inode
rc = encoded_symlen;
goto out_lock;
}
-   rc = vfs_symlink(lower_dir_dentry-d_inode, lower_dentry,
+   rc = vfs_symlink(lower_dir_dentry-d_inode, lower_dentry, lower_mnt,
 encoded_symname, mode);
kfree(encoded_symname);
if (rc || !lower_dentry-d_inode)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2391,7 +2391,8 @@ asmlinkage long sys_unlink(const char __
return do_unlinkat(AT_FDCWD, pathname);
 }
 
-int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, 
int mode)
+int vfs_symlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt,
+   const char *oldname, int mode)
 {
int error = may_create(dir, dentry, NULL);
 
@@ -2440,7 +2441,8 @@ asmlinkage long sys_symlinkat(const char
error = mnt_want_write(nd.path.mnt);
if (error)
goto out_dput;
-   error = vfs_symlink(nd.path.dentry-d_inode, dentry, from, S_IALLUGO);
+   error = vfs_symlink(nd.path.dentry-d_inode, dentry, nd.path.mnt, from,
+   S_IALLUGO);
mnt_drop_write(nd.path.mnt);
 out_dput:
dput(dentry);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1495,6 +1495,7 @@ nfsd_symlink(struct svc_rqst *rqstp, str
struct iattr *iap)
 {
struct dentry   *dentry, *dnew;
+   struct svc_export *exp;
__be32  err, cerr;
int host_err;
umode_t mode;
@@ -1521,6 +1522,7 @@ nfsd_symlink(struct svc_rqst *rqstp, str
if (iap  (iap-ia_valid  ATTR_MODE))
mode = iap-ia_mode  S_IALLUGO;
 
+   exp = fhp-fh_export;
if (unlikely(path[plen] != 0)) {
char *path_alloced = kmalloc(plen+1, GFP_KERNEL);
if (path_alloced == NULL)
@@ -1528,20 +1530,23 @@ nfsd_symlink(struct svc_rqst *rqstp, str
else {
strncpy(path_alloced, path, plen);
path_alloced[plen] = 0;
-   host_err = vfs_symlink(dentry-d_inode, dnew, 
path_alloced, mode);
+   host_err = vfs_symlink(dentry-d_inode, dnew,
+  exp-ex_path.mnt, path_alloced,
+  mode);
kfree(path_alloced);
}
} else
-   host_err = vfs_symlink(dentry-d_inode, dnew, path, mode);
+   host_err = vfs_symlink(dentry-d_inode, dnew, exp-ex_path.mnt,
+  path, mode);
 
if (!host_err) {
-   if (EX_ISSYNC(fhp-fh_export))
+   if (EX_ISSYNC(exp))
host_err = nfsd_sync_dir(dentry);
}
err = nfserrno(host_err);
fh_unlock(fhp);
 
-   cerr = fh_compose(resfhp, fhp-fh_export, dnew, fhp);
+   cerr = fh_compose(resfhp, exp, dnew, fhp);
dput(dnew);
if (err==0) err = cerr;
 out:
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -184,6 +184,7 @@ static int __copyup_ndentry(struct dentr
} else if (S_ISLNK(old_mode)) {
args.symlink.parent = new_lower_parent_dentry-d_inode;
args.symlink.dentry = new_lower_dentry;
+   args.symlink.mnt = new_lower_mnt;
args.symlink.symbuf = symbuf;
args.symlink.mode = old_mode;
 
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -452,6 +452,7 @@ static int unionfs_symlink(struct inode 
 */
for (bindex = bstart; bindex = 0; bindex--) {
lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
+   

[AppArmor 10/47] Pass struct vfsmount to the inode_symlink LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |2 +-
 include/linux/security.h |8 +---
 security/dummy.c |2 +-
 security/security.c  |4 ++--
 security/selinux/hooks.c |3 ++-
 5 files changed, 11 insertions(+), 8 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2402,7 +2402,7 @@ int vfs_symlink(struct inode *dir, struc
if (!dir-i_op || !dir-i_op-symlink)
return -EPERM;
 
-   error = security_inode_symlink(dir, dentry, oldname);
+   error = security_inode_symlink(dir, dentry, mnt, oldname);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -332,6 +332,7 @@ struct request_sock;
  * Check the permission to create a symbolic link to a file.
  * @dir contains the inode structure of parent directory of the symbolic 
link.
  * @dentry contains the dentry structure of the symbolic link.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * @old_name contains the pathname of file.
  * Return 0 if permission is granted.
  * @inode_mkdir:
@@ -1281,8 +1282,8 @@ struct security_operations {
int (*inode_link) (struct dentry *old_dentry,
   struct inode *dir, struct dentry *new_dentry);
int (*inode_unlink) (struct inode *dir, struct dentry *dentry);
-   int (*inode_symlink) (struct inode *dir,
- struct dentry *dentry, const char *old_name);
+   int (*inode_symlink) (struct inode *dir, struct dentry *dentry,
+ struct vfsmount *mnt, const char *old_name);
int (*inode_mkdir) (struct inode *dir, struct dentry *dentry,
struct vfsmount *mnt, int mode);
int (*inode_rmdir) (struct inode *dir, struct dentry *dentry);
@@ -1548,7 +1549,7 @@ int security_inode_link(struct dentry *o
 struct dentry *new_dentry);
 int security_inode_unlink(struct inode *dir, struct dentry *dentry);
 int security_inode_symlink(struct inode *dir, struct dentry *dentry,
-   const char *old_name);
+  struct vfsmount *mnt, const char *old_name);
 int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
 struct vfsmount *mnt, int mode);
 int security_inode_rmdir(struct inode *dir, struct dentry *dentry);
@@ -1878,6 +1879,7 @@ static inline int security_inode_unlink 
 
 static inline int security_inode_symlink (struct inode *dir,
  struct dentry *dentry,
+ struct vfsmount *mnt,
  const char *old_name)
 {
return 0;
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -307,7 +307,7 @@ static int dummy_inode_unlink (struct in
 }
 
 static int dummy_inode_symlink (struct inode *inode, struct dentry *dentry,
-   const char *name)
+   struct vfsmount *mnt, const char *name)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -372,11 +372,11 @@ int security_inode_unlink(struct inode *
 }
 
 int security_inode_symlink(struct inode *dir, struct dentry *dentry,
-   const char *old_name)
+  struct vfsmount *mnt, const char *old_name)
 {
if (unlikely(IS_PRIVATE(dir)))
return 0;
-   return security_ops-inode_symlink(dir, dentry, old_name);
+   return security_ops-inode_symlink(dir, dentry, mnt, old_name);
 }
 
 int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2437,7 +2437,8 @@ static int selinux_inode_unlink(struct i
return may_link(dir, dentry, MAY_UNLINK);
 }
 
-static int selinux_inode_symlink(struct inode *dir, struct dentry *dentry, 
const char *name)
+static int selinux_inode_symlink(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt, const char *name)
 {
return may_create(dir, dentry, SECCLASS_LNK_FILE);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 11/47] Pass struct vfsmount to the inode_readlink LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/stat.c|3 ++-
 include/linux/security.h |8 +---
 security/dummy.c |2 +-
 security/security.c  |4 ++--
 security/selinux/hooks.c |2 +-
 5 files changed, 11 insertions(+), 8 deletions(-)

--- a/fs/stat.c
+++ b/fs/stat.c
@@ -306,7 +306,8 @@ asmlinkage long sys_readlinkat(int dfd, 
 
error = -EINVAL;
if (inode-i_op  inode-i_op-readlink) {
-   error = security_inode_readlink(nd.path.dentry);
+   error = security_inode_readlink(nd.path.dentry,
+   nd.path.mnt);
if (!error) {
touch_atime(nd.path.mnt, nd.path.dentry);
error = inode-i_op-readlink(nd.path.dentry,
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -369,6 +369,7 @@ struct request_sock;
  * @inode_readlink:
  * Check the permission to read the symbolic link.
  * @dentry contains the dentry structure for the file link.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * Return 0 if permission is granted.
  * @inode_follow_link:
  * Check permission to follow a symbolic link when looking up a pathname.
@@ -1291,7 +1292,7 @@ struct security_operations {
struct vfsmount *mnt, int mode, dev_t dev);
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
 struct inode *new_dir, struct dentry *new_dentry);
-   int (*inode_readlink) (struct dentry *dentry);
+   int (*inode_readlink) (struct dentry *dentry, struct vfsmount *mnt);
int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
int (*inode_permission) (struct inode *inode, int mask, struct 
nameidata *nd);
int (*inode_setattr) (struct dentry *dentry, struct vfsmount *mnt,
@@ -1557,7 +1558,7 @@ int security_inode_mknod(struct inode *d
 struct vfsmount *mnt, int mode, dev_t dev);
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
   struct inode *new_dir, struct dentry *new_dentry);
-int security_inode_readlink(struct dentry *dentry);
+int security_inode_readlink(struct dentry *dentry, struct vfsmount *mnt);
 int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
 int security_inode_permission(struct inode *inode, int mask, struct nameidata 
*nd);
 int security_inode_setattr(struct dentry *dentry, struct vfsmount *mnt,
@@ -1915,7 +1916,8 @@ static inline int security_inode_rename 
return 0;
 }
 
-static inline int security_inode_readlink (struct dentry *dentry)
+static inline int security_inode_readlink(struct dentry *dentry,
+  struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -337,7 +337,7 @@ static int dummy_inode_rename (struct in
return 0;
 }
 
-static int dummy_inode_readlink (struct dentry *dentry)
+static int dummy_inode_readlink (struct dentry *dentry, struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -412,11 +412,11 @@ int security_inode_rename(struct inode *
   new_dir, new_dentry);
 }
 
-int security_inode_readlink(struct dentry *dentry)
+int security_inode_readlink(struct dentry *dentry, struct vfsmount *mnt)
 {
if (unlikely(IS_PRIVATE(dentry-d_inode)))
return 0;
-   return security_ops-inode_readlink(dentry);
+   return security_ops-inode_readlink(dentry, mnt);
 }
 
 int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2472,7 +2472,7 @@ static int selinux_inode_rename(struct i
return may_rename(old_inode, old_dentry, new_inode, new_dentry);
 }
 
-static int selinux_inode_readlink(struct dentry *dentry)
+static int selinux_inode_readlink(struct dentry *dentry, struct vfsmount *mnt)
 {
return dentry_has_perm(current, NULL, dentry, FILE__READ);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 12/47] Add struct vfsmount parameters to vfs_link()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c |9 +++--
 fs/namei.c  |5 +++--
 fs/nfsd/vfs.c   |3 ++-
 fs/unionfs/inode.c  |   14 +++---
 include/linux/fs.h  |2 +-
 5 files changed, 24 insertions(+), 9 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -401,19 +401,24 @@ static int ecryptfs_link(struct dentry *
 struct dentry *new_dentry)
 {
struct dentry *lower_old_dentry;
+   struct vfsmount *lower_old_mnt;
struct dentry *lower_new_dentry;
+   struct vfsmount *lower_new_mnt;
struct dentry *lower_dir_dentry;
u64 file_size_save;
int rc;
 
file_size_save = i_size_read(old_dentry-d_inode);
lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
+   lower_old_mnt = ecryptfs_dentry_to_lower_mnt(old_dentry);
lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry);
+   lower_new_mnt = ecryptfs_dentry_to_lower_mnt(new_dentry);
dget(lower_old_dentry);
dget(lower_new_dentry);
lower_dir_dentry = lock_parent(lower_new_dentry);
-   rc = vfs_link(lower_old_dentry, lower_dir_dentry-d_inode,
- lower_new_dentry);
+   rc = vfs_link(lower_old_dentry, lower_old_mnt,
+ lower_dir_dentry-d_inode, lower_new_dentry,
+ lower_new_mnt);
if (rc || !lower_new_dentry-d_inode)
goto out_lock;
rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir-i_sb, 0);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2461,7 +2461,7 @@ asmlinkage long sys_symlink(const char _
return sys_symlinkat(oldname, AT_FDCWD, newname);
 }
 
-int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*new_dentry)
+int vfs_link(struct dentry *old_dentry, struct vfsmount *old_mnt, struct inode 
*dir, struct dentry *new_dentry, struct vfsmount *new_mnt)
 {
struct inode *inode = old_dentry-d_inode;
int error;
@@ -2542,7 +2542,8 @@ asmlinkage long sys_linkat(int olddfd, c
error = mnt_want_write(nd.path.mnt);
if (error)
goto out_dput;
-   error = vfs_link(old_nd.path.dentry, nd.path.dentry-d_inode, 
new_dentry);
+   error = vfs_link(old_nd.path.dentry, old_nd.path.mnt,
+nd.path.dentry-d_inode, new_dentry, nd.path.mnt);
mnt_drop_write(nd.path.mnt);
 out_dput:
dput(new_dentry);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1596,7 +1596,8 @@ nfsd_link(struct svc_rqst *rqstp, struct
dold = tfhp-fh_dentry;
dest = dold-d_inode;
 
-   host_err = vfs_link(dold, dirp, dnew);
+   host_err = vfs_link(dold, tfhp-fh_export-ex_path.mnt, dirp,
+   dnew, ffhp-fh_export-ex_path.mnt);
if (!host_err) {
if (EX_ISSYNC(ffhp-fh_export)) {
err = nfserrno(nfsd_sync_dir(ddir));
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -222,6 +222,7 @@ static int unionfs_link(struct dentry *o
 {
int err = 0;
struct dentry *lower_old_dentry = NULL;
+   struct vfsmount *lower_old_mnt = NULL;
struct dentry *lower_new_dentry = NULL;
struct vfsmount *lower_new_mnt = NULL;
struct dentry *lower_dir_dentry = NULL;
@@ -293,14 +294,17 @@ static int unionfs_link(struct dentry *o
goto out;
}
lower_new_dentry = unionfs_lower_dentry(new_dentry);
+   lower_new_mnt = unionfs_lower_mnt(new_dentry);
lower_old_dentry = unionfs_lower_dentry(old_dentry);
+   lower_old_mnt = unionfs_lower_mnt(old_dentry);
 
BUG_ON(dbstart(old_dentry) != dbstart(new_dentry));
lower_dir_dentry = lock_parent(lower_new_dentry);
err = is_robranch(old_dentry);
if (!err)
-   err = vfs_link(lower_old_dentry, lower_dir_dentry-d_inode,
-  lower_new_dentry);
+   err = vfs_link(lower_old_dentry, lower_old_mnt,
+  lower_dir_dentry-d_inode,
+  lower_new_dentry, lower_new_mnt);
unlock_dir(lower_dir_dentry);
 
 docopyup:
@@ -321,12 +325,16 @@ docopyup:
   bindex, lower_new_mnt);
lower_old_dentry =
unionfs_lower_dentry(old_dentry);
+   lower_old_mnt =
+   unionfs_lower_mnt(old_dentry);
lower_dir_dentry =
lock_parent(lower_new_dentry);
/* do vfs_link */
err = vfs_link(lower_old_dentry,
+ 

[AppArmor 13/47] Pass the struct vfsmounts to the inode_link LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |3 ++-
 include/linux/security.h |   16 +++-
 security/dummy.c |6 --
 security/security.c  |8 +---
 security/selinux/hooks.c |9 +++--
 5 files changed, 29 insertions(+), 13 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2486,7 +2486,8 @@ int vfs_link(struct dentry *old_dentry, 
if (S_ISDIR(old_dentry-d_inode-i_mode))
return -EPERM;
 
-   error = security_inode_link(old_dentry, dir, new_dentry);
+   error = security_inode_link(old_dentry, old_mnt, dir, new_dentry,
+   new_mnt);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -320,8 +320,10 @@ struct request_sock;
  * @inode_link:
  * Check permission before creating a new hard link to a file.
  * @old_dentry contains the dentry structure for an existing link to the 
file.
+ * @old_mnt is the vfsmount corresponding to @old_dentry (may be NULL).
  * @dir contains the inode structure of the parent directory of the new 
link.
  * @new_dentry contains the dentry structure for the new link.
+ * @new_mnt is the vfsmount corresponding to @new_dentry (may be NULL).
  * Return 0 if permission is granted.
  * @inode_unlink:
  * Check the permission to remove a hard link to a file. 
@@ -1280,8 +1282,9 @@ struct security_operations {
char **name, void **value, size_t *len);
int (*inode_create) (struct inode *dir, struct dentry *dentry,
 struct vfsmount *mnt, int mode);
-   int (*inode_link) (struct dentry *old_dentry,
-  struct inode *dir, struct dentry *new_dentry);
+   int (*inode_link) (struct dentry *old_dentry, struct vfsmount *old_mnt,
+  struct inode *dir, struct dentry *new_dentry,
+  struct vfsmount *new_mnt);
int (*inode_unlink) (struct inode *dir, struct dentry *dentry);
int (*inode_symlink) (struct inode *dir, struct dentry *dentry,
  struct vfsmount *mnt, const char *old_name);
@@ -1546,8 +1549,9 @@ int security_inode_init_security(struct 
  char **name, void **value, size_t *len);
 int security_inode_create(struct inode *dir, struct dentry *dentry,
  struct vfsmount *mnt, int mode);
-int security_inode_link(struct dentry *old_dentry, struct inode *dir,
-struct dentry *new_dentry);
+int security_inode_link(struct dentry *old_dentry, struct vfsmount *old_mnt,
+   struct inode *dir, struct dentry *new_dentry,
+   struct vfsmount *new_mnt);
 int security_inode_unlink(struct inode *dir, struct dentry *dentry);
 int security_inode_symlink(struct inode *dir, struct dentry *dentry,
   struct vfsmount *mnt, const char *old_name);
@@ -1866,8 +1870,10 @@ static inline int security_inode_create 
 }
 
 static inline int security_inode_link (struct dentry *old_dentry,
+  struct vfsmount *old_mnt,
   struct inode *dir,
-  struct dentry *new_dentry)
+  struct dentry *new_dentry,
+  struct vfsmount *new_mnt)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -295,8 +295,10 @@ static int dummy_inode_create (struct in
return 0;
 }
 
-static int dummy_inode_link (struct dentry *old_dentry, struct inode *inode,
-struct dentry *new_dentry)
+static int dummy_inode_link (struct dentry *old_dentry,
+struct vfsmount *old_mnt, struct inode *inode,
+struct dentry *new_dentry,
+struct vfsmount *new_mnt)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -356,12 +356,14 @@ int security_inode_create(struct inode *
return security_ops-inode_create(dir, dentry, mnt, mode);
 }
 
-int security_inode_link(struct dentry *old_dentry, struct inode *dir,
-struct dentry *new_dentry)
+int security_inode_link(struct dentry *old_dentry, struct vfsmount *old_mnt,
+   struct inode *dir, struct dentry *new_dentry,
+   struct vfsmount *new_mnt)
 {
if (unlikely(IS_PRIVATE(old_dentry-d_inode)))
return 0;
-   return security_ops-inode_link(old_dentry, dir, new_dentry);
+   return security_ops-inode_link(old_dentry, old_mnt, dir,
+

Re: [rfc][patch] mm: madvise(WILLNEED) for anonymous memory

2007-12-20 Thread Hugh Dickins
On Thu, 20 Dec 2007, Peter Zijlstra wrote:
 
 Lennart asked for madvise(WILLNEED) to work on anonymous pages, he plans
 to use this to pre-fault pages. He currently uses: mlock/munlock for
 this purpose.

I certainly agree with this in principle: it just seems an unnecessary
and surprising restriction to refuse on anonymous vmas; I guess the only
reason for not adding this was not having anyone asking for it until now.
Though, does Lennart realize he could use MAP_POPULATE in the mmap?

 
 [ compile tested only ]

I haven't tried it either, but generally it looks plausible.

 
 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 ---
 diff --git a/mm/madvise.c b/mm/madvise.c
 index 93ee375..eff60ce 100644
 --- a/mm/madvise.c
 +++ b/mm/madvise.c
 @@ -100,6 +100,24 @@ out:
   return error;
  }
  
 +static long madvice_willneed_anon(struct vm_area_struct *vma,
 +   struct vm_area_struct **prev,
 +   unsigned long start, unsigned long end)

mavise.c uses madvise_ rather than  madvice_ throughout,
so please go with the flow.

 +{
 + int ret, len;
 +
 + *prev = vma;
 + if (end  vma-vm_end)
 + end = vma-vm_end;

Please check, but I think the upper level ensures end is within range.

 +
 + len = end - start;
 + ret = get_user_pages(current, current-mm, start, len,
 + 0, 0, NULL, NULL);
 + if (ret  0)
 + return ret;
 + return ret == len ? 0 : -1;

It's not good to return -1 as an alternative to a real errno:
it'll look like -EPERM.  If you copied that from somewhere, better
send a patch to fix the somewhere!  Ah, yes, make_pages_present: it
happens that nobody is interested in its return value, so we could
make it a void; but that'd just be a cleanup.  What to do here if
non-negative ret less than len?  Oh, just return 0, that's good
enough in this case (the file case always returns 0).

Hmm, might it be better to use make_pages_present itself,
fixing its retval, rather than using get_user_pages directly?
(I'd hope the caching makes its repeat of find_vma not an overhead.)

Interesting divergence: make_pages_present faults in writable pages
in a writable vma, whereas the file case's force_page_cache_readahead
doesn't even insert the pages into the mm.

 +}
 +
  /*
   * Schedule all required I/O operations.  Do not wait for completion.
   */
 @@ -110,7 +128,7 @@ static long madvise_willneed(struct vm_area_struct * vma,
   struct file *file = vma-vm_file;
  
   if (!file)
 - return -EBADF;
 + return madvice_willneed_anon(vma, prev, start, end);
  
   if (file-f_mapping-a_ops-get_xip_page) {
   /* no bad return value, but ignore advice */

And there's a correctly invisible hunk to the patch too: this
extension of MADV_WILLNEED also does not require down_write of
mmap_sem, so madvise_need_mmap_write can remain unchanged.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 14/47] Add a struct vfsmount parameter to vfs_rmdir()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c   |4 +++-
 fs/namei.c|4 ++--
 fs/nfsd/nfs4recover.c |2 +-
 fs/nfsd/vfs.c |8 +---
 fs/reiserfs/xattr.c   |2 +-
 fs/unionfs/unlink.c   |5 -
 include/linux/fs.h|2 +-
 7 files changed, 17 insertions(+), 10 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -532,14 +532,16 @@ out:
 static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct dentry *lower_dir_dentry;
int rc;
 
lower_dentry = ecryptfs_dentry_to_lower(dentry);
+   lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
dget(dentry);
lower_dir_dentry = lock_parent(lower_dentry);
dget(lower_dentry);
-   rc = vfs_rmdir(lower_dir_dentry-d_inode, lower_dentry);
+   rc = vfs_rmdir(lower_dir_dentry-d_inode, lower_dentry, lower_mnt);
dput(lower_dentry);
if (!rc)
d_delete(lower_dentry);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2202,7 +2202,7 @@ void dentry_unhash(struct dentry *dentry
spin_unlock(dcache_lock);
 }
 
-int vfs_rmdir(struct inode *dir, struct dentry *dentry)
+int vfs_rmdir(struct inode *dir, struct dentry *dentry,struct vfsmount *mnt)
 {
int error = may_delete(dir, dentry, 1);
 
@@ -2269,7 +2269,7 @@ static long do_rmdir(int dfd, const char
error = mnt_want_write(nd.path.mnt);
if (error)
goto exit3;
-   error = vfs_rmdir(nd.path.dentry-d_inode, dentry);
+   error = vfs_rmdir(nd.path.dentry-d_inode, dentry, nd.path.mnt);
mnt_drop_write(nd.path.mnt);
 exit3:
dput(dentry);
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -279,7 +279,7 @@ nfsd4_clear_clid_dir(struct dentry *dir,
 * a kernel from the future */
nfsd4_list_rec_dir(dentry, nfsd4_remove_clid_file);
mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT);
-   status = vfs_rmdir(dir-d_inode, dentry);
+   status = vfs_rmdir(dir-d_inode, dentry, rec_dir.path.mnt);
mutex_unlock(dir-d_inode-i_mutex);
return status;
 }
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1734,6 +1734,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
char *fname, int flen)
 {
struct dentry   *dentry, *rdentry;
+   struct svc_export *exp;
struct inode*dirp;
__be32  err;
int host_err;
@@ -1748,6 +1749,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
fh_lock_nested(fhp, I_MUTEX_PARENT);
dentry = fhp-fh_dentry;
dirp = dentry-d_inode;
+   exp = fhp-fh_export;
 
rdentry = lookup_one_len(fname, dentry, flen);
host_err = PTR_ERR(rdentry);
@@ -1765,21 +1767,21 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 
if (type != S_IFDIR) { /* It's UNLINK */
 #ifdef MSNFS
-   if ((fhp-fh_export-ex_flags  NFSEXP_MSNFS) 
+   if ((exp-ex_flags  NFSEXP_MSNFS) 
(atomic_read(rdentry-d_count)  1)) {
host_err = -EPERM;
} else
 #endif
host_err = vfs_unlink(dirp, rdentry);
} else { /* It's RMDIR */
-   host_err = vfs_rmdir(dirp, rdentry);
+   host_err = vfs_rmdir(dirp, rdentry, exp-ex_path.mnt);
}
 
dput(rdentry);
 
if (host_err)
goto out_nfserr;
-   if (EX_ISSYNC(fhp-fh_export))
+   if (EX_ISSYNC(exp))
host_err = nfsd_sync_dir(dentry);
 
 out_nfserr:
--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -747,7 +747,7 @@ int reiserfs_delete_xattrs(struct inode 
if (dir-d_inode-i_nlink = 2) {
root = get_xa_root(inode-i_sb, XATTR_REPLACE);
reiserfs_write_lock_xattrs(inode-i_sb);
-   err = vfs_rmdir(root-d_inode, dir);
+   err = vfs_rmdir(root-d_inode, dir, NULL);
reiserfs_write_unlock_xattrs(inode-i_sb);
dput(root);
} else {
--- a/fs/unionfs/unlink.c
+++ b/fs/unionfs/unlink.c
@@ -125,6 +125,7 @@ static int unionfs_rmdir_first(struct in
 {
int err;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct dentry *lower_dir_dentry = NULL;
 
/* Here we need to remove whiteout entries. */
@@ -133,6 +134,7 @@ static int unionfs_rmdir_first(struct in
goto out;
 
lower_dentry = unionfs_lower_dentry(dentry);
+   lower_mnt = unionfs_lower_mnt(dentry);
 
lower_dir_dentry = lock_parent(lower_dentry);
 
@@ -140,7 +142,8 @@ static int unionfs_rmdir_first(struct in
dget(lower_dentry);
err = 

[AppArmor 15/47] Pass struct vfsmount to the inode_rmdir LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |2 +-
 include/linux/security.h |   10 +++---
 security/dummy.c |3 ++-
 security/security.c  |5 +++--
 security/selinux/hooks.c |3 ++-
 5 files changed, 15 insertions(+), 8 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2219,7 +2219,7 @@ int vfs_rmdir(struct inode *dir, struct 
if (d_mountpoint(dentry))
error = -EBUSY;
else {
-   error = security_inode_rmdir(dir, dentry);
+   error = security_inode_rmdir(dir, dentry, mnt);
if (!error) {
error = dir-i_op-rmdir(dir, dentry);
if (!error)
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -349,6 +349,7 @@ struct request_sock;
  * Check the permission to remove a directory.
  * @dir contains the inode structure of parent of the directory to be 
removed.
  * @dentry contains the dentry structure of directory to be removed.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * Return 0 if permission is granted.
  * @inode_mknod:
  * Check permissions when creating a special file (or a socket or a fifo
@@ -1290,7 +1291,8 @@ struct security_operations {
  struct vfsmount *mnt, const char *old_name);
int (*inode_mkdir) (struct inode *dir, struct dentry *dentry,
struct vfsmount *mnt, int mode);
-   int (*inode_rmdir) (struct inode *dir, struct dentry *dentry);
+   int (*inode_rmdir) (struct inode *dir, struct dentry *dentry,
+   struct vfsmount *mnt);
int (*inode_mknod) (struct inode *dir, struct dentry *dentry,
struct vfsmount *mnt, int mode, dev_t dev);
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
@@ -1557,7 +1559,8 @@ int security_inode_symlink(struct inode 
   struct vfsmount *mnt, const char *old_name);
 int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
 struct vfsmount *mnt, int mode);
-int security_inode_rmdir(struct inode *dir, struct dentry *dentry);
+int security_inode_rmdir(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt);
 int security_inode_mknod(struct inode *dir, struct dentry *dentry,
 struct vfsmount *mnt, int mode, dev_t dev);
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
@@ -1901,7 +1904,8 @@ static inline int security_inode_mkdir (
 }
 
 static inline int security_inode_rmdir (struct inode *dir,
-   struct dentry *dentry)
+   struct dentry *dentry,
+   struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -320,7 +320,8 @@ static int dummy_inode_mkdir (struct ino
return 0;
 }
 
-static int dummy_inode_rmdir (struct inode *inode, struct dentry *dentry)
+static int dummy_inode_rmdir (struct inode *inode, struct dentry *dentry,
+ struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -389,11 +389,12 @@ int security_inode_mkdir(struct inode *d
return security_ops-inode_mkdir(dir, dentry, mnt, mode);
 }
 
-int security_inode_rmdir(struct inode *dir, struct dentry *dentry)
+int security_inode_rmdir(struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt)
 {
if (unlikely(IS_PRIVATE(dentry-d_inode)))
return 0;
-   return security_ops-inode_rmdir(dir, dentry);
+   return security_ops-inode_rmdir(dir, dentry, mnt);
 }
 
 int security_inode_mknod(struct inode *dir, struct dentry *dentry,
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2454,7 +2454,8 @@ static int selinux_inode_mkdir(struct in
return may_create(dir, dentry, SECCLASS_DIR);
 }
 
-static int selinux_inode_rmdir(struct inode *dir, struct dentry *dentry)
+static int selinux_inode_rmdir(struct inode *dir, struct dentry *dentry,
+  struct vfsmount *mnt)
 {
return may_link(dir, dentry, MAY_RMDIR);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 16/47] Call lsm hook before unhashing dentry in vfs_rmdir()

2007-12-20 Thread John
If we unhash the dentry before calling the security_inode_rmdir hook,
we cannot compute the file's pathname in the hook anymore. AppArmor
needs to know the filename in order to decide whether a file may be
deleted, though.

Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]

---
 fs/namei.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2212,6 +2212,10 @@ int vfs_rmdir(struct inode *dir, struct 
if (!dir-i_op || !dir-i_op-rmdir)
return -EPERM;
 
+   error = security_inode_rmdir(dir, dentry, mnt);
+   if (error)
+   return error;
+
DQUOT_INIT(dir);
 
mutex_lock(dentry-d_inode-i_mutex);
@@ -2219,12 +2223,9 @@ int vfs_rmdir(struct inode *dir, struct 
if (d_mountpoint(dentry))
error = -EBUSY;
else {
-   error = security_inode_rmdir(dir, dentry, mnt);
-   if (!error) {
-   error = dir-i_op-rmdir(dir, dentry);
-   if (!error)
-   dentry-d_inode-i_flags |= S_DEAD;
-   }
+   error = dir-i_op-rmdir(dir, dentry);
+   if (!error)
+   dentry-d_inode-i_flags |= S_DEAD;
}
mutex_unlock(dentry-d_inode-i_mutex);
if (!error) {

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 17/47] Add a struct vfsmount parameter to vfs_unlink()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c |3 ++-
 fs/namei.c  |5 +++--
 fs/nfsd/nfs4recover.c   |2 +-
 fs/nfsd/vfs.c   |2 +-
 fs/unionfs/commonfops.c |4 +++-
 fs/unionfs/copyup.c |3 ++-
 fs/unionfs/dirhelper.c  |5 -
 fs/unionfs/inode.c  |   22 --
 fs/unionfs/rename.c |8 ++--
 fs/unionfs/sioq.c   |2 +-
 fs/unionfs/sioq.h   |1 +
 fs/unionfs/unlink.c |5 -
 include/linux/fs.h  |2 +-
 ipc/mqueue.c|2 +-
 14 files changed, 46 insertions(+), 20 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -443,10 +443,11 @@ static int ecryptfs_unlink(struct inode 
 {
int rc = 0;
struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+   struct vfsmount *lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
struct inode *lower_dir_inode = ecryptfs_inode_to_lower(dir);
 
lock_parent(lower_dentry);
-   rc = vfs_unlink(lower_dir_inode, lower_dentry);
+   rc = vfs_unlink(lower_dir_inode, lower_dentry, lower_mnt);
if (rc) {
printk(KERN_ERR Error in vfs_unlink; rc = [%d]\n, rc);
goto out_unlock;
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2288,7 +2288,7 @@ asmlinkage long sys_rmdir(const char __u
return do_rmdir(AT_FDCWD, pathname);
 }
 
-int vfs_unlink(struct inode *dir, struct dentry *dentry)
+int vfs_unlink(struct inode *dir, struct dentry *dentry, struct vfsmount *mnt)
 {
int error = may_delete(dir, dentry, 0);
 
@@ -2356,7 +2356,8 @@ static long do_unlinkat(int dfd, const c
error = mnt_want_write(nd.path.mnt);
if (error)
goto exit2;
-   error = vfs_unlink(nd.path.dentry-d_inode, dentry);
+   error = vfs_unlink(nd.path.dentry-d_inode, dentry,
+  nd.path.mnt);
mnt_drop_write(nd.path.mnt);
exit2:
dput(dentry);
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -264,7 +264,7 @@ nfsd4_remove_clid_file(struct dentry *di
return -EINVAL;
}
mutex_lock_nested(dir-d_inode-i_mutex, I_MUTEX_PARENT);
-   status = vfs_unlink(dir-d_inode, dentry);
+   status = vfs_unlink(dir-d_inode, dentry, rec_dir.path.mnt);
mutex_unlock(dir-d_inode-i_mutex);
return status;
 }
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1772,7 +1772,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
host_err = -EPERM;
} else
 #endif
-   host_err = vfs_unlink(dirp, rdentry);
+   host_err = vfs_unlink(dirp, rdentry, exp-ex_path.mnt);
} else { /* It's RMDIR */
host_err = vfs_rmdir(dirp, rdentry, exp-ex_path.mnt);
}
--- a/fs/unionfs/commonfops.c
+++ b/fs/unionfs/commonfops.c
@@ -34,6 +34,7 @@ static int copyup_deleted_file(struct fi
int err;
struct dentry *tmp_dentry = NULL;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
struct dentry *lower_dir_dentry = NULL;
 
lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);
@@ -82,13 +83,14 @@ retry:
 
/* bring it to the same state as an unlinked file */
lower_dentry = unionfs_lower_dentry_idx(dentry, dbstart(dentry));
+   lower_mnt = unionfs_lower_mnt_idx(dentry, dbstart(dentry));
if (!unionfs_lower_inode_idx(dentry-d_inode, bindex)) {
atomic_inc(lower_dentry-d_inode-i_count);
unionfs_set_lower_inode_idx(dentry-d_inode, bindex,
lower_dentry-d_inode);
}
lower_dir_dentry = lock_parent(lower_dentry);
-   err = vfs_unlink(lower_dir_dentry-d_inode, lower_dentry);
+   err = vfs_unlink(lower_dir_dentry-d_inode, lower_dentry, lower_mnt);
unlock_dir(lower_dir_dentry);
 
 out:
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -491,7 +491,8 @@ out_unlink:
 * quota, or something else happened so let's unlink; we don't
 * really care about the return value of vfs_unlink
 */
-   vfs_unlink(new_lower_parent_dentry-d_inode, new_lower_dentry);
+   vfs_unlink(new_lower_parent_dentry-d_inode, new_lower_dentry,
+  new_lower_mnt);
 
if (copyup_file) {
/* need to close the file */
--- a/fs/unionfs/dirhelper.c
+++ b/fs/unionfs/dirhelper.c
@@ -29,6 +29,7 @@ int do_delete_whiteouts(struct dentry *d
int err = 0;
struct dentry *lower_dir_dentry = NULL;
struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt;
char *name = NULL, *p;
struct inode *lower_dir;
int i;
@@ 

[AppArmor 18/47] Pass struct vfsmount to the inode_unlink LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |2 +-
 include/linux/security.h |   10 +++---
 security/dummy.c |3 ++-
 security/security.c  |5 +++--
 security/selinux/hooks.c |5 +++--
 5 files changed, 16 insertions(+), 9 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2304,7 +2304,7 @@ int vfs_unlink(struct inode *dir, struct
if (d_mountpoint(dentry))
error = -EBUSY;
else {
-   error = security_inode_unlink(dir, dentry);
+   error = security_inode_unlink(dir, dentry, mnt);
if (!error)
error = dir-i_op-unlink(dir, dentry);
}
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -329,6 +329,7 @@ struct request_sock;
  * Check the permission to remove a hard link to a file. 
  * @dir contains the inode structure of parent directory of the file.
  * @dentry contains the dentry structure for file to be unlinked.
+ * @mnt is the vfsmount corresponding to @dentry (may be NULL).
  * Return 0 if permission is granted.
  * @inode_symlink:
  * Check the permission to create a symbolic link to a file.
@@ -1286,7 +1287,8 @@ struct security_operations {
int (*inode_link) (struct dentry *old_dentry, struct vfsmount *old_mnt,
   struct inode *dir, struct dentry *new_dentry,
   struct vfsmount *new_mnt);
-   int (*inode_unlink) (struct inode *dir, struct dentry *dentry);
+   int (*inode_unlink) (struct inode *dir, struct dentry *dentry,
+struct vfsmount *mnt);
int (*inode_symlink) (struct inode *dir, struct dentry *dentry,
  struct vfsmount *mnt, const char *old_name);
int (*inode_mkdir) (struct inode *dir, struct dentry *dentry,
@@ -1554,7 +1556,8 @@ int security_inode_create(struct inode *
 int security_inode_link(struct dentry *old_dentry, struct vfsmount *old_mnt,
struct inode *dir, struct dentry *new_dentry,
struct vfsmount *new_mnt);
-int security_inode_unlink(struct inode *dir, struct dentry *dentry);
+int security_inode_unlink(struct inode *dir, struct dentry *dentry,
+ struct vfsmount *mnt);
 int security_inode_symlink(struct inode *dir, struct dentry *dentry,
   struct vfsmount *mnt, const char *old_name);
 int security_inode_mkdir(struct inode *dir, struct dentry *dentry,
@@ -1882,7 +1885,8 @@ static inline int security_inode_link (s
 }
 
 static inline int security_inode_unlink (struct inode *dir,
-struct dentry *dentry)
+struct dentry *dentry,
+struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -303,7 +303,8 @@ static int dummy_inode_link (struct dent
return 0;
 }
 
-static int dummy_inode_unlink (struct inode *inode, struct dentry *dentry)
+static int dummy_inode_unlink (struct inode *inode, struct dentry *dentry,
+  struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -366,11 +366,12 @@ int security_inode_link(struct dentry *o
 new_dentry, new_mnt);
 }
 
-int security_inode_unlink(struct inode *dir, struct dentry *dentry)
+int security_inode_unlink(struct inode *dir, struct dentry *dentry,
+ struct vfsmount *mnt)
 {
if (unlikely(IS_PRIVATE(dentry-d_inode)))
return 0;
-   return security_ops-inode_unlink(dir, dentry);
+   return security_ops-inode_unlink(dir, dentry, mnt);
 }
 
 int security_inode_symlink(struct inode *dir, struct dentry *dentry,
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2432,11 +2432,12 @@ static int selinux_inode_link(struct den
return may_link(dir, old_dentry, MAY_LINK);
 }
 
-static int selinux_inode_unlink(struct inode *dir, struct dentry *dentry)
+static int selinux_inode_unlink(struct inode *dir, struct dentry *dentry,
+   struct vfsmount *mnt)
 {
int rc;
 
-   rc = secondary_ops-inode_unlink(dir, dentry);
+   rc = secondary_ops-inode_unlink(dir, dentry, mnt);
if (rc)
return rc;
return may_link(dir, dentry, MAY_UNLINK);

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 19/47] Add struct vfsmount parameters to vfs_rename()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/ecryptfs/inode.c |7 ++-
 fs/namei.c  |   19 ---
 fs/nfsd/vfs.c   |3 ++-
 fs/unionfs/rename.c |6 +-
 include/linux/fs.h  |2 +-
 5 files changed, 26 insertions(+), 11 deletions(-)

--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -588,19 +588,24 @@ ecryptfs_rename(struct inode *old_dir, s
 {
int rc;
struct dentry *lower_old_dentry;
+   struct vfsmount *lower_old_mnt;
struct dentry *lower_new_dentry;
+   struct vfsmount *lower_new_mnt;
struct dentry *lower_old_dir_dentry;
struct dentry *lower_new_dir_dentry;
 
lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
+   lower_old_mnt = ecryptfs_dentry_to_lower_mnt(old_dentry);
lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry);
+   lower_new_mnt = ecryptfs_dentry_to_lower_mnt(new_dentry);
dget(lower_old_dentry);
dget(lower_new_dentry);
lower_old_dir_dentry = dget_parent(lower_old_dentry);
lower_new_dir_dentry = dget_parent(lower_new_dentry);
lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
rc = vfs_rename(lower_old_dir_dentry-d_inode, lower_old_dentry,
-   lower_new_dir_dentry-d_inode, lower_new_dentry);
+   lower_old_mnt, lower_new_dir_dentry-d_inode,
+   lower_new_dentry, lower_new_mnt);
if (rc)
goto out_lock;
fsstack_copy_attr_all(new_dir, lower_new_dir_dentry-d_inode);
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2600,7 +2600,8 @@ asmlinkage long sys_link(const char __us
  *locking].
  */
 static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+ struct vfsmount *old_mnt, struct inode *new_dir,
+ struct dentry *new_dentry, struct vfsmount *new_mnt)
 {
int error = 0;
struct inode *target;
@@ -2643,7 +2644,8 @@ static int vfs_rename_dir(struct inode *
 }
 
 static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
-   struct inode *new_dir, struct dentry *new_dentry)
+   struct vfsmount *old_mnt, struct inode *new_dir,
+   struct dentry *new_dentry, struct vfsmount *new_mnt)
 {
struct inode *target;
int error;
@@ -2671,7 +2673,8 @@ static int vfs_rename_other(struct inode
 }
 
 int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-  struct inode *new_dir, struct dentry *new_dentry)
+   struct vfsmount *old_mnt, struct inode *new_dir,
+   struct dentry *new_dentry, struct vfsmount *new_mnt)
 {
int error;
int is_dir = S_ISDIR(old_dentry-d_inode-i_mode);
@@ -2700,9 +2703,11 @@ int vfs_rename(struct inode *old_dir, st
old_name = fsnotify_oldname_init(old_dentry-d_name.name);
 
if (is_dir)
-   error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
+   error = vfs_rename_dir(old_dir, old_dentry, old_mnt,
+  new_dir, new_dentry, new_mnt);
else
-   error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
+   error = vfs_rename_other(old_dir, old_dentry, old_mnt,
+new_dir, new_dentry, new_mnt);
if (!error) {
const char *new_name = old_dentry-d_name.name;
fsnotify_move(old_dir, new_dir, old_name, new_name, is_dir,
@@ -2777,8 +2782,8 @@ static int do_rename(int olddfd, const c
error = mnt_want_write(oldnd.path.mnt);
if (error)
goto exit5;
-   error = vfs_rename(old_dir-d_inode, old_dentry,
-  new_dir-d_inode, new_dentry);
+   error = vfs_rename(old_dir-d_inode, old_dentry, oldnd.path.mnt,
+  new_dir-d_inode, new_dentry, newnd.path.mnt);
mnt_drop_write(oldnd.path.mnt);
 exit5:
dput(new_dentry);
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1697,7 +1697,8 @@ nfsd_rename(struct svc_rqst *rqstp, stru
if (host_err)
goto out_dput_new;
 
-   host_err = vfs_rename(fdir, odentry, tdir, ndentry);
+   host_err = vfs_rename(fdir, odentry, ffhp-fh_export-ex_path.mnt,
+ tdir, ndentry, tfhp-fh_export-ex_path.mnt);
if (!host_err  EX_ISSYNC(tfhp-fh_export)) {
host_err = nfsd_sync_dir(tdentry);
if (!host_err)
--- a/fs/unionfs/rename.c
+++ b/fs/unionfs/rename.c
@@ -24,6 +24,7 @@ static int __unionfs_rename(struct inode
 {
int err = 

[AppArmor 20/47] Pass struct vfsmount to the inode_rename LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c   |6 --
 include/linux/security.h |   13 ++---
 security/dummy.c |4 +++-
 security/security.c  |7 ---
 security/selinux/hooks.c |8 ++--
 5 files changed, 27 insertions(+), 11 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2616,7 +2616,8 @@ static int vfs_rename_dir(struct inode *
return error;
}
 
-   error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
+   error = security_inode_rename(old_dir, old_dentry, old_mnt,
+ new_dir, new_dentry, new_mnt);
if (error)
return error;
 
@@ -2650,7 +2651,8 @@ static int vfs_rename_other(struct inode
struct inode *target;
int error;
 
-   error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
+   error = security_inode_rename(old_dir, old_dentry, old_mnt,
+ new_dir, new_dentry, new_mnt);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -367,8 +367,10 @@ struct request_sock;
  * Check for permission to rename a file or directory.
  * @old_dir contains the inode structure for parent of the old link.
  * @old_dentry contains the dentry structure of the old link.
+ * @old_mnt is the vfsmount corresponding to @old_dentry (may be NULL).
  * @new_dir contains the inode structure for parent of the new link.
  * @new_dentry contains the dentry structure of the new link.
+ * @new_mnt is the vfsmount corresponding to @new_dentry (may be NULL).
  * Return 0 if permission is granted.
  * @inode_readlink:
  * Check the permission to read the symbolic link.
@@ -1298,7 +1300,9 @@ struct security_operations {
int (*inode_mknod) (struct inode *dir, struct dentry *dentry,
struct vfsmount *mnt, int mode, dev_t dev);
int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
-struct inode *new_dir, struct dentry *new_dentry);
+struct vfsmount *old_mnt,
+struct inode *new_dir, struct dentry *new_dentry,
+struct vfsmount *new_mnt);
int (*inode_readlink) (struct dentry *dentry, struct vfsmount *mnt);
int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
int (*inode_permission) (struct inode *inode, int mask, struct 
nameidata *nd);
@@ -1567,7 +1571,8 @@ int security_inode_rmdir(struct inode *d
 int security_inode_mknod(struct inode *dir, struct dentry *dentry,
 struct vfsmount *mnt, int mode, dev_t dev);
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
-  struct inode *new_dir, struct dentry *new_dentry);
+ struct vfsmount *old_mnt, struct inode *new_dir,
+ struct dentry *new_dentry, struct vfsmount *new_mnt);
 int security_inode_readlink(struct dentry *dentry, struct vfsmount *mnt);
 int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
 int security_inode_permission(struct inode *inode, int mask, struct nameidata 
*nd);
@@ -1924,8 +1929,10 @@ static inline int security_inode_mknod (
 
 static inline int security_inode_rename (struct inode *old_dir,
 struct dentry *old_dentry,
+struct vfsmount *old_mnt,
 struct inode *new_dir,
-struct dentry *new_dentry)
+struct dentry *new_dentry,
+struct vfsmount *new_mnt)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -335,8 +335,10 @@ static int dummy_inode_mknod (struct ino
 
 static int dummy_inode_rename (struct inode *old_inode,
   struct dentry *old_dentry,
+  struct vfsmount *old_mnt,
   struct inode *new_inode,
-  struct dentry *new_dentry)
+  struct dentry *new_dentry,
+  struct vfsmount *new_mnt)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -407,13 +407,14 @@ int security_inode_mknod(struct inode *d
 }
 
 int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
-  struct inode *new_dir, struct dentry *new_dentry)
+ struct vfsmount *old_mnt, struct inode *new_dir,
+ 

[AppArmor 21/47] Add a struct vfsmount parameter to vfs_setxattr()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/nfsd/vfs.c |   16 +++-
 fs/unionfs/copyup.c   |   18 --
 fs/unionfs/xattr.c|7 +--
 fs/xattr.c|   16 
 include/linux/xattr.h |3 ++-
 5 files changed, 38 insertions(+), 22 deletions(-)

--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -426,7 +426,8 @@ static ssize_t nfsd_getxattr(struct dent
 
 #if defined(CONFIG_NFSD_V4)
 static int
-set_nfsv4_acl_one(struct dentry *dentry, struct posix_acl *pacl, char *key)
+set_nfsv4_acl_one(struct dentry *dentry, struct vfsmount *mnt,
+ struct posix_acl *pacl, char *key)
 {
int len;
size_t buflen;
@@ -445,7 +446,7 @@ set_nfsv4_acl_one(struct dentry *dentry,
goto out;
}
 
-   error = vfs_setxattr(dentry, key, buf, len, 0);
+   error = vfs_setxattr(dentry, mnt, key, buf, len, 0);
 out:
kfree(buf);
return error;
@@ -458,6 +459,7 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst
__be32 error;
int host_error;
struct dentry *dentry;
+   struct vfsmount *mnt;
struct inode *inode;
struct posix_acl *pacl = NULL, *dpacl = NULL;
unsigned int flags = 0;
@@ -468,6 +470,7 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst
return error;
 
dentry = fhp-fh_dentry;
+   mnt = fhp-fh_export-ex_path.mnt;
inode = dentry-d_inode;
if (S_ISDIR(inode-i_mode))
flags = NFS4_ACL_DIR;
@@ -478,12 +481,14 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqst
} else if (host_error  0)
goto out_nfserr;
 
-   host_error = set_nfsv4_acl_one(dentry, pacl, POSIX_ACL_XATTR_ACCESS);
+   host_error = set_nfsv4_acl_one(dentry, mnt, pacl,
+  POSIX_ACL_XATTR_ACCESS);
if (host_error  0)
goto out_release;
 
if (S_ISDIR(inode-i_mode))
-   host_error = set_nfsv4_acl_one(dentry, dpacl, 
POSIX_ACL_XATTR_DEFAULT);
+   host_error = set_nfsv4_acl_one(dentry, mnt, dpacl,
+  POSIX_ACL_XATTR_DEFAULT);
 
 out_release:
posix_acl_release(pacl);
@@ -2057,7 +2062,8 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i
size = 0;
 
if (size)
-   error = vfs_setxattr(fhp-fh_dentry, name, value, size, 0);
+   error = vfs_setxattr(fhp-fh_dentry, 
fhp-fh_export-ex_path.mnt,
+name, value, size,0);
else {
if (!S_ISDIR(inode-i_mode)  type == ACL_TYPE_DEFAULT)
error = 0;
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -26,7 +26,9 @@
 #ifdef CONFIG_UNION_FS_XATTR
 /* copyup all extended attrs for a given dentry */
 static int copyup_xattrs(struct dentry *old_lower_dentry,
-struct dentry *new_lower_dentry)
+struct vfsmount *old_lower_mnt,
+struct dentry *new_lower_dentry,
+struct vfsmount *new_lower_mnt)
 {
int err = 0;
ssize_t list_size = -1;
@@ -82,8 +84,8 @@ static int copyup_xattrs(struct dentry *
goto out;
}
/* Don't lock here since vfs_setxattr does it for us. */
-   err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
-  size, 0);
+   err = vfs_setxattr(new_lower_dentry, new_lower_mnt, name_list,
+  attr_value, size, 0);
/*
 * Selinux depends on security.* xattrs, so to maintain
 * the security of copied-up files, if Selinux is active,
@@ -93,8 +95,8 @@ static int copyup_xattrs(struct dentry *
 */
if (err == -EPERM  !capable(CAP_FOWNER)) {
cap_raise(current-cap_effective, CAP_FOWNER);
-   err = vfs_setxattr(new_lower_dentry, name_list,
-  attr_value, size, 0);
+   err = vfs_setxattr(new_lower_dentry, new_lower_mnt,
+  name_list, attr_value, size, 0);
cap_lower(current-cap_effective, CAP_FOWNER);
}
if (err  0)
@@ -380,6 +382,7 @@ int copyup_dentry(struct inode *dir, str
struct dentry *new_lower_dentry;
struct vfsmount *new_lower_mnt;
struct dentry *old_lower_dentry = NULL;
+   struct vfsmount *old_lower_mnt;
struct super_block *sb;
int err = 0;
int old_bindex;
@@ -413,6 +416,8 @@ int copyup_dentry(struct inode *dir, str
}
 
old_lower_dentry = 

[AppArmor 22/47] Pass struct vfsmount to the inode_setxattr LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/xattr.c   |4 ++--
 include/linux/security.h |   35 +--
 security/commoncap.c |4 ++--
 security/dummy.c |9 ++---
 security/security.c  |   14 --
 security/selinux/hooks.c |8 ++--
 6 files changed, 45 insertions(+), 29 deletions(-)

--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -78,7 +78,7 @@ vfs_setxattr(struct dentry *dentry, stru
return error;
 
mutex_lock(inode-i_mutex);
-   error = security_inode_setxattr(dentry, name, value, size, flags);
+   error = security_inode_setxattr(dentry, mnt, name, value, size, flags);
if (error)
goto out;
error = -EOPNOTSUPP;
@@ -86,7 +86,7 @@ vfs_setxattr(struct dentry *dentry, stru
error = inode-i_op-setxattr(dentry, name, value, size, flags);
if (!error) {
fsnotify_xattr(dentry);
-   security_inode_post_setxattr(dentry, name, value,
+   security_inode_post_setxattr(dentry, mnt, name, value,
 size, flags);
}
} else if (!strncmp(name, XATTR_SECURITY_PREFIX,
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -57,7 +57,7 @@ extern void cap_capset_set (struct task_
 extern int cap_bprm_set_security (struct linux_binprm *bprm);
 extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe);
 extern int cap_bprm_secureexec(struct linux_binprm *bprm);
-extern int cap_inode_setxattr(struct dentry *dentry, char *name, void *value, 
size_t size, int flags);
+extern int cap_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, 
char *name, void *value, size_t size, int flags);
 extern int cap_inode_removexattr(struct dentry *dentry, char *name);
 extern int cap_inode_need_killpriv(struct dentry *dentry);
 extern int cap_inode_killpriv(struct dentry *dentry);
@@ -415,11 +415,11 @@ struct request_sock;
  * inode.
  * @inode_setxattr:
  * Check permission before setting the extended attributes
- * @value identified by @name for @dentry.
+ * @value identified by @name for @dentry and @mnt.
  * Return 0 if permission is granted.
  * @inode_post_setxattr:
  * Update inode security field after successful setxattr operation.
- * @value identified by @name for @dentry.
+ * @value identified by @name for @dentry and @mnt.
  * @inode_getxattr:
  * Check permission before obtaining the extended attributes
  * identified by @name for @dentry.
@@ -1310,9 +1310,11 @@ struct security_operations {
  struct iattr *attr);
int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry);
 void (*inode_delete) (struct inode *inode);
-   int (*inode_setxattr) (struct dentry *dentry, char *name, void *value,
-  size_t size, int flags);
-   void (*inode_post_setxattr) (struct dentry *dentry, char *name, void 
*value,
+   int (*inode_setxattr) (struct dentry *dentry, struct vfsmount *mnt,
+  char *name, void *value, size_t size, int flags);
+   void (*inode_post_setxattr) (struct dentry *dentry,
+struct vfsmount *mnt,
+char *name, void *value,
 size_t size, int flags);
int (*inode_getxattr) (struct dentry *dentry, char *name);
int (*inode_listxattr) (struct dentry *dentry);
@@ -1580,10 +1582,11 @@ int security_inode_setattr(struct dentry
   struct iattr *attr);
 int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry);
 void security_inode_delete(struct inode *inode);
-int security_inode_setxattr(struct dentry *dentry, char *name,
-void *value, size_t size, int flags);
-void security_inode_post_setxattr(struct dentry *dentry, char *name,
-  void *value, size_t size, int flags);
+int security_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt,
+   char *name, void *value, size_t size, int flags);
+void security_inode_post_setxattr(struct dentry *dentry, struct vfsmount *mnt,
+ char *name, void *value, size_t size,
+ int flags);
 int security_inode_getxattr(struct dentry *dentry, char *name);
 int security_inode_listxattr(struct dentry *dentry);
 int security_inode_removexattr(struct dentry *dentry, char *name);
@@ -1971,14 +1974,18 @@ static inline int security_inode_getattr
 static inline void security_inode_delete (struct inode *inode)
 { }
 
-static inline int 

[AppArmor 23/47] Add a struct vfsmount parameter to vfs_getxattr()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/nfsd/nfs4xdr.c |2 +-
 fs/nfsd/vfs.c |   21 -
 fs/unionfs/copyup.c   |2 +-
 fs/unionfs/xattr.c|4 +++-
 fs/xattr.c|   14 --
 include/linux/nfsd/nfsd.h |3 ++-
 include/linux/xattr.h |2 +-
 7 files changed, 28 insertions(+), 20 deletions(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1501,7 +1501,7 @@ nfsd4_encode_fattr(struct svc_fh *fhp, s
}
if (bmval0  (FATTR4_WORD0_ACL | FATTR4_WORD0_ACLSUPPORT
| FATTR4_WORD0_SUPPORTED_ATTRS)) {
-   err = nfsd4_get_nfs4_acl(rqstp, dentry, acl);
+   err = nfsd4_get_nfs4_acl(rqstp, dentry, exp-ex_path.mnt, acl);
aclsupport = (err == 0);
if (bmval0  FATTR4_WORD0_ACL) {
if (err == -EOPNOTSUPP)
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -408,11 +408,12 @@ out_nfserr:
 #if defined(CONFIG_NFSD_V2_ACL) || \
 defined(CONFIG_NFSD_V3_ACL) || \
 defined(CONFIG_NFSD_V4)
-static ssize_t nfsd_getxattr(struct dentry *dentry, char *key, void **buf)
+static ssize_t nfsd_getxattr(struct dentry *dentry, struct vfsmount *mnt,
+char *key, void **buf)
 {
ssize_t buflen;
 
-   buflen = vfs_getxattr(dentry, key, NULL, 0);
+   buflen = vfs_getxattr(dentry, mnt, key, NULL, 0);
if (buflen = 0)
return buflen;
 
@@ -420,7 +421,7 @@ static ssize_t nfsd_getxattr(struct dent
if (!*buf)
return -ENOMEM;
 
-   return vfs_getxattr(dentry, key, *buf, buflen);
+   return vfs_getxattr(dentry, mnt, key, *buf, buflen);
 }
 #endif
 
@@ -501,13 +502,13 @@ out_nfserr:
 }
 
 static struct posix_acl *
-_get_posix_acl(struct dentry *dentry, char *key)
+_get_posix_acl(struct dentry *dentry, struct vfsmount *mnt, char *key)
 {
void *buf = NULL;
struct posix_acl *pacl = NULL;
int buflen;
 
-   buflen = nfsd_getxattr(dentry, key, buf);
+   buflen = nfsd_getxattr(dentry, mnt, key, buf);
if (!buflen)
buflen = -ENODATA;
if (buflen = 0)
@@ -519,14 +520,15 @@ _get_posix_acl(struct dentry *dentry, ch
 }
 
 int
-nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, struct 
nfs4_acl **acl)
+nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
+  struct vfsmount *mnt, struct nfs4_acl **acl)
 {
struct inode *inode = dentry-d_inode;
int error = 0;
struct posix_acl *pacl = NULL, *dpacl = NULL;
unsigned int flags = 0;
 
-   pacl = _get_posix_acl(dentry, POSIX_ACL_XATTR_ACCESS);
+   pacl = _get_posix_acl(dentry, mnt, POSIX_ACL_XATTR_ACCESS);
if (IS_ERR(pacl)  PTR_ERR(pacl) == -ENODATA)
pacl = posix_acl_from_mode(inode-i_mode, GFP_KERNEL);
if (IS_ERR(pacl)) {
@@ -536,7 +538,7 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqst
}
 
if (S_ISDIR(inode-i_mode)) {
-   dpacl = _get_posix_acl(dentry, POSIX_ACL_XATTR_DEFAULT);
+   dpacl = _get_posix_acl(dentry, mnt, POSIX_ACL_XATTR_DEFAULT);
if (IS_ERR(dpacl)  PTR_ERR(dpacl) == -ENODATA)
dpacl = NULL;
else if (IS_ERR(dpacl)) {
@@ -2017,7 +2019,8 @@ nfsd_get_posix_acl(struct svc_fh *fhp, i
return ERR_PTR(-EOPNOTSUPP);
}
 
-   size = nfsd_getxattr(fhp-fh_dentry, name, value);
+   size = nfsd_getxattr(fhp-fh_dentry, fhp-fh_export-ex_path.mnt, name,
+value);
if (size  0)
return ERR_PTR(size);
 
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -72,7 +72,7 @@ static int copyup_xattrs(struct dentry *
 
/* Lock here since vfs_getxattr doesn't lock for us */
mutex_lock(old_lower_dentry-d_inode-i_mutex);
-   size = vfs_getxattr(old_lower_dentry, name_list,
+   size = vfs_getxattr(old_lower_dentry, old_lower_mnt, name_list,
attr_value, XATTR_SIZE_MAX);
mutex_unlock(old_lower_dentry-d_inode-i_mutex);
if (size  0) {
--- a/fs/unionfs/xattr.c
+++ b/fs/unionfs/xattr.c
@@ -43,6 +43,7 @@ ssize_t unionfs_getxattr(struct dentry *
 size_t size)
 {
struct dentry *lower_dentry = NULL;
+   struct vfsmount *lower_mnt;
int err = -EOPNOTSUPP;
 
unionfs_read_lock(dentry-d_sb);
@@ -54,8 +55,9 @@ ssize_t unionfs_getxattr(struct dentry *
}
 
lower_dentry = unionfs_lower_dentry(dentry);
+   lower_mnt = unionfs_lower_mnt(dentry);
 
-   err = vfs_getxattr(lower_dentry, (char *) name, value, size);
+   err = 

[AppArmor 24/47] Pass struct vfsmount to the inode_getxattr LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/xattr.c   |2 +-
 include/linux/security.h |   11 +++
 security/dummy.c |3 ++-
 security/security.c  |5 +++--
 security/selinux/hooks.c |3 ++-
 5 files changed, 15 insertions(+), 9 deletions(-)

--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -141,7 +141,7 @@ vfs_getxattr(struct dentry *dentry, stru
if (error)
return error;
 
-   error = security_inode_getxattr(dentry, name);
+   error = security_inode_getxattr(dentry, mnt, name);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -422,7 +422,7 @@ struct request_sock;
  * @value identified by @name for @dentry and @mnt.
  * @inode_getxattr:
  * Check permission before obtaining the extended attributes
- * identified by @name for @dentry.
+ * identified by @name for @dentry and @mnt.
  * Return 0 if permission is granted.
  * @inode_listxattr:
  * Check permission before obtaining the list of extended attribute 
@@ -1316,7 +1316,8 @@ struct security_operations {
 struct vfsmount *mnt,
 char *name, void *value,
 size_t size, int flags);
-   int (*inode_getxattr) (struct dentry *dentry, char *name);
+   int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt,
+  char *name);
int (*inode_listxattr) (struct dentry *dentry);
int (*inode_removexattr) (struct dentry *dentry, char *name);
int (*inode_need_killpriv) (struct dentry *dentry);
@@ -1587,7 +1588,8 @@ int security_inode_setxattr(struct dentr
 void security_inode_post_setxattr(struct dentry *dentry, struct vfsmount *mnt,
  char *name, void *value, size_t size,
  int flags);
-int security_inode_getxattr(struct dentry *dentry, char *name);
+int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt,
+   char *name);
 int security_inode_listxattr(struct dentry *dentry);
 int security_inode_removexattr(struct dentry *dentry, char *name);
 int security_inode_need_killpriv(struct dentry *dentry);
@@ -1988,7 +1990,8 @@ static inline void security_inode_post_s
 int flags)
 { }
 
-static inline int security_inode_getxattr (struct dentry *dentry, char *name)
+static inline int security_inode_getxattr (struct dentry *dentry,
+   struct vfsmount *mnt, char *name)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -393,7 +393,8 @@ static void dummy_inode_post_setxattr (s
 {
 }
 
-static int dummy_inode_getxattr (struct dentry *dentry, char *name)
+static int dummy_inode_getxattr (struct dentry *dentry,
+ struct vfsmount *mnt, char *name)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -478,11 +478,12 @@ void security_inode_post_setxattr(struct
security_ops-inode_post_setxattr(dentry, mnt, name, value, size, 
flags);
 }
 
-int security_inode_getxattr(struct dentry *dentry, char *name)
+int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt,
+   char *name)
 {
if (unlikely(IS_PRIVATE(dentry-d_inode)))
return 0;
-   return security_ops-inode_getxattr(dentry, name);
+   return security_ops-inode_getxattr(dentry, mnt, name);
 }
 
 int security_inode_listxattr(struct dentry *dentry)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2636,7 +2636,8 @@ static void selinux_inode_post_setxattr(
return;
 }
 
-static int selinux_inode_getxattr (struct dentry *dentry, char *name)
+static int selinux_inode_getxattr (struct dentry *dentry, struct vfsmount *mnt,
+  char *name)
 {
return dentry_has_perm(current, NULL, dentry, FILE__GETATTR);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 25/47] Add a struct vfsmount parameter to vfs_listxattr()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/unionfs/copyup.c   |5 +++--
 fs/unionfs/xattr.c|4 +++-
 fs/xattr.c|   25 ++---
 include/linux/xattr.h |2 +-
 4 files changed, 21 insertions(+), 15 deletions(-)

--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -37,7 +37,7 @@ static int copyup_xattrs(struct dentry *
char *name_list_buf = NULL;
 
/* query the actual size of the xattr list */
-   list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
+   list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, NULL, 0);
if (list_size = 0) {
err = list_size;
goto out;
@@ -53,7 +53,8 @@ static int copyup_xattrs(struct dentry *
name_list_buf = name_list; /* save for kfree at end */
 
/* now get the actual xattr list of the source file */
-   list_size = vfs_listxattr(old_lower_dentry, name_list, list_size);
+   list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, name_list,
+ list_size);
if (list_size = 0) {
err = list_size;
goto out;
--- a/fs/unionfs/xattr.c
+++ b/fs/unionfs/xattr.c
@@ -134,6 +134,7 @@ out:
 ssize_t unionfs_listxattr(struct dentry *dentry, char *list, size_t size)
 {
struct dentry *lower_dentry = NULL;
+   struct vfsmount *lower_mnt;
int err = -EOPNOTSUPP;
char *encoded_list = NULL;
 
@@ -146,9 +147,10 @@ ssize_t unionfs_listxattr(struct dentry 
}
 
lower_dentry = unionfs_lower_dentry(dentry);
+   lower_mnt = unionfs_lower_mnt(dentry);
 
encoded_list = list;
-   err = vfs_listxattr(lower_dentry, encoded_list, size);
+   err = vfs_listxattr(lower_dentry, lower_mnt, encoded_list, size);
 
 out:
unionfs_check_dentry(dentry);
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -168,18 +168,20 @@ nolsm:
 EXPORT_SYMBOL_GPL(vfs_getxattr);
 
 ssize_t
-vfs_listxattr(struct dentry *d, char *list, size_t size)
+vfs_listxattr(struct dentry *dentry, struct vfsmount *mnt, char *list,
+ size_t size)
 {
+   struct inode *inode = dentry-d_inode;
ssize_t error;
 
-   error = security_inode_listxattr(d);
+   error = security_inode_listxattr(dentry);
if (error)
return error;
error = -EOPNOTSUPP;
-   if (d-d_inode-i_op  d-d_inode-i_op-listxattr) {
-   error = d-d_inode-i_op-listxattr(d, list, size);
-   } else {
-   error = security_inode_listsecurity(d-d_inode, list, size);
+   if (inode-i_op  inode-i_op-listxattr)
+   error = inode-i_op-listxattr(dentry, list, size);
+   else {
+   error = security_inode_listsecurity(inode, list, size);
if (size  error  size)
error = -ERANGE;
}
@@ -400,7 +402,8 @@ sys_fgetxattr(int fd, char __user *name,
  * Extended attribute LIST operations
  */
 static ssize_t
-listxattr(struct dentry *d, char __user *list, size_t size)
+listxattr(struct dentry *dentry, struct vfsmount *mnt, char __user *list,
+ size_t size)
 {
ssize_t error;
char *klist = NULL;
@@ -413,7 +416,7 @@ listxattr(struct dentry *d, char __user 
return -ENOMEM;
}
 
-   error = vfs_listxattr(d, klist, size);
+   error = vfs_listxattr(dentry, mnt, klist, size);
if (error  0) {
if (size  copy_to_user(list, klist, error))
error = -EFAULT;
@@ -435,7 +438,7 @@ sys_listxattr(char __user *path, char __
error = user_path_walk(path, nd);
if (error)
return error;
-   error = listxattr(nd.path.dentry, list, size);
+   error = listxattr(nd.path.dentry, nd.path.mnt, list, size);
path_put(nd.path);
return error;
 }
@@ -449,7 +452,7 @@ sys_llistxattr(char __user *path, char _
error = user_path_walk_link(path, nd);
if (error)
return error;
-   error = listxattr(nd.path.dentry, list, size);
+   error = listxattr(nd.path.dentry, nd.path.mnt, list, size);
path_put(nd.path);
return error;
 }
@@ -464,7 +467,7 @@ sys_flistxattr(int fd, char __user *list
if (!f)
return error;
audit_inode(NULL, f-f_path.dentry);
-   error = listxattr(f-f_path.dentry, list, size);
+   error = listxattr(f-f_path.dentry, f-f_path.mnt, list, size);
fput(f);
return error;
 }
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -48,7 +48,7 @@ struct xattr_handler {
 
 ssize_t xattr_getsecurity(struct inode *, const char *, void *, size_t);
 ssize_t vfs_getxattr(struct dentry *, struct vfsmount *, char *, void *, 
size_t);
-ssize_t vfs_listxattr(struct 

[AppArmor 26/47] Pass struct vfsmount to the inode_listxattr LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/xattr.c   |2 +-
 include/linux/security.h |9 +
 security/dummy.c |2 +-
 security/security.c  |4 ++--
 security/selinux/hooks.c |2 +-
 5 files changed, 10 insertions(+), 9 deletions(-)

--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -174,7 +174,7 @@ vfs_listxattr(struct dentry *dentry, str
struct inode *inode = dentry-d_inode;
ssize_t error;
 
-   error = security_inode_listxattr(dentry);
+   error = security_inode_listxattr(dentry, mnt);
if (error)
return error;
error = -EOPNOTSUPP;
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -426,7 +426,7 @@ struct request_sock;
  * Return 0 if permission is granted.
  * @inode_listxattr:
  * Check permission before obtaining the list of extended attribute 
- * names for @dentry.
+ * names for @dentry and @mnt.
  * Return 0 if permission is granted.
  * @inode_removexattr:
  * Check permission before removing the extended attribute
@@ -1318,7 +1318,7 @@ struct security_operations {
 size_t size, int flags);
int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt,
   char *name);
-   int (*inode_listxattr) (struct dentry *dentry);
+   int (*inode_listxattr) (struct dentry *dentry, struct vfsmount *mnt);
int (*inode_removexattr) (struct dentry *dentry, char *name);
int (*inode_need_killpriv) (struct dentry *dentry);
int (*inode_killpriv) (struct dentry *dentry);
@@ -1590,7 +1590,7 @@ void security_inode_post_setxattr(struct
  int flags);
 int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt,
char *name);
-int security_inode_listxattr(struct dentry *dentry);
+int security_inode_listxattr(struct dentry *dentry, struct vfsmount *mnt);
 int security_inode_removexattr(struct dentry *dentry, char *name);
 int security_inode_need_killpriv(struct dentry *dentry);
 int security_inode_killpriv(struct dentry *dentry);
@@ -1996,7 +1996,8 @@ static inline int security_inode_getxatt
return 0;
 }
 
-static inline int security_inode_listxattr (struct dentry *dentry)
+static inline int security_inode_listxattr (struct dentry *dentry,
+   struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -399,7 +399,7 @@ static int dummy_inode_getxattr (struct 
return 0;
 }
 
-static int dummy_inode_listxattr (struct dentry *dentry)
+static int dummy_inode_listxattr (struct dentry *dentry, struct vfsmount *mnt)
 {
return 0;
 }
--- a/security/security.c
+++ b/security/security.c
@@ -486,11 +486,11 @@ int security_inode_getxattr(struct dentr
return security_ops-inode_getxattr(dentry, mnt, name);
 }
 
-int security_inode_listxattr(struct dentry *dentry)
+int security_inode_listxattr(struct dentry *dentry, struct vfsmount *mnt)
 {
if (unlikely(IS_PRIVATE(dentry-d_inode)))
return 0;
-   return security_ops-inode_listxattr(dentry);
+   return security_ops-inode_listxattr(dentry, mnt);
 }
 
 int security_inode_removexattr(struct dentry *dentry, char *name)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2642,7 +2642,7 @@ static int selinux_inode_getxattr (struc
return dentry_has_perm(current, NULL, dentry, FILE__GETATTR);
 }
 
-static int selinux_inode_listxattr (struct dentry *dentry)
+static int selinux_inode_listxattr (struct dentry *dentry, struct vfsmount 
*mnt)
 {
return dentry_has_perm(current, NULL, dentry, FILE__GETATTR);
 }

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-20 Thread Björn Steinbrink
On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote:
 
 
 On Sun, 16 Dec 2007, Krzysztof Oledzki wrote:
  
  I'll confirm this tomorrow but it seems that even switching to data=ordered
  (AFAIK default o ext3) is indeed enough to cure this problem.
 
 Ok, do we actually have any ext3 expert following this? I have no idea 
 about what the journalling code does, but I have painful memories of ext3 
 doing really odd buffer-head-based IO and totally bypassing all the normal 
 page dirty logic.
 
 Judging by the symptoms (sorry for not following this well, it came up 
 while I was mostly away travelling), something probably *does* clear the 
 dirty bit on the pages, but the dirty *accounting* is not done properly, 
 so the kernel keeps thinking it has dirty pages.
 
 Now, a simple grep shows that ext3 does not actually do any 
 ClearPageDirty() or similar on its own, although maybe I missed some other 
 subtle way this can happen. And the *normal* VFS routines that do 
 ClearPageDirty should all be doing the proper accounting.
 
 So I see a couple of possible cases:
 
  - actually clearing the PG_dirty bit somehow, without doing the 
accounting.
 
This looks very unlikely. PG_dirty is always cleared by some variant of 
*ClearPageDirty(), and that bit definition isn't used for anything 
else in the whole kernel judging by grep (the page allocator tests 
the bit, that's it).

OK, so I looked for PG_dirty anyway.

In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
bail out if the page is dirty.

Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
truncate_complete_page, because it called cancel_dirty_page (and thus
cleared PG_dirty) after try_to_free_buffers was called via
do_invalidatepage.

Now, if I'm not mistaken, we can end up as follows.

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

If journal_unmap_buffer then returns 0, try_to_free_buffers is not
called and neither is cancel_dirty_page, so the dirty pages accounting
is not decreased again.

As try_to_free_buffers got its ext3 hack back in
ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
the accounting fix in cancel_dirty_page, of course).


On a side note, before 8368e328dfe1c534957051333a87b3210a12743b the task
io accounting for cancelled writes happened always happened if the page
was dirty, regardless of page-mapping. This was also already true for
the old test_clear_page_dirty code, and the commit log for
8368e328dfe1c534957051333a87b3210a12743b doesn't mention that semantic
change either, so maybe the if (account_size) block should be moved
out of the if (mapping  ...) block?

Björn - not sending patches because he needs sleep and wouldn't have a
damn clue about what to write as a commit message anyway
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 27/47] Add a struct vfsmount parameter to vfs_removexattr()

2007-12-20 Thread John
The vfsmount will be passed down to the LSM hook so that LSMs can compute
pathnames.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/nfsd/vfs.c |7 ---
 fs/unionfs/xattr.c|4 +++-
 fs/xattr.c|   12 ++--
 include/linux/xattr.h |2 +-
 4 files changed, 14 insertions(+), 11 deletions(-)

--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -2032,6 +2032,7 @@ nfsd_get_posix_acl(struct svc_fh *fhp, i
 int
 nfsd_set_posix_acl(struct svc_fh *fhp, int type, struct posix_acl *acl)
 {
+   struct vfsmount *mnt;
struct inode *inode = fhp-fh_dentry-d_inode;
char *name;
void *value = NULL;
@@ -2064,14 +2065,14 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i
} else
size = 0;
 
+   mnt = fhp-fh_export-ex_path.mnt;
if (size)
-   error = vfs_setxattr(fhp-fh_dentry, 
fhp-fh_export-ex_path.mnt,
-name, value, size,0);
+   error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0);
else {
if (!S_ISDIR(inode-i_mode)  type == ACL_TYPE_DEFAULT)
error = 0;
else {
-   error = vfs_removexattr(fhp-fh_dentry, name);
+   error = vfs_removexattr(fhp-fh_dentry, mnt, name);
if (error == -ENODATA)
error = 0;
}
--- a/fs/unionfs/xattr.c
+++ b/fs/unionfs/xattr.c
@@ -106,6 +106,7 @@ out:
 int unionfs_removexattr(struct dentry *dentry, const char *name)
 {
struct dentry *lower_dentry = NULL;
+   struct vfsmount *lower_mnt;
int err = -EOPNOTSUPP;
 
unionfs_read_lock(dentry-d_sb);
@@ -117,8 +118,9 @@ int unionfs_removexattr(struct dentry *d
}
 
lower_dentry = unionfs_lower_dentry(dentry);
+   lower_mnt = unionfs_lower_mnt(dentry);
 
-   err = vfs_removexattr(lower_dentry, (char *) name);
+   err = vfs_removexattr(lower_dentry, lower_mnt, (char *) name);
 
 out:
unionfs_check_dentry(dentry);
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -190,7 +190,7 @@ vfs_listxattr(struct dentry *dentry, str
 EXPORT_SYMBOL_GPL(vfs_listxattr);
 
 int
-vfs_removexattr(struct dentry *dentry, char *name)
+vfs_removexattr(struct dentry *dentry, struct vfsmount *mnt, char *name)
 {
struct inode *inode = dentry-d_inode;
int error;
@@ -476,7 +476,7 @@ sys_flistxattr(int fd, char __user *list
  * Extended attribute REMOVE operations
  */
 static long
-removexattr(struct dentry *d, char __user *name)
+removexattr(struct dentry *dentry, struct vfsmount *mnt, char __user *name)
 {
int error;
char kname[XATTR_NAME_MAX + 1];
@@ -487,7 +487,7 @@ removexattr(struct dentry *d, char __use
if (error  0)
return error;
 
-   return vfs_removexattr(d, kname);
+   return vfs_removexattr(dentry, mnt, kname);
 }
 
 asmlinkage long
@@ -499,7 +499,7 @@ sys_removexattr(char __user *path, char 
error = user_path_walk(path, nd);
if (error)
return error;
-   error = removexattr(nd.path.dentry, name);
+   error = removexattr(nd.path.dentry, nd.path.mnt, name);
path_put(nd.path);
return error;
 }
@@ -513,7 +513,7 @@ sys_lremovexattr(char __user *path, char
error = user_path_walk_link(path, nd);
if (error)
return error;
-   error = removexattr(nd.path.dentry, name);
+   error = removexattr(nd.path.dentry, nd.path.mnt, name);
path_put(nd.path);
return error;
 }
@@ -530,7 +530,7 @@ sys_fremovexattr(int fd, char __user *na
return error;
dentry = f-f_path.dentry;
audit_inode(NULL, dentry);
-   error = removexattr(dentry, name);
+   error = removexattr(dentry, f-f_path.mnt, name);
fput(f);
return error;
 }
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -51,7 +51,7 @@ ssize_t vfs_getxattr(struct dentry *, st
 ssize_t vfs_listxattr(struct dentry *d, struct vfsmount *, char *list, size_t 
size);
 int vfs_setxattr(struct dentry *, struct vfsmount *, char *, void *, size_t,
 int);
-int vfs_removexattr(struct dentry *, char *);
+int vfs_removexattr(struct dentry *, struct vfsmount *, char *);
 
 ssize_t generic_getxattr(struct dentry *dentry, const char *name, void 
*buffer, size_t size);
 ssize_t generic_listxattr(struct dentry *dentry, char *buffer, size_t 
buffer_size);

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 28/47] Pass struct vfsmount to the inode_removexattr LSM hook

2007-12-20 Thread John
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/xattr.c   |2 +-
 include/linux/security.h |   13 -
 security/commoncap.c |3 ++-
 security/dummy.c |3 ++-
 security/security.c  |5 +++--
 security/selinux/hooks.c |3 ++-
 6 files changed, 18 insertions(+), 11 deletions(-)

--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -202,7 +202,7 @@ vfs_removexattr(struct dentry *dentry, s
if (error)
return error;
 
-   error = security_inode_removexattr(dentry, name);
+   error = security_inode_removexattr(dentry, mnt, name);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -58,7 +58,7 @@ extern int cap_bprm_set_security (struct
 extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe);
 extern int cap_bprm_secureexec(struct linux_binprm *bprm);
 extern int cap_inode_setxattr(struct dentry *dentry, struct vfsmount *mnt, 
char *name, void *value, size_t size, int flags);
-extern int cap_inode_removexattr(struct dentry *dentry, char *name);
+extern int cap_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt, 
char *name);
 extern int cap_inode_need_killpriv(struct dentry *dentry);
 extern int cap_inode_killpriv(struct dentry *dentry);
 extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t 
old_suid, int flags);
@@ -1319,7 +1319,8 @@ struct security_operations {
int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt,
   char *name);
int (*inode_listxattr) (struct dentry *dentry, struct vfsmount *mnt);
-   int (*inode_removexattr) (struct dentry *dentry, char *name);
+   int (*inode_removexattr) (struct dentry *dentry, struct vfsmount *mnt,
+ char *name);
int (*inode_need_killpriv) (struct dentry *dentry);
int (*inode_killpriv) (struct dentry *dentry);
int (*inode_getsecurity)(const struct inode *inode, const char *name, 
void **buffer, bool alloc);
@@ -1591,7 +1592,8 @@ void security_inode_post_setxattr(struct
 int security_inode_getxattr(struct dentry *dentry, struct vfsmount *mnt,
char *name);
 int security_inode_listxattr(struct dentry *dentry, struct vfsmount *mnt);
-int security_inode_removexattr(struct dentry *dentry, char *name);
+int security_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt,
+  char *name);
 int security_inode_need_killpriv(struct dentry *dentry);
 int security_inode_killpriv(struct dentry *dentry);
 int security_inode_getsecurity(const struct inode *inode, const char *name, 
void **buffer, bool alloc);
@@ -2002,9 +2004,10 @@ static inline int security_inode_listxat
return 0;
 }
 
-static inline int security_inode_removexattr (struct dentry *dentry, char 
*name)
+static inline int security_inode_removexattr (struct dentry *dentry,
+ struct vfsmount *mnt, char *name)
 {
-   return cap_inode_removexattr(dentry, name);
+   return cap_inode_removexattr(dentry, mnt, name);
 }
 
 static inline int security_inode_need_killpriv(struct dentry *dentry)
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -393,7 +393,8 @@ int cap_inode_setxattr(struct dentry *de
return 0;
 }
 
-int cap_inode_removexattr(struct dentry *dentry, char *name)
+int cap_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt,
+ char *name)
 {
if (!strcmp(name, XATTR_NAME_CAPS)) {
if (!capable(CAP_SETFCAP))
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -404,7 +404,8 @@ static int dummy_inode_listxattr (struct
return 0;
 }
 
-static int dummy_inode_removexattr (struct dentry *dentry, char *name)
+static int dummy_inode_removexattr (struct dentry *dentry, struct vfsmount 
*mnt,
+   char *name)
 {
if (!strncmp(name, XATTR_SECURITY_PREFIX,
 sizeof(XATTR_SECURITY_PREFIX) - 1) 
--- a/security/security.c
+++ b/security/security.c
@@ -493,11 +493,12 @@ int security_inode_listxattr(struct dent
return security_ops-inode_listxattr(dentry, mnt);
 }
 
-int security_inode_removexattr(struct dentry *dentry, char *name)
+int security_inode_removexattr(struct dentry *dentry, struct vfsmount *mnt,
+  char *name)
 {
if (unlikely(IS_PRIVATE(dentry-d_inode)))
return 0;
-   return security_ops-inode_removexattr(dentry, name);
+   return security_ops-inode_removexattr(dentry, mnt, name);
 }
 
 int security_inode_need_killpriv(struct dentry *dentry)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2647,7 +2647,8 @@ static int 

[AppArmor 29/47] Fix __d_path() for lazy unmounts and make it unambiguous

2007-12-20 Thread John
First, when __d_path() hits a lazily unmounted mount point, it tries to prepend
the name of the lazily unmounted dentry to the path name.  It gets this wrong,
and also overwrites the slash that separates the name from the following
pathname component. This patch fixes that; if a process was in directory
/foo/bar and /foo got lazily unmounted, the old result was ``foobar'' (note the
missing slash), while the new result with this patch is ``foo/bar''.

Second, it isn't always possible to tell from the __d_path() result whether the
specified root and rootmnt (i.e., the chroot) was reached.  We need an
unambiguous result for AppArmor at least though, so we make sure that paths
will only start with a slash if the path leads all the way up to the root.

We also add a @fail_deleted argument, which allows to get rid of some of the
mess in sys_getcwd().

This patch leaves getcwd() and d_path() as they were before for everything
except for bind-mounted directories; for them, it reports ``/foo/bar'' instead
of ``foobar'' in the example described above.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]
Acked-by: Alan Cox [EMAIL PROTECTED]

---
 fs/dcache.c |  166 ++--
 1 file changed, 96 insertions(+), 70 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1748,51 +1748,49 @@ shouldnt_be_hashed:
 }
 
 /**
- * d_path - return the path of a dentry
- * @dentry: dentry to report
- * @vfsmnt: vfsmnt to which the dentry belongs
- * @root: root dentry
- * @rootmnt: vfsmnt to which the root dentry belongs
+ * __d_path - return the path of a dentry
+ * @path: path to report
+ * @root: root path
  * @buffer: buffer to return value in
  * @buflen: buffer length
+ * @fail_deleted: what to return for deleted files
  *
- * Convert a dentry into an ASCII path name. If the entry has been deleted
+ * If @path is not connected to @root, the path returned will be relative
+ * (i.e., it will not start with a slash).
  * the string  (deleted) is appended. Note that this is ambiguous.
  *
  * Returns the buffer or an error code if the path was too long.
  *
- * buflen should be positive. Caller holds the dcache_lock.
+ * Returns the buffer or an error code.
  */
-static char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
-  struct path *root, char *buffer, int buflen)
+static char *__d_path(struct path *path, struct path *root,
+ char *buffer, int buflen, int fail_deleted)
 {
-   char * end = buffer+buflen;
-   char * retval;
-   int namelen;
+   int namelen, is_slash;
+   struct dentry *dentry = path-dentry;
+   struct vfsmount *vfsmnt = path-mnt;
+
+   if (buflen  2)
+   return ERR_PTR(-ENAMETOOLONG);
+   buffer += --buflen;
+   *buffer = '\0';
 
-   *--end = '\0';
-   buflen--;
+   spin_lock(dcache_lock);
if (!IS_ROOT(dentry)  d_unhashed(dentry)) {
-   buflen -= 10;
-   end -= 10;
-   if (buflen  0)
+   if (fail_deleted) {
+   buffer = ERR_PTR(-ENOENT);
+   goto out;
+   }
+   if (buflen  10)
goto Elong;
-   memcpy(end,  (deleted), 10);
+   buflen -= 10;
+   buffer -= 10;
+   memcpy(buffer,  (deleted), 10);
}
-
-   if (buflen  1)
-   goto Elong;
-   /* Get '/' right */
-   retval = end-1;
-   *retval = '/';
-
-   for (;;) {
+   while (dentry != root-dentry || vfsmnt != root-mnt) {
struct dentry * parent;
 
-   if (dentry == root-dentry  vfsmnt == root-mnt)
-   break;
if (dentry == vfsmnt-mnt_root || IS_ROOT(dentry)) {
-   /* Global root? */
spin_lock(vfsmount_lock);
if (vfsmnt-mnt_parent == vfsmnt) {
spin_unlock(vfsmount_lock);
@@ -1806,28 +1804,67 @@ static char *__d_path(struct dentry *den
parent = dentry-d_parent;
prefetch(parent);
namelen = dentry-d_name.len;
-   buflen -= namelen + 1;
-   if (buflen  0)
+   if (buflen  namelen + 1)
goto Elong;
-   end -= namelen;
-   memcpy(end, dentry-d_name.name, namelen);
-   *--end = '/';
-   retval = end;
+   buflen -= namelen + 1;
+   buffer -= namelen;
+   memcpy(buffer, dentry-d_name.name, namelen);
+   *--buffer = '/';
dentry = parent;
}
+   /* Get '/' right. */
+   if (*buffer != '/')
+   *--buffer = '/';
 
-   return retval;
+out:
+   spin_unlock(dcache_lock);
+   return buffer;
 
 global_root:
+   /*
+* We went 

[AppArmor 30/47] Make d_path() consistent across mount operations

2007-12-20 Thread John
The path that __d_path() computes can become slightly inconsistent when it
races with mount operations: it grabs the vfsmount_lock when traversing mount
points but immediately drops it again, only to re-grab it when it reaches the
next mount point.  The result is that the filename computed is not always
consisent, and the file may never have had that name. (This is unlikely, but
still possible.)

Fix this by grabbing the vfsmount_lock when the first mount point is reached,
and holding onto it until the d_cache lookup is completed.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/dcache.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1766,7 +1766,7 @@ shouldnt_be_hashed:
 static char *__d_path(struct path *path, struct path *root,
  char *buffer, int buflen, int fail_deleted)
 {
-   int namelen, is_slash;
+int namelen, is_slash, vfsmount_locked = 0;
struct dentry *dentry = path-dentry;
struct vfsmount *vfsmnt = path-mnt;
 
@@ -1791,14 +1791,14 @@ static char *__d_path(struct path *path,
struct dentry * parent;
 
if (dentry == vfsmnt-mnt_root || IS_ROOT(dentry)) {
-   spin_lock(vfsmount_lock);
-   if (vfsmnt-mnt_parent == vfsmnt) {
-   spin_unlock(vfsmount_lock);
-   goto global_root;
+   if (!vfsmount_locked) {
+   spin_lock(vfsmount_lock);
+   vfsmount_locked = 1;
}
+   if (vfsmnt-mnt_parent == vfsmnt)
+   goto global_root;
dentry = vfsmnt-mnt_mountpoint;
vfsmnt = vfsmnt-mnt_parent;
-   spin_unlock(vfsmount_lock);
continue;
}
parent = dentry-d_parent;
@@ -1817,6 +1817,8 @@ static char *__d_path(struct path *path,
*--buffer = '/';
 
 out:
+   if (vfsmount_locked)
+   spin_unlock(vfsmount_lock);
spin_unlock(dcache_lock);
return buffer;
 

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 31/47] Add d_namespace_path() to compute namespace relative pathnames

2007-12-20 Thread John
In AppArmor, we are interested in pathnames relative to the namespace root.
This is the same as d_path() except for the root where the search ends. Add
a function for computing the namespace-relative path.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/dcache.c|4 ++--
 fs/namespace.c |   26 ++
 include/linux/dcache.h |1 +
 include/linux/mount.h  |3 +++
 4 files changed, 32 insertions(+), 2 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1763,8 +1763,8 @@ shouldnt_be_hashed:
  *
  * Returns the buffer or an error code.
  */
-static char *__d_path(struct path *path, struct path *root,
- char *buffer, int buflen, int fail_deleted)
+char *__d_path(struct path *path, struct path *root, char *buffer, int buflen,
+  int fail_deleted)
 {
 int namelen, is_slash, vfsmount_locked = 0;
struct dentry *dentry = path-dentry;
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -26,6 +26,7 @@
 #include linux/security.h
 #include linux/mount.h
 #include linux/ramfs.h
+#include linux/path.h
 #include asm/uaccess.h
 #include asm/unistd.h
 #include pnode.h
@@ -2167,3 +2168,28 @@ void __put_mnt_ns(struct mnt_namespace *
release_mounts(umount_list);
kfree(ns);
 }
+
+char *d_namespace_path(struct path *path, char *buf, int buflen)
+{
+   struct path root = { NULL, NULL };
+   struct vfsmount *rootmnt;
+   char *res;
+
+   read_lock(current-fs-lock);
+   rootmnt = mntget(current-fs-root.mnt);
+   read_unlock(current-fs-lock);
+   spin_lock(vfsmount_lock);
+   if (rootmnt-mnt_ns)
+   root.mnt = mntget(rootmnt-mnt_ns-root);
+   spin_unlock(vfsmount_lock);
+   mntput(rootmnt);
+   if (root.mnt)
+   root.dentry = dget(root.mnt-mnt_root);
+   res = __d_path(path, root, buf, buflen, 1);
+   path_put(root);
+   /* Prevent empty path for lazily unmounted filesystems. */
+   if (!IS_ERR(res)  *res == '\0')
+   *--res = '.';
+   return res;
+}
+EXPORT_SYMBOL(d_namespace_path);
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -302,6 +302,7 @@ extern int d_validate(struct dentry *, s
 extern char *dynamic_dname(struct dentry *, char *, int, const char *, ...);
 
 extern char *d_path(struct path *, char *, int);
+extern char *__d_path(struct path *, struct path *, char *, int, int);
 
 /* Allocation counts.. */
 
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -16,6 +16,7 @@
 #include linux/list.h
 #include linux/nodemask.h
 #include linux/spinlock.h
+#include linux/path.h
 #include asm/atomic.h
 
 struct super_block;
@@ -114,5 +115,7 @@ extern void shrink_submounts(struct vfsm
 extern spinlock_t vfsmount_lock;
 extern dev_t name_to_dev_t(char *name);
 
+extern char *d_namespace_path(struct path *, char *, int);
+
 #endif
 #endif /* _LINUX_MOUNT_H */

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 32/47] From: Miklos Szeredi [EMAIL PROTECTED]

2007-12-20 Thread John
Add a new file operation: f_op-fgetattr(), that is invoked by
fstat().  Fall back to i_op-getattr() if it is not defined.

We need this because fstat() semantics can in some cases be better
implemented if the filesystem has the open file available.

Let's take the following example: we have a network filesystem, with
the server implemented as an unprivileged userspace process running on
a UNIX system (this is basically what sshfs does).

We want the filesystem to follow the familiar UNIX file semantics as
closely as possible.  If for example we have this sequence of events,
we still would like fstat to work correctly:

 1) file X is opened on client
 2) file X is renamed to Y on server
 3) fstat() is performed on open file descriptor on client

This is only possible if the filesystem server acutally uses fstat()
on a file descriptor obtained when the file was opened.  Which means,
the filesystem client needs a way to get this information from the
VFS.

Even if we assume, that the remote filesystem never changes, it is
difficult to implement open-unlink-fstat semantics correctly in the
client, without having this information.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---

---
 fs/fuse/file.c |   13 +
 fs/stat.c  |   29 -
 include/linux/fs.h |1 +
 3 files changed, 42 insertions(+), 1 deletion(-)

--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -871,6 +871,17 @@ static int fuse_file_flock(struct file *
return err;
 }
 
+static int fuse_file_fgetattr(struct file *file, struct kstat *stat)
+{
+   struct inode *inode = file-f_dentry-d_inode;
+   struct fuse_conn *fc = get_fuse_conn(inode);
+
+   if (!fuse_allow_task(fc, current))
+   return -EACCES;
+
+   return fuse_update_attributes(inode, stat, file, NULL);
+}
+
 static sector_t fuse_bmap(struct address_space *mapping, sector_t block)
 {
struct inode *inode = mapping-host;
@@ -920,6 +931,7 @@ static const struct file_operations fuse
.fsync  = fuse_fsync,
.lock   = fuse_file_lock,
.flock  = fuse_file_flock,
+   .fgetattr   = fuse_file_fgetattr,
.splice_read= generic_file_splice_read,
 };
 
@@ -933,6 +945,7 @@ static const struct file_operations fuse
.fsync  = fuse_fsync,
.lock   = fuse_file_lock,
.flock  = fuse_file_flock,
+   .fgetattr   = fuse_file_fgetattr,
/* no mmap and splice_read */
 };
 
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -55,6 +55,33 @@ int vfs_getattr(struct vfsmount *mnt, st
 
 EXPORT_SYMBOL(vfs_getattr);
 
+/*
+ * Perform getattr on an open file
+ *
+ * Fall back to i_op-getattr (or generic_fillattr) if the filesystem
+ * doesn't define an f_op-fgetattr operation.
+ */
+static int vfs_fgetattr(struct file *file, struct kstat *stat)
+{
+   struct vfsmount *mnt = file-f_path.mnt;
+   struct dentry *dentry = file-f_path.dentry;
+   struct inode *inode = dentry-d_inode;
+   int retval;
+
+   retval = security_inode_getattr(mnt, dentry);
+   if (retval)
+   return retval;
+
+   if (file-f_op  file-f_op-fgetattr) {
+   return file-f_op-fgetattr(file, stat);
+   } else if (inode-i_op-getattr) {
+   return inode-i_op-getattr(mnt, dentry, stat);
+   } else {
+   generic_fillattr(inode, stat);
+   return 0;
+   }
+}
+
 int vfs_stat_fd(int dfd, char __user *name, struct kstat *stat)
 {
struct nameidata nd;
@@ -101,7 +128,7 @@ int vfs_fstat(unsigned int fd, struct ks
int error = -EBADF;
 
if (f) {
-   error = vfs_getattr(f-f_path.mnt, f-f_path.dentry, stat);
+   error = vfs_fgetattr(f, stat);
fput(f);
}
return error;
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1190,6 +1190,7 @@ struct file_operations {
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t 
*, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info 
*, size_t, unsigned int);
int (*setlease)(struct file *, long, struct file_lock **);
+   int (*fgetattr)(struct file *, struct kstat *);
 };
 
 struct inode_operations {

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 33/47] VFS: new fsetattr() file operation

2007-12-20 Thread John
From: Miklos Szeredi [EMAIL PROTECTED]

Add a new file operation: f_op-fsetattr(), that is invoked by
ftruncate, fchmod, fchown and utimensat.  Fall back to i_op-setattr()
if it is not defined.

For the reasons why we need this, see patch adding fgetattr().

ftruncate() already passed the open file to the filesystem via the
ia_file member of struct iattr.  However it is cleaner to have a
separate file operation for this, so remove ia_file and convert
existing users: fuse, AFS, and reiserfs4.  Use ATTR_FILE to allow
LSMs to distinguish file descriptor based operations.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/afs/dir.c |1 +
 fs/afs/file.c|1 +
 fs/afs/inode.c   |   19 +++
 fs/afs/internal.h|1 +
 fs/attr.c|   18 ++
 fs/fuse/dir.c|   20 +---
 fs/fuse/file.c   |7 +++
 fs/fuse/fuse_i.h |4 
 fs/open.c|   25 +
 fs/reiser4/plugin/file/cryptcompress.c   |   12 +++-
 fs/reiser4/plugin/file/file.c|   27 ---
 fs/reiser4/plugin/file/file.h|3 +++
 fs/reiser4/plugin/file/file_conversion.c |7 +++
 fs/reiser4/plugin/object.c   |5 -
 fs/reiser4/plugin/plugin.h   |1 +
 fs/utimes.c  |2 +-
 include/linux/fs.h   |9 ++---
 17 files changed, 114 insertions(+), 48 deletions(-)

--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -45,6 +45,7 @@ const struct file_operations afs_dir_fil
.release= afs_release,
.readdir= afs_readdir,
.lock   = afs_lock,
+   .fsetattr   = afs_fsetattr,
 };
 
 const struct inode_operations afs_dir_inode_operations = {
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -36,6 +36,7 @@ const struct file_operations afs_file_op
.fsync  = afs_fsync,
.lock   = afs_lock,
.flock  = afs_flock,
+   .fsetattr   = afs_fsetattr,
 };
 
 const struct inode_operations afs_file_inode_operations = {
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -358,7 +358,8 @@ void afs_clear_inode(struct inode *inode
 /*
  * set the attributes of an inode
  */
-int afs_setattr(struct dentry *dentry, struct iattr *attr)
+static int afs_do_setattr(struct dentry *dentry, struct iattr *attr,
+  struct file *file)
 {
struct afs_vnode *vnode = AFS_FS_I(dentry-d_inode);
struct key *key;
@@ -380,8 +381,8 @@ int afs_setattr(struct dentry *dentry, s
afs_writeback_all(vnode);
}
 
-   if (attr-ia_valid  ATTR_FILE) {
-   key = attr-ia_file-private_data;
+   if (file) {
+   key = file-private_data;
} else {
key = afs_request_key(vnode-volume-cell);
if (IS_ERR(key)) {
@@ -391,10 +392,20 @@ int afs_setattr(struct dentry *dentry, s
}
 
ret = afs_vnode_setattr(vnode, key, attr);
-   if (!(attr-ia_valid  ATTR_FILE))
+   if (!file)
key_put(key);
 
 error:
_leave( = %d, ret);
return ret;
 }
+
+int afs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+   return afs_do_setattr(dentry, attr, NULL);
+}
+
+int afs_fsetattr(struct file *file, struct iattr *attr)
+{
+   return afs_do_setattr(file-f_path.dentry, attr, file);
+}
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -550,6 +550,7 @@ extern void afs_zap_data(struct afs_vnod
 extern int afs_validate(struct afs_vnode *, struct key *);
 extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
 extern int afs_setattr(struct dentry *, struct iattr *);
+extern int afs_fsetattr(struct file *, struct iattr *);
 extern void afs_clear_inode(struct inode *);
 
 /*
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -100,8 +100,8 @@ int inode_setattr(struct inode * inode, 
 }
 EXPORT_SYMBOL(inode_setattr);
 
-int notify_change(struct dentry *dentry, struct vfsmount *mnt,
- struct iattr *attr)
+int fnotify_change(struct dentry *dentry, struct vfsmount *mnt,
+  struct iattr *attr, struct file *file)
 {
struct inode *inode = dentry-d_inode;
mode_t mode = inode-i_mode;
@@ -160,8 +160,12 @@ int notify_change(struct dentry *dentry,
 
if (inode-i_op  inode-i_op-setattr) {
error = security_inode_setattr(dentry, mnt, attr);
-   if (!error)
-   error = inode-i_op-setattr(dentry, attr);
+   if (!error) {
+   if (file  file-f_op  file-f_op-fsetattr)
+   error = file-f_op-fsetattr(file, attr);
+   else
+ 

[AppArmor 34/47] Pass struct file down the inode_*xattr security LSM hooks

2007-12-20 Thread John
This allows LSMs to also distinguish between file descriptor and path
access for the xattr operations. (The other relevant operations are
covered by the setattr hook.)

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/nfsd/vfs.c|   12 +
 fs/unionfs/copyup.c  |   12 +
 fs/unionfs/xattr.c   |9 +++
 fs/xattr.c   |   60 +--
 include/linux/security.h |   40 ++-
 include/linux/xattr.h|   10 ---
 security/commoncap.c |4 +--
 security/dummy.c |   10 ---
 security/security.c  |   21 +---
 security/selinux/hooks.c |   10 ---
 10 files changed, 107 insertions(+), 81 deletions(-)

--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -413,7 +413,7 @@ static ssize_t nfsd_getxattr(struct dent
 {
ssize_t buflen;
 
-   buflen = vfs_getxattr(dentry, mnt, key, NULL, 0);
+   buflen = vfs_getxattr(dentry, mnt, key, NULL, 0, NULL);
if (buflen = 0)
return buflen;
 
@@ -421,7 +421,7 @@ static ssize_t nfsd_getxattr(struct dent
if (!*buf)
return -ENOMEM;
 
-   return vfs_getxattr(dentry, mnt, key, *buf, buflen);
+   return vfs_getxattr(dentry, mnt, key, *buf, buflen, NULL);
 }
 #endif
 
@@ -447,7 +447,7 @@ set_nfsv4_acl_one(struct dentry *dentry,
goto out;
}
 
-   error = vfs_setxattr(dentry, mnt, key, buf, len, 0);
+   error = vfs_setxattr(dentry, mnt, key, buf, len, 0, NULL);
 out:
kfree(buf);
return error;
@@ -2067,12 +2067,14 @@ nfsd_set_posix_acl(struct svc_fh *fhp, i
 
mnt = fhp-fh_export-ex_path.mnt;
if (size)
-   error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0);
+   error = vfs_setxattr(fhp-fh_dentry, mnt, name, value, size, 0,
+NULL);
else {
if (!S_ISDIR(inode-i_mode)  type == ACL_TYPE_DEFAULT)
error = 0;
else {
-   error = vfs_removexattr(fhp-fh_dentry, mnt, name);
+   error = vfs_removexattr(fhp-fh_dentry, mnt, name,
+   NULL);
if (error == -ENODATA)
error = 0;
}
--- a/fs/unionfs/copyup.c
+++ b/fs/unionfs/copyup.c
@@ -37,7 +37,8 @@ static int copyup_xattrs(struct dentry *
char *name_list_buf = NULL;
 
/* query the actual size of the xattr list */
-   list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, NULL, 0);
+   list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, NULL, 0,
+ NULL);
if (list_size = 0) {
err = list_size;
goto out;
@@ -54,7 +55,7 @@ static int copyup_xattrs(struct dentry *
 
/* now get the actual xattr list of the source file */
list_size = vfs_listxattr(old_lower_dentry, old_lower_mnt, name_list,
- list_size);
+ list_size, NULL);
if (list_size = 0) {
err = list_size;
goto out;
@@ -74,7 +75,7 @@ static int copyup_xattrs(struct dentry *
/* Lock here since vfs_getxattr doesn't lock for us */
mutex_lock(old_lower_dentry-d_inode-i_mutex);
size = vfs_getxattr(old_lower_dentry, old_lower_mnt, name_list,
-   attr_value, XATTR_SIZE_MAX);
+   attr_value, XATTR_SIZE_MAX, NULL);
mutex_unlock(old_lower_dentry-d_inode-i_mutex);
if (size  0) {
err = size;
@@ -86,7 +87,7 @@ static int copyup_xattrs(struct dentry *
}
/* Don't lock here since vfs_setxattr does it for us. */
err = vfs_setxattr(new_lower_dentry, new_lower_mnt, name_list,
-  attr_value, size, 0);
+  attr_value, size, 0, NULL);
/*
 * Selinux depends on security.* xattrs, so to maintain
 * the security of copied-up files, if Selinux is active,
@@ -97,7 +98,8 @@ static int copyup_xattrs(struct dentry *
if (err == -EPERM  !capable(CAP_FOWNER)) {
cap_raise(current-cap_effective, CAP_FOWNER);
err = vfs_setxattr(new_lower_dentry, new_lower_mnt,
-  name_list, attr_value, size, 0);
+  name_list, attr_value, size, 0,
+  NULL);
cap_lower(current-cap_effective, CAP_FOWNER);
}
if (err  0)
--- a/fs/unionfs/xattr.c
+++ b/fs/unionfs/xattr.c
@@ 

[AppArmor 35/47] Factor out sysctl pathname code

2007-12-20 Thread John
Convert the selinux sysctl pathname computation code into a standalone
function.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]
Reviewed-by: James Morris [EMAIL PROTECTED]

---
 include/linux/sysctl.h   |2 ++
 kernel/sysctl.c  |   27 +++
 security/selinux/hooks.c |   35 +--
 3 files changed, 34 insertions(+), 30 deletions(-)

--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -977,6 +977,8 @@ extern int proc_doulongvec_minmax(struct
 extern int proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int,
  struct file *, void __user *, size_t *, 
loff_t *);
 
+extern char *sysctl_pathname(ctl_table *, char *, int);
+
 extern int do_sysctl (int __user *name, int nlen,
  void __user *oldval, size_t __user *oldlenp,
  void __user *newval, size_t newlen);
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1443,6 +1443,33 @@ void register_sysctl_root(struct ctl_tab
spin_unlock(sysctl_lock);
 }
 
+char *sysctl_pathname(ctl_table *table, char *buffer, int buflen)
+{
+   if (buflen  1)
+   return NULL;
+   buffer += --buflen;
+   *buffer = '\0';
+
+   while (table) {
+   int namelen = strlen(table-procname);
+
+   if (buflen  namelen + 1)
+   return NULL;
+   buflen -= namelen + 1;
+   buffer -= namelen;
+   memcpy(buffer, table-procname, namelen);
+   *--buffer = '/';
+   table = table-parent;
+   }
+   if (buflen  4)
+   return NULL;
+   buffer -= 4;
+   memcpy(buffer, /sys, 4);
+
+   return buffer;
+}
+EXPORT_SYMBOL(sysctl_pathname);
+
 #ifdef CONFIG_SYSCTL_SYSCALL
 int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user 
*oldlenp,
   void __user *newval, size_t newlen)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1659,40 +1659,15 @@ static int selinux_capable(struct task_s
 
 static int selinux_sysctl_get_sid(ctl_table *table, u16 tclass, u32 *sid)
 {
-   int buflen, rc;
-   char *buffer, *path, *end;
+   char *buffer, *path;
+   int rc = -ENOMEM;
 
-   rc = -ENOMEM;
buffer = (char*)__get_free_page(GFP_KERNEL);
if (!buffer)
goto out;
-
-   buflen = PAGE_SIZE;
-   end = buffer+buflen;
-   *--end = '\0';
-   buflen--;
-   path = end-1;
-   *path = '/';
-   while (table) {
-   const char *name = table-procname;
-   size_t namelen = strlen(name);
-   buflen -= namelen + 1;
-   if (buflen  0)
-   goto out_free;
-   end -= namelen;
-   memcpy(end, name, namelen);
-   *--end = '/';
-   path = end;
-   table = table-parent;
-   }
-   buflen -= 4;
-   if (buflen  0)
-   goto out_free;
-   end -= 4;
-   memcpy(end, /sys, 4);
-   path = end;
-   rc = security_genfs_sid(proc, path, tclass, sid);
-out_free:
+   path = sysctl_pathname(table, buffer, PAGE_SIZE);
+   if (path)
+   rc = security_genfs_sid(proc, path, tclass, sid);
free_page((unsigned long)buffer);
 out:
return rc;

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 36/47] Allow permission functions to tell between parent and leaf checks

2007-12-20 Thread John
Set the LOOKUP_CONTINUE flag when checking parent permissions. This allows
permission functions to tell between parent and leaf checks.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c |2 ++
 1 file changed, 2 insertions(+)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1511,6 +1511,8 @@ static inline int may_create(struct inod
return -EEXIST;
if (IS_DEADDIR(dir))
return -ENOENT;
+   if (nd)
+   nd-flags |= LOOKUP_CONTINUE;
return permission(dir,MAY_WRITE | MAY_EXEC, nd);
 }
 

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 38/47] Switch to vfs_permission() in sys_fchdir()

2007-12-20 Thread John
Switch from file_permission() to vfs_permission() in sys_fchdir(): this
avoids calling permission() with a NULL nameidata here.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/open.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/fs/open.c
+++ b/fs/open.c
@@ -512,8 +512,8 @@ out:
 
 asmlinkage long sys_fchdir(unsigned int fd)
 {
+   struct nameidata nd;
struct file *file;
-   struct inode *inode;
int error;
 
error = -EBADF;
@@ -521,15 +521,16 @@ asmlinkage long sys_fchdir(unsigned int 
if (!file)
goto out;
 
-   inode = file-f_path.dentry-d_inode;
+   nd.path = file-f_path;
+   nd.flags = 0;
 
error = -ENOTDIR;
-   if (!S_ISDIR(inode-i_mode))
+   if (!S_ISDIR(nd.path.dentry-d_inode-i_mode))
goto out_putf;
 
-   error = file_permission(file, MAY_EXEC);
+   error = vfs_permission(nd, MAY_EXEC);
if (!error)
-   set_fs_pwd(current-fs, file-f_path);
+   set_fs_pwd(current-fs, nd.path);
 out_putf:
fput(file);
 out:

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 39/47] Fix file_permission()

2007-12-20 Thread John
We cannot easily switch from file_permission() to vfs_permission()
everywhere, so fix file_permission() to not use a NULL nameidata
for the remaining users.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -313,7 +313,13 @@ int vfs_permission(struct nameidata *nd,
  */
 int file_permission(struct file *file, int mask)
 {
-   return permission(file-f_path.dentry-d_inode, mask, NULL);
+   struct nameidata nd;
+
+   nd.path.dentry = file-f_path.dentry;
+   nd.path.mnt = file-f_path.mnt;
+   nd.flags = LOOKUP_ACCESS;
+
+   return permission(nd.path.dentry-d_inode, mask, nd);
 }
 
 /*

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 40/47] Export audit subsystem for use by modules

2007-12-20 Thread John
Update kenel audit range comments to show AppArmor's registered range of
1500-1599.  This range used to be reserved for LSPP but LSPP uses the
SE Linux range and the range was given to AppArmor.
Adds necessary export symbols for audit subsystem routines.
Changes audit_log_vformat to be externally visible (analagous to vprintf)
Patch is not in mainline -- pending AppArmor code submission to lkml

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 include/linux/audit.h |   12 +++-
 kernel/audit.c|6 --
 2 files changed, 15 insertions(+), 3 deletions(-)

--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -33,7 +33,7 @@
  * 1200 - 1299 messages internal to the audit daemon
  * 1300 - 1399 audit event messages
  * 1400 - 1499 SE Linux use
- * 1500 - 1599 kernel LSPP events
+ * 1500 - 1599 AppArmor use
  * 1600 - 1699 kernel crypto events
  * 1700 - 1799 kernel anomaly records
  * 1800 - 1999 future kernel use (maybe integrity labels and related events)
@@ -118,6 +118,13 @@
 #define AUDIT_MAC_UNLBL_STCADD 1416/* NetLabel: add a static label */
 #define AUDIT_MAC_UNLBL_STCDEL 1417/* NetLabel: del a static label */
 
+#define AUDIT_APPARMOR_AUDIT   1501/* AppArmor audited grants */
+#define AUDIT_APPARMOR_ALLOWED 1502/* Allowed Access for learning */
+#define AUDIT_APPARMOR_DENIED  1503
+#define AUDIT_APPARMOR_HINT1504/* Process Tracking information */
+#define AUDIT_APPARMOR_STATUS  1505/* Changes in config */
+#define AUDIT_APPARMOR_ERROR   1506/* Internal AppArmor Errors */
+
 #define AUDIT_FIRST_KERN_ANOM_MSG   1700
 #define AUDIT_LAST_KERN_ANOM_MSG1799
 #define AUDIT_ANOM_PROMISCUOUS  1700 /* Device changed promiscuous mode */
@@ -515,6 +522,9 @@ extern void audit_log(struct audit_
  __attribute__((format(printf,4,5)));
 
 extern struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t 
gfp_mask, int type);
+extern voidaudit_log_vformat(struct audit_buffer *ab,
+ const char *fmt, va_list args)
+   __attribute__((format(printf,2,0)));
 extern voidaudit_log_format(struct audit_buffer *ab,
 const char *fmt, ...)
__attribute__((format(printf,2,3)));
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1215,8 +1215,7 @@ static inline int audit_expand(struct au
  * will be called a second time.  Currently, we assume that a printk
  * can't format message larger than 1024 bytes, so we don't either.
  */
-static void audit_log_vformat(struct audit_buffer *ab, const char *fmt,
- va_list args)
+void audit_log_vformat(struct audit_buffer *ab, const char *fmt, va_list args)
 {
int len, avail;
struct sk_buff *skb;
@@ -1472,3 +1471,6 @@ EXPORT_SYMBOL(audit_log_start);
 EXPORT_SYMBOL(audit_log_end);
 EXPORT_SYMBOL(audit_log_format);
 EXPORT_SYMBOL(audit_log);
+EXPORT_SYMBOL_GPL(audit_log_vformat);
+EXPORT_SYMBOL_GPL(audit_log_untrustedstring);
+EXPORT_SYMBOL_GPL(audit_log_d_path);

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 41/47] AppArmor: Main Part

2007-12-20 Thread John
The underlying functions by which the AppArmor LSM hooks are implemented.

Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]

---
 security/apparmor/main.c | 1357 +++
 1 file changed, 1357 insertions(+)

--- /dev/null
+++ b/security/apparmor/main.c
@@ -0,0 +1,1357 @@
+/*
+ * Copyright (C) 2002-2007 Novell/SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ * AppArmor Core
+ */
+
+#include linux/security.h
+#include linux/namei.h
+#include linux/audit.h
+#include linux/mount.h
+#include linux/ptrace.h
+
+#include apparmor.h
+
+#include inline.h
+
+/*
+ * Table of capability names: we generate it from capabilities.h.
+ */
+static const char *capability_names[] = {
+#include capability_names.h
+};
+
+struct aa_namespace *default_namespace;
+
+static int aa_inode_mode(struct inode *inode)
+{
+   /* if the inode doesn't exist the user is creating it */
+   if (!inode || current-fsuid == inode-i_uid)
+   return AA_USER_SHIFT;
+   return AA_OTHER_SHIFT;
+}
+
+/**
+ * aa_file_denied - check for @mask access on a file
+ * @profile: profile to check against
+ * @name: pathname of file
+ * @mask: permission mask requested for file
+ *
+ * Return %0 on success, or else the permissions in @mask that the
+ * profile denies.
+ */
+static int aa_file_denied(struct aa_profile *profile, const char *name,
+ int mask)
+{
+   return (mask  ~aa_match(profile-file_rules, name));
+}
+
+/**
+ * aa_link_denied - check for permission to link a file
+ * @profile: profile to check against
+ * @link: pathname of link being created
+ * @target: pathname of target to be linked to
+ * @target_mode: UGO shift for target inode
+ * @request_mask: the permissions subset valid only if link succeeds
+ * Return %0 on success, or else the permissions that the profile denies.
+ */
+static int aa_link_denied(struct aa_profile *profile, const char *link,
+ const char *target, int target_mode,
+ int *request_mask)
+{
+   unsigned int state;
+   int l_mode, t_mode, denied_mask = 0;
+   int link_mask = AA_MAY_LINK  target_mode;
+
+   *request_mask = link_mask;
+
+   l_mode = aa_match_state(profile-file_rules, DFA_START, link, state);
+   if (l_mode  link_mask) {
+   int mode;
+   /* test to see if target can be paired with link */
+   state = aa_dfa_null_transition(profile-file_rules, state);
+   mode = aa_match_state(profile-file_rules, state, target,
+ NULL);
+
+   if (!(mode  link_mask))
+   denied_mask |= link_mask;
+   if (!(mode  (AA_LINK_SUBSET_TEST  target_mode)))
+   return denied_mask;
+   }
+
+   /* do link perm subset test */
+   t_mode = aa_match(profile-file_rules, target);
+
+   /* Ignore valid-profile-transition flags. */
+   l_mode = ~AA_SHARED_PERMS;
+   t_mode = ~AA_SHARED_PERMS;
+
+   *request_mask = l_mode | link_mask;
+
+   /* Link always requires 'l' on the link for both parts of the pair.
+* If a subset test is required a permission subset test of the
+* perms for the link are done against the user:group:other of the
+* target's 'r', 'w', 'x', 'a', 'z', and 'm' permissions.
+*
+* If the link has 'x', an exact match of all the execute flags
+* ('i', 'u', 'p').  safe exec is treated as a subset of unsafe exec
+*/
+#define SUBSET_PERMS (AA_FILE_PERMS  ~AA_LINK_BITS)
+   denied_mask |= ~l_mode  link_mask;
+   if (l_mode  SUBSET_PERMS) {
+   denied_mask |= (l_mode  SUBSET_PERMS)  ~t_mode;
+   if (denied_mask  AA_EXEC_BITS)
+   denied_mask |= l_mode  AA_ALL_EXEC_MODS;
+   else if (l_mode  AA_EXEC_BITS) {
+   if (!(l_mode  AA_USER_EXEC_UNSAFE))
+   l_mode |= t_mode  AA_USER_EXEC_UNSAFE;
+   if (l_mode  AA_USER_EXEC 
+   (l_mode  AA_USER_EXEC_MODS) !=
+   (t_mode  AA_USER_EXEC_MODS))
+   denied_mask |= AA_USER_EXEC |
+   (l_mode  AA_USER_EXEC_MODS);
+   if (!(l_mode  AA_OTHER_EXEC_UNSAFE))
+   l_mode |= t_mode  AA_OTHER_EXEC_UNSAFE;
+   if (l_mode  AA_OTHER_EXEC 
+   (l_mode  AA_OTHER_EXEC_MODS) !=
+   (t_mode  AA_OTHER_EXEC_MODS))
+   denied_mask |= AA_OTHER_EXEC |
+   (l_mode  

[AppArmor 42/47] AppArmor: Module and LSM hooks

2007-12-20 Thread John
Module parameters, LSM hooks, initialization and teardown.

Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]

---
 security/apparmor/lsm.c |  815 
 1 file changed, 815 insertions(+)

--- /dev/null
+++ b/security/apparmor/lsm.c
@@ -0,0 +1,815 @@
+/*
+ * Copyright (C) 1998-2007 Novell/SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ * AppArmor LSM interface
+ */
+
+#include linux/security.h
+#include linux/module.h
+#include linux/mm.h
+#include linux/mman.h
+#include linux/mount.h
+#include linux/namei.h
+#include linux/ctype.h
+#include linux/sysctl.h
+#include linux/audit.h
+
+#include apparmor.h
+#include inline.h
+
+/* Boot time disable flag */
+int apparmor_enabled = CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE;
+
+static int __init apparmor_enabled_setup(char *str)
+{
+   apparmor_enabled = simple_strtol(str, NULL, 0);
+   return 1;
+}
+__setup(apparmor=, apparmor_enabled_setup);
+
+
+static int param_set_aabool(const char *val, struct kernel_param *kp);
+static int param_get_aabool(char *buffer, struct kernel_param *kp);
+#define param_check_aabool(name, p) __param_check(name, p, int)
+
+static int param_set_aauint(const char *val, struct kernel_param *kp);
+static int param_get_aauint(char *buffer, struct kernel_param *kp);
+#define param_check_aauint(name, p) __param_check(name, p, int)
+
+/* Flag values, also controllable via /sys/module/apparmor/parameters
+ * We define special types as we want to do additional mediation.
+ *
+ * Complain mode -- in complain mode access failures result in auditing only
+ * and task is allowed access.  audit events are processed by userspace to
+ * generate policy.  Default is 'enforce' (0).
+ * Value is also togglable per profile and referenced when global value is
+ * enforce.
+ */
+int apparmor_complain = 0;
+module_param_named(complain, apparmor_complain, aabool, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(apparmor_complain, Toggle AppArmor complain mode);
+
+/* Debug mode */
+int apparmor_debug = 0;
+module_param_named(debug, apparmor_debug, aabool, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(apparmor_debug, Toggle AppArmor debug mode);
+
+/* Audit mode */
+int apparmor_audit = 0;
+module_param_named(audit, apparmor_audit, aabool, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(apparmor_audit, Toggle AppArmor audit mode);
+
+/* Syscall logging mode */
+int apparmor_logsyscall = 0;
+module_param_named(logsyscall, apparmor_logsyscall, aabool, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(apparmor_logsyscall, Toggle AppArmor logsyscall mode);
+
+/* Maximum pathname length before accesses will start getting rejected */
+unsigned int apparmor_path_max = 2 * PATH_MAX;
+module_param_named(path_max, apparmor_path_max, aauint, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(apparmor_path_max, Maximum pathname length allowed);
+
+static int param_set_aabool(const char *val, struct kernel_param *kp)
+{
+   if (aa_task_context(current))
+   return -EPERM;
+   return param_set_bool(val, kp);
+}
+
+static int param_get_aabool(char *buffer, struct kernel_param *kp)
+{
+   if (aa_task_context(current))
+   return -EPERM;
+   return param_get_bool(buffer, kp);
+}
+
+static int param_set_aauint(const char *val, struct kernel_param *kp)
+{
+   if (aa_task_context(current))
+   return -EPERM;
+   return param_set_uint(val, kp);
+}
+
+static int param_get_aauint(char *buffer, struct kernel_param *kp)
+{
+   if (aa_task_context(current))
+   return -EPERM;
+   return param_get_uint(buffer, kp);
+}
+
+static int aa_reject_syscall(struct task_struct *task, gfp_t flags,
+const char *name)
+{
+   struct aa_profile *profile = aa_get_profile(task);
+   int error = 0;
+
+   if (profile) {
+   error = aa_audit_syscallreject(profile, flags, name);
+   aa_put_profile(profile);
+   }
+
+   return error;
+}
+
+static int apparmor_ptrace(struct task_struct *parent,
+  struct task_struct *child)
+{
+   struct aa_task_context *cxt;
+   int error = 0;
+
+   /*
+* parent can ptrace child when
+* - parent is unconfined
+* - parent  child are in the same namespace 
+*   - parent is in complain mode
+*   - parent and child are confined by the same profile
+*   - parent profile has CAP_SYS_PTRACE
+*/
+
+   rcu_read_lock();
+   cxt = aa_task_context(parent);
+   if (cxt) {
+   if (parent-nsproxy != child-nsproxy) {
+   struct aa_audit sa;
+   memset(sa, 0, sizeof(sa));
+   sa.operation = ptrace;
+   

[PATCH] IB/ehca: Forward event client-reregister-required to registered clients

2007-12-20 Thread Hoang-Nam Nguyen
This patch allows ehca to forward event client-reregister-required to
registered clients. Such one event is generated by the switch eg. after
its reboot.

Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_irq.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c 
b/drivers/infiniband/hw/ehca/ehca_irq.c
index 3f617b2..4c734ec 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -62,6 +62,7 @@
 #define NEQE_PORT_NUMBER   EHCA_BMASK_IBM( 8, 15)
 #define NEQE_PORT_AVAILABILITY EHCA_BMASK_IBM(16, 16)
 #define NEQE_DISRUPTIVEEHCA_BMASK_IBM(16, 16)
+#define NEQE_SPECIFIC_EVENTEHCA_BMASK_IBM(16, 23)
 
 #define ERROR_DATA_LENGTH  EHCA_BMASK_IBM(52, 63)
 #define ERROR_DATA_TYPEEHCA_BMASK_IBM( 0,  7)
@@ -354,6 +355,7 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe)
 {
u8 ec   = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe);
u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe);
+   u8 spec_event;
 
switch (ec) {
case 0x30: /* port availability change */
@@ -394,6 +396,16 @@ static void parse_ec(struct ehca_shca *shca, u64 eqe)
case 0x33:  /* trace stopped */
ehca_err(shca-ib_device, Traced stopped.);
break;
+   case 0x34: /* util async event */
+   spec_event = EHCA_BMASK_GET(NEQE_SPECIFIC_EVENT, eqe);
+   if (spec_event == 0x80) /* client reregister required */
+   dispatch_port_event(shca, port,
+   IB_EVENT_CLIENT_REREGISTER,
+   client reregister req.);
+   else
+   ehca_warn(shca-ib_device, Unknown util async 
+ event %x on port %x, spec_event, port);
+   break;
default:
ehca_err(shca-ib_device, Unknown event code: %x on %s.,
 ec, shca-ib_device.name);
-- 
1.5.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 43/47] AppArmor: Profile loading and manipulation, pathname matching

2007-12-20 Thread John
Pathname matching, transition table loading, profile loading and
manipulation.

Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]

---
 security/apparmor/match.c|  299 +
 security/apparmor/match.h|   85 +++
 security/apparmor/module_interface.c |  805 +++
 3 files changed, 1189 insertions(+)

--- /dev/null
+++ b/security/apparmor/match.c
@@ -0,0 +1,299 @@
+/*
+ * Copyright (C) 2007 Novell/SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ * Regular expression transition table matching
+ */
+
+#include linux/kernel.h
+#include linux/slab.h
+#include linux/errno.h
+#include apparmor.h
+#include match.h
+
+static struct table_header *unpack_table(void *blob, size_t bsize)
+{
+   struct table_header *table = NULL;
+   struct table_header th;
+   size_t tsize;
+
+   if (bsize  sizeof(struct table_header))
+   goto out;
+
+   th.td_id = be16_to_cpu(*(u16 *) (blob));
+   th.td_flags = be16_to_cpu(*(u16 *) (blob + 2));
+   th.td_lolen = be32_to_cpu(*(u32 *) (blob + 8));
+   blob += sizeof(struct table_header);
+
+   if (!(th.td_flags == YYTD_DATA16 || th.td_flags == YYTD_DATA32 ||
+   th.td_flags == YYTD_DATA8))
+   goto out;
+
+   tsize = table_size(th.td_lolen, th.td_flags);
+   if (bsize  tsize)
+   goto out;
+
+   table = kmalloc(tsize, GFP_KERNEL);
+   if (table) {
+   *table = th;
+   if (th.td_flags == YYTD_DATA8)
+   UNPACK_ARRAY(table-td_data, blob, th.td_lolen,
+u8, byte_to_byte);
+   else if (th.td_flags == YYTD_DATA16)
+   UNPACK_ARRAY(table-td_data, blob, th.td_lolen,
+u16, be16_to_cpu);
+   else
+   UNPACK_ARRAY(table-td_data, blob, th.td_lolen,
+u32, be32_to_cpu);
+   }
+
+out:
+   return table;
+}
+
+int unpack_dfa(struct aa_dfa *dfa, void *blob, size_t size)
+{
+   int hsize, i;
+   int error = -ENOMEM;
+
+   /* get dfa table set header */
+   if (size  sizeof(struct table_set_header))
+   goto fail;
+
+   if (ntohl(*(u32 *)blob) != YYTH_MAGIC)
+   goto fail;
+
+   hsize = ntohl(*(u32 *)(blob + 4));
+   if (size  hsize)
+   goto fail;
+
+   blob += hsize;
+   size -= hsize;
+
+   error = -EPROTO;
+   while (size  0) {
+   struct table_header *table;
+   table = unpack_table(blob, size);
+   if (!table)
+   goto fail;
+
+   switch(table-td_id) {
+   case YYTD_ID_ACCEPT:
+   case YYTD_ID_BASE:
+   dfa-tables[table-td_id - 1] = table;
+   if (table-td_flags != YYTD_DATA32)
+   goto fail;
+   break;
+   case YYTD_ID_DEF:
+   case YYTD_ID_NXT:
+   case YYTD_ID_CHK:
+   dfa-tables[table-td_id - 1] = table;
+   if (table-td_flags != YYTD_DATA16)
+   goto fail;
+   break;
+   case YYTD_ID_EC:
+   dfa-tables[table-td_id - 1] = table;
+   if (table-td_flags != YYTD_DATA8)
+   goto fail;
+   break;
+   default:
+   kfree(table);
+   goto fail;
+   }
+
+   blob += table_size(table-td_lolen, table-td_flags);
+   size -= table_size(table-td_lolen, table-td_flags);
+   }
+
+   return 0;
+
+fail:
+   for (i = 0; i  ARRAY_SIZE(dfa-tables); i++) {
+   if (dfa-tables[i]) {
+   kfree(dfa-tables[i]);
+   dfa-tables[i] = NULL;
+   }
+   }
+   return error;
+}
+
+/**
+ * verify_dfa - verify that all the transitions and states in the dfa tables
+ *  are in bounds.
+ * @dfa: dfa to test
+ *
+ * assumes dfa has gone through the verification done by unpacking
+ */
+int verify_dfa(struct aa_dfa *dfa)
+{
+   size_t i, state_count, trans_count;
+   int error = -EPROTO;
+
+   /* check that required tables exist */
+   if (!(dfa-tables[YYTD_ID_ACCEPT -1 ] 
+ dfa-tables[YYTD_ID_DEF - 1] 
+ dfa-tables[YYTD_ID_BASE - 1] 
+ dfa-tables[YYTD_ID_NXT - 1] 
+ dfa-tables[YYTD_ID_CHK - 1]))
+   goto out;
+
+   /* accept.size == default.size == base.size */
+   state_count = 

[AppArmor 44/47] AppArmor: all the rest

2007-12-20 Thread John
All the things that didn't nicely fit in a category on their own: kbuild
code, declararions and inline functions, /sys/kernel/security/apparmor
filesystem for controlling apparmor from user space, profile list
functions, locking documentation, /proc/$pid/task/$tid/attr/current
access.

Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]

---
 security/apparmor/Kconfig  |9 +
 security/apparmor/Makefile |   13 +
 security/apparmor/apparmor.h   |  323 +
 security/apparmor/apparmorfs.c |  252 +++
 security/apparmor/inline.h |  240 ++
 security/apparmor/list.c   |  156 +++
 security/apparmor/locking.txt  |   68 
 security/apparmor/procattr.c   |  194 
 8 files changed, 1255 insertions(+)

--- /dev/null
+++ b/security/apparmor/Kconfig
@@ -0,0 +1,9 @@
+config SECURITY_APPARMOR
+   tristate AppArmor support
+   depends on SECURITY!=n
+   help
+ This enables the AppArmor security module.
+ Required userspace tools (if they are not included in your
+ distribution) and further information may be found at
+ http://forge.novell.com/modules/xfmod/project/?apparmor
+ If you are unsure how to answer this question, answer N.
--- /dev/null
+++ b/security/apparmor/Makefile
@@ -0,0 +1,13 @@
+# Makefile for AppArmor Linux Security Module
+#
+obj-$(CONFIG_SECURITY_APPARMOR) += apparmor.o
+
+apparmor-y := main.o list.o procattr.o lsm.o apparmorfs.o \
+ module_interface.o match.o
+
+quiet_cmd_make-caps = GEN $@
+cmd_make-caps = sed -n -e /CAP_FS_MASK/d -e s/^\#define[ 
\\t]\\+CAP_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\$$/[\\2]  = \\\1\,/p $ 
| tr A-Z a-z  $@
+
+$(obj)/main.o : $(obj)/capability_names.h
+$(obj)/capability_names.h : $(srctree)/include/linux/capability.h
+   $(call cmd,make-caps)
--- /dev/null
+++ b/security/apparmor/apparmor.h
@@ -0,0 +1,323 @@
+/*
+ * Copyright (C) 1998-2007 Novell/SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ * AppArmor internal prototypes
+ */
+
+#ifndef __APPARMOR_H
+#define __APPARMOR_H
+
+#include linux/sched.h
+#include linux/fs.h
+#include linux/binfmts.h
+#include linux/rcupdate.h
+
+/*
+ * We use MAY_READ, MAY_WRITE, MAY_EXEC, MAY_APPEND and the following flags
+ * for profile permissions
+ */
+#define AA_MAY_LINK0x0010
+#define AA_MAY_LOCK0x0020
+#define AA_EXEC_MMAP   0x0040
+#define AA_EXEC_UNSAFE 0x0080
+#define AA_EXEC_MOD_0  0x0100
+#define AA_EXEC_MOD_1  0x0200
+#define AA_BASE_PERMS  (MAY_READ | MAY_WRITE | MAY_EXEC | \
+MAY_APPEND | AA_MAY_LINK | \
+AA_MAY_LOCK | AA_EXEC_MMAP | \
+AA_EXEC_UNSAFE | AA_EXEC_MOD_0 | \
+AA_EXEC_MOD_1)
+#define AA_LINK_SUBSET_TEST0x0020
+
+#define AA_EXEC_UNCONFINED 0
+#define AA_EXEC_INHERITAA_EXEC_MOD_0
+#define AA_EXEC_PROFILEAA_EXEC_MOD_1
+#define AA_EXEC_PIX(AA_EXEC_MOD_0 | AA_EXEC_MOD_1)
+
+#define AA_EXEC_MODIFIERS  (AA_EXEC_MOD_0 | AA_EXEC_MOD_1)
+
+#define AA_USER_SHIFT  0
+#define AA_OTHER_SHIFT 10
+
+#define AA_USER_PERMS  (AA_BASE_PERMS  AA_USER_SHIFT)
+#define AA_OTHER_PERMS (AA_BASE_PERMS  AA_OTHER_SHIFT)
+
+#define AA_FILE_PERMS  (AA_USER_PERMS | AA_OTHER_PERMS)
+
+#define AA_LINK_BITS   ((AA_MAY_LINK  AA_USER_SHIFT) | \
+(AA_MAY_LINK  AA_OTHER_SHIFT))
+
+#define AA_USER_EXEC   (MAY_EXEC  AA_USER_SHIFT)
+#define AA_OTHER_EXEC  (MAY_EXEC  AA_OTHER_SHIFT)
+
+#define AA_USER_EXEC_MODS  (AA_EXEC_MODIFIERS  AA_USER_SHIFT)
+#define AA_OTHER_EXEC_MODS (AA_EXEC_MODIFIERS  AA_OTHER_SHIFT)
+
+#define AA_USER_EXEC_UNSAFE(AA_EXEC_UNSAFE  AA_USER_SHIFT)
+#define AA_OTHER_EXEC_UNSAFE   (AA_EXEC_UNSAFE  AA_OTHER_SHIFT)
+
+#define AA_EXEC_BITS   (AA_USER_EXEC | AA_OTHER_EXEC)
+
+#define AA_ALL_EXEC_MODS   (AA_USER_EXEC_MODS | \
+AA_OTHER_EXEC_MODS)
+
+/* shared permissions that are not duplicated in user:group:other */
+#define AA_CHANGE_PROFILE  0x4000
+
+#define AA_SHARED_PERMS(AA_CHANGE_PROFILE)
+
+#define AA_VALID_PERM_MASK (AA_FILE_PERMS | AA_SHARED_PERMS)
+

[AppArmor 45/47] Add AppArmor LSM to security/Makefile

2007-12-20 Thread John
Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]

---
 security/Kconfig  |1 +
 security/Makefile |1 +
 security/apparmor/Kconfig |   22 --
 3 files changed, 22 insertions(+), 2 deletions(-)

--- a/security/Kconfig
+++ b/security/Kconfig
@@ -105,6 +105,7 @@ config SECURITY_ROOTPLUG
 
 source security/selinux/Kconfig
 source security/smack/Kconfig
+source security/apparmor/Kconfig
 
 endmenu
 
--- a/security/Makefile
+++ b/security/Makefile
@@ -16,5 +16,6 @@ obj-$(CONFIG_SECURITY)+= security.o d
 # Must precede capability.o in order to stack properly.
 obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o
 obj-$(CONFIG_SECURITY_SMACK)   += commoncap.o smack/built-in.o
+obj-$(CONFIG_SECURITY_APPARMOR)+= commoncap.o apparmor/
 obj-$(CONFIG_SECURITY_CAPABILITIES)+= commoncap.o capability.o
 obj-$(CONFIG_SECURITY_ROOTPLUG)+= commoncap.o root_plug.o
--- a/security/apparmor/Kconfig
+++ b/security/apparmor/Kconfig
@@ -1,9 +1,27 @@
 config SECURITY_APPARMOR
-   tristate AppArmor support
-   depends on SECURITY!=n
+   bool AppArmor support
+   depends on SECURITY
+   select AUDIT
help
  This enables the AppArmor security module.
  Required userspace tools (if they are not included in your
  distribution) and further information may be found at
  http://forge.novell.com/modules/xfmod/project/?apparmor
+
  If you are unsure how to answer this question, answer N.
+
+config SECURITY_APPARMOR_BOOTPARAM_VALUE
+   int AppArmor boot parameter default value
+   depends on SECURITY_APPARMOR
+   range 0 1
+   default 1
+   help
+ This option sets the default value for the kernel parameter
+ 'apparmor', which allows AppArmor to be enabled or disabled
+  at boot.  If this option is set to 0 (zero), the AppArmor
+ kernel parameter will default to 0, disabling AppArmor at
+ bootup.  If this option is set to 1 (one), the AppArmor
+ kernel parameter will default to 1, enabling AppArmor at
+ bootup.
+
+ If you are unsure how to answer this question, answer 1.

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 46/47] add simple network toggles to apparmor

2007-12-20 Thread John
Signed-off-by: John Johansen [EMAIL PROTECTED]
Signed-off-by: Jesse Michael [EMAIL PROTECTED]

---
 security/apparmor/Makefile   |7 +
 security/apparmor/apparmor.h |7 +
 security/apparmor/lsm.c  |  129 ++-
 security/apparmor/main.c |   96 ++
 security/apparmor/module_interface.c |   20 +
 5 files changed, 255 insertions(+), 4 deletions(-)

--- a/security/apparmor/Makefile
+++ b/security/apparmor/Makefile
@@ -8,6 +8,11 @@ apparmor-y := main.o list.o procattr.o l
 quiet_cmd_make-caps = GEN $@
 cmd_make-caps = sed -n -e /CAP_FS_MASK/d -e s/^\#define[ 
\\t]\\+CAP_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\$$/[\\2]  = \\\1\,/p $ 
| tr A-Z a-z  $@
 
-$(obj)/main.o : $(obj)/capability_names.h
+quiet_cmd_make-af = GEN $@
+cmd_make-af = sed -n -e /AF_MAX/d -e /AF_LOCAL/d -e s/^\#define[ 
\\t]\\+AF_\\([A-Z0-9_]\\+\\)[ \\t]\\+\\([0-9]\\+\\)\\(.*\\)\$$/[\\2]  = 
\\\1\,/p $ | tr A-Z a-z  $@
+
+$(obj)/main.o : $(obj)/capability_names.h $(obj)/af_names.h
 $(obj)/capability_names.h : $(srctree)/include/linux/capability.h
$(call cmd,make-caps)
+$(obj)/af_names.h : $(srctree)/include/linux/socket.h
+   $(call cmd,make-af)
--- a/security/apparmor/apparmor.h
+++ b/security/apparmor/apparmor.h
@@ -16,6 +16,8 @@
 #include linux/fs.h
 #include linux/binfmts.h
 #include linux/rcupdate.h
+#include linux/socket.h
+#include net/sock.h
 
 /*
  * We use MAY_READ, MAY_WRITE, MAY_EXEC, MAY_APPEND and the following flags
@@ -169,6 +171,7 @@ struct aa_profile {
struct list_head task_contexts;
spinlock_t lock;
unsigned long int_flags;
+   u16 network_families[AF_MAX];
 };
 
 extern struct list_head profile_ns_list;
@@ -215,6 +218,7 @@ struct aa_audit {
int request_mask, denied_mask;
struct iattr *iattr;
pid_t task, parent;
+   int family, type, protocol;
int error_code;
 };
 
@@ -276,6 +280,9 @@ extern void aa_change_task_context(struc
   struct aa_profile *previous_profile);
 extern int aa_may_ptrace(struct aa_task_context *cxt,
 struct aa_profile *tracee);
+extern int aa_net_perm(struct aa_profile *profile, char *operation,
+   int family, int type, int protocol);
+extern int aa_revalidate_sk(struct sock *sk, char *operation);
 
 /* list.c */
 extern struct aa_namespace *__aa_find_namespace(const char *name,
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -18,6 +18,7 @@
 #include linux/ctype.h
 #include linux/sysctl.h
 #include linux/audit.h
+#include net/sock.h
 
 #include apparmor.h
 #include inline.h
@@ -616,6 +617,117 @@ static void apparmor_task_free_security(
aa_release(task);
 }
 
+static int apparmor_socket_create(int family, int type, int protocol, int kern)
+{
+   struct aa_profile *profile;
+   int error = 0;
+
+   if (kern)
+   return 0;
+
+   profile = aa_get_profile(current);
+   if (profile)
+   error = aa_net_perm(profile, socket_create, family,
+   type, protocol);
+   aa_put_profile(profile);
+
+   return error;
+}
+
+static int apparmor_socket_post_create(struct socket *sock, int family,
+   int type, int protocol, int kern)
+{
+   struct sock *sk = sock-sk;
+
+   if (kern)
+   return 0;
+
+   return aa_revalidate_sk(sk, socket_post_create);
+}
+
+static int apparmor_socket_bind(struct socket *sock,
+   struct sockaddr *address, int addrlen)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_bind);
+}
+
+static int apparmor_socket_connect(struct socket *sock,
+   struct sockaddr *address, int addrlen)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_connect);
+}
+
+static int apparmor_socket_listen(struct socket *sock, int backlog)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_listen);
+}
+
+static int apparmor_socket_accept(struct socket *sock, struct socket *newsock)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_accept);
+}
+
+static int apparmor_socket_sendmsg(struct socket *sock,
+   struct msghdr *msg, int size)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_sendmsg);
+}
+
+static int apparmor_socket_recvmsg(struct socket *sock,
+  struct msghdr *msg, int size, int flags)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_recvmsg);
+}
+
+static int apparmor_socket_getsockname(struct socket *sock)
+{
+   struct sock *sk = sock-sk;
+
+   return aa_revalidate_sk(sk, socket_getsockname);
+}
+
+static int 

[AppArmor 37/47] Switch to vfs_permission() in do_path_lookup()

2007-12-20 Thread John
Switch from file_permission() to vfs_permission() in do_path_lookup():
this avoids calling permission() with a NULL nameidata here.

Signed-off-by: Andreas Gruenbacher [EMAIL PROTECTED]
Signed-off-by: John Johansen [EMAIL PROTECTED]

---
 fs/namei.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1150,24 +1150,21 @@ static int fastcall do_path_lookup(int d
path_get(fs-pwd);
read_unlock(fs-lock);
} else {
-   struct dentry *dentry;
-
file = fget_light(dfd, fput_needed);
retval = -EBADF;
if (!file)
goto out_fail;
 
-   dentry = file-f_path.dentry;
+   nd-path = file-f_path;
 
retval = -ENOTDIR;
-   if (!S_ISDIR(dentry-d_inode-i_mode))
+   if (!S_ISDIR(nd-path.dentry-d_inode-i_mode))
goto fput_fail;
 
-   retval = file_permission(file, MAY_EXEC);
+   retval = vfs_permission(nd, MAY_EXEC);
if (retval)
goto fput_fail;
 
-   nd-path = file-f_path;
path_get(file-f_path);
 
fput_light(file, fput_needed);

-- 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 47/47] --- Documentation/lsm/AppArmor-Security-Goal.txt | 134 +++++++++++++++++++++++++++ 1 file changed, 134 insertions(+)

2007-12-20 Thread John
--- /dev/null
+++ b/Documentation/lsm/AppArmor-Security-Goal.txt
@@ -0,0 +1,134 @@
+AppArmor Security Goal
+Crispin Cowan, PhD
+MercenaryLinux.com
+
+This document specifies the security goal that AppArmor is intended to
+achieve, so that users can evaluate whether AppArmor will meet their
+needs, and kernel developers can evaluate whether AppArmor is living up
+to its claims. This document is *not* a general purpose explanation of
+how AppArmor works, nor is it an explanation for why one might want to
+use AppArmor rather than some other system.
+
+AppArmor is intended to limit system damage from attackers exploiting
+vulnerabilities in applications that the system hosts. The threat is
+that an attacker can cause a vulnerable application to do something
+unexpected and undesirable. AppArmor addresses this threat by confining
+the application to access only the resources it needs to access to
+execute properly, effectively imposing least privilege execution on
+the application.
+
+Applications interact with the rest of the system via resources
+including files, interprocess communication, networking, capabilities,
+and execution of other applications. The purpose of least privilege is
+to bound the damage that a malicious user or code can do by removing
+access to resources that the application does not need for its intended
+function. This is true for all access control systems, including AppArmor.
+
+The attacker is someone trying to gain the privileges of a process for
+themselves. For instance, a policy for a web server might grant read
+only access to most web documents, preventing an attacker who can
+corrupt the web server from defacing the web pages. A web server has
+access to the web server's local file system, and a network attacker
+trying to hack the web server does not have such file access. An e-mail
+attacker attempting to infect the recipient of the e-mail does not have
+access to the files that the victim user's mail client does. By limiting
+the scope of access for an application, AppArmor can limit the damage an
+attacker can do by exploiting vulnerabilities in applications.
+
+An application is one or more related processes performing a function,
+e.g. the gang of processes that constitute an Apache web server, or a
+Postfix mail server. AppArmor *only* confines processes that the
+AppArmor policy says it should confine, and other processes are
+permitted to do anything that DAC permits. This is sometimes known as a
+targeted security policy.
+
+AppArmor does not provide a default policy that applies to all
+processes. So to defend an entire host, you have to piece-wise confine
+each process that is exposed to potential attack. For instance, to
+defend a system against network attack, place AppArmor profiles around
+every application that accesses the network. This limits the damage a
+network attacker can do to the file system to only those files granted
+by the profiles for the network-available applications. Similarly, to
+defend a system against attack from the console, place AppArmor profiles
+around every application that accessed the keyboard and mouse. The
+system is defended in that the worst the attacker can do to corrupt
+the system is limited to the transitive closure of what the confined
+processes are allowed to access.
+
+AppArmor currently mediates access to files, ability to use POSIX.1e
+Capabilities, and coarse-grained control on network access. This is
+sufficient to prevent a confined process from *directly* corrupting the
+file system. It is not sufficient to prevent a confined process from
+*indirectly* corrupting the system by influencing some other process to
+do the dirty deed. But to do so requires a complicit process that can be
+manipulated through another channel such as IPC. A complicit process
+is either a malicious process the attacker somehow got control of, or is
+a process that is actively listening to IPC of some kind and can be
+corrupted via IPC.
+
+The only IPC that AppArmor mediates is access to named sockets, FIFOs,
+etc. that appear in the file system name space, a side effect of
+AppArmor's file access mediation. Future versions of AppArmor will
+mediate more resources, including finer grained network access controls,
+and controls on various forms of IPC.
+
+AppArmor specifies the programs to be confined and the resources they
+can access in a syntax similar to how users are accustomed to accessing
+those resources. So file access controls are specified using absolute
+paths with respect to the name space the process is in. POSIX.1e
+capabilities are specified by name. Network access controls currently
+are specified by simply naming the protocol that can be used e.g. tcp,
+udp, and in the future will be more general, resembling firewall rules.
+
+Thus the AppArmor security goal should be considered piecewise from the
+point of view of a single confined process: that process should only be
+able to access the resources specified in its 

<    1   2   3   4   5   6   7   8   9   10   >