from:"David Howells"

Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-10 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 Grumble. Yet another thing to undo in the near future. I still
 hope to suggest what I would consider a viable alternative soon.

Use a struct key with the overrides attached?  The key can be generated by
SELinux or whatever module is there.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/16] Permit filesystem local caching [try #3]

2007-08-11 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 How would you expect an LSM that is not SELinux to interface with
 CacheFiles?

You have to understand that I didn't know that much about the LSM interface,
so I asked advice of the Red Hat security people, who, naturally, pointed me
at the SELinux mailing list.  I knew my stuff would have to work with SELinux
to be used with RH stuff.

Furthermore, as you pointed out, there aren't any other LSM modules upstream
yet for me to work against.  I would like CacheFiles to work with all LSM
modules in general, but I don't know how to do that yet.

I'm open to suggestion as to how to modify things to support any LSM.


Btw, do you understand the problems that CacheFiles has to deal with?  If I
set this down clearly, this may help you or someone else suggest a better way
to do things.

  (1) Some random process tries to access a file on a network filesystem
  (NFS example).

  (2) NFS goes to the cache to attempt to read the data from there prior to
  going to the network.

  (3) The cache driver wants to access the files in the cache, but it's
  running in the security context of either the aforementioned random
  process, or one of FS-Cache's thread pool.

  This security context, however, doesn't necessarily give it the rights
  to access what's in the cache, so the driver has to be permitted to act
  as a context appropriate to accessing the cache, without changing the
  overall security context of the random process (which would impact
  things trying to act on that process - kill() for example).

  (4) Assuming the data is found in the cache, all well and good, but if it
  isn't, the cache driver will have to create some files in the cache.

  Now, if the cache driver just went ahead and created the files, they
  could end up with their own security contexts being derived from the
  random process's security context, thus potentially making it impossible
  for other processes to access the cache.

  So the file-creation part of the security context must also be
  overridden temporarily, assuming that whatever LSM is in force has such
  a concept.

Part of the problem is that the VFS does not pass around the security context
as which the VFS routines act, but rather gets them from the task_struct.  For
the most part, this is entirely sufficient, but in the cache driver case, it's
a problem.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/16] Permit filesystem local caching [try #3]

2007-08-13 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 I haven't looked into the issues at all and I bet there are plenty,
 maybe in audit and places outside of the security realm, but this
 looks like a clean approach from the LSM interface standpoint. Do
 you want the entire task or just task-security?

It would probably have to be the task struct, lest the security information
(for which I've no refcount held) went away whilst I was trying to access it.

 I could see it either way, but I suspect the task is your best bet. If you
 call security_act_as() twice, then security_act_as_self() do you pop a
 stack, or return to the initial state?

Good point.  I've pondered that.  What I have at the moment partly acts like a
stack in that I store some of the shifted-out context on the machine stack (in
struct cachefiles_secctx).  The act-as context should probably be shifted too,
in addition to the old file-creation SID and the fsuid/fsgid.

 How about security_act_as(NULL) returning you to the initial state, and
 dropping security_act_as_self()?

That would be fine.

Actually, to address Stephen Smalley's requirements also, how about making
things a bit more complex.  Have the following suite of functions:

 (1) int security_get_context(struct sec **_context);

This allocates and gives the caller a blob that describes the current
context of all the LSM module states attached to the current task and
stores a pointer to it in *_context.

 (2) int security_push(struct sec *context, struct sec **_old_context)

This causes all the LSM modules on the current task to switch to a new
acting state, passing back the old state.  It does not change how
other tasks do things to this one.

 (3) int security_pop(struct sec *context)

This causes all the LSM modules on the current task to switch to a new
acting state, deleting the old state.  It does not change how
other tasks do things to this one.

 (4) int security_delete_context(struct sec *context)

This deletes a context blob.

The context blob could then be structured very simply.  Give each loaded LSM
module an integer index as it is registered.  Having a limit to the number of
LSM modules would make things simpler.  The blob would then be an array of
void pointers, one per LSM module, indexed by the integer index for each one.
It you don't have a limit on the number of LSM modules, you'd also need a
count of slots in the blob.

Any LSM module that wanted to implement the above three functions would fill
in or otherwise use the slot that belongs to it.  Otherwise the slot would
just be left NULL.

For example:

context ---+++-+
| SLOT 0 |---| SELINUX |
++  +++-+
| SLOT 1 |-| THINGY |
++  ++
| ...|
++ +---+
| SLOT N || AUDIT |
++ +---+

For Stephen and NFS, he could then generate a context from NFS which nfsd
could then put in place.  Perhaps any unfilled slot would be ignored by the
LSM module to which it belonged.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/16] Permit filesystem local caching [try #3]

2007-08-13 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 Seems like over-design - we don't need to support LSM stacking, and we
 don't need to support pushing/popping more than one level of context.

It will, at some point hopefully, be possible for someone to try, say, NFS
exporting a cached ISO9660 mount (CDROM) - in which case, we'd should allow
for two levels of stack.  If we can pass the displaced context to the caller
to restore later then that allows for more or less unlimited depth.

It occurs to me that the following is almost good enough, but not quite:

  (1) int security_get_context(void **_context);

This allocates and gives the caller a blob that describes the current
context of all the LSM module states attached to the current task and
stores a pointer to it in *_context.

  (2) int security_push(void *context, struct sec **_old_context)

This causes all the LSM modules on the current task to switch to a new
acting state, passing back the old state.  It does not change how
other tasks do things to this one.

  (3) int security_pop(void *context)

This causes all the LSM modules on the current task to switch to a new
acting state, deleting the old state.  It does not change how
other tasks do things to this one.

  (4) int security_delete_context(void *context)

I still need a way to transform the cachefilesd context into the kernel's
context.  See patch:

   Subject: [Linux-cachefs] [PATCH 12/16] CacheFiles: Get the SID under which
the CacheFiles module should operate [try #3]

However, this seems to add a fairly generic tranformation, so that could be
generalised:

  (5) int security_xfrm_to_kernel_context(void *from, void **_to);

 What was the objection again to the original interface, aside from
 replacing u32 secids with void* security blobs?

I got the impression that Casey thought much of this was tied to SELinux, but
rereading his/her emails, I'm not so certain.  Maybe that's sufficient.  Casey?

However, I've realised a problem (as outlined above) with what I've got.
Namely its stack isn't necessarily deep enough.  Alternatively, nfsd perhaps
should suppress caching on what it reads.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/16] Permit filesystem local caching [try #3]

2007-08-13 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

(1) int security_get_context(void **_context);
  
  This allocates and gives the caller a blob that describes the current
  context of all the LSM module states attached to the current task and
  stores a pointer to it in *_context.
 
 Is this intended to be anything more than a copy of current-security?

It has to be sufficient to fully effect security_push().

 I assume that you're talking about the LSM specific data changing,
 not the LSM itself.

Yes.

 If you change the task-security information you are definitly going
 to change what other tasks can do to the calling task.

I dealt with that in my current act-as patch.  Under SELinux a task has two
primary labels.  One with which it is labelled and is used to govern effects
upon it, and one that is used to act upon things and follows changes to the
former.

(5) int security_xfrm_to_kernel_context(void *from, void **_to);
 
 Woof. What are you transforming from? 

In CacheFiles case, the cachefilesd daemon's security label into the label the
cache driver acts as on behalf of other processes.

 That's the really nice thing about cans of worms.
 They come in six-packs.

Yeah...

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/16] Permit filesystem local caching [try #3]

2007-08-14 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 With Smack you can leave the label alone, raise CAP_MAC_OVERRIDE,
 do your business of setting the label correctly, and then drop
 the capability. No new hooks required.

That sounds like a contradiction.  How can you both leave it alone and set it?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adding a security parameter to VFS functions

2007-08-15 Thread David Howells


Hi Linus, Al,

Would you object greatly to functions like vfs_mkdir() gaining a security
parameter?  What I'm thinking of is this:

int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode,
  struct security *security)

Where the security context is the state of the context at the time the call
was issued:

struct security {
uid_t   fsuid;
git_t   fsgid;
struct group_info   *group_info;
void*security;
struct key  *session_keyring;
struct key  *process_keyring;
struct key  *thread_keyring;

And perhaps:

struct audit_context*audit_context;
seccomp_t   seccomp;
};

This would, for the most part, be a temporary affair, being set up by such as
sys_mkdir()/sys_mkdirat() from data held in task_struct.

This information would then be passed into the filesystem and LSM layers so
that files, directories, etc. can be created, opened, deleted, or otherwise
mangled based on these security items, rather than the one in whichever
task_struct is current.


The reason for doing this would be to support an act-as interface, so that
services such as nfsd and cachefiles could act with different security details
to the ones attached to the task.  This would have a couple of potential
benefits:

 (1) nfsd threads don't have to keep changing their security contexts.

 (2) cachefiles can act on behalf of a process without changing its security
 context.


Note that I/O operations such as read, write and ioctl would *not* be passed
this data as the file struct should contain the relevant security information.
Similarly, page I/O operations would also not need alteration as the VMA
covering the region points to a file struct, which holds the appropriate
security.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] CRED: Split the task security data and move part of it into struct cred

2007-09-19 Thread David Howells

Move into the cred struct the part of the task security data that defines how a
task acts upon an object.  The part that defines how something acts upon a task
remains attached to the task.

For SELinux this requires some of task_security_struct to be split off into
cred_security_struct which is then attached to struct cred.  Note that the
contents of cred_security_struct may not be changed without the generation of a
new struct cred.

The split is as follows:

 (*) create_sid, keycreate_sid and sockcreate_sid just move across.

 (*) sid is split into victim_sid - which remains - and action_sid - which
 migrates.

 (*) osid, exec_sid and ptrace_sid remain.

victim_sid is the SID used to govern actions upon the task.  action_sid is used
to govern actions made by the task.

When accessing the cred_security_struct of another process, RCU read procedures
must be observed.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/cred.h  |1 
 include/linux/security.h  |   34 +++
 kernel/cred.c |7 +
 security/dummy.c  |   11 +
 security/selinux/exports.c|6 
 security/selinux/hooks.c  |  497 +++--
 security/selinux/include/objsec.h |   16 +
 security/selinux/selinuxfs.c  |8 -
 security/selinux/xfrm.c   |6 
 9 files changed, 380 insertions(+), 206 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 22ae610..6c6feec 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -26,6 +26,7 @@ struct cred {
gid_t   gid;/* fsgid as was */
struct rcu_head exterminate;/* cred destroyer */
struct group_info   *group_info;
+   void*security;
 
/* caches for references to the three task keyrings
 * - note that key_ref_t isn't typedef'd at this point, hence the odd
diff --git a/include/linux/security.h b/include/linux/security.h
index 1a15526..e5ed2ea 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -504,6 +504,18 @@ struct request_sock;
  * @file contains the file structure being received.
  * Return 0 if permission is granted.
  *
+ * Security hooks for credential structure operations.
+ *
+ * @cred_dup:
+ * Duplicate the credentials onto a duplicated cred structure.
+ * @cred points to the credentials structure.  cred-security points to the
+ * security struct that was attached to the original cred struct, but it
+ * lacks a reference for the duplication if reference counting is needed.
+ *
+ * @cred_destroy:
+ * Destroy the credentials attached to a cred structure.
+ * @cred points to the credentials structure that is to be destroyed.
+ *
  * Security hooks for task operations.
  *
  * @task_create:
@@ -1257,6 +1269,9 @@ struct security_operations {
struct fown_struct * fown, int sig);
int (*file_receive) (struct file * file);
 
+   int (*cred_dup)(struct cred *cred);
+   void (*cred_destroy)(struct cred *cred);
+
int (*task_create) (unsigned long clone_flags);
int (*task_alloc_security) (struct task_struct * p);
void (*task_free_security) (struct task_struct * p);
@@ -1864,6 +1879,16 @@ static inline int security_file_receive (struct file 
*file)
return security_ops-file_receive (file);
 }
 
+static inline int security_cred_dup(struct cred *cred)
+{
+   return security_ops-cred_dup(cred);
+}
+
+static inline void security_cred_destroy(struct cred *cred)
+{
+   return security_ops-cred_destroy(cred);
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return security_ops-task_create (clone_flags);
@@ -2546,6 +2571,15 @@ static inline int security_file_receive (struct file 
*file)
return 0;
 }
 
+static inline int security_cred_dup(struct cred *cred)
+{
+   return 0;
+}
+
+static inline void security_cred_destroy(struct cred *cred)
+{
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return 0;
diff --git a/kernel/cred.c b/kernel/cred.c
index e96dafe..6a9dda2 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -92,6 +92,12 @@ struct cred *dup_cred(const struct cred *pcred)
if (likely(cred)) {
*cred = *pcred;
atomic_set(cred-usage, 1);
+
+   if (security_cred_dup(cred)  0) {
+   kfree(cred);
+   return NULL;
+   }
+
get_group_info(cred-group_info);
key_get(key_ref_to_ptr(cred-session_keyring));
key_get(key_ref_to_ptr(cred-process_keyring));
@@ -109,6 +115,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
 {
struct cred *cred = container_of(rcu, struct cred, exterminate);
 
+   security_cred_destroy(cred);
put_group_info(cred-group_info

[PATCH 0/3] Introduce credential record

2007-09-19 Thread David Howells



Hi Al, Christoph, Trond, Stephen, Casey,

Here's a set of patches that implement a very basic set of COW credentials.  It
compiles, links and runs for x86_64 with EXT3, (V)FAT, NFS, AFS, SELinux and
keyrings all enabled.  Most other filesystems are disabled, apart from things
like proc.  It is not intended to completely cover the kernel at this point.

The cred struct contains the credentials that the kernel needs to act upon
something or to create something.  Credentials that govern how a task may be
acted upon remain in the task struct.

Because keyrings and effective capabilities can be installed or changed in one
process by another process, they are shadowed by the cred structure rather than
residing there.  Additionally, the session and process keyrings are shared
between all the threads of a process.  The shadowing is performed by
update_current_cred() which is invoked on entry to any system call that might
need it.

A thread's cred struct may be read by that thread without any RCU precautions
as only that thread may replace the its own cred struct.  To change a thread's
credentials, dup_cred() should be called to create a new copy, the copy should
be changed, and then set_current_cred() should be called to make it live.  Once
live, it may not be changed as it may then be shared with file descriptors, RPC
calls and other threads.  RCU will be used to dispose of the old structure.


The three patches are:

 (1) Introduce struct cred and migrate fsuid, fsgid, the groups list and the
 keyrings pointer to it.

 (2) Introduce a security pointer into the cred struct and add LSM hooks to
 duplicate the information pointed to thereby and to free it.

 Make SELinux implement the hooks, splitting out some the task security
 data to be associated with struct cred instead.

 (3) Migrate the effective capabilities mask into the cred struct.


I plan on adding a fourth patch that will allow the LSM security contents of a
cred struct to be manipulated by the kernel - something that cachefiles and
possibly NFSd will require.

To substitute a temporary set of credentials, the cred struct attached to the
task should be altered, like so:

int get_privileged_creds(...)
{
/* get special privileged creds */
my_special_cred = dup_cred(current-cred);
change_fsuid(my_special_cred, 123);
}

int do_stuff(...)
{
struct cred *cred;

/* rotate in the new creds, saving the old */
cred = __set_current_cred(get_cred(my_special_cred));

do_privileged_stuff();

/* restore the old creds */
set_current_cred(cred);
}

One thing I'm not certain about is how this should interact with /proc, which
can display some of the stuff in the cred struct.  I think it may be necessary
to have a real cred pointer and an effective cred pointer, with the contents of
/proc coming from the real, but the effective governing what actually goes on.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] CRED: Split the task security data and move part of it into struct cred

2007-09-19 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

  Move into the cred struct the part of the task security data that defines
  how a task acts upon an object.  The part that defines how something acts
  upon a task remains attached to the task.
 
 This seems to me to be an unnatural and inappropriate separation.  Move the
 whole of the security blob into the cred if you must have a cred (which I
 was s glad Linux didn't have after having dealt with it in Solaris)
 rather than having two blobs to deal with.

The separation is necessary for a few reasons:

 (1) The task victimisation context must *not* be changed by a temporary
 override of the action and creation contexts for purposes such as
 cachefiles.

 (2) If the victimisation context is not included in the override cred, then I
 only need one copy of the override cred to do *all* the work for
 cachefiles.  I can share that singular override blob across every task
 that wishes to access the cache.

 (3) If the victimisation context is moved to the override cred, I have to
 create a new context every time I want to apply the override.  This means
 I have to deal with the possibility of OOM at such points.  I could cache
 the contexts, but that's messy - and unnecessary.

 If an LSM requires a different treatment between when a task is a subject and
 when it is an object the LSM should handle that itself.

Indeed, but I can help it to do so by providing separate security pointers on
the task struct and the cred struct.

 So put all these fields into one blob and attach them to the cred.

The separation is, I think, the correct thing to do.

 Actually, if you put all these fields in the task blob maybe you
 don't need to do your COW thing at all.

Whilst that is true, one of the purposes of this is to make it easier and
cleaner to effect the override.  Every field in the cred struct potentially
must be overridden.  That's a lot of context to save each time I need to apply
the override and a lot of context to restore each time I want to restore it.

With these patches, all I need to do is to take a ref and swap the cred
pointers with a memory barrier to satisfy the RCU, and then swap them back
again and release the ref.  It's much, much simpler.

Furthermore, with respect to LSM and SELinux, I think I can remove the SELinux
specific knowledge currently present in cachefiles by saying to LSM give me a
cred for kernel service X.  With SELinux this can do all the transformations
necessary to give me the appropriate action SID and file creation SID without
me needing to know that these concepts exist.  I just apply the cred I'm given
as an override.

With your suggestion, I either have to do a full set of transformations each
time I want to apply the override, or I have to know about SELinux or
whatever's internals.  Your objection to my earlier patch was this very point.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] CRED: Move the effective capabilities into the cred struct

2007-09-20 Thread David Howells

Andrew Morgan [EMAIL PROTECTED] wrote:

 OOC If we were to simply drop support for one process changing the
 capabilities of another, would we need this patch?

Well, the patch could be less, but there's still the possibility of a kernel
service wanting to override the capabilities mask.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/22] NFS: Configuration and mount option changes to enable local caching on NFS

2007-09-21 Thread David Howells

Changes to the kernel configuration defintions and to the NFS mount options to
allow the local caching support added by the previous patch to be enabled.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig|8 
 fs/nfs/client.c   |   14 ++
 fs/nfs/internal.h |2 ++
 fs/nfs/super.c|   40 ++--
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 8ae7eda..ebc7341 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1597,6 +1597,14 @@ config NFS_V4
 
  If unsure, say N.
 
+config NFS_FSCACHE
+   bool Provide NFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on NFS_FS=m  FSCACHE || NFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
 config NFS_DIRECTIO
bool Allow direct I/O on NFS files
depends on NFS_FS
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index f1783b2..0de4db4 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -543,7 +543,8 @@ error:
 /*
  * Create a version 2 or 3 client
  */
-static int nfs_init_server(struct nfs_server *server, const struct 
nfs_mount_data *data)
+static int nfs_init_server(struct nfs_server *server, const struct 
nfs_mount_data *data,
+  unsigned int extra_options)
 {
struct nfs_client *clp;
int error, nfsvers = 2;
@@ -580,6 +581,7 @@ static int nfs_init_server(struct nfs_server *server, const 
struct nfs_mount_dat
server-acregmax = data-acregmax * HZ;
server-acdirmin = data-acdirmin * HZ;
server-acdirmax = data-acdirmax * HZ;
+   server-options = extra_options;
 
/* Start lockd here, before we might error out */
error = nfs_start_lockd(server);
@@ -776,6 +778,7 @@ void nfs_free_server(struct nfs_server *server)
  * - keyed on server and FSID
  */
 struct nfs_server *nfs_create_server(const struct nfs_mount_data *data,
+unsigned extra_options,
 struct nfs_fh *mntfh)
 {
struct nfs_server *server;
@@ -787,7 +790,7 @@ struct nfs_server *nfs_create_server(const struct 
nfs_mount_data *data,
return ERR_PTR(-ENOMEM);
 
/* Get a client representation */
-   error = nfs_init_server(server, data);
+   error = nfs_init_server(server, data, extra_options);
if (error  0)
goto error;
 
@@ -911,7 +914,8 @@ error:
  * Create a version 4 volume record
  */
 static int nfs4_init_server(struct nfs_server *server,
-   const struct nfs4_mount_data *data, rpc_authflavor_t 
authflavour)
+   const struct nfs4_mount_data *data, rpc_authflavor_t 
authflavour,
+   unsigned int extra_options)
 {
int error;
 
@@ -930,6 +934,7 @@ static int nfs4_init_server(struct nfs_server *server,
server-acregmax = data-acregmax * HZ;
server-acdirmin = data-acdirmin * HZ;
server-acdirmax = data-acdirmax * HZ;
+   server-options = extra_options;
 
error = nfs_init_server_rpcclient(server, authflavour);
 
@@ -948,6 +953,7 @@ struct nfs_server *nfs4_create_server(const struct 
nfs4_mount_data *data,
  const char *mntpath,
  const char *ip_addr,
  rpc_authflavor_t authflavour,
+ unsigned int extra_options,
  struct nfs_fh *mntfh)
 {
struct nfs_fattr fattr;
@@ -967,7 +973,7 @@ struct nfs_server *nfs4_create_server(const struct 
nfs4_mount_data *data,
goto error;
 
/* set up the general RPC client */
-   error = nfs4_init_server(server, data, authflavour);
+   error = nfs4_init_server(server, data, authflavour, extra_options);
if (error  0)
goto error;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 76cf55d..34ef000 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -33,6 +33,7 @@ extern struct rpc_program nfs_program;
 extern void nfs_put_client(struct nfs_client *);
 extern struct nfs_client *nfs_find_client(const struct sockaddr_in *, int);
 extern struct nfs_server *nfs_create_server(const struct nfs_mount_data *,
+   unsigned int,
struct nfs_fh *);
 extern struct nfs_server *nfs4_create_server(const struct nfs4_mount_data *,
 const char *,
@@ -40,6 +41,7 @@ extern struct nfs_server *nfs4_create_server(const struct 
nfs4_mount_data *,
 const char *,
 const char *,
 rpc_authflavor_t

[PATCH 14/22] NFS: Use local caching

2007-09-21 Thread David Howells

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

http://people.redhat.com/steved/fscache/util-linux/

To mount an NFS filesystem to use caching, add an fsc option to the mount:

mount warthog:/ /a -o fsc

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/nfs/Makefile   |1 
 fs/nfs/client.c   |5 +
 fs/nfs/file.c |   51 ++
 fs/nfs/fscache-def.c  |  288 +++
 fs/nfs/fscache.c  |  372 +
 fs/nfs/fscache.h  |  144 +
 fs/nfs/inode.c|   48 +-
 fs/nfs/read.c |   28 +++
 fs/nfs/sysctl.c   |   44 +
 include/linux/nfs_fs.h|8 +
 include/linux/nfs_fs_sb.h |7 +
 11 files changed, 986 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index b55cb23..07c9345 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)  += nfs4proc.o nfs4xdr.o nfs4state.o 
nfs4renewd.o \
   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o
 nfs-objs   := $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a49f9fe..f1783b2 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -41,6 +41,7 @@
 #include delegation.h
 #include iostat.h
 #include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_CLIENT
 
@@ -137,6 +138,8 @@ static struct nfs_client *nfs_alloc_client(const char 
*hostname,
clp-cl_state = 1  NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+   nfs_fscache_get_client_cookie(clp);
+
return clp;
 
 error_3:
@@ -168,6 +171,8 @@ static void nfs_free_client(struct nfs_client *clp)
 
nfs4_shutdown_client(clp);
 
+   nfs_fscache_release_client_cookie(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp-cl_rpcclient))
rpc_shutdown_client(clp-cl_rpcclient);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 579cf8a..640179b 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -34,6 +34,8 @@
 
 #include delegation.h
 #include iostat.h
+#include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_FILE
 
@@ -54,6 +56,12 @@ static int nfs_check_flags(int flags);
 static int nfs_lock(struct file *filp, int cmd, struct file_lock *fl);
 static int nfs_flock(struct file *filp, int cmd, struct file_lock *fl);
 static int nfs_setlease(struct file *file, long arg, struct file_lock **fl);
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page 
*page);
+
+struct vm_operations_struct nfs_fs_vm_operations = {
+   .fault  = filemap_fault,
+   .page_mkwrite   = nfs_file_page_mkwrite,
+};
 
 const struct file_operations nfs_file_operations = {
.llseek = nfs_file_llseek,
@@ -259,6 +267,9 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * 
vma)
status = nfs_revalidate_mapping(inode, file-f_mapping);
if (!status)
status = generic_file_mmap(file, vma);
+   if (!status)
+   vma-vm_ops = nfs_fs_vm_operations;
+
return status;
 }
 
@@ -311,22 +322,48 @@ static int nfs_commit_write(struct file *file, struct 
page *page, unsigned offse
return status;
 }
 
+/*
+ * Partially or wholly invalidate a page
+ * - Release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page-mapping-host, page);
+
+   nfs_fscache_invalidate_page(page, page-mapping-host);
 }
 
+/*
+ * Release the private state associated with a page
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return true (may release) or false (may not)
+ */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
/* If PagePrivate() is set, then the page is not freeable */
-   return 0;
+   if (PagePrivate(page))
+   return 0;
+   return nfs_fscache_release_page(page, gfp);
 }
 
+/*
+ * Attempt to clear the private state associated with a page when an error
+ * occurs that requires the cached contents of an inode to be written back or
+ * destroyed
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return 0 if successful, -error otherwise
+ */
 static int nfs_launder_page(struct page *page)
 {
+   wait_on_page_fscache_write(page

[PATCH 10/22] CacheFiles: Add a hook to write a single page of data to an inode

2007-09-21 Thread David Howells

Add an address space operation to write one single page of data to an inode at
a page-aligned location (thus permitting the implementation to be highly
optimised).

This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.

Supply a generic implementation for this that uses the prepare_write() and
commit_write() address_space operations to bound a copy directly into the page
cache.

Hook the Ext2 and Ext3 operations to the generic implementation.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/ext2/inode.c|2 +
 fs/ext3/inode.c|3 ++
 include/linux/fs.h |7 
 mm/filemap.c   |   95 
 4 files changed, 107 insertions(+), 0 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 0079b2c..b3e4b50 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -695,6 +695,7 @@ const struct address_space_operations ext2_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 const struct address_space_operations ext2_aops_xip = {
@@ -713,6 +714,7 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 /*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index de4e316..93809eb 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1713,6 +1713,7 @@ static const struct address_space_operations 
ext3_ordered_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_writeback_aops = {
@@ -1727,6 +1728,7 @@ static const struct address_space_operations 
ext3_writeback_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_journalled_aops = {
@@ -1740,6 +1742,7 @@ static const struct address_space_operations 
ext3_journalled_aops = {
.bmap   = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage= ext3_releasepage,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 void ext3_set_aops(struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bf35441..38f67ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -434,6 +434,11 @@ struct address_space_operations {
int (*migratepage) (struct address_space *,
struct page *, struct page *);
int (*launder_page) (struct page *);
+   /* write the contents of the source page over the page at the specified
+* index in the target address space (the source page does not need to
+* be related to the target address space) */
+   int (*write_one_page)(struct address_space *, pgoff_t, struct page *);
+
 };
 
 struct backing_dev_info;
@@ -1669,6 +1674,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, 
const struct iovec *,
unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec 
*,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_page(struct address_space *,
+   pgoff_t, struct page *);
 extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, 
loff_t *ppos);
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t 
len, loff_t *ppos);
 extern void do_generic_mapping_read(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 3a61923..21aeee9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2016,6 +2016,101 @@ zero_length_segment:
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
+/**
+ * generic_file_buffered_write_one_page - Write a single page of data to an
+ * inode
+ * @mapping - The address space of the target inode
+ * @index - The target page in the target inode to fill
+ * @source - The data to write into the target page
+ *
+ * Write the data from the source page to the page in the nominated address
+ * space at the @index specified.  Note that the file will not be extended if
+ * the page crosses the EOF marker, in which case only the first part of the
+ * page will be written.
+ *
+ * The @source page does not need to have any association with the file or the
+ * target page offset.
+ */
+int

[PATCH 20/22] AFS: Implement shared-writable mmap

2007-09-21 Thread David Howells

Implement shared-writable mmap for AFS.

The key with which to access the file is obtained from the VMA at the point
where the PTE is made writable by the page_mkwrite() VMA op and cached in the
affected page.

If there's an outstanding write on the page made with a different key, then
page_mkwrite() will flush it before attaching a record of the new key.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/file.c |   20 +++-
 fs/afs/internal.h |1 +
 fs/afs/write.c|   35 +++
 3 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 525f7c5..1323df4 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -22,6 +22,7 @@ static int afs_readpage(struct file *file, struct page *page);
 static void afs_invalidatepage(struct page *page, unsigned long offset);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 static int afs_launder_page(struct page *page);
+static int afs_mmap(struct file *file, struct vm_area_struct *vma);
 
 const struct file_operations afs_file_operations = {
.open   = afs_open,
@@ -31,7 +32,7 @@ const struct file_operations afs_file_operations = {
.write  = do_sync_write,
.aio_read   = generic_file_aio_read,
.aio_write  = afs_file_write,
-   .mmap   = generic_file_readonly_mmap,
+   .mmap   = afs_mmap,
.splice_read= generic_file_splice_read,
.fsync  = afs_fsync,
.lock   = afs_lock,
@@ -56,6 +57,11 @@ const struct address_space_operations afs_fs_aops = {
.writepages = afs_writepages,
 };
 
+static struct vm_operations_struct afs_file_vm_ops = {
+   .fault  = filemap_fault,
+   .page_mkwrite   = afs_page_mkwrite,
+};
+
 /*
  * open an AFS file or directory and attach a key to it
  */
@@ -295,3 +301,15 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
_leave( = 0);
return 0;
 }
+
+/*
+ * memory map part of an AFS file
+ */
+static int afs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   _enter();
+
+   file_accessed(file);
+   vma-vm_ops = afs_file_vm_ops;
+   return 0;
+}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e1bcce0..12afccc 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -743,6 +743,7 @@ extern ssize_t afs_file_write(struct kiocb *, const struct 
iovec *,
  unsigned long, loff_t);
 extern int afs_writeback_all(struct afs_vnode *);
 extern int afs_fsync(struct file *, struct dentry *, int);
+extern int afs_page_mkwrite(struct vm_area_struct *, struct page *);
 
 
 /*/
diff --git a/fs/afs/write.c b/fs/afs/write.c
index ac621e8..dd471f0 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -155,6 +155,8 @@ static int afs_prepare_page(struct afs_vnode *vnode, struct 
page *page,
  * prepare to perform part of a write to a page
  * - the caller holds the page locked, preventing it from being written out or
  *   modified by anyone else
+ * - may be called from afs_page_mkwrite() to set up a page for modification
+ *   through shared-writable mmap
  */
 int afs_prepare_write(struct file *file, struct page *page,
  unsigned offset, unsigned to)
@@ -833,3 +835,36 @@ int afs_fsync(struct file *file, struct dentry *dentry, 
int datasync)
_leave( = %d, ret);
return ret;
 }
+
+/*
+ * notification that a previously read-only page is about to become writable
+ * - if it returns an error, the caller will deliver a bus error signal
+ *
+ * we use this to make a record of the key with which the writeback should be
+ * performed and to flush any outstanding writes made with a different key
+ *
+ * the key to be used is attached to the struct file pinned by the VMA
+ */
+int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+   struct afs_vnode *vnode = AFS_FS_I(vma-vm_file-f_mapping-host);
+   struct key *key = vma-vm_file-private_data;
+   int ret;
+
+   _enter({{%x:%u},%x},{%lx},
+  vnode-fid.vid, vnode-fid.vnode, key_serial(key), page-index);
+
+   do {
+   lock_page(page);
+   if (page-mapping == vma-vm_file-f_mapping)
+   ret = afs_prepare_write(vma-vm_file, page, 0,
+   PAGE_SIZE);
+   else
+   ret = 0; /* seems there was interference - let the
+ * caller deal with it */
+   unlock_page(page);
+   } while (ret == AOP_TRUNCATED_PAGE);
+
+   _leave( = %d, ret);
+   return ret;
+}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/22] FS-Cache: Make kAFS use FS-Cache

2007-09-21 Thread David Howells

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches.  The kAFS filesystem will use caching
automatically if it's available.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig |8 +
 fs/afs/Makefile|3 
 fs/afs/cache.c |  505 ++--
 fs/afs/cache.h |   15 --
 fs/afs/cell.c  |   16 +-
 fs/afs/file.c  |  212 +-
 fs/afs/fsclient.c  |   32 ++-
 fs/afs/inode.c |   25 +--
 fs/afs/internal.h  |   53 ++---
 fs/afs/main.c  |   27 +--
 fs/afs/mntpt.c |4 
 fs/afs/rxrpc.c |1 
 fs/afs/vlclient.c  |2 
 fs/afs/vlocation.c |   23 +-
 fs/afs/volume.c|   14 -
 fs/afs/write.c |6 -
 16 files changed, 563 insertions(+), 383 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index ebc7341..158a8d8 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -2059,6 +2059,14 @@ config AFS_DEBUG
 
  If unsure, say N.
 
+config AFS_FSCACHE
+   bool Provide AFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on AFS_FS=m  FSCACHE || AFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want AFS data to be cached locally on disk through
+ the generic filesystem cache manager
+
 config 9P_FS
tristate Plan 9 Resource Sharing Support (9P2000) (Experimental)
depends on INET  NET_9P  EXPERIMENTAL
diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index a666710..4f64b95 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -2,7 +2,10 @@
 # Makefile for Red Hat Linux AFS client.
 #
 
+afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o
+
 kafs-objs := \
+   $(afs-cache-y) \
callback.o \
cell.o \
cmservice.o \
diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index de0d7de..a5d6a70 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -9,248 +9,399 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-   const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
-   .name   = cell_ix,
-   .data_size  = sizeof(struct afs_cache_cell),
-   .keys[0]= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-   .match  = afs_cell_cache_match,
-   .update = afs_cell_cache_update,
+#include linux/slab.h
+#include linux/sched.h
+#include internal.h
+
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+  void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+  void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+  const void *buffer,
+  uint16_t buflen);
+
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void 
*cookie_netfs_data,
+   const void *buffer,
+   uint16_t buflen);
+
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+void *buffer, uint16_t buflen);
+
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+   const void *buffer,
+   uint16_t buflen);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+static struct fscache_netfs_operations afs_cache_ops = {
+};
+
+struct fscache_netfs afs_cache_netfs = {
+   .name   = afs,
+   .version= 0,
+   .ops= afs_cache_ops,
+};
+
+struct fscache_cookie_def afs_cell_cache_index_def = {
+   .name   = AFS.cell,
+   .type   = FSCACHE_COOKIE_TYPE_INDEX,
+   .get_key= afs_cell_cache_get_key,
+   .get_aux

Re: [PATCH 00/22] Introduce credential record

2007-09-21 Thread David Howells


This patch set is available for download as a tarball from:

http://people.redhat.com/~dhowells/nfs/nfs+fscache-23.tar.bz2

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/22] FS-Cache: Recruit a couple of page flags for cache management

2007-09-21 Thread David Howells

Recruit a couple of page flags to aid in cache management.  The following extra
flags are defined:

 (1) PG_fscache (PG_owner_priv_2)

 The marked page is backed by a local cache and is pinning resources in the
 cache driver.

 (2) PG_fscache_write (PG_owner_priv_3)

 The marked page is being written to the local cache.  The page may not be
 modified whilst this is in progress.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to detect this.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/splice.c|2 +-
 include/linux/page-flags.h |   30 +-
 include/linux/pagemap.h|   11 +++
 mm/filemap.c   |   16 
 mm/migrate.c   |2 +-
 mm/page_alloc.c|3 +++
 mm/readahead.c |9 +
 mm/swap.c  |4 ++--
 mm/swap_state.c|4 ++--
 mm/truncate.c  |   10 +-
 mm/vmscan.c|2 +-
 11 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index ceb1f07..1a8b80c 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info 
*pipe,
 */
wait_on_page_writeback(page);
 
-   if (PagePrivate(page))
+   if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
 
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..eaf9854 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -83,19 +83,24 @@
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
+#define PG_owner_priv_213  /* Owner use. If pagecache, fs 
may use */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk16  /* Has blocks allocated on-disk 
*/
 #define PG_reclaim 17  /* To be reclaimed asap */
+#define PG_owner_priv_318  /* Owner use. If pagecache, fs 
may use */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
-/* PG_owner_priv_1 users should have descriptive aliases */
+/* PG_owner_priv_1/2/3 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
+#define PG_fscache PG_owner_priv_2 /* Backed by local cache */
+#define PG_fscache_write   PG_owner_priv_3 /* Writing to local cache */
+
 
 #if (BITS_PER_LONG  32)
 /*
@@ -199,6 +204,18 @@ static inline void SetPageUptodate(struct page *page)
 #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback,  \
(page)-flags)
 
+#define PageFsCache(page)  test_bit(PG_fscache, (page)-flags)
+#define SetPageFsCache(page)   set_bit(PG_fscache, (page)-flags)
+#define ClearPageFsCache(page) clear_bit(PG_fscache, (page)-flags)
+#define TestSetPageFsCache(page) test_and_set_bit(PG_fscache, (page)-flags)
+#define TestClearPageFsCache(page) test_and_clear_bit(PG_fscache, 
(page)-flags)
+
+#define PageFsCacheWrite(page) test_bit(PG_fscache_write, 
(page)-flags)
+#define SetPageFsCacheWrite(page)  set_bit(PG_fscache_write, 
(page)-flags)
+#define ClearPageFsCacheWrite(page)clear_bit(PG_fscache_write, 
(page)-flags)
+#define TestSetPageFsCacheWrite(page)  test_and_set_bit(PG_fscache_write, 
(page)-flags)
+#define TestClearPageFsCacheWrite(page)
test_and_clear_bit(PG_fscache_write, (page)-flags)
+
 #define PageBuddy(page)test_bit(PG_buddy, (page)-flags)
 #define __SetPageBuddy(page)   __set_bit(PG_buddy, (page)-flags)
 #define __ClearPageBuddy(page) __clear_bit(PG_buddy, (page)-flags)
@@ -272,4 +289,15 @@ static inline void set_page_writeback(struct page *page)
test_set_page_writeback(page);
 }
 
+/**
+ * page_has_private - Determine if page has private stuff
+ * @page: The page to be checked
+ *
+ * Determine if a page has private stuff, indicating that release routines
+ * should be invoked upon it.
+ */
+#define page_has_private(page) \
+   ((page)-flags  ((1  PG_private) |   \
+ (1  PG_fscache)))
+
 #endif /* PAGE_FLAGS_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8a83537..d1049b6 100644

[PATCH 05/22] FS-Cache: Release page-private after failed readahead

2007-09-21 Thread David Howells

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 mm/readahead.c |   40 ++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 39bf45d..12d1378 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -15,6 +15,7 @@
 #include linux/backing-dev.h
 #include linux/task_io_accounting_ops.h
 #include linux/pagevec.h
+#include linux/buffer_head.h
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -51,6 +52,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
 
 #define list_to_page(head) (list_entry((head)-prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+struct page *page)
+{
+   if (PagePrivate(page)) {
+   if (TestSetPageLocked(page))
+   BUG();
+   page-mapping = mapping;
+   do_invalidatepage(page, 0);
+   page-mapping = NULL;
+   unlock_page(page);
+   }
+   page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+   struct page *victim;
+
+   while (!list_empty(pages)) {
+   victim = list_to_page(pages);
+   list_del(victim-lru);
+   read_cache_pages_invalidate_page(mapping, victim);
+   }
+}
+
 /**
  * read_cache_pages - populate an address space with some pages  start reads 
against them
  * @mapping: the address_space
@@ -74,14 +110,14 @@ int read_cache_pages(struct address_space *mapping, struct 
list_head *pages,
page = list_to_page(pages);
list_del(page-lru);
if (add_to_page_cache(page, mapping, page-index, GFP_KERNEL)) {
-   page_cache_release(page);
+   read_cache_pages_invalidate_page(mapping, page);
continue;
}
ret = filler(data, page);
if (!pagevec_add(lru_pvec, page))
__pagevec_lru_add(lru_pvec);
if (ret) {
-   put_pages_list(pages);
+   read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/22] CRED: Split the task security data and move part of it into struct cred

2007-09-21 Thread David Howells

Move into the cred struct the part of the task security data that defines how a
task acts upon an object.  The part that defines how something acts upon a task
remains attached to the task.

For SELinux this requires some of task_security_struct to be split off into
cred_security_struct which is then attached to struct cred.  Note that the
contents of cred_security_struct may not be changed without the generation of a
new struct cred.

The split is as follows:

 (*) create_sid, keycreate_sid and sockcreate_sid just move across.

 (*) sid is split into victim_sid - which remains - and action_sid - which
 migrates.

 (*) osid, exec_sid and ptrace_sid remain.

victim_sid is the SID used to govern actions upon the task.  action_sid is used
to govern actions made by the task.

When accessing the cred_security_struct of another process, RCU read procedures
must be observed.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/cred.h  |1 
 include/linux/security.h  |   33 ++
 kernel/cred.c |7 +
 security/dummy.c  |   11 +
 security/selinux/exports.c|6 
 security/selinux/hooks.c  |  497 +++--
 security/selinux/include/objsec.h |   16 +
 security/selinux/selinuxfs.c  |8 -
 security/selinux/xfrm.c   |6 
 9 files changed, 379 insertions(+), 206 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index f3d98a8..98d5279 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -26,6 +26,7 @@ struct cred {
gid_t   gid;/* fsgid as was */
struct rcu_head exterminate;/* cred destroyer */
struct group_info   *group_info;
+   void*security;
 
/* caches for references to the three task keyrings
 * - note that key_ref_t isn't typedef'd at this point, hence the odd
diff --git a/include/linux/security.h b/include/linux/security.h
index 1a15526..74cc204 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -504,6 +504,17 @@ struct request_sock;
  * @file contains the file structure being received.
  * Return 0 if permission is granted.
  *
+ * Security hooks for credential structure operations.
+ *
+ * @cred_dup:
+ * Duplicate the credentials onto a duplicated cred structure.
+ * @cred points to the credentials structure.  cred-security points to the
+ * security struct that was attached to the original cred struct, but it
+ * lacks a reference for the duplication if reference counting is needed.
+ * @cred_destroy:
+ * Destroy the credentials attached to a cred structure.
+ * @cred points to the credentials structure that is to be destroyed.
+ *
  * Security hooks for task operations.
  *
  * @task_create:
@@ -1257,6 +1268,9 @@ struct security_operations {
struct fown_struct * fown, int sig);
int (*file_receive) (struct file * file);
 
+   int (*cred_dup)(struct cred *cred);
+   void (*cred_destroy)(struct cred *cred);
+
int (*task_create) (unsigned long clone_flags);
int (*task_alloc_security) (struct task_struct * p);
void (*task_free_security) (struct task_struct * p);
@@ -1864,6 +1878,16 @@ static inline int security_file_receive (struct file 
*file)
return security_ops-file_receive (file);
 }
 
+static inline int security_cred_dup(struct cred *cred)
+{
+   return security_ops-cred_dup(cred);
+}
+
+static inline void security_cred_destroy(struct cred *cred)
+{
+   return security_ops-cred_destroy(cred);
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return security_ops-task_create (clone_flags);
@@ -2546,6 +2570,15 @@ static inline int security_file_receive (struct file 
*file)
return 0;
 }
 
+static inline int security_cred_dup(struct cred *cred)
+{
+   return 0;
+}
+
+static inline void security_cred_destroy(struct cred *cred)
+{
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return 0;
diff --git a/kernel/cred.c b/kernel/cred.c
index 4710b60..5b827cb 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -92,6 +92,12 @@ struct cred *dup_cred(const struct cred *pcred)
if (likely(cred)) {
*cred = *pcred;
atomic_set(cred-usage, 1);
+
+   if (security_cred_dup(cred)  0) {
+   kfree(cred);
+   return NULL;
+   }
+
get_group_info(cred-group_info);
key_get(key_ref_to_ptr(cred-session_keyring));
key_get(key_ref_to_ptr(cred-process_keyring));
@@ -109,6 +115,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
 {
struct cred *cred = container_of(rcu, struct cred, exterminate);
 
+   security_cred_destroy(cred);
put_group_info(cred-group_info);
key_ref_put

Re: [PATCH 10/22] CacheFiles: Add a hook to write a single page of data to an inode

2007-09-21 Thread David Howells

Trond Myklebust [EMAIL PROTECTED] wrote:

 So why do you need a new address space operation? AFAICS the generic
 implementation will work for pretty much everyone who supports the
 existing prepare_write()/commit_write().

Because Christoph decreed that I wasn't allowed to call prepare_write() and
commit_write() directly.  It's possible that the method should be in the
inode_operations rather than on the address space.

 Furthermore, you don't appear to supply any alternative optimised
 implementations...

Optimised in what fashion?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/22] CacheFiles: Permit the page lock state to be monitored

2007-09-21 Thread David Howells


Trond Myklebust [EMAIL PROTECTED] wrote:

  This is used by CacheFiles to detect read completion on a page in the
  backing filesystem so that it can then copy the data to the waiting netfs
  page.
 
 Won't it in any case want to lock the page too?

No.  Why would it?  All it wants to do is to read the page (copying it to the
netfs's page), assuming it becomes PG_uptodate.

 That would be the only way to ensure that the page is still mapped into the
 address space when you're writing it out...

I don't understand what you're getting at.  Write the page out where?  We've
just read it in from the cache, so why would we be writing it back out?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/22] NFS: Use local caching

2007-09-21 Thread David Howells

David Howells [EMAIL PROTECTED] wrote:

 Peter Staubach [EMAIL PROTECTED] wrote:
 
  Did I miss the section where the modified semantics about which
  mounted file systems can use the cache and which ones can not
  was implemented?
 
 Yes.

fs/nfs/super.c:

case Opt_sharecache:
mnt-flags = ~NFS_MOUNT_UNSHARED;
break;
case Opt_nosharecache:
mnt-flags |= NFS_MOUNT_UNSHARED;
mnt-options = ~NFS_OPTION_FSCACHE;
break;
case Opt_fscache:
/* sharing is mandatory with fscache */
mnt-options |= NFS_OPTION_FSCACHE;
mnt-flags = ~NFS_MOUNT_UNSHARED;
break;
case Opt_nofscache:
mnt-options = ~NFS_OPTION_FSCACHE;
break;

Hmmm...  Actually, I'm not sure this is sufficient.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/24] CRED: Alter security_task_getsecid() and similar to return both task SIDs

2007-09-26 Thread David Howells

Alter security_task_getsecid(), selinux_get_task_sid() and associated functions
to return both the objective/victim and subjective/action task SIDs.  Both
results are optional by submitting NULL result pointers.

Interestingly, AF_NETLINK calls directly into SELinux.  I suspect this to be
incorrect.  It should probably use security_task_getsecid() instead.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 drivers/usb/core/devio.c  |4 ++--
 include/linux/security.h  |   18 +-
 include/linux/selinux.h   |   15 ++-
 kernel/auditsc.c  |   14 +++---
 net/netlabel/netlabel_unlabeled.c |2 +-
 net/netlink/af_netlink.c  |2 +-
 security/dummy.c  |3 ++-
 security/selinux/exports.c|   22 ++
 security/selinux/hooks.c  |5 +++--
 security/selinux/xfrm.c   |   12 ++--
 10 files changed, 63 insertions(+), 34 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 927a181..1e651d8 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -579,7 +579,7 @@ static int usbdev_open(struct inode *inode, struct file 
*file)
ps-disc_euid = current-euid;
ps-disccontext = NULL;
ps-ifclaimed = 0;
-   security_task_getsecid(current, ps-secid);
+   security_task_getsecid(current, NULL, ps-secid);
smp_wmb();
list_add_tail(ps-list, dev-filelist);
file-private_data = ps;
@@ -1069,7 +1069,7 @@ static int proc_do_submiturb(struct dev_state *ps, struct 
usbdevfs_urb *uurb,
as-pid = get_pid(task_pid(current));
as-uid = current-uid;
as-euid = current-euid;
-   security_task_getsecid(current, as-secid);
+   security_task_getsecid(current, NULL, as-secid);
if (!(uurb-endpoint  USB_DIR_IN)) {
if (copy_from_user(as-urb-transfer_buffer, uurb-buffer, 
as-urb-transfer_buffer_length)) {
free_async(as);
diff --git a/include/linux/security.h b/include/linux/security.h
index 74cc204..093 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -584,7 +584,12 @@ struct request_sock;
  * Return 0 if permission is granted.
  * @task_getsecid:
  * Retrieve the security identifier of the process @p.
- * @p contains the task_struct for the process and place is into @secid.
+ * @p contains the task_struct for the process to be interrogated.  The
+ * security ID of the task itself is placed in [EMAIL PROTECTED], and the
+ * security ID as which the task is currently acting is placed in
+ * [EMAIL PROTECTED]  Either result pointer may be NULL if that particular
+ * ID is not required.
+ *
  * @task_setgroups:
  * Check permission before setting the supplementary group set of the
  * current process.
@@ -1281,7 +1286,8 @@ struct security_operations {
int (*task_setpgid) (struct task_struct * p, pid_t pgid);
int (*task_getpgid) (struct task_struct * p);
int (*task_getsid) (struct task_struct * p);
-   void (*task_getsecid) (struct task_struct * p, u32 * secid);
+   void (*task_getsecid) (struct task_struct * p,
+  u32 * object_secid, u32 * subject_secid);
int (*task_setgroups) (struct group_info *group_info);
int (*task_setnice) (struct task_struct * p, int nice);
int (*task_setioprio) (struct task_struct * p, int ioprio);
@@ -1936,9 +1942,10 @@ static inline int security_task_getsid (struct 
task_struct *p)
return security_ops-task_getsid (p);
 }
 
-static inline void security_task_getsecid (struct task_struct *p, u32 *secid)
+static inline void security_task_getsecid (struct task_struct *p,
+  u32 *object_secid, u32 *action_secid)
 {
-   security_ops-task_getsecid (p, secid);
+   security_ops-task_getsecid (p, object_secid, action_secid);
 }
 
 static inline int security_task_setgroups (struct group_info *group_info)
@@ -2625,7 +2632,8 @@ static inline int security_task_getsid (struct 
task_struct *p)
return 0;
 }
 
-static inline void security_task_getsecid (struct task_struct *p, u32 *secid)
+static inline void security_task_getsecid (struct task_struct *p,
+  u32 *object_secid, u32 *action_secid)
 { }
 
 static inline int security_task_setgroups (struct group_info *group_info)
diff --git a/include/linux/selinux.h b/include/linux/selinux.h
index d1b7ca6..26cdec3 100644
--- a/include/linux/selinux.h
+++ b/include/linux/selinux.h
@@ -101,12 +101,16 @@ void selinux_get_ipc_sid(const struct kern_ipc_perm 
*ipcp, u32 *sid);
 
 /**
  * selinux_get_task_sid - return the SID of task
- * @tsk: the task whose SID will be returned
- * @sid: pointer to security context ID to be filled in.
+ * @tsk: the task to be queried.
+ * @object_sid: optional pointer

[PATCH 02/24] CRED: Split the task security data and move part of it into struct cred

2007-09-26 Thread David Howells

Move into the cred struct the part of the task security data that defines how a
task acts upon an object.  The part that defines how something acts upon a task
remains attached to the task.

For SELinux this requires some of task_security_struct to be split off into
cred_security_struct which is then attached to struct cred.  Note that the
contents of cred_security_struct may not be changed without the generation of a
new struct cred.

The split is as follows:

 (*) create_sid, keycreate_sid and sockcreate_sid just move across.

 (*) sid is split into victim_sid - which remains - and action_sid - which
 migrates.

 (*) osid, exec_sid and ptrace_sid remain.

victim_sid is the SID used to govern actions upon the task.  action_sid is used
to govern actions made by the task.

When accessing the cred_security_struct of another process, RCU read procedures
must be observed.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/cred.h  |1 
 include/linux/security.h  |   33 ++
 kernel/cred.c |7 +
 security/dummy.c  |   11 +
 security/selinux/exports.c|6 
 security/selinux/hooks.c  |  497 +++--
 security/selinux/include/objsec.h |   16 +
 security/selinux/selinuxfs.c  |8 -
 security/selinux/xfrm.c   |6 
 9 files changed, 379 insertions(+), 206 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 0cc4400..7e35b2f 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -26,6 +26,7 @@ struct cred {
gid_t   gid;/* fsgid as was */
struct rcu_head exterminate;/* cred destroyer */
struct group_info   *group_info;
+   void*security;
 
/* caches for references to the three task keyrings
 * - note that key_ref_t isn't typedef'd at this point, hence the odd
diff --git a/include/linux/security.h b/include/linux/security.h
index 1a15526..74cc204 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -504,6 +504,17 @@ struct request_sock;
  * @file contains the file structure being received.
  * Return 0 if permission is granted.
  *
+ * Security hooks for credential structure operations.
+ *
+ * @cred_dup:
+ * Duplicate the credentials onto a duplicated cred structure.
+ * @cred points to the credentials structure.  cred-security points to the
+ * security struct that was attached to the original cred struct, but it
+ * lacks a reference for the duplication if reference counting is needed.
+ * @cred_destroy:
+ * Destroy the credentials attached to a cred structure.
+ * @cred points to the credentials structure that is to be destroyed.
+ *
  * Security hooks for task operations.
  *
  * @task_create:
@@ -1257,6 +1268,9 @@ struct security_operations {
struct fown_struct * fown, int sig);
int (*file_receive) (struct file * file);
 
+   int (*cred_dup)(struct cred *cred);
+   void (*cred_destroy)(struct cred *cred);
+
int (*task_create) (unsigned long clone_flags);
int (*task_alloc_security) (struct task_struct * p);
void (*task_free_security) (struct task_struct * p);
@@ -1864,6 +1878,16 @@ static inline int security_file_receive (struct file 
*file)
return security_ops-file_receive (file);
 }
 
+static inline int security_cred_dup(struct cred *cred)
+{
+   return security_ops-cred_dup(cred);
+}
+
+static inline void security_cred_destroy(struct cred *cred)
+{
+   return security_ops-cred_destroy(cred);
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return security_ops-task_create (clone_flags);
@@ -2546,6 +2570,15 @@ static inline int security_file_receive (struct file 
*file)
return 0;
 }
 
+static inline int security_cred_dup(struct cred *cred)
+{
+   return 0;
+}
+
+static inline void security_cred_destroy(struct cred *cred)
+{
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return 0;
diff --git a/kernel/cred.c b/kernel/cred.c
index 5b56b2b..9868eef 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -94,6 +94,12 @@ struct cred *dup_cred(const struct cred *pcred)
if (likely(cred)) {
*cred = *pcred;
atomic_set(cred-usage, 1);
+
+   if (security_cred_dup(cred)  0) {
+   kfree(cred);
+   return NULL;
+   }
+
get_group_info(cred-group_info);
 #ifdef CONFIG_KEYS
key_get(key_ref_to_ptr(cred-session_keyring));
@@ -113,6 +119,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
 {
struct cred *cred = container_of(rcu, struct cred, exterminate);
 
+   security_cred_destroy(cred);
put_group_info(cred-group_info);
key_ref_put(cred-session_keyring);
key_ref_put

[PATCH 04/24] CRED: Move the effective capabilities into the cred struct

2007-09-26 Thread David Howells

Move the effective capabilities mask from the task struct into the credentials
record.

Note that the effective capabilities mask in the cred struct shadows that in
the task_struct because a thread can have its capabilities masks changed by
another thread.  The shadowing is performed by update_current_cred() which is
invoked on entry to any system call that might need it.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/buffer.c   |3 +++
 fs/ioprio.c   |3 +++
 fs/open.c |   27 +--
 fs/proc/array.c   |2 +-
 fs/readdir.c  |3 +++
 include/linux/cred.h  |2 ++
 include/linux/init_task.h |2 +-
 include/linux/sched.h |2 +-
 ipc/msg.c |3 +++
 ipc/sem.c |3 +++
 ipc/shm.c |3 +++
 kernel/acct.c |3 +++
 kernel/capability.c   |3 +++
 kernel/compat.c   |3 +++
 kernel/cred.c |   36 +---
 kernel/exit.c |2 ++
 kernel/fork.c |6 +-
 kernel/futex.c|3 +++
 kernel/futex_compat.c |3 +++
 kernel/kexec.c|3 +++
 kernel/module.c   |6 ++
 kernel/ptrace.c   |3 +++
 kernel/sched.c|9 +
 kernel/signal.c   |6 ++
 kernel/sys.c  |   39 +++
 kernel/sysctl.c   |3 +++
 kernel/time.c |9 +
 kernel/uid16.c|3 +++
 mm/mempolicy.c|6 ++
 mm/migrate.c  |3 +++
 mm/mlock.c|4 
 mm/mmap.c |3 +++
 mm/mremap.c   |3 +++
 mm/oom_kill.c |9 +++--
 mm/swapfile.c |6 ++
 net/compat.c  |6 ++
 net/socket.c  |   45 +
 security/commoncap.c  |   32 +---
 security/dummy.c  |   22 ++
 39 files changed, 282 insertions(+), 50 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0e5ec37..9aabf79 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2909,6 +2909,9 @@ asmlinkage long sys_bdflush(int func, long data)
 {
static int msg_count;
 
+   if (update_current_cred()  0)
+   return -ENOMEM;
+
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
 
diff --git a/fs/ioprio.c b/fs/ioprio.c
index 10d2c21..d32b7b7 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -63,6 +63,9 @@ asmlinkage long sys_ioprio_set(int which, int who, int ioprio)
struct pid *pgrp;
int ret;
 
+   if (update_current_cred()  0)
+   return -ENOMEM;
+
switch (class) {
case IOPRIO_CLASS_RT:
if (!capable(CAP_SYS_ADMIN))
diff --git a/fs/open.c b/fs/open.c
index 0c05863..f765ec5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -450,7 +450,7 @@ out:
 asmlinkage long sys_faccessat(int dfd, const char __user *filename, int mode)
 {
struct nameidata nd;
-   kernel_cap_t old_cap;
+   kernel_cap_t old_cap, want_cap = CAP_EMPTY_SET;
struct cred *cred;
int res;
 
@@ -461,33 +461,26 @@ asmlinkage long sys_faccessat(int dfd, const char __user 
*filename, int mode)
if (res  0)
return res;
 
-   old_cap = current-cap_effective;
+   /* Clear the capabilities if we switch to a non-root user */
+   if (!current-uid)
+   want_cap = current-cap_permitted;
+
+   old_cap = current-cred-cap_effective;
 
if (current-cred-uid != current-uid ||
-   current-cred-gid != current-gid) {
+   current-cred-gid != current-gid ||
+   current-cred-cap_effective != want_cap) {
cred = dup_cred(current-cred);
if (!cred)
return -ENOMEM;
 
change_fsuid(cred, current-uid);
change_fsgid(cred, current-gid);
+   change_cap(cred, want_cap);
} else {
cred = get_current_cred();
}
 
-   /*
-* Clear the capabilities if we switch to a non-root user
-*
-* FIXME: There is a race here against sys_capset.  The
-* capabilities can change yet we will restore the old
-* value below.  We should hold task_capabilities_lock,
-* but we cannot because user_path_walk can sleep.
-*/
-   if (current-uid)
-   cap_clear(current-cap_effective);
-   else
-   current-cap_effective = current-cap_permitted;
-
cred = __set_current_cred(cred);
res = __user_walk_fd(dfd, filename, LOOKUP_FOLLOW|LOOKUP_ACCESS, nd);
if (res)
@@ -506,8 +499,6 @@ out_path_release:
path_release(nd);
 out:
set_current_cred(cred);
-   current

[PATCH 06/24] CRED: Request a credential record for a kernel service

2007-09-26 Thread David Howells

Request a credential record for the named kernel service.  This produces a
cred struct with appropriate DAC and MAC controls for effecting that service.
It may be used to override the credentials on a task to do work on that task's
behalf.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/cred.h |2 +
 include/linux/security.h |   43 +
 kernel/cred.c|   68 ++
 security/dummy.c |   13 +
 security/selinux/hooks.c |   47 
 5 files changed, 173 insertions(+), 0 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 78924d5..b2d0ac9 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -51,6 +51,8 @@ extern void change_fsgid(struct cred *, gid_t);
 extern void change_groups(struct cred *, struct group_info *);
 extern void change_cap(struct cred *, kernel_cap_t);
 extern struct cred *dup_cred(const struct cred *);
+extern struct cred *get_kernel_cred(const char *, struct task_struct *);
+extern int change_create_files_as(struct cred *, struct inode *);
 
 /**
  * get_cred - Get an extra reference on a credentials record
diff --git a/include/linux/security.h b/include/linux/security.h
index 093..b7c06c3 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -514,6 +514,18 @@ struct request_sock;
  * @cred_destroy:
  * Destroy the credentials attached to a cred structure.
  * @cred points to the credentials structure that is to be destroyed.
+ * @cred_kernel_act_as:
+ * Set the credentials for a kernel service to act as (subjective context).
+ * @cred points to the credentials structure to be filled in.
+ * @service names the service making the request.
+ * @daemon: A userspace daemon to be used as a base for the context.
+ * Return 0 if successful.
+ * @cred_create_files_as:
+ * Set the file creation context in a credentials record to be the same as
+ * the objective context of an inode.
+ * @cred points to the credentials structure to be altered.
+ * @inode points to the inode to use as a reference.
+ * Return 0 if successful.
  *
  * Security hooks for task operations.
  *
@@ -1275,6 +1287,9 @@ struct security_operations {
 
int (*cred_dup)(struct cred *cred);
void (*cred_destroy)(struct cred *cred);
+   int (*cred_kernel_act_as)(struct cred *cred, const char *service,
+  struct task_struct *daemon);
+   int (*cred_create_files_as)(struct cred *cred, struct inode *inode);
 
int (*task_create) (unsigned long clone_flags);
int (*task_alloc_security) (struct task_struct * p);
@@ -1894,6 +1909,21 @@ static inline void security_cred_destroy(struct cred 
*cred)
return security_ops-cred_destroy(cred);
 }
 
+static inline int security_cred_kernel_act_as(struct cred *cred,
+  const char *service,
+  struct task_struct *daemon)
+{
+   return security_ops-cred_kernel_act_as(cred, service, daemon);
+}
+
+static inline int security_cred_create_files_as(struct cred *cred,
+   struct inode *inode)
+{
+   if (IS_PRIVATE(inode))
+   return -EINVAL;
+   return security_ops-cred_create_files_as(cred, inode);
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return security_ops-task_create (clone_flags);
@@ -2586,6 +2616,19 @@ static inline void security_cred_destroy(struct cred 
*cred)
 {
 }
 
+static inline int security_cred_kernel_act_as(struct cred *cred,
+ const char *service,
+ struct task_struct *daemon)
+{
+   return 0;
+}
+
+static inline int security_cred_create_files_as(struct cred *cred,
+   struct inode *inode)
+{
+   return 0;
+}
+
 static inline int security_task_create (unsigned long clone_flags)
 {
return 0;
diff --git a/kernel/cred.c b/kernel/cred.c
index f545634..294b33a 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -210,3 +210,71 @@ void change_cap(struct cred *cred, kernel_cap_t cap)
 }
 
 EXPORT_SYMBOL(change_cap);
+
+/**
+ * get_kernel_cred - Get credentials for a named kernel service
+ * @service: The name of the service
+ * @daemon: A userspace daemon to be used as a base for the context
+ *
+ * Get a set of credentials for a specific kernel service.  These can then be
+ * used to override a task's credentials so that work can be done on behalf of
+ * that task.
+ *
+ * @daemon is used to provide a base for the security context, but can be NULL.
+ * If @deamon is supplied, then the cred's uid, gid and groups list will be
+ * derived from that; otherwise they'll be set to 0 and no groups.
+ *
+ * @daemon is also passd to the LSM module as a base from

[PATCH 07/24] FS-Cache: Release page-private after failed readahead

2007-09-26 Thread David Howells

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 mm/readahead.c |   40 ++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 39bf45d..12d1378 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -15,6 +15,7 @@
 #include linux/backing-dev.h
 #include linux/task_io_accounting_ops.h
 #include linux/pagevec.h
+#include linux/buffer_head.h
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -51,6 +52,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
 
 #define list_to_page(head) (list_entry((head)-prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+struct page *page)
+{
+   if (PagePrivate(page)) {
+   if (TestSetPageLocked(page))
+   BUG();
+   page-mapping = mapping;
+   do_invalidatepage(page, 0);
+   page-mapping = NULL;
+   unlock_page(page);
+   }
+   page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+   struct page *victim;
+
+   while (!list_empty(pages)) {
+   victim = list_to_page(pages);
+   list_del(victim-lru);
+   read_cache_pages_invalidate_page(mapping, victim);
+   }
+}
+
 /**
  * read_cache_pages - populate an address space with some pages  start reads 
against them
  * @mapping: the address_space
@@ -74,14 +110,14 @@ int read_cache_pages(struct address_space *mapping, struct 
list_head *pages,
page = list_to_page(pages);
list_del(page-lru);
if (add_to_page_cache(page, mapping, page-index, GFP_KERNEL)) {
-   page_cache_release(page);
+   read_cache_pages_invalidate_page(mapping, page);
continue;
}
ret = filler(data, page);
if (!pagevec_add(lru_pvec, page))
__pagevec_lru_add(lru_pvec);
if (ret) {
-   put_pages_list(pages);
+   read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/24] FS-Cache: Provide an add_wait_queue_tail() function

2007-09-26 Thread David Howells

Provide an add_wait_queue_tail() function to add a waiter to the back of a
wait queue instead of the front.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/wait.h |1 +
 kernel/wait.c|   18 ++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..4cae7db 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -118,6 +118,7 @@ static inline int waitqueue_active(wait_queue_head_t *q)
 #define is_sync_wait(wait) (!(wait) || ((wait)-private))
 
 extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * 
wait));
+extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t * 
wait));
 extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, 
wait_queue_t * wait));
 extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * 
wait));
 
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..7acc9cc 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, 
wait_queue_t *wait)
 }
 EXPORT_SYMBOL(add_wait_queue);
 
+/**
+ * add_wait_queue_tail - Add a waiter to the back of a waitqueue
+ * @q: the wait queue to append the waiter to
+ * @wait: the waiter to be queued
+ *
+ * Add a waiter to the back of a waitqueue so that it gets woken up last.
+ */
+void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   unsigned long flags;
+
+   wait-flags = ~WQ_FLAG_EXCLUSIVE;
+   spin_lock_irqsave(q-lock, flags);
+   __add_wait_queue_tail(q, wait);
+   spin_unlock_irqrestore(q-lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_tail);
+
 void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t 
*wait)
 {
unsigned long flags;

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/24] CacheFiles: Add missing copy_page export for ia64

2007-09-26 Thread David Howells

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches.  This patch is not yet upstream, but is required
for cachefile on ia64.  It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava [EMAIL PROTECTED]
Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 arch/ia64/kernel/ia64_ksyms.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index bd17190..20c3546 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user);
 EXPORT_SYMBOL(__strlen_user);
 EXPORT_SYMBOL(__strncpy_from_user);
 EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);
 
 /* from arch/ia64/lib */
 extern void __divsi3(void);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/24] CacheFiles: Permit the page lock state to be monitored

2007-09-26 Thread David Howells

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 include/linux/pagemap.h |5 +
 mm/filemap.c|   19 +++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d1049b6..452fdcf 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -220,6 +220,11 @@ static inline void wait_on_page_fscache_write(struct page 
*page)
 extern void end_page_fscache_write(struct page *page);
 
 /*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/mm/filemap.c b/mm/filemap.c
index 21aeee9..e48e862 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -518,6 +518,25 @@ void fastcall wait_on_page_bit(struct page *page, int 
bit_nr)
 EXPORT_SYMBOL(wait_on_page_bit);
 
 /**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+   wait_queue_head_t *q = page_waitqueue(page);
+   unsigned long flags;
+
+   spin_lock_irqsave(q-lock, flags);
+   __add_wait_queue(q, waiter);
+   spin_unlock_irqrestore(q-lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
  * unlock_page - unlock a locked page
  * @page: the page
  *

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/24] CacheFiles: Export things for CacheFiles

2007-09-26 Thread David Howells

Export a number of functions for CacheFiles's use.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/super.c   |2 ++
 kernel/auditsc.c |2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 28e7370..0e8c0e2 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -270,6 +270,8 @@ int fsync_super(struct super_block *sb)
return sync_blockdev(sb-s_bdev);
 }
 
+EXPORT_SYMBOL_GPL(fsync_super);
+
 /**
  * generic_shutdown_super  -   common helper for -kill_sb()
  * @sb: superblock to kill
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index d87f7ac..159d5d2 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1531,6 +1531,8 @@ add_names:
}
 }
 
+EXPORT_SYMBOL_GPL(__audit_inode_child);
+
 /**
  * auditsc_get_stamp - get local copies of audit_context values
  * @ctx: audit_context for the task

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/24] NFS: Configuration and mount option changes to enable local caching on NFS

2007-09-26 Thread David Howells

Changes to the kernel configuration defintions and to the NFS mount options to
allow the local caching support added by the previous patch to be enabled.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig|8 
 fs/nfs/client.c   |   14 ++
 fs/nfs/internal.h |2 ++
 fs/nfs/super.c|   40 ++--
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 8ae7eda..ebc7341 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1597,6 +1597,14 @@ config NFS_V4
 
  If unsure, say N.
 
+config NFS_FSCACHE
+   bool Provide NFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on NFS_FS=m  FSCACHE || NFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
 config NFS_DIRECTIO
bool Allow direct I/O on NFS files
depends on NFS_FS
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index f1783b2..0de4db4 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -543,7 +543,8 @@ error:
 /*
  * Create a version 2 or 3 client
  */
-static int nfs_init_server(struct nfs_server *server, const struct 
nfs_mount_data *data)
+static int nfs_init_server(struct nfs_server *server, const struct 
nfs_mount_data *data,
+  unsigned int extra_options)
 {
struct nfs_client *clp;
int error, nfsvers = 2;
@@ -580,6 +581,7 @@ static int nfs_init_server(struct nfs_server *server, const 
struct nfs_mount_dat
server-acregmax = data-acregmax * HZ;
server-acdirmin = data-acdirmin * HZ;
server-acdirmax = data-acdirmax * HZ;
+   server-options = extra_options;
 
/* Start lockd here, before we might error out */
error = nfs_start_lockd(server);
@@ -776,6 +778,7 @@ void nfs_free_server(struct nfs_server *server)
  * - keyed on server and FSID
  */
 struct nfs_server *nfs_create_server(const struct nfs_mount_data *data,
+unsigned extra_options,
 struct nfs_fh *mntfh)
 {
struct nfs_server *server;
@@ -787,7 +790,7 @@ struct nfs_server *nfs_create_server(const struct 
nfs_mount_data *data,
return ERR_PTR(-ENOMEM);
 
/* Get a client representation */
-   error = nfs_init_server(server, data);
+   error = nfs_init_server(server, data, extra_options);
if (error  0)
goto error;
 
@@ -911,7 +914,8 @@ error:
  * Create a version 4 volume record
  */
 static int nfs4_init_server(struct nfs_server *server,
-   const struct nfs4_mount_data *data, rpc_authflavor_t 
authflavour)
+   const struct nfs4_mount_data *data, rpc_authflavor_t 
authflavour,
+   unsigned int extra_options)
 {
int error;
 
@@ -930,6 +934,7 @@ static int nfs4_init_server(struct nfs_server *server,
server-acregmax = data-acregmax * HZ;
server-acdirmin = data-acdirmin * HZ;
server-acdirmax = data-acdirmax * HZ;
+   server-options = extra_options;
 
error = nfs_init_server_rpcclient(server, authflavour);
 
@@ -948,6 +953,7 @@ struct nfs_server *nfs4_create_server(const struct 
nfs4_mount_data *data,
  const char *mntpath,
  const char *ip_addr,
  rpc_authflavor_t authflavour,
+ unsigned int extra_options,
  struct nfs_fh *mntfh)
 {
struct nfs_fattr fattr;
@@ -967,7 +973,7 @@ struct nfs_server *nfs4_create_server(const struct 
nfs4_mount_data *data,
goto error;
 
/* set up the general RPC client */
-   error = nfs4_init_server(server, data, authflavour);
+   error = nfs4_init_server(server, data, authflavour, extra_options);
if (error  0)
goto error;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 76cf55d..34ef000 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -33,6 +33,7 @@ extern struct rpc_program nfs_program;
 extern void nfs_put_client(struct nfs_client *);
 extern struct nfs_client *nfs_find_client(const struct sockaddr_in *, int);
 extern struct nfs_server *nfs_create_server(const struct nfs_mount_data *,
+   unsigned int,
struct nfs_fh *);
 extern struct nfs_server *nfs4_create_server(const struct nfs4_mount_data *,
 const char *,
@@ -40,6 +41,7 @@ extern struct nfs_server *nfs4_create_server(const struct 
nfs4_mount_data *,
 const char *,
 const char *,
 rpc_authflavor_t

[PATCH 16/24] NFS: Use local caching

2007-09-26 Thread David Howells

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

http://people.redhat.com/steved/fscache/util-linux/

To mount an NFS filesystem to use caching, add an fsc option to the mount:

mount warthog:/ /a -o fsc

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/nfs/Makefile   |1 
 fs/nfs/client.c   |5 +
 fs/nfs/file.c |   51 ++
 fs/nfs/fscache-def.c  |  288 +++
 fs/nfs/fscache.c  |  372 +
 fs/nfs/fscache.h  |  144 +
 fs/nfs/inode.c|   48 +-
 fs/nfs/read.c |   28 +++
 fs/nfs/sysctl.c   |   44 +
 include/linux/nfs_fs.h|8 +
 include/linux/nfs_fs_sb.h |7 +
 11 files changed, 986 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index b55cb23..07c9345 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)  += nfs4proc.o nfs4xdr.o nfs4state.o 
nfs4renewd.o \
   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o
 nfs-objs   := $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a49f9fe..f1783b2 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -41,6 +41,7 @@
 #include delegation.h
 #include iostat.h
 #include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_CLIENT
 
@@ -137,6 +138,8 @@ static struct nfs_client *nfs_alloc_client(const char 
*hostname,
clp-cl_state = 1  NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+   nfs_fscache_get_client_cookie(clp);
+
return clp;
 
 error_3:
@@ -168,6 +171,8 @@ static void nfs_free_client(struct nfs_client *clp)
 
nfs4_shutdown_client(clp);
 
+   nfs_fscache_release_client_cookie(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp-cl_rpcclient))
rpc_shutdown_client(clp-cl_rpcclient);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 579cf8a..640179b 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -34,6 +34,8 @@
 
 #include delegation.h
 #include iostat.h
+#include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_FILE
 
@@ -54,6 +56,12 @@ static int nfs_check_flags(int flags);
 static int nfs_lock(struct file *filp, int cmd, struct file_lock *fl);
 static int nfs_flock(struct file *filp, int cmd, struct file_lock *fl);
 static int nfs_setlease(struct file *file, long arg, struct file_lock **fl);
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page 
*page);
+
+struct vm_operations_struct nfs_fs_vm_operations = {
+   .fault  = filemap_fault,
+   .page_mkwrite   = nfs_file_page_mkwrite,
+};
 
 const struct file_operations nfs_file_operations = {
.llseek = nfs_file_llseek,
@@ -259,6 +267,9 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * 
vma)
status = nfs_revalidate_mapping(inode, file-f_mapping);
if (!status)
status = generic_file_mmap(file, vma);
+   if (!status)
+   vma-vm_ops = nfs_fs_vm_operations;
+
return status;
 }
 
@@ -311,22 +322,48 @@ static int nfs_commit_write(struct file *file, struct 
page *page, unsigned offse
return status;
 }
 
+/*
+ * Partially or wholly invalidate a page
+ * - Release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page-mapping-host, page);
+
+   nfs_fscache_invalidate_page(page, page-mapping-host);
 }
 
+/*
+ * Release the private state associated with a page
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return true (may release) or false (may not)
+ */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
/* If PagePrivate() is set, then the page is not freeable */
-   return 0;
+   if (PagePrivate(page))
+   return 0;
+   return nfs_fscache_release_page(page, gfp);
 }
 
+/*
+ * Attempt to clear the private state associated with a page when an error
+ * occurs that requires the cached contents of an inode to be written back or
+ * destroyed
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return 0 if successful, -error otherwise
+ */
 static int nfs_launder_page(struct page *page)
 {
+   wait_on_page_fscache_write(page

[PATCH 18/24] NFS: Display local caching state

2007-09-26 Thread David Howells

Display the local caching state in /proc/fs/nfsfs/volumes.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/client.c  |7 ---
 fs/nfs/fscache.h |   12 
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 0de4db4..d350668 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1319,7 +1319,7 @@ static int nfs_volume_list_show(struct seq_file *m, void 
*v)
 
/* display header on line 1 */
if (v == nfs_volume_list) {
-   seq_puts(m, NV SERVER   PORT DEV FSID\n);
+   seq_puts(m, NV SERVER   PORT DEV FSID  FSC\n);
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1333,12 +1333,13 @@ static int nfs_volume_list_show(struct seq_file *m, 
void *v)
 (unsigned long long) server-fsid.major,
 (unsigned long long) server-fsid.minor);
 
-   seq_printf(m, v%d %02x%02x%02x%02x %4hx %-7s %-17s\n,
+   seq_printf(m, v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n,
   clp-cl_nfsversion,
   NIPQUAD(clp-cl_addr.sin_addr),
   ntohs(clp-cl_addr.sin_port),
   dev,
-  fsid);
+  fsid,
+  nfs_server_fscache_state(server));
 
return 0;
 }
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 44bb0d1..77f3450 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -56,6 +56,17 @@ extern void __nfs_fscache_invalidate_page(struct page *, 
struct inode *);
 extern int nfs_fscache_release_page(struct page *, gfp_t);
 
 /*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+   if (server-nfs_client-fscache 
+   (server-options  NFS_OPTION_FSCACHE))
+   return yes;
+   return no ;
+}
+
+/*
  * release the caching state associated with a page if undergoing complete page
  * invalidation
  */
@@ -110,6 +121,7 @@ static inline void nfs_fscache_unregister(void) {}
 static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
 static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
 static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
+static inline const char *nfs_server_fscache_state(struct nfs_server *server) 
{ return no ; }
 
 static inline void nfs_fscache_init_fh_cookie(struct inode *inode) {}
 static inline void nfs_fscache_enable_fh_cookie(struct inode *inode) {}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/24] AFS: Add a function to excise a rejected write from the pagecache

2007-09-26 Thread David Howells

Add a function - cancel_rejected_write() - to excise a rejected write from the
pagecache.  This function is related to the truncation family of routines.  It
permits the pages modified by a network filesystem client (such as AFS) to be
excised and discarded from the pagecache if the attempt to write them back to
the server fails.

The dirty and writeback states of the afflicted pages are cancelled and the
pages themselves are detached for recycling.  All PTEs referring to those
pages are removed.

Note that the locking is tricky as it's very easy to deadlock against
truncate() and other routines once the pages have been unlocked as part of the
writeback process.  To this end, the PG_error flag is set, then the
PG_writeback flag is cleared, and only *then* can lock_page() be called.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/mm.h |5 ++-
 mm/truncate.c  |   83 
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1692dd6..49863df 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1091,12 +1091,13 @@ extern int do_munmap(struct mm_struct *, unsigned long, 
size_t);
 
 extern unsigned long do_brk(unsigned long, unsigned long);
 
-/* filemap.c */
-extern unsigned long page_unuse(struct page *);
+/* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
   loff_t lstart, loff_t lend);
+extern void cancel_rejected_write(struct address_space *, pgoff_t, pgoff_t);
 
+/* filemap.c */
 /* generic vm_area_ops exported for stackable file systems */
 extern int filemap_fault(struct vm_area_struct *, struct vm_fault *);
 
diff --git a/mm/truncate.c b/mm/truncate.c
index cb0..92a68f7 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -462,3 +462,86 @@ int invalidate_inode_pages2(struct address_space *mapping)
return invalidate_inode_pages2_range(mapping, 0, -1);
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
+
+/*
+ * Cancel that part of a rejected write that affects a particular page
+ */
+static void cancel_rejected_page(struct address_space *mapping,
+struct page *page, pgoff_t *_next)
+{
+   if (!TestSetPageError(page)) {
+   /* can't lock the page until we've cleared PG_writeback lest we
+* deadlock with truncate (amongst other things) */
+   end_page_writeback(page);
+   if (page-mapping == mapping) {
+   lock_page(page);
+   if (page-mapping == mapping) {
+   truncate_complete_page(mapping, page);
+   *_next = page-index + 1;
+   }
+   unlock_page(page);
+   }
+   } else if (PageWriteback(page) || PageDirty(page)) {
+   BUG();
+   }
+}
+
+/**
+ * cancel_rejected_write - Cancel a write on a contiguous set of pages
+ * @mapping: mapping affected
+ * @start: first page in set
+ * @end: last page in set
+ *
+ * Cancel a write of a contiguous set of pages when the writeback was rejected
+ * by the target medium or server.
+ *
+ * The pages in question are detached and discarded from the pagecache, and the
+ * writeback and dirty states are cleared prior to invalidation.  The caller
+ * must make sure that all the pages in the range are present in the pagecache,
+ * and the caller must hold PG_writeback on each of them.  NOTE! All the pages
+ * are locked and unlocked as part of this process, so the caller must take
+ * care to avoid deadlock.
+ *
+ * The PTEs pointing to those pages are also cleared, leading to the PTEs being
+ * reset when new pages are allocated and the contents reloaded.
+ */
+void cancel_rejected_write(struct address_space *mapping,
+  pgoff_t start, pgoff_t end)
+{
+   struct pagevec pvec;
+   pgoff_t n;
+   int i;
+
+   BUG_ON(mapping-nrpages  end - start + 1);
+
+   /* dispose of any PTEs pointing to the affected pages */
+   unmap_mapping_range(mapping,
+   (loff_t)start  PAGE_CACHE_SHIFT,
+   (loff_t)(end - start + 1)  PAGE_CACHE_SHIFT,
+   0);
+
+   pagevec_init(pvec, 0);
+   do {
+   cond_resched();
+   n = end - start + 1;
+   if (n  PAGEVEC_SIZE)
+   n = PAGEVEC_SIZE;
+   n = pagevec_lookup(pvec, mapping, start, n);
+   for (i = 0; i  n; i++) {
+   struct page *page = pvec.pages[i];
+
+   if (page-index  start || page-index  end)
+   continue;
+   start++;
+   cancel_rejected_page(mapping, page, start

[PATCH 21/24] AFS: Improve handling of a rejected writeback

2007-09-26 Thread David Howells

Improve the handling of the case of a server rejecting an attempt to write back
a cached write.  AFS operates a write-back cache, so the following sequence of
events can theoretically occur:

CLIENT 1CLIENT 2
=== ===
cat data /the/file
 (sits in pagecache)
fs setacl -dir /the/dir/of/the/file \
-acl system:administrators rlidka
 (write permission removed for client 1)
sync
 (writeback attempt fails)

The way AFS attempts to handle this is:

 (1) The affected region will be excised and discarded on the basis that it
 can't be written back, yet we don't want it lurking in the page cache
 either.  The contents of the affected region will be reread from the
 server when called for again.

 (2) The EOF size will be set to the current server-based file size - usually
 that which it was before the affected write was made - assuming no
 conflicting write has been appended, and assuming the affected write
 extended the file.


This patch makes the following changes:

 (1) Zero-length short reads don't produce EBADMSG now just because the OpenAFS
 server puts a silly value as the size of the returned data.  This prevents
 excised pages beyond the revised EOF being reinstantiated with a surprise
 PG_error.

 (2) Writebacks can now be put into a 'rejected' state in which all further
 attempts to write them back will result in excision of the affected pages
 instead.

 (3) Preparing a page for overwriting now reads the whole page instead of just
 those parts of it that aren't to be covered by the copy to be made.  This
 handles the possibility that the copy might fail on EFAULT.  Corollary to
 this, PG_update can now be set by afs_prepare_page() on behalf of
 afs_prepare_write() rather than setting it in afs_commit_write().

 (4) In the case of a conflicting write, afs_prepare_write() will attempt to
 flush the write to the server, and will then wait for PG_writeback to go
 away - after unlocking the page.  This helps prevent deadlock against the
 writeback-rejection handler.  AOP_TRUNCATED_PAGE is then returned to the
 caller to signify that the page has been unlocked, and that it should be
 revalidated.

 (5) The writeback-rejection handler now calls cancel_rejected_write() added by
 the previous patch to excise the affected pages rather than clearing the
 PG_uptodate flag on all the pages.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/fsclient.c |4 +
 fs/afs/internal.h |1 
 fs/afs/write.c|  154 -
 3 files changed, 85 insertions(+), 74 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 023b95b..04584c0 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -353,7 +353,9 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call,
 
call-count = ntohl(call-tmp);
_debug(DATA length: %u, call-count);
-   if (call-count  PAGE_SIZE)
+   if ((s32) call-count  0)
+   call-count = 0; /* access completely beyond EOF */
+   else if (call-count  PAGE_SIZE)
return -EBADMSG;
call-offset = 0;
call-unmarshall++;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 6306438..e1bcce0 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -156,6 +156,7 @@ struct afs_writeback {
AFS_WBACK_PENDING,  /* write pending */
AFS_WBACK_CONFLICTING,  /* conflicting writes posted */
AFS_WBACK_WRITING,  /* writing back */
+   AFS_WBACK_REJECTED, /* the writeback was rejected */
AFS_WBACK_COMPLETE  /* the writeback record has 
been unlinked */
} state __attribute__((packed));
 };
diff --git a/fs/afs/write.c b/fs/afs/write.c
index a03b92a..ac621e8 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -81,18 +81,16 @@ void afs_put_writeback(struct afs_writeback *wb)
 }
 
 /*
- * partly or wholly fill a page that's under preparation for writing
+ * fill a page that's under preparation for writing
  */
 static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
-unsigned start, unsigned len, struct page *page)
+unsigned len, struct page *page)
 {
int ret;
 
-   _enter(,,%u,%u, start, len);
+   _enter(,,%u,, len);
 
-   ASSERTCMP(start + len, =, PAGE_SIZE);
-
-   ret = afs_vnode_fetch_data(vnode, key, start, len, page);
+   ret = afs_vnode_fetch_data(vnode, key, 0, len, page);
if (ret  0) {
if (ret == -ENOENT) {
_debug(got NOENT from server

[PATCH 24/24] FS-Cache: Make kAFS use FS-Cache

2007-09-26 Thread David Howells

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches.  The kAFS filesystem will use caching
automatically if it's available.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig |8 +
 fs/afs/Makefile|3 
 fs/afs/cache.c |  505 ++--
 fs/afs/cache.h |   15 --
 fs/afs/cell.c  |   16 +-
 fs/afs/file.c  |  212 +-
 fs/afs/fsclient.c  |   32 ++-
 fs/afs/inode.c |   25 +--
 fs/afs/internal.h  |   53 ++---
 fs/afs/main.c  |   27 +--
 fs/afs/mntpt.c |4 
 fs/afs/rxrpc.c |1 
 fs/afs/vlclient.c  |2 
 fs/afs/vlocation.c |   23 +-
 fs/afs/volume.c|   14 -
 fs/afs/write.c |6 -
 16 files changed, 563 insertions(+), 383 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index ebc7341..158a8d8 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -2059,6 +2059,14 @@ config AFS_DEBUG
 
  If unsure, say N.
 
+config AFS_FSCACHE
+   bool Provide AFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on AFS_FS=m  FSCACHE || AFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want AFS data to be cached locally on disk through
+ the generic filesystem cache manager
+
 config 9P_FS
tristate Plan 9 Resource Sharing Support (9P2000) (Experimental)
depends on INET  NET_9P  EXPERIMENTAL
diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index a666710..4f64b95 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -2,7 +2,10 @@
 # Makefile for Red Hat Linux AFS client.
 #
 
+afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o
+
 kafs-objs := \
+   $(afs-cache-y) \
callback.o \
cell.o \
cmservice.o \
diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index de0d7de..a5d6a70 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -9,248 +9,399 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-   const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
-   .name   = cell_ix,
-   .data_size  = sizeof(struct afs_cache_cell),
-   .keys[0]= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-   .match  = afs_cell_cache_match,
-   .update = afs_cell_cache_update,
+#include linux/slab.h
+#include linux/sched.h
+#include internal.h
+
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+  void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+  void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+  const void *buffer,
+  uint16_t buflen);
+
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void 
*cookie_netfs_data,
+   const void *buffer,
+   uint16_t buflen);
+
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+void *buffer, uint16_t buflen);
+
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+   const void *buffer,
+   uint16_t buflen);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+static struct fscache_netfs_operations afs_cache_ops = {
+};
+
+struct fscache_netfs afs_cache_netfs = {
+   .name   = afs,
+   .version= 0,
+   .ops= afs_cache_ops,
+};
+
+struct fscache_cookie_def afs_cell_cache_index_def = {
+   .name   = AFS.cell,
+   .type   = FSCACHE_COOKIE_TYPE_INDEX,
+   .get_key= afs_cell_cache_get_key,
+   .get_aux

[PATCH 22/24] AFS: Implement shared-writable mmap

2007-09-26 Thread David Howells

Implement shared-writable mmap for AFS.

The key with which to access the file is obtained from the VMA at the point
where the PTE is made writable by the page_mkwrite() VMA op and cached in the
affected page.

If there's an outstanding write on the page made with a different key, then
page_mkwrite() will flush it before attaching a record of the new key.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/file.c |   20 +++-
 fs/afs/internal.h |1 +
 fs/afs/write.c|   35 +++
 3 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 525f7c5..1323df4 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -22,6 +22,7 @@ static int afs_readpage(struct file *file, struct page *page);
 static void afs_invalidatepage(struct page *page, unsigned long offset);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 static int afs_launder_page(struct page *page);
+static int afs_mmap(struct file *file, struct vm_area_struct *vma);
 
 const struct file_operations afs_file_operations = {
.open   = afs_open,
@@ -31,7 +32,7 @@ const struct file_operations afs_file_operations = {
.write  = do_sync_write,
.aio_read   = generic_file_aio_read,
.aio_write  = afs_file_write,
-   .mmap   = generic_file_readonly_mmap,
+   .mmap   = afs_mmap,
.splice_read= generic_file_splice_read,
.fsync  = afs_fsync,
.lock   = afs_lock,
@@ -56,6 +57,11 @@ const struct address_space_operations afs_fs_aops = {
.writepages = afs_writepages,
 };
 
+static struct vm_operations_struct afs_file_vm_ops = {
+   .fault  = filemap_fault,
+   .page_mkwrite   = afs_page_mkwrite,
+};
+
 /*
  * open an AFS file or directory and attach a key to it
  */
@@ -295,3 +301,15 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
_leave( = 0);
return 0;
 }
+
+/*
+ * memory map part of an AFS file
+ */
+static int afs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   _enter();
+
+   file_accessed(file);
+   vma-vm_ops = afs_file_vm_ops;
+   return 0;
+}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e1bcce0..12afccc 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -743,6 +743,7 @@ extern ssize_t afs_file_write(struct kiocb *, const struct 
iovec *,
  unsigned long, loff_t);
 extern int afs_writeback_all(struct afs_vnode *);
 extern int afs_fsync(struct file *, struct dentry *, int);
+extern int afs_page_mkwrite(struct vm_area_struct *, struct page *);
 
 
 /*/
diff --git a/fs/afs/write.c b/fs/afs/write.c
index ac621e8..dd471f0 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -155,6 +155,8 @@ static int afs_prepare_page(struct afs_vnode *vnode, struct 
page *page,
  * prepare to perform part of a write to a page
  * - the caller holds the page locked, preventing it from being written out or
  *   modified by anyone else
+ * - may be called from afs_page_mkwrite() to set up a page for modification
+ *   through shared-writable mmap
  */
 int afs_prepare_write(struct file *file, struct page *page,
  unsigned offset, unsigned to)
@@ -833,3 +835,36 @@ int afs_fsync(struct file *file, struct dentry *dentry, 
int datasync)
_leave( = %d, ret);
return ret;
 }
+
+/*
+ * notification that a previously read-only page is about to become writable
+ * - if it returns an error, the caller will deliver a bus error signal
+ *
+ * we use this to make a record of the key with which the writeback should be
+ * performed and to flush any outstanding writes made with a different key
+ *
+ * the key to be used is attached to the struct file pinned by the VMA
+ */
+int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+   struct afs_vnode *vnode = AFS_FS_I(vma-vm_file-f_mapping-host);
+   struct key *key = vma-vm_file-private_data;
+   int ret;
+
+   _enter({{%x:%u},%x},{%lx},
+  vnode-fid.vid, vnode-fid.vnode, key_serial(key), page-index);
+
+   do {
+   lock_page(page);
+   if (page-mapping == vma-vm_file-f_mapping)
+   ret = afs_prepare_write(vma-vm_file, page, 0,
+   PAGE_SIZE);
+   else
+   ret = 0; /* seems there was interference - let the
+ * caller deal with it */
+   unlock_page(page);
+   } while (ret == AOP_TRUNCATED_PAGE);
+
+   _leave( = %d, ret);
+   return ret;
+}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/24] CRED: Introduce a COW credentials record

2007-09-26 Thread David Howells

Al Viro [EMAIL PROTECTED] wrote:

 Umm...  Perhaps a better primitive would be make sure that our cred is
 not shared with anybody, creating a copy and redirecting reference to
 it if needed.

I wanted to make the point that once a cred record was made live - i.e. exposed
to the rest of the system - it should not be changed.  I'll think about
rewording that.  Also making sure that our cred is not shared does not work
for cachefiles where we actually want to create a new set of creds.

Al Viro [EMAIL PROTECTED] wrote:

  In addition, the default setting of i_uid and i_gid to fsuid and fsgid has
  been moved from the callers of new_inode() into new_inode() itself.
 
 I don't think it's safe; better do something trivial like
   own_inode(inode)
 that would set these (and that's a goot splitup candidate, to go in front
 of the series).

I think you're probably right.  I commented on this at the bottom of the cover
note.  One thing I could do is provide a variant on own_inode() that takes a
parent dir inode pointer and does the sticky GID thing - something that several
filesystems do.

 FWIW, the main weakness here is the need of update_current_cred() splattered
 all over the entry points.

Yeah.  I'm not keen on that, but I'm even less keen on sticking something in
everywhere that the cred struct is consulted.  I don't like the idea of making
it implicit in the dereference of current-cred either, and neither is Linus.

 Two problems:
   a) it's a bug source (somebody adds a syscall and forgets to
 add that call / somebody modifies syscall guts and doesn't notice that
 it needs to be added).

It's simpler to check for its existence at the beginning of a syscall.

   b) it's almost always doing noting, so being lazier would be
 better (event numbers checked in the inlined part, perhaps?)

Linus is against having an inlined part:-/

 The former would be more robust if it had been closer to the places where
 we get to passing current-cred to functions.

You can't do it there because there may be an override in effect.  Or, rather,
if you do do it there, you have to not do it if there's an override set.

 The latter...  When do we actually step into this kind of situation (somebody
 changing keys on us)

There are four cases:

 (1) The request_key() upcall forces us to create a thread keyring.

 (2) The request_key() upcall forces us to create a process keyring.

 (3) A sibling thread instantiates our common process keyring.

 (4) A sibling thread replaces our common session keyring.

The first three could be trivially avoidable by creating the thread and process
keyrings in advance, (1) and (2) at request_key() time, (3) at clone time.  It
eats extra resources, but it's easy.

The fourth is more tricky.  A sibling thread can replace our common session
keyring on us at any time.  I suppose we could decree that you can't replace
your session keyring if you've got multiple threads.  That ought to be simple
enough, and I suspect won't impact particularly.

The alternatives are (b) not to include the keyrings in the cred stuff, though
they are relevant; and (c) to make it possible for sibling threads to change
each other's creds.  I'm really not keen on (c) as that means you can't just
dereference your own creds directly without taking locks and stuff.

 and what's the right semantics here?  E.g. if it happens
 in the middle of long read(), do we want to keep using the original keys?

If you're in the middle of a long read(), you should be using the cred struct
attached to file-f_cred, not current-cred, and so that problem should not
arise.

As for long ops that aren't I/O operations on file descriptors, I think it's
reasonable for you to do the entire op with the creds you started off doing it
with.

Don't forget that there's also the cap_effective stuff, which appears that it
can be changed by someone other than the target process.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/28] KEYS: Check starting keyring as part of search [try #2]

2007-12-05 Thread David Howells

Check the starting keyring as part of the search to (a) see if that is what
we're searching for, and (b) to check it is still valid for searching.

The scenario:  User in process A does things that cause things to be
created in its process session keyring.  The user then does an su to
another user and starts a new process, B.  The two processes now
share the same process session keyring.

Process B does an NFS access which results in an upcall to gssd.
When gssd attempts to instantiate the context key (to be linked
into the process session keyring), it is denied access even though it
has an authorization key.

The order of calls is:

   keyctl_instantiate_key()
  lookup_user_key() (the default: case)
 search_process_keyrings(current)
search_process_keyrings(rka-context)   (recursive call)
   keyring_search_aux()

keyring_search_aux() verifies the keys and keyrings underneath the
top-level keyring it is given, but that top-level keyring is neither
fully validated nor checked to see if it is the thing being searched for.

This patch changes keyring_search_aux() to:
1) do more validation on the top keyring it is given and
2) check whether that top-level keyring is the thing being searched for


Signed-off-by: Kevin Coffman [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyring.c |   35 +++
 1 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 88292e3..76b89b2 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 
struct keyring_list *keylist;
struct timespec now;
-   unsigned long possessed;
+   unsigned long possessed, kflags;
struct key *keyring, *key;
key_ref_t key_ref;
long err;
@@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
now = current_kernel_time();
err = -EAGAIN;
sp = 0;
+   
+   /* firstly we should check to see if this top-level keyring is what we
+* are looking for */
+   key_ref = ERR_PTR(-EAGAIN);
+   kflags = keyring-flags;
+   if (keyring-type == type  match(keyring, description)) {
+   key = keyring;
+
+   /* check it isn't negative and hasn't expired or been
+* revoked */
+   if (kflags  (1  KEY_FLAG_REVOKED))
+   goto error_2;
+   if (key-expiry  now.tv_sec = key-expiry)
+   goto error_2;
+   key_ref = ERR_PTR(-ENOKEY);
+   if (kflags  (1  KEY_FLAG_NEGATIVE))
+   goto error_2;
+   goto found;
+   }
+
+   /* otherwise, the top keyring must not be revoked, expired, or
+* negatively instantiated if we are to search it */
+   key_ref = ERR_PTR(-EAGAIN);
+   if (kflags  ((1  KEY_FLAG_REVOKED) | (1  KEY_FLAG_NEGATIVE)) ||
+   (keyring-expiry  now.tv_sec = keyring-expiry))
+   goto error_2;
 
/* start processing a new keyring */
 descend:
@@ -331,13 +357,14 @@ descend:
/* iterate through the keys in this keyring first */
for (kix = 0; kix  keylist-nkeys; kix++) {
key = keylist-keys[kix];
+   kflags = key-flags;
 
/* ignore keys not of this type */
if (key-type != type)
continue;
 
/* skip revoked keys and expired keys */
-   if (test_bit(KEY_FLAG_REVOKED, key-flags))
+   if (kflags  (1  KEY_FLAG_REVOKED))
continue;
 
if (key-expiry  now.tv_sec = key-expiry)
@@ -352,8 +379,8 @@ descend:
context, KEY_SEARCH)  0)
continue;
 
-   /* we set a different error code if we find a negative key */
-   if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) {
+   /* we set a different error code if we pass a negative key */
+   if (kflags  (1  KEY_FLAG_NEGATIVE)) {
err = -ENOKEY;
continue;
}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/28] KEYS: Allow the callout data to be passed as a blob rather than a string [try #2]

2007-12-05 Thread David Howells

Allow the callout data to be passed as a blob rather than a string for internal
kernel services that call any request_key_*() interface other than
request_key().  request_key() itself still takes a NUL-terminated string.

The functions that change are:

request_key_with_auxdata()
request_key_async()
request_key_async_with_auxdata()

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 Documentation/keys-request-key.txt |   11 +---
 Documentation/keys.txt |   14 +++---
 include/linux/key.h|9 ---
 security/keys/internal.h   |9 ---
 security/keys/keyctl.c |7 -
 security/keys/request_key.c|   49 ++--
 security/keys/request_key_auth.c   |   12 +
 7 files changed, 70 insertions(+), 41 deletions(-)

diff --git a/Documentation/keys-request-key.txt 
b/Documentation/keys-request-key.txt
index 266955d..09b55e4 100644
--- a/Documentation/keys-request-key.txt
+++ b/Documentation/keys-request-key.txt
@@ -11,26 +11,29 @@ request_key*():
 
struct key *request_key(const struct key_type *type,
const char *description,
-   const char *callout_string);
+   const char *callout_info);
 
 or:
 
struct key *request_key_with_auxdata(const struct key_type *type,
 const char *description,
-const char *callout_string,
+const char *callout_info,
+size_t callout_len,
 void *aux);
 
 or:
 
struct key *request_key_async(const struct key_type *type,
  const char *description,
- const char *callout_string);
+ const char *callout_info,
+ size_t callout_len);
 
 or:
 
struct key *request_key_async_with_auxdata(const struct key_type *type,
   const char *description,
-  const char *callout_string,
+  const char *callout_info,
+  size_t callout_len,
   void *aux);
 
 Or by userspace invoking the request_key system call:
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 51652d3..b82d38d 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -771,7 +771,7 @@ payload contents for more information.
 
struct key *request_key(const struct key_type *type,
const char *description,
-   const char *callout_string);
+   const char *callout_info);
 
 This is used to request a key or keyring with a description that matches
 the description specified according to the key type's match function. This
@@ -793,24 +793,28 @@ payload contents for more information.
 
struct key *request_key_with_auxdata(const struct key_type *type,
 const char *description,
-const char *callout_string,
+const void *callout_info,
+size_t callout_len,
 void *aux);
 
 This is identical to request_key(), except that the auxiliary data is
-passed to the key_type-request_key() op if it exists.
+passed to the key_type-request_key() op if it exists, and the callout_info
+is a blob of length callout_len, if given (the length may be 0).
 
 
 (*) A key can be requested asynchronously by calling one of:
 
struct key *request_key_async(const struct key_type *type,
  const char *description,
- const char *callout_string);
+ const void *callout_info,
+ size_t callout_len);
 
 or:
 
struct key *request_key_async_with_auxdata(const struct key_type *type,
   const char *description,
-  const char *callout_string,
+  const char *callout_info,
+  size_t callout_len,
   void *aux);
 
 which are asynchronous equivalents of request_key() and
diff --git a/include/linux/key.h b/include/linux/key.h
index fcdbd5e..4a6021a 100644
--- a/include/linux/key.h
+++ b/include/linux

[PATCH 28/28] FS-Cache: Make kAFS use FS-Cache [try #2]

2007-12-05 Thread David Howells

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches.  The kAFS filesystem will use caching
automatically if it's available.

Signed-Off-By: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig |8 +
 fs/afs/Makefile|3 
 fs/afs/cache.c |  505 ++--
 fs/afs/cache.h |   15 --
 fs/afs/cell.c  |   16 +-
 fs/afs/file.c  |  212 +-
 fs/afs/inode.c |   26 +--
 fs/afs/internal.h  |   53 ++---
 fs/afs/main.c  |   27 +--
 fs/afs/mntpt.c |4 
 fs/afs/vlocation.c |   23 +-
 fs/afs/volume.c|   14 -
 fs/afs/write.c |6 -
 13 files changed, 537 insertions(+), 375 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 83d1227..7f3278f 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -2120,6 +2120,14 @@ config AFS_DEBUG
 
  If unsure, say N.
 
+config AFS_FSCACHE
+   bool Provide AFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on AFS_FS=m  FSCACHE || AFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want AFS data to be cached locally on disk through
+ the generic filesystem cache manager
+
 config 9P_FS
tristate Plan 9 Resource Sharing Support (9P2000) (Experimental)
depends on INET  NET_9P  EXPERIMENTAL
diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index a666710..4f64b95 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -2,7 +2,10 @@
 # Makefile for Red Hat Linux AFS client.
 #
 
+afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o
+
 kafs-objs := \
+   $(afs-cache-y) \
callback.o \
cell.o \
cmservice.o \
diff --git a/fs/afs/cache.c b/fs/afs/cache.c
index de0d7de..8e179a9 100644
--- a/fs/afs/cache.c
+++ b/fs/afs/cache.c
@@ -9,248 +9,399 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
-   const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
-   .name   = cell_ix,
-   .data_size  = sizeof(struct afs_cache_cell),
-   .keys[0]= { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
-   .match  = afs_cell_cache_match,
-   .update = afs_cell_cache_update,
+#include linux/slab.h
+#include linux/sched.h
+#include internal.h
+
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+  void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+  void *buffer, uint16_t buflen);
+static enum fscache_checkaux afs_cell_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static enum fscache_checkaux afs_vlocation_cache_check_aux(
+   void *cookie_netfs_data, const void *buffer, uint16_t buflen);
+
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+void *buffer, uint16_t buflen);
+
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+   void *buffer, uint16_t buflen);
+static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data,
+  const void *buffer,
+  uint16_t buflen);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+static struct fscache_netfs_operations afs_cache_ops = {
+};
+
+struct fscache_netfs afs_cache_netfs = {
+   .name   = afs,
+   .version= 0,
+   .ops= afs_cache_ops,
+};
+
+struct fscache_cookie_def afs_cell_cache_index_def = {
+   .name   = AFS.cell,
+   .type   = FSCACHE_COOKIE_TYPE_INDEX,
+   .get_key= afs_cell_cache_get_key,
+   .get_aux= afs_cell_cache_get_aux,
+   .check_aux  = afs_cell_cache_check_aux,
+};
+
+struct fscache_cookie_def afs_vlocation_cache_index_def = {
+   .name   = AFS.vldb

[PATCH 23/28] AFS: Add TestSetPageError() [try #2]

2007-12-05 Thread David Howells

Add a TestSetPageError() macro to the suite of page flag manipulators.  This
can be used by AFS to prevent over-excision of rejected writes from the page
cache.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/page-flags.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index fcc9e23..0350c37 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -130,6 +130,7 @@
 #define PageError(page)test_bit(PG_error, (page)-flags)
 #define SetPageError(page) set_bit(PG_error, (page)-flags)
 #define ClearPageError(page)   clear_bit(PG_error, (page)-flags)
+#define TestSetPageError(page) test_and_set_bit(PG_error, (page)-flags)
 
 #define PageReferenced(page)   test_bit(PG_referenced, (page)-flags)
 #define SetPageReferenced(page)set_bit(PG_referenced, (page)-flags)

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/28] CacheFiles: Add missing copy_page export for ia64 [try #2]

2007-12-05 Thread David Howells

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches.  This patch is not yet upstream, but is required
for cachefile on ia64.  It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 arch/ia64/kernel/ia64_ksyms.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index bd17190..20c3546 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user);
 EXPORT_SYMBOL(__strlen_user);
 EXPORT_SYMBOL(__strncpy_from_user);
 EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);
 
 /* from arch/ia64/lib */
 extern void __divsi3(void);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/28] NFS: Display local caching state [try #2]

2007-12-05 Thread David Howells

Display the local caching state in /proc/fs/nfsfs/volumes.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/client.c  |7 ---
 fs/nfs/fscache.h |   15 +++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be38c3c..91ecea3 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1335,7 +1335,7 @@ static int nfs_volume_list_show(struct seq_file *m, void 
*v)
 
/* display header on line 1 */
if (v == nfs_volume_list) {
-   seq_puts(m, NV SERVER   PORT DEV FSID\n);
+   seq_puts(m, NV SERVER   PORT DEV FSID  FSC\n);
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1349,12 +1349,13 @@ static int nfs_volume_list_show(struct seq_file *m, 
void *v)
 (unsigned long long) server-fsid.major,
 (unsigned long long) server-fsid.minor);
 
-   seq_printf(m, v%d %02x%02x%02x%02x %4hx %-7s %-17s\n,
+   seq_printf(m, v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n,
   clp-cl_nfsversion,
   NIPQUAD(clp-cl_addr.sin_addr),
   ntohs(clp-cl_addr.sin_port),
   dev,
-  fsid);
+  fsid,
+  nfs_server_fscache_state(server));
 
return 0;
 }
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 144fb58..9a735fc 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -53,6 +53,17 @@ extern void __nfs_fscache_invalidate_page(struct page *, 
struct inode *);
 extern int nfs_fscache_release_page(struct page *, gfp_t);
 
 /*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+   if (server-nfs_client-fscache 
+   (server-options  NFS_OPTION_FSCACHE))
+   return yes;
+   return no ;
+}
+
+/*
  * release the caching state associated with a page if undergoing complete page
  * invalidation
  */
@@ -109,6 +120,10 @@ static inline void nfs4_fscache_get_client_cookie(struct 
nfs_client *clp) {}
 static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
 static inline void nfs_fscache_show_stats(struct seq_file *m,
  struct nfs_server *nfss) {}
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+   return no ;
+}
 
 static inline void nfs_fscache_init_fh_cookie(struct inode *inode) {}
 static inline void nfs_fscache_enable_fh_cookie(struct inode *inode) {}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/28] NFS: Configuration and mount option changes to enable local caching on NFS [try #2]

2007-12-05 Thread David Howells

Changes to the kernel configuration defintions and to the NFS mount options to
allow the local caching support added by the previous patch to be enabled.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/Kconfig|8 
 fs/nfs/client.c   |2 ++
 fs/nfs/internal.h |1 +
 fs/nfs/super.c|   14 ++
 4 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 215b0d6..83d1227 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1650,6 +1650,14 @@ config NFS_V4
 
  If unsure, say N.
 
+config NFS_FSCACHE
+   bool Provide NFS client caching support (EXPERIMENTAL)
+   depends on EXPERIMENTAL
+   depends on NFS_FS=m  FSCACHE || NFS_FS=y  FSCACHE=y
+   help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
 config NFS_DIRECTIO
bool Allow direct I/O on NFS files
depends on NFS_FS
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index acb2179..be38c3c 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -575,6 +575,7 @@ static int nfs_init_server(struct nfs_server *server,
 
/* Initialise the client representation from the mount data */
server-flags = data-flags  NFS_MOUNT_FLAGMASK;
+   server-options = data-options;
 
if (data-rsize)
server-rsize = nfs_block_size(data-rsize, NULL);
@@ -931,6 +932,7 @@ static int nfs4_init_server(struct nfs_server *server,
/* Initialise the client representation from the mount data */
server-flags = data-flags  NFS_MOUNT_FLAGMASK;
server-caps |= NFS_CAP_ATOMIC_OPEN;
+   server-options = data-options;
 
if (data-rsize)
server-rsize = nfs_block_size(data-rsize, NULL);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f3acf48..ef09e00 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -35,6 +35,7 @@ struct nfs_parsed_mount_data {
int acregmin, acregmax,
acdirmin, acdirmax;
int namlen;
+   unsigned intoptions;
unsigned intbsize;
unsigned intauth_flavor_len;
rpc_authflavor_tauth_flavors[1];
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 040d65b..a5a3bd1 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -74,6 +74,7 @@ enum {
Opt_acl, Opt_noacl,
Opt_rdirplus, Opt_nordirplus,
Opt_sharecache, Opt_nosharecache,
+   Opt_fscache, Opt_nofscache,
 
/* Mount options that take integer arguments */
Opt_port,
@@ -123,6 +124,8 @@ static match_table_t nfs_mount_option_tokens = {
{ Opt_nordirplus, nordirplus },
{ Opt_sharecache, sharecache },
{ Opt_nosharecache, nosharecache },
+   { Opt_fscache, fsc },
+   { Opt_nofscache, nofsc },
 
{ Opt_port, port=%u },
{ Opt_rsize, rsize=%u },
@@ -459,6 +462,8 @@ static void nfs_show_mount_options(struct seq_file *m, 
struct nfs_server *nfss,
seq_printf(m, ,timeo=%lu, 10U * clp-retrans_timeo / HZ);
seq_printf(m, ,retrans=%u, clp-retrans_count);
seq_printf(m, ,sec=%s, 
nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor));
+   if (nfss-options  NFS_OPTION_FSCACHE)
+   seq_printf(m, ,fsc);
 }
 
 /*
@@ -697,6 +702,15 @@ static int nfs_parse_mount_options(char *raw,
break;
case Opt_nosharecache:
mnt-flags |= NFS_MOUNT_UNSHARED;
+   mnt-options = ~NFS_OPTION_FSCACHE;
+   break;
+   case Opt_fscache:
+   /* sharing is mandatory with fscache */
+   mnt-options |= NFS_OPTION_FSCACHE;
+   mnt-flags = ~NFS_MOUNT_UNSHARED;
+   break;
+   case Opt_nofscache:
+   mnt-options = ~NFS_OPTION_FSCACHE;
break;
 
case Opt_port:

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 19/28] NFS: Use local caching [try #2]

2007-12-05 Thread David Howells

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

http://people.redhat.com/steved/fscache/util-linux/

To mount an NFS filesystem to use caching, add an fsc option to the mount:

mount warthog:/ /a -o fsc

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/Makefile   |1 
 fs/nfs/client.c   |5 +
 fs/nfs/file.c |   37 
 fs/nfs/fscache-def.c  |  289 +
 fs/nfs/fscache.c  |  391 +
 fs/nfs/fscache.h  |  148 +
 fs/nfs/inode.c|   47 +
 fs/nfs/read.c |   28 +++
 fs/nfs/super.c|3 
 fs/nfs/sysctl.c   |1 
 include/linux/nfs_fs.h|9 +
 include/linux/nfs_fs_sb.h |   18 ++
 12 files changed, 968 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index df0f41e..073d04c 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,3 +16,4 @@ nfs-$(CONFIG_NFS_V4)  += nfs4proc.o nfs4xdr.o nfs4state.o 
nfs4renewd.o \
   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 70587f3..acb2179 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -43,6 +43,7 @@
 #include delegation.h
 #include iostat.h
 #include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_CLIENT
 
@@ -139,6 +140,8 @@ static struct nfs_client *nfs_alloc_client(const char 
*hostname,
clp-cl_state = 1  NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+   nfs_fscache_get_client_cookie(clp);
+
return clp;
 
 error_3:
@@ -170,6 +173,8 @@ static void nfs_free_client(struct nfs_client *clp)
 
nfs4_shutdown_client(clp);
 
+   nfs_fscache_release_client_cookie(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp-cl_rpcclient))
rpc_shutdown_client(clp-cl_rpcclient);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index b3bb89f..d492cd7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -35,6 +35,7 @@
 #include delegation.h
 #include internal.h
 #include iostat.h
+#include fscache.h
 
 #define NFSDBG_FACILITYNFSDBG_FILE
 
@@ -352,22 +353,48 @@ static int nfs_write_end(struct file *file, struct 
address_space *mapping,
return status  0 ? status : copied;
 }
 
+/*
+ * Partially or wholly invalidate a page
+ * - Release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page-mapping-host, page);
+
+   nfs_fscache_invalidate_page(page, page-mapping-host);
 }
 
+/*
+ * Release the private state associated with a page
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return true (may release) or false (may not)
+ */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
/* If PagePrivate() is set, then the page is not freeable */
-   return 0;
+   if (PagePrivate(page))
+   return 0;
+   return nfs_fscache_release_page(page, gfp);
 }
 
+/*
+ * Attempt to clear the private state associated with a page when an error
+ * occurs that requires the cached contents of an inode to be written back or
+ * destroyed
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return 0 if successful, -error otherwise
+ */
 static int nfs_launder_page(struct page *page)
 {
+   wait_on_page_fscache_write(page);
return nfs_wb_page(page-mapping-host, page);
 }
 
@@ -387,6 +414,11 @@ const struct address_space_operations nfs_file_aops = {
.launder_page = nfs_launder_page,
 };
 
+/*
+ * Notification that a PTE pointing to an NFS page is about to be made
+ * writable, implying that someone is about to modify the page through a
+ * shared-writable mapping
+ */
 static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page)
 {
struct file *filp = vma-vm_file;
@@ -396,6 +428,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, 
struct page *page)
struct address_space *mapping;
loff_t offset;
 
+   /* make sure the cache has finished storing the page */
+   wait_on_page_fscache_write(page);
+
lock_page(page);
mapping = page-mapping;
if (mapping != vma-vm_file-f_path.dentry-d_inode-i_mapping) {
diff --git a/fs/nfs/fscache-def.c b/fs/nfs/fscache

[PATCH 16/28] CacheFiles: Permit the page lock state to be monitored [try #2]

2007-12-05 Thread David Howells

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/pagemap.h |5 +
 mm/filemap.c|   18 ++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6a1b317..21c35e2 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -223,6 +223,11 @@ static inline void wait_on_page_fscache_write(struct page 
*page)
 extern void end_page_fscache_write(struct page *page);
 
 /*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/mm/filemap.c b/mm/filemap.c
index bea1ba6..6872d1b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -521,6 +521,24 @@ void fastcall wait_on_page_bit(struct page *page, int 
bit_nr)
 EXPORT_SYMBOL(wait_on_page_bit);
 
 /**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+   wait_queue_head_t *q = page_waitqueue(page);
+   unsigned long flags;
+
+   spin_lock_irqsave(q-lock, flags);
+   __add_wait_queue(q, waiter);
+   spin_unlock_irqrestore(q-lock, flags);
+}
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
  * unlock_page - unlock a locked page
  * @page: the page
  *

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/28] FS-Cache: Release page-private after failed readahead [try #2]

2007-12-05 Thread David Howells

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 mm/readahead.c |   39 +--
 1 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index c9c50ca..75aa6b6 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -44,6 +44,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
 
 #define list_to_page(head) (list_entry((head)-prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+struct page *page)
+{
+   if (PagePrivate(page)) {
+   if (TestSetPageLocked(page))
+   BUG();
+   page-mapping = mapping;
+   do_invalidatepage(page, 0);
+   page-mapping = NULL;
+   unlock_page(page);
+   }
+   page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+   struct page *victim;
+
+   while (!list_empty(pages)) {
+   victim = list_to_page(pages);
+   list_del(victim-lru);
+   read_cache_pages_invalidate_page(mapping, victim);
+   }
+}
+
 /**
  * read_cache_pages - populate an address space with some pages  start reads 
against them
  * @mapping: the address_space
@@ -65,14 +100,14 @@ int read_cache_pages(struct address_space *mapping, struct 
list_head *pages,
list_del(page-lru);
if (add_to_page_cache_lru(page, mapping,
page-index, GFP_KERNEL)) {
-   page_cache_release(page);
+   read_cache_pages_invalidate_page(mapping, page);
continue;
}
page_cache_release(page);
 
ret = filler(data, page);
if (unlikely(ret)) {
-   put_pages_list(pages);
+   read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/28] KEYS: Increase the payload size when instantiating a key [try #2]

2007-12-05 Thread David Howells

Increase the size of a payload that can be used to instantiate a key in
add_key() and keyctl_instantiate_key().  This permits huge CIFS SPNEGO blobs to
be passed around.  The limit is raised to 1MB.  If kmalloc() can't allocate a
buffer of sufficient size, vmalloc() will be tried instead.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyctl.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index d9ca15c..8ec8432 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -19,6 +19,7 @@
 #include linux/capability.h
 #include linux/string.h
 #include linux/err.h
+#include linux/vmalloc.h
 #include asm/uaccess.h
 #include internal.h
 
@@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type,
char type[32], *description;
void *payload;
long ret;
+   bool vm;
 
ret = -EINVAL;
-   if (plen  32767)
+   if (plen  1024 * 1024 - 1)
goto error;
 
/* draw all the data into kernel space */
@@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type,
/* pull the payload in if one was supplied */
payload = NULL;
 
+   vm = false;
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
-   if (!payload)
-   goto error2;
+   if (!payload) {
+   if (plen = PAGE_SIZE)
+   goto error2;
+   vm = true;
+   payload = vmalloc(plen);
+   if (!payload)
+   goto error2;
+   }
 
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type,
 
key_ref_put(keyring_ref);
  error3:
-   kfree(payload);
+   if (!vm)
+   kfree(payload);
+   else
+   vfree(payload);
  error2:
kfree(description);
  error:
@@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id,
key_ref_t keyring_ref;
void *payload;
long ret;
+   bool vm = false;
 
ret = -EINVAL;
-   if (plen  32767)
+   if (plen  1024 * 1024 - 1)
goto error;
 
/* the appropriate instantiation authorisation key must have been
@@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id,
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
-   if (!payload)
-   goto error;
+   if (!payload) {
+   if (plen = PAGE_SIZE)
+   goto error;
+   vm = true;
+   payload = vmalloc(plen);
+   if (!payload)
+   goto error;
+   }
 
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id,
}
 
 error2:
-   kfree(payload);
+   if (!vm)
+   kfree(payload);
+   else
+   vfree(payload);
 error:
return ret;
 

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/28] Security: Change current-fs[ug]id to current_fs[ug]id() [try #2]

2007-12-05 Thread David Howells

Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be
separated from the task_struct.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 arch/ia64/kernel/perfmon.c|4 ++--
 arch/powerpc/platforms/cell/spufs/inode.c |4 ++--
 drivers/isdn/capi/capifs.c|4 ++--
 drivers/usb/core/inode.c  |4 ++--
 fs/9p/fid.c   |2 +-
 fs/9p/vfs_inode.c |4 ++--
 fs/9p/vfs_super.c |4 ++--
 fs/affs/inode.c   |4 ++--
 fs/anon_inodes.c  |4 ++--
 fs/attr.c |4 ++--
 fs/bfs/dir.c  |4 ++--
 fs/cifs/cifsproto.h   |2 +-
 fs/cifs/dir.c |   12 ++--
 fs/cifs/inode.c   |8 
 fs/cifs/misc.c|4 ++--
 fs/coda/cache.c   |6 +++---
 fs/coda/upcall.c  |4 ++--
 fs/devpts/inode.c |4 ++--
 fs/dquot.c|2 +-
 fs/exec.c |4 ++--
 fs/ext2/balloc.c  |2 +-
 fs/ext2/ialloc.c  |4 ++--
 fs/ext2/ioctl.c   |2 +-
 fs/ext3/balloc.c  |2 +-
 fs/ext3/ialloc.c  |4 ++--
 fs/ext4/balloc.c  |2 +-
 fs/ext4/ialloc.c  |4 ++--
 fs/fuse/dev.c |4 ++--
 fs/gfs2/inode.c   |   10 +-
 fs/hfs/inode.c|4 ++--
 fs/hfsplus/inode.c|4 ++--
 fs/hpfs/namei.c   |   24 
 fs/hugetlbfs/inode.c  |   16 
 fs/jffs2/fs.c |4 ++--
 fs/jfs/jfs_inode.c|4 ++--
 fs/locks.c|2 +-
 fs/minix/bitmap.c |4 ++--
 fs/namei.c|8 
 fs/nfsd/vfs.c |4 ++--
 fs/ocfs2/dlm/dlmfs.c  |8 
 fs/ocfs2/namei.c  |4 ++--
 fs/pipe.c |4 ++--
 fs/posix_acl.c|4 ++--
 fs/ramfs/inode.c  |4 ++--
 fs/reiserfs/namei.c   |4 ++--
 fs/sysv/ialloc.c  |4 ++--
 fs/udf/ialloc.c   |4 ++--
 fs/udf/namei.c|2 +-
 fs/ufs/ialloc.c   |4 ++--
 fs/xfs/linux-2.6/xfs_linux.h  |4 ++--
 fs/xfs/xfs_acl.c  |6 +++---
 fs/xfs/xfs_attr.c |2 +-
 fs/xfs/xfs_inode.c|6 +++---
 fs/xfs/xfs_vnodeops.c |8 
 include/linux/fs.h|2 +-
 include/linux/sched.h |3 +++
 ipc/mqueue.c  |4 ++--
 kernel/cgroup.c   |4 ++--
 mm/shmem.c|8 
 net/9p/client.c   |2 +-
 net/socket.c  |4 ++--
 net/sunrpc/auth.c |8 
 security/commoncap.c  |8 
 security/keys/key.c   |2 +-
 security/keys/keyctl.c|2 +-
 security/keys/request_key.c   |   10 +-
 security/keys/request_key_auth.c  |2 +-
 67 files changed, 163 insertions(+), 160 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 73e7c2e..ef383d9 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2206,8 +2206,8 @@ pfm_alloc_fd(struct file **cfile)
DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode));
 
inode-i_mode = S_IFCHR|S_IRUGO;
-   inode-i_uid  = current-fsuid;
-   inode-i_gid  = current-fsgid;
+   inode-i_uid  = current_fsuid();
+   inode-i_gid  = current_fsgid();
 
sprintf(name, [%lu], inode-i_ino);
this.name = name;
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index c0e968a..4efe7bf 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -85,8 +85,8 @@ spufs_new_inode(struct super_block *sb, int mode)
goto out;
 
inode-i_mode = mode;
-   inode-i_uid = current-fsuid;
-   inode-i_gid = current-fsgid;
+   inode-i_uid = current_fsuid();
+   inode-i_gid

[PATCH 0/7] Permit filesystem local caching

2007-12-05 Thread David Howells



These patches add local caching for network filesystems such as NFS and AFS.

The patches can roughly be broken down into a number of sets:

  (*) 01-keys-inc-payload.diff
  (*) 02-keys-search-keyring.diff
  (*) 03-keys-callout-blob.diff

  Three patches to the keyring code made to help the CIFS people.
  Included because of patches 05-08.

  (*) 04-keys-get-label.diff

  A patch to allow the security label of a key to be retrieved.
  Included because of patches 05-08.

  (*) 05-security-current-fsugid.diff
  (*) 06-security-separate-task-bits.diff
  (*) 07-security-subjective.diff
  (*) 08-security-kernel-service.diff

  Patches to permit the subjective security of a task to be overridden.
  All the security details in task_struct are decanted into a new struct
  that task_struct then has two pointers two: one that defines the
  objective security of that task (how other tasks may affect it) and one
  that defines the subjective security (how it may affect other objects).

  Note that I have dropped the idea of struct cred for the moment.  With
  the amount of stuff that was excluded from it, it wasn't actually any
  use to me.  However, it can be added later.

  Required for cachefiles.

  (*) 09-release-page.diff
  (*) 10-fscache-page-flags.diff
  (*) 11-add_wait_queue_tail.diff
  (*) 12-fscache.diff

  Patches to provide a local caching facility for network filesystems.

  (*) 13-cachefiles-ia64.diff
  (*) 14-cachefiles-ext3-f_mapping.diff
  (*) 15-cachefiles-write.diff
  (*) 16-cachefiles-monitor.diff
  (*) 17-cachefiles-export.diff
  (*) 18-cachefiles.diff

  Patches to provide a local cache in a directory of an already mounted
  filesystem.

  (*) 19-fscache-nfs.diff
  (*) 20-fscache-nfs-mount.diff
  (*) 21-fscache-nfs-display.diff

  Patches to provide NFS with local caching.

  (*) 22-fcrypt-bit-annotate.diff

  A fix for AFS.

  (*) 23-afs-testsetpageerror.diff
  (*) 24-afs-cancel_rejected_write.diff
  (*) 25-afs-rejected-writeback.diff
  (*) 26-afs-opID.diff
  (*) 27-afs-shared-writable-mmap.diff

  Patches to provide AFS with improved write support.

  (*) 28-fscache-afs.diff

  Patches to provide AFS with local caching.

There are some issues with these patches that I'd like advice on:

 (1) Is the security override stuff acceptable?

 (2) Should the audit context be placed in the task_security struct?

 (3) Should the task security context actually be shared by CLONE_THREAD?
 (should it be placed in struct thread_group_security).

 (4) How to handle superblock sharing in NFS?  (I've sent a separate email on
 this)

Andrew, Linus, can you please hold off on taking these patches for the moment.


--
A tarball of the patches is available at:


http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-25.tar.bz2


To use this version of CacheFiles, the cachefilesd-0.9 is also required.  It
is available as an SRPM:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm

Or as individual bits:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2
http://people.redhat.com/~dhowells/fscache/cachefilesd.fc
http://people.redhat.com/~dhowells/fscache/cachefilesd.if
http://people.redhat.com/~dhowells/fscache/cachefilesd.te
http://people.redhat.com/~dhowells/fscache/cachefilesd.spec

The .fc, .if and .te files are for manipulating SELinux.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 24/28] AFS: Add a function to excise a rejected write from the pagecache [try #2]

2007-12-05 Thread David Howells

Add a function - cancel_rejected_write() - to excise a rejected write from the
pagecache.  This function is related to the truncation family of routines.  It
permits the pages modified by a network filesystem client (such as AFS) to be
excised and discarded from the pagecache if the attempt to write them back to
the server fails.

The dirty and writeback states of the afflicted pages are cancelled and the
pages themselves are detached for recycling.  All PTEs referring to those
pages are removed.

Note that the locking is tricky as it's very easy to deadlock against
truncate() and other routines once the pages have been unlocked as part of the
writeback process.  To this end, the PG_error flag is set, then the
PG_writeback flag is cleared, and only *then* can lock_page() be called.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/mm.h |5 ++-
 mm/truncate.c  |   83 
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 520238c..438270f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1005,12 +1005,13 @@ extern int do_munmap(struct mm_struct *, unsigned long, 
size_t);
 
 extern unsigned long do_brk(unsigned long, unsigned long);
 
-/* filemap.c */
-extern unsigned long page_unuse(struct page *);
+/* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
   loff_t lstart, loff_t lend);
+extern void cancel_rejected_write(struct address_space *, pgoff_t, pgoff_t);
 
+/* filemap.c */
 /* generic vm_area_ops exported for stackable file systems */
 extern int filemap_fault(struct vm_area_struct *, struct vm_fault *);
 
diff --git a/mm/truncate.c b/mm/truncate.c
index 5b7d1c5..95fc1a8 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -465,3 +465,86 @@ int invalidate_inode_pages2(struct address_space *mapping)
return invalidate_inode_pages2_range(mapping, 0, -1);
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
+
+/*
+ * Cancel that part of a rejected write that affects a particular page
+ */
+static void cancel_rejected_page(struct address_space *mapping,
+struct page *page, pgoff_t *_next)
+{
+   if (!TestSetPageError(page)) {
+   /* can't lock the page until we've cleared PG_writeback lest we
+* deadlock with truncate (amongst other things) */
+   end_page_writeback(page);
+   if (page-mapping == mapping) {
+   lock_page(page);
+   if (page-mapping == mapping) {
+   truncate_complete_page(mapping, page);
+   *_next = page-index + 1;
+   }
+   unlock_page(page);
+   }
+   } else if (PageWriteback(page) || PageDirty(page)) {
+   BUG();
+   }
+}
+
+/**
+ * cancel_rejected_write - Cancel a write on a contiguous set of pages
+ * @mapping: mapping affected
+ * @start: first page in set
+ * @end: last page in set
+ *
+ * Cancel a write of a contiguous set of pages when the writeback was rejected
+ * by the target medium or server.
+ *
+ * The pages in question are detached and discarded from the pagecache, and the
+ * writeback and dirty states are cleared prior to invalidation.  The caller
+ * must make sure that all the pages in the range are present in the pagecache,
+ * and the caller must hold PG_writeback on each of them.  NOTE! All the pages
+ * are locked and unlocked as part of this process, so the caller must take
+ * care to avoid deadlock.
+ *
+ * The PTEs pointing to those pages are also cleared, leading to the PTEs being
+ * reset when new pages are allocated and the contents reloaded.
+ */
+void cancel_rejected_write(struct address_space *mapping,
+  pgoff_t start, pgoff_t end)
+{
+   struct pagevec pvec;
+   pgoff_t n;
+   int i;
+
+   BUG_ON(mapping-nrpages  end - start + 1);
+
+   /* dispose of any PTEs pointing to the affected pages */
+   unmap_mapping_range(mapping,
+   (loff_t)start  PAGE_CACHE_SHIFT,
+   (loff_t)(end - start + 1)  PAGE_CACHE_SHIFT,
+   0);
+
+   pagevec_init(pvec, 0);
+   do {
+   cond_resched();
+   n = end - start + 1;
+   if (n  PAGEVEC_SIZE)
+   n = PAGEVEC_SIZE;
+   n = pagevec_lookup(pvec, mapping, start, n);
+   for (i = 0; i  n; i++) {
+   struct page *page = pvec.pages[i];
+
+   if (page-index  start || page-index  end)
+   continue;
+   start++;
+   cancel_rejected_page(mapping, page, start

[PATCH 22/28] fcrypt endianness misannotations [try #2]

2007-12-05 Thread David Howells

Signed-off-by: Al Viro [EMAIL PROTECTED]
---

 crypto/fcrypt.c |   88 ---
 1 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/crypto/fcrypt.c b/crypto/fcrypt.c
index d161949..a32cb68 100644
--- a/crypto/fcrypt.c
+++ b/crypto/fcrypt.c
@@ -51,7 +51,7 @@
 #define ROUNDS 16
 
 struct fcrypt_ctx {
-   u32 sched[ROUNDS];
+   __be32 sched[ROUNDS];
 };
 
 /* Rotate right two 32 bit numbers as a 56 bit number */
@@ -73,8 +73,8 @@ do {  
\
  * /afs/transarc.com/public/afsps/afs.rel31b.export-src/rxkad/sboxes.h
  */
 #undef Z
-#define Z(x) __constant_be32_to_cpu(x  3)
-static const u32 sbox0[256] = {
+#define Z(x) __constant_cpu_to_be32(x  3)
+static const __be32 sbox0[256] = {
Z(0xea), Z(0x7f), Z(0xb2), Z(0x64), Z(0x9d), Z(0xb0), Z(0xd9), Z(0x11),
Z(0xcd), Z(0x86), Z(0x86), Z(0x91), Z(0x0a), Z(0xb2), Z(0x93), Z(0x06),
Z(0x0e), Z(0x06), Z(0xd2), Z(0x65), Z(0x73), Z(0xc5), Z(0x28), Z(0x60),
@@ -110,8 +110,8 @@ static const u32 sbox0[256] = {
 };
 
 #undef Z
-#define Z(x) __constant_be32_to_cpu((x  27) | (x  5))
-static const u32 sbox1[256] = {
+#define Z(x) __constant_cpu_to_be32((x  27) | (x  5))
+static const __be32 sbox1[256] = {
Z(0x77), Z(0x14), Z(0xa6), Z(0xfe), Z(0xb2), Z(0x5e), Z(0x8c), Z(0x3e),
Z(0x67), Z(0x6c), Z(0xa1), Z(0x0d), Z(0xc2), Z(0xa2), Z(0xc1), Z(0x85),
Z(0x6c), Z(0x7b), Z(0x67), Z(0xc6), Z(0x23), Z(0xe3), Z(0xf2), Z(0x89),
@@ -147,8 +147,8 @@ static const u32 sbox1[256] = {
 };
 
 #undef Z
-#define Z(x) __constant_be32_to_cpu(x  11)
-static const u32 sbox2[256] = {
+#define Z(x) __constant_cpu_to_be32(x  11)
+static const __be32 sbox2[256] = {
Z(0xf0), Z(0x37), Z(0x24), Z(0x53), Z(0x2a), Z(0x03), Z(0x83), Z(0x86),
Z(0xd1), Z(0xec), Z(0x50), Z(0xf0), Z(0x42), Z(0x78), Z(0x2f), Z(0x6d),
Z(0xbf), Z(0x80), Z(0x87), Z(0x27), Z(0x95), Z(0xe2), Z(0xc5), Z(0x5d),
@@ -184,8 +184,8 @@ static const u32 sbox2[256] = {
 };
 
 #undef Z
-#define Z(x) __constant_be32_to_cpu(x  19)
-static const u32 sbox3[256] = {
+#define Z(x) __constant_cpu_to_be32(x  19)
+static const __be32 sbox3[256] = {
Z(0xa9), Z(0x2a), Z(0x48), Z(0x51), Z(0x84), Z(0x7e), Z(0x49), Z(0xe2),
Z(0xb5), Z(0xb7), Z(0x42), Z(0x33), Z(0x7d), Z(0x5d), Z(0xa6), Z(0x12),
Z(0x44), Z(0x48), Z(0x6d), Z(0x28), Z(0xaa), Z(0x20), Z(0x6d), Z(0x57),
@@ -225,7 +225,7 @@ static const u32 sbox3[256] = {
  */
 #define F_ENCRYPT(R, L, sched) \
 do {   \
-   union lc4 { u32 l; u8 c[4]; } u;\
+   union lc4 { __be32 l; u8 c[4]; } u; \
u.l = sched ^ R;\
L ^= sbox0[u.c[0]] ^ sbox1[u.c[1]] ^ sbox2[u.c[2]] ^ sbox3[u.c[3]]; \
 } while(0)
@@ -237,7 +237,7 @@ static void fcrypt_encrypt(struct crypto_tfm *tfm, u8 *dst, 
const u8 *src)
 {
const struct fcrypt_ctx *ctx = crypto_tfm_ctx(tfm);
struct {
-   u32 l, r;
+   __be32 l, r;
} X;
 
memcpy(X, src, sizeof(X));
@@ -269,7 +269,7 @@ static void fcrypt_decrypt(struct crypto_tfm *tfm, u8 *dst, 
const u8 *src)
 {
const struct fcrypt_ctx *ctx = crypto_tfm_ctx(tfm);
struct {
-   u32 l, r;
+   __be32 l, r;
} X;
 
memcpy(X, src, sizeof(X));
@@ -328,22 +328,22 @@ static int fcrypt_setkey(struct crypto_tfm *tfm, const u8 
*key, unsigned int key
k |= (*key)  1;
 
/* Use lower 32 bits for schedule, rotate by 11 each round (16 times) */
-   ctx-sched[0x0] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x1] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x2] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x3] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x4] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x5] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x6] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x7] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x8] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0x9] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0xa] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0xb] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0xc] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0xd] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0xe] = be32_to_cpu(k); ror56_64(k, 11);
-   ctx-sched[0xf] = be32_to_cpu(k);
+   ctx-sched[0x0] = cpu_to_be32(k); ror56_64(k, 11);
+   ctx-sched[0x1] = cpu_to_be32(k); ror56_64(k, 11);
+   ctx-sched[0x2] = cpu_to_be32(k); ror56_64(k, 11);
+   ctx-sched[0x3] = cpu_to_be32(k); ror56_64(k, 11);
+   ctx-sched[0x4] = cpu_to_be32(k); ror56_64(k, 11);
+   ctx-sched[0x5] = cpu_to_be32(k);

[PATCH 15/28] CacheFiles: Add a hook to write a single page of data to an inode [try #2]

2007-12-05 Thread David Howells

Add an address space operation to write one single page of data to an inode at
a page-aligned location (thus permitting the implementation to be highly
optimised).  The data source is a single page.

This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.

Supply a generic implementation for this that uses the write_begin() and
write_end() address_space operations to bind a copy directly into the page
cache.

Hook the Ext2 and Ext3 operations to the generic implementation.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/ext2/inode.c|2 ++
 fs/ext3/inode.c|3 +++
 include/linux/fs.h |7 ++
 mm/filemap.c   |   61 
 4 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index b1ab32a..cfa56e6 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -796,6 +796,7 @@ const struct address_space_operations ext2_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 const struct address_space_operations ext2_aops_xip = {
@@ -814,6 +815,7 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 /*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index bc918d3..435c684 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1780,6 +1780,7 @@ static const struct address_space_operations 
ext3_ordered_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_writeback_aops = {
@@ -1794,6 +1795,7 @@ static const struct address_space_operations 
ext3_writeback_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_journalled_aops = {
@@ -1807,6 +1809,7 @@ static const struct address_space_operations 
ext3_journalled_aops = {
.bmap   = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage= ext3_releasepage,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 void ext3_set_aops(struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 850d3fc..a3c3369 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -479,6 +479,11 @@ struct address_space_operations {
int (*migratepage) (struct address_space *,
struct page *, struct page *);
int (*launder_page) (struct page *);
+   /* write the contents of the source page over the page at the specified
+* index in the target address space (the source page does not need to
+* be related to the target address space) */
+   int (*write_one_page)(struct address_space *, pgoff_t, struct page *);
+
 };
 
 /*
@@ -1801,6 +1806,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, 
const struct iovec *,
unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec 
*,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_page(struct address_space *,
+   pgoff_t, struct page *);
 extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, 
loff_t *ppos);
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t 
len, loff_t *ppos);
 extern void do_generic_mapping_read(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 8a8e5b8..bea1ba6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2321,6 +2321,67 @@ generic_file_buffered_write(struct kiocb *iocb, const 
struct iovec *iov,
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
+/**
+ * generic_file_buffered_write_one_page - Write a single page of data to an
+ * inode
+ * @mapping - The address space of the target inode
+ * @index - The target page in the target inode to fill
+ * @source - The data to write into the target page
+ *
+ * Write the data from the source page to the page in the nominated address
+ * space at the @index specified.  Note that the file will not be extended if
+ * the page crosses the EOF marker, in which case only the first part of the
+ * page will be written.
+ *
+ * The @source page does not need to have any association

[PATCH 1/7] KEYS: Increase the payload size when instantiating a key

2007-12-05 Thread David Howells

Increase the size of a payload that can be used to instantiate a key in
add_key() and keyctl_instantiate_key().  This permits huge CIFS SPNEGO blobs to
be passed around.  The limit is raised to 1MB.  If kmalloc() can't allocate a
buffer of sufficient size, vmalloc() will be tried instead.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyctl.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index d9ca15c..8ec8432 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -19,6 +19,7 @@
 #include linux/capability.h
 #include linux/string.h
 #include linux/err.h
+#include linux/vmalloc.h
 #include asm/uaccess.h
 #include internal.h
 
@@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type,
char type[32], *description;
void *payload;
long ret;
+   bool vm;
 
ret = -EINVAL;
-   if (plen  32767)
+   if (plen  1024 * 1024 - 1)
goto error;
 
/* draw all the data into kernel space */
@@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type,
/* pull the payload in if one was supplied */
payload = NULL;
 
+   vm = false;
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
-   if (!payload)
-   goto error2;
+   if (!payload) {
+   if (plen = PAGE_SIZE)
+   goto error2;
+   vm = true;
+   payload = vmalloc(plen);
+   if (!payload)
+   goto error2;
+   }
 
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type,
 
key_ref_put(keyring_ref);
  error3:
-   kfree(payload);
+   if (!vm)
+   kfree(payload);
+   else
+   vfree(payload);
  error2:
kfree(description);
  error:
@@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id,
key_ref_t keyring_ref;
void *payload;
long ret;
+   bool vm = false;
 
ret = -EINVAL;
-   if (plen  32767)
+   if (plen  1024 * 1024 - 1)
goto error;
 
/* the appropriate instantiation authorisation key must have been
@@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id,
if (_payload) {
ret = -ENOMEM;
payload = kmalloc(plen, GFP_KERNEL);
-   if (!payload)
-   goto error;
+   if (!payload) {
+   if (plen = PAGE_SIZE)
+   goto error;
+   vm = true;
+   payload = vmalloc(plen);
+   if (!payload)
+   goto error;
+   }
 
ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
@@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id,
}
 
 error2:
-   kfree(payload);
+   if (!vm)
+   kfree(payload);
+   else
+   vfree(payload);
 error:
return ret;
 

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/7] KEYS: Check starting keyring as part of search

2007-12-05 Thread David Howells

Check the starting keyring as part of the search to (a) see if that is what
we're searching for, and (b) to check it is still valid for searching.

The scenario:  User in process A does things that cause things to be
created in its process session keyring.  The user then does an su to
another user and starts a new process, B.  The two processes now
share the same process session keyring.

Process B does an NFS access which results in an upcall to gssd.
When gssd attempts to instantiate the context key (to be linked
into the process session keyring), it is denied access even though it
has an authorization key.

The order of calls is:

   keyctl_instantiate_key()
  lookup_user_key() (the default: case)
 search_process_keyrings(current)
search_process_keyrings(rka-context)   (recursive call)
   keyring_search_aux()

keyring_search_aux() verifies the keys and keyrings underneath the
top-level keyring it is given, but that top-level keyring is neither
fully validated nor checked to see if it is the thing being searched for.

This patch changes keyring_search_aux() to:
1) do more validation on the top keyring it is given and
2) check whether that top-level keyring is the thing being searched for


Signed-off-by: Kevin Coffman [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyring.c |   35 +++
 1 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 88292e3..76b89b2 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 
struct keyring_list *keylist;
struct timespec now;
-   unsigned long possessed;
+   unsigned long possessed, kflags;
struct key *keyring, *key;
key_ref_t key_ref;
long err;
@@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
now = current_kernel_time();
err = -EAGAIN;
sp = 0;
+   
+   /* firstly we should check to see if this top-level keyring is what we
+* are looking for */
+   key_ref = ERR_PTR(-EAGAIN);
+   kflags = keyring-flags;
+   if (keyring-type == type  match(keyring, description)) {
+   key = keyring;
+
+   /* check it isn't negative and hasn't expired or been
+* revoked */
+   if (kflags  (1  KEY_FLAG_REVOKED))
+   goto error_2;
+   if (key-expiry  now.tv_sec = key-expiry)
+   goto error_2;
+   key_ref = ERR_PTR(-ENOKEY);
+   if (kflags  (1  KEY_FLAG_NEGATIVE))
+   goto error_2;
+   goto found;
+   }
+
+   /* otherwise, the top keyring must not be revoked, expired, or
+* negatively instantiated if we are to search it */
+   key_ref = ERR_PTR(-EAGAIN);
+   if (kflags  ((1  KEY_FLAG_REVOKED) | (1  KEY_FLAG_NEGATIVE)) ||
+   (keyring-expiry  now.tv_sec = keyring-expiry))
+   goto error_2;
 
/* start processing a new keyring */
 descend:
@@ -331,13 +357,14 @@ descend:
/* iterate through the keys in this keyring first */
for (kix = 0; kix  keylist-nkeys; kix++) {
key = keylist-keys[kix];
+   kflags = key-flags;
 
/* ignore keys not of this type */
if (key-type != type)
continue;
 
/* skip revoked keys and expired keys */
-   if (test_bit(KEY_FLAG_REVOKED, key-flags))
+   if (kflags  (1  KEY_FLAG_REVOKED))
continue;
 
if (key-expiry  now.tv_sec = key-expiry)
@@ -352,8 +379,8 @@ descend:
context, KEY_SEARCH)  0)
continue;
 
-   /* we set a different error code if we find a negative key */
-   if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) {
+   /* we set a different error code if we pass a negative key */
+   if (kflags  (1  KEY_FLAG_NEGATIVE)) {
err = -ENOKEY;
continue;
}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/28] FS-Cache: Provide an add_wait_queue_tail() function [try #2]

2007-12-05 Thread David Howells

Provide an add_wait_queue_tail() function to add a waiter to the back of a
wait queue instead of the front.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 include/linux/wait.h |2 ++
 kernel/wait.c|   18 ++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..f1038d0 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -118,6 +118,8 @@ static inline int waitqueue_active(wait_queue_head_t *q)
 #define is_sync_wait(wait) (!(wait) || ((wait)-private))
 
 extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * 
wait));
+extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q,
+wait_queue_t *wait));
 extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, 
wait_queue_t * wait));
 extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * 
wait));
 
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..7acc9cc 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, 
wait_queue_t *wait)
 }
 EXPORT_SYMBOL(add_wait_queue);
 
+/**
+ * add_wait_queue_tail - Add a waiter to the back of a waitqueue
+ * @q: the wait queue to append the waiter to
+ * @wait: the waiter to be queued
+ *
+ * Add a waiter to the back of a waitqueue so that it gets woken up last.
+ */
+void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   unsigned long flags;
+
+   wait-flags = ~WQ_FLAG_EXCLUSIVE;
+   spin_lock_irqsave(q-lock, flags);
+   __add_wait_queue_tail(q, wait);
+   spin_unlock_irqrestore(q-lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_tail);
+
 void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t 
*wait)
 {
unsigned long flags;

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2007-12-05 Thread David Howells

Recruit a couple of page flags to aid in cache management.  The following extra
flags are defined:

 (1) PG_fscache (PG_owner_priv_2)

 The marked page is backed by a local cache and is pinning resources in the
 cache driver.

 (2) PG_fscache_write (PG_owner_priv_3)

 The marked page is being written to the local cache.  The page may not be
 modified whilst this is in progress.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to detect this.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/splice.c|2 +-
 include/linux/page-flags.h |   38 --
 include/linux/pagemap.h|   11 +++
 mm/filemap.c   |   16 
 mm/migrate.c   |2 +-
 mm/page_alloc.c|3 +++
 mm/readahead.c |9 +
 mm/swap.c  |4 ++--
 mm/swap_state.c|4 ++--
 mm/truncate.c  |   10 +-
 mm/vmscan.c|2 +-
 11 files changed, 83 insertions(+), 18 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 6bdcb61..61edad7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info 
*pipe,
 */
wait_on_page_writeback(page);
 
-   if (PagePrivate(page))
+   if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
 
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..fcc9e23 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -77,25 +77,30 @@
 #define PG_active   6
 #define PG_slab 7  /* slab debug (Suparna wants 
this) */
 
-#define PG_owner_priv_1 8  /* Owner use. If pagecache, fs 
may use*/
+#define PG_owner_priv_1 8  /* Owner use. fs may use in 
pagecache */
 #define PG_arch_1   9
 #define PG_reserved10
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
+#define PG_owner_priv_213  /* Owner use. fs may use in 
pagecache */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk16  /* Has blocks allocated on-disk 
*/
 #define PG_reclaim 17  /* To be reclaimed asap */
+#define PG_owner_priv_318  /* Owner use. fs may use in 
pagecache */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
-/* PG_owner_priv_1 users should have descriptive aliases */
+/* PG_owner_priv_1/2/3 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
+#define PG_fscache PG_owner_priv_2 /* Backed by local cache */
+#define PG_fscache_write   PG_owner_priv_3 /* Writing to local cache */
+
 
 #if (BITS_PER_LONG  32)
 /*
@@ -199,6 +204,24 @@ static inline void SetPageUptodate(struct page *page)
 #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback,  \
(page)-flags)
 
+#define PageFsCache(page)  test_bit(PG_fscache, (page)-flags)
+#define SetPageFsCache(page)   set_bit(PG_fscache, (page)-flags)
+#define ClearPageFsCache(page) clear_bit(PG_fscache, (page)-flags)
+#define TestSetPageFsCache(page) test_and_set_bit(PG_fscache, (page)-flags)
+#define TestClearPageFsCache(page) test_and_clear_bit(PG_fscache, \
+ (page)-flags)
+
+#define PageFsCacheWrite(page) test_bit(PG_fscache_write, \
+(page)-flags)
+#define SetPageFsCacheWrite(page)  set_bit(PG_fscache_write, \
+   (page)-flags)
+#define ClearPageFsCacheWrite(page)clear_bit(PG_fscache_write, \
+ (page)-flags)
+#define TestSetPageFsCacheWrite(page)  test_and_set_bit(PG_fscache_write, \
+(page)-flags)
+#define TestClearPageFsCacheWrite(page)
test_and_clear_bit(PG_fscache_write, \
+  (page)-flags)
+
 #define PageBuddy(page)test_bit(PG_buddy, (page)-flags)
 #define __SetPageBuddy(page

[PATCH 3/7] KEYS: Allow the callout data to be passed as a blob rather than a string

2007-12-05 Thread David Howells

Allow the callout data to be passed as a blob rather than a string for internal
kernel services that call any request_key_*() interface other than
request_key().  request_key() itself still takes a NUL-terminated string.

The functions that change are:

request_key_with_auxdata()
request_key_async()
request_key_async_with_auxdata()

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 Documentation/keys-request-key.txt |   11 +---
 Documentation/keys.txt |   14 +++---
 include/linux/key.h|9 ---
 security/keys/internal.h   |9 ---
 security/keys/keyctl.c |7 -
 security/keys/request_key.c|   49 ++--
 security/keys/request_key_auth.c   |   12 +
 7 files changed, 70 insertions(+), 41 deletions(-)

diff --git a/Documentation/keys-request-key.txt 
b/Documentation/keys-request-key.txt
index 266955d..09b55e4 100644
--- a/Documentation/keys-request-key.txt
+++ b/Documentation/keys-request-key.txt
@@ -11,26 +11,29 @@ request_key*():
 
struct key *request_key(const struct key_type *type,
const char *description,
-   const char *callout_string);
+   const char *callout_info);
 
 or:
 
struct key *request_key_with_auxdata(const struct key_type *type,
 const char *description,
-const char *callout_string,
+const char *callout_info,
+size_t callout_len,
 void *aux);
 
 or:
 
struct key *request_key_async(const struct key_type *type,
  const char *description,
- const char *callout_string);
+ const char *callout_info,
+ size_t callout_len);
 
 or:
 
struct key *request_key_async_with_auxdata(const struct key_type *type,
   const char *description,
-  const char *callout_string,
+  const char *callout_info,
+  size_t callout_len,
   void *aux);
 
 Or by userspace invoking the request_key system call:
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 51652d3..b82d38d 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -771,7 +771,7 @@ payload contents for more information.
 
struct key *request_key(const struct key_type *type,
const char *description,
-   const char *callout_string);
+   const char *callout_info);
 
 This is used to request a key or keyring with a description that matches
 the description specified according to the key type's match function. This
@@ -793,24 +793,28 @@ payload contents for more information.
 
struct key *request_key_with_auxdata(const struct key_type *type,
 const char *description,
-const char *callout_string,
+const void *callout_info,
+size_t callout_len,
 void *aux);
 
 This is identical to request_key(), except that the auxiliary data is
-passed to the key_type-request_key() op if it exists.
+passed to the key_type-request_key() op if it exists, and the callout_info
+is a blob of length callout_len, if given (the length may be 0).
 
 
 (*) A key can be requested asynchronously by calling one of:
 
struct key *request_key_async(const struct key_type *type,
  const char *description,
- const char *callout_string);
+ const void *callout_info,
+ size_t callout_len);
 
 or:
 
struct key *request_key_async_with_auxdata(const struct key_type *type,
   const char *description,
-  const char *callout_string,
+  const char *callout_info,
+  size_t callout_len,
   void *aux);
 
 which are asynchronous equivalents of request_key() and
diff --git a/include/linux/key.h b/include/linux/key.h
index fcdbd5e..4a6021a 100644
--- a/include/linux/key.h
+++ b/include/linux

[PATCH 5/7] Security: Change current-fs[ug]id to current_fs[ug]id()

2007-12-05 Thread David Howells

Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be
separated from the task_struct.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 arch/ia64/kernel/perfmon.c|4 ++--
 arch/powerpc/platforms/cell/spufs/inode.c |4 ++--
 drivers/isdn/capi/capifs.c|4 ++--
 drivers/usb/core/inode.c  |4 ++--
 fs/9p/fid.c   |2 +-
 fs/9p/vfs_inode.c |4 ++--
 fs/9p/vfs_super.c |4 ++--
 fs/affs/inode.c   |4 ++--
 fs/anon_inodes.c  |4 ++--
 fs/attr.c |4 ++--
 fs/bfs/dir.c  |4 ++--
 fs/cifs/cifsproto.h   |2 +-
 fs/cifs/dir.c |   12 ++--
 fs/cifs/inode.c   |8 
 fs/cifs/misc.c|4 ++--
 fs/coda/cache.c   |6 +++---
 fs/coda/upcall.c  |4 ++--
 fs/devpts/inode.c |4 ++--
 fs/dquot.c|2 +-
 fs/exec.c |4 ++--
 fs/ext2/balloc.c  |2 +-
 fs/ext2/ialloc.c  |4 ++--
 fs/ext2/ioctl.c   |2 +-
 fs/ext3/balloc.c  |2 +-
 fs/ext3/ialloc.c  |4 ++--
 fs/ext4/balloc.c  |2 +-
 fs/ext4/ialloc.c  |4 ++--
 fs/fuse/dev.c |4 ++--
 fs/gfs2/inode.c   |   10 +-
 fs/hfs/inode.c|4 ++--
 fs/hfsplus/inode.c|4 ++--
 fs/hpfs/namei.c   |   24 
 fs/hugetlbfs/inode.c  |   16 
 fs/jffs2/fs.c |4 ++--
 fs/jfs/jfs_inode.c|4 ++--
 fs/locks.c|2 +-
 fs/minix/bitmap.c |4 ++--
 fs/namei.c|8 
 fs/nfsd/vfs.c |4 ++--
 fs/ocfs2/dlm/dlmfs.c  |8 
 fs/ocfs2/namei.c  |4 ++--
 fs/pipe.c |4 ++--
 fs/posix_acl.c|4 ++--
 fs/ramfs/inode.c  |4 ++--
 fs/reiserfs/namei.c   |4 ++--
 fs/sysv/ialloc.c  |4 ++--
 fs/udf/ialloc.c   |4 ++--
 fs/udf/namei.c|2 +-
 fs/ufs/ialloc.c   |4 ++--
 fs/xfs/linux-2.6/xfs_linux.h  |4 ++--
 fs/xfs/xfs_acl.c  |6 +++---
 fs/xfs/xfs_attr.c |2 +-
 fs/xfs/xfs_inode.c|6 +++---
 fs/xfs/xfs_vnodeops.c |8 
 include/linux/fs.h|2 +-
 include/linux/sched.h |3 +++
 ipc/mqueue.c  |4 ++--
 kernel/cgroup.c   |4 ++--
 mm/shmem.c|8 
 net/9p/client.c   |2 +-
 net/socket.c  |4 ++--
 net/sunrpc/auth.c |8 
 security/commoncap.c  |8 
 security/keys/key.c   |2 +-
 security/keys/keyctl.c|2 +-
 security/keys/request_key.c   |   10 +-
 security/keys/request_key_auth.c  |2 +-
 67 files changed, 163 insertions(+), 160 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 73e7c2e..ef383d9 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2206,8 +2206,8 @@ pfm_alloc_fd(struct file **cfile)
DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode));
 
inode-i_mode = S_IFCHR|S_IRUGO;
-   inode-i_uid  = current-fsuid;
-   inode-i_gid  = current-fsgid;
+   inode-i_uid  = current_fsuid();
+   inode-i_gid  = current_fsgid();
 
sprintf(name, [%lu], inode-i_ino);
this.name = name;
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index c0e968a..4efe7bf 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -85,8 +85,8 @@ spufs_new_inode(struct super_block *sb, int mode)
goto out;
 
inode-i_mode = mode;
-   inode-i_uid = current-fsuid;
-   inode-i_gid = current-fsgid;
+   inode-i_uid = current_fsuid();
+   inode-i_gid

[PATCH 04/28] KEYS: Add keyctl function to get a security label [try #2]

2007-12-05 Thread David Howells

Add a keyctl() function to get the security label of a key.

The following is added to Documentation/keys.txt:

 (*) Get the LSM security context attached to a key.

long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
size_t buflen)

 This function returns a string that represents the LSM security context
 attached to a key in the buffer provided.

 Unless there's an error, it always returns the amount of data it could
 produce, even if that's too big for the buffer, but it won't copy more
 than requested to userspace. If the buffer pointer is NULL then no copy
 will take place.

 A NUL character is included at the end of the string if the buffer is
 sufficiently big.  This is included in the returned count.  If no LSM is
 in force then an empty string will be returned.

 A process must have view permission on the key for this function to be
 successful.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 Documentation/keys.txt   |   21 +++
 include/linux/keyctl.h   |1 +
 include/linux/security.h |   20 +-
 security/dummy.c |8 ++
 security/keys/compat.c   |3 ++
 security/keys/keyctl.c   |   66 ++
 security/security.c  |5 +++
 security/selinux/hooks.c |   21 +--
 8 files changed, 141 insertions(+), 4 deletions(-)

diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index b82d38d..be424b0 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -711,6 +711,27 @@ The keyctl syscall functions are:
  The assumed authoritative key is inherited across fork and exec.
 
 
+ (*) Get the LSM security context attached to a key.
+
+   long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
+   size_t buflen)
+
+ This function returns a string that represents the LSM security context
+ attached to a key in the buffer provided.
+
+ Unless there's an error, it always returns the amount of data it could
+ produce, even if that's too big for the buffer, but it won't copy more
+ than requested to userspace. If the buffer pointer is NULL then no copy
+ will take place.
+
+ A NUL character is included at the end of the string if the buffer is
+ sufficiently big.  This is included in the returned count.  If no LSM is
+ in force then an empty string will be returned.
+
+ A process must have view permission on the key for this function to be
+ successful.
+
+
 ===
 KERNEL SERVICES
 ===
diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h
index 3365945..656ee6b 100644
--- a/include/linux/keyctl.h
+++ b/include/linux/keyctl.h
@@ -49,5 +49,6 @@
 #define KEYCTL_SET_REQKEY_KEYRING  14  /* set default request-key 
keyring */
 #define KEYCTL_SET_TIMEOUT 15  /* set key timeout */
 #define KEYCTL_ASSUME_AUTHORITY16  /* assume request_key() 
authorisation */
+#define KEYCTL_GET_SECURITY17  /* get key security label */
 
 #endif /*  _LINUX_KEYCTL_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index ac05083..8d9e946 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -959,6 +959,17 @@ struct request_sock;
  * @perm describes the combination of permissions required of this key.
  * Return 1 if permission granted, 0 if permission denied and -ve it the
  *  normal permissions model should be effected.
+ * @key_getsecurity:
+ * Get a textual representation of the security context attached to a key
+ * for the purposes of honouring KEYCTL_GETSECURITY.  This function
+ * allocates the storage for the NUL-terminated string and the caller
+ * should free it.
+ * @key points to the key to be queried.
+ * @_buffer points to a pointer that should be set to point to the
+ *  resulting string (if no label or an error occurs).
+ * Return the length of the string (including terminating NUL) or -ve if
+ *  an error.
+ * May also return 0 (and a NULL buffer pointer) if there is no label.
  *
  * Security hooks affecting all System V IPC operations.
  *
@@ -1437,7 +1448,7 @@ struct security_operations {
int (*key_permission)(key_ref_t key_ref,
  struct task_struct *context,
  key_perm_t perm);
-
+   int (*key_getsecurity)(struct key *key, char **_buffer);
 #endif /* CONFIG_KEYS */
 
 };
@@ -2567,6 +2578,7 @@ int security_key_alloc(struct key *key, struct 
task_struct *tsk, unsigned long f
 void security_key_free(struct key *key);
 int security_key_permission(key_ref_t key_ref,
struct task_struct *context, key_perm_t perm);
+int security_key_getsecurity(struct key *key, char **_buffer);
 
 #else
 
@@ -2588,6 +2600,12 @@ static inline int security_key_permission(key_ref_t 
key_ref

[PATCH 25/28] AFS: Improve handling of a rejected writeback [try #2]

2007-12-05 Thread David Howells

Improve the handling of the case of a server rejecting an attempt to write back
a cached write.  AFS operates a write-back cache, so the following sequence of
events can theoretically occur:

CLIENT 1CLIENT 2
=== ===
cat data /the/file
 (sits in pagecache)
fs setacl -dir /the/dir/of/the/file \
-acl system:administrators rlidka
 (write permission removed for client 1)
sync
 (writeback attempt fails)

The way AFS attempts to handle this is:

 (1) The affected region will be excised and discarded on the basis that it
 can't be written back, yet we don't want it lurking in the page cache
 either.  The contents of the affected region will be reread from the
 server when called for again.

 (2) The EOF size will be set to the current server-based file size - usually
 that which it was before the affected write was made - assuming no
 conflicting write has been appended, and assuming the affected write
 extended the file.


This patch makes the following changes:

 (1) Zero-length short reads don't produce EBADMSG now just because the OpenAFS
 server puts a silly value as the size of the returned data.  This prevents
 excised pages beyond the revised EOF being reinstantiated with a surprise
 PG_error.

 (2) Writebacks can now be put into a 'rejected' state in which all further
 attempts to write them back will result in excision of the affected pages
 instead.

 (3) Preparing a page for overwriting now reads the whole page instead of just
 those parts of it that aren't to be covered by the copy to be made.  This
 handles the possibility that the copy might fail on EFAULT.  Corollary to
 this, PG_update can now be set by afs_prepare_page() on behalf of
 afs_prepare_write() rather than setting it in afs_commit_write().

 (4) In the case of a conflicting write, afs_prepare_write() will attempt to
 flush the write to the server, and will then wait for PG_writeback to go
 away - after unlocking the page.  This helps prevent deadlock against the
 writeback-rejection handler.  AOP_TRUNCATED_PAGE is then returned to the
 caller to signify that the page has been unlocked, and that it should be
 revalidated.

 (5) The writeback-rejection handler now calls cancel_rejected_write() added by
 the previous patch to excise the affected pages rather than clearing the
 PG_uptodate flag on all the pages.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/fsclient.c |4 +
 fs/afs/internal.h |1 
 fs/afs/write.c|  154 -
 3 files changed, 85 insertions(+), 74 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 023b95b..04584c0 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -353,7 +353,9 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call,
 
call-count = ntohl(call-tmp);
_debug(DATA length: %u, call-count);
-   if (call-count  PAGE_SIZE)
+   if ((s32) call-count  0)
+   call-count = 0; /* access completely beyond EOF */
+   else if (call-count  PAGE_SIZE)
return -EBADMSG;
call-offset = 0;
call-unmarshall++;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 5ca3625..84b90b0 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -156,6 +156,7 @@ struct afs_writeback {
AFS_WBACK_PENDING,  /* write pending */
AFS_WBACK_CONFLICTING,  /* conflicting writes posted */
AFS_WBACK_WRITING,  /* writing back */
+   AFS_WBACK_REJECTED, /* the writeback was rejected */
AFS_WBACK_COMPLETE  /* the writeback record has 
been unlinked */
} state __attribute__((packed));
 };
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 9a849ad..add5892 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -81,18 +81,16 @@ void afs_put_writeback(struct afs_writeback *wb)
 }
 
 /*
- * partly or wholly fill a page that's under preparation for writing
+ * fill a page that's under preparation for writing
  */
 static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
-unsigned start, unsigned len, struct page *page)
+unsigned len, struct page *page)
 {
int ret;
 
-   _enter(,,%u,%u, start, len);
+   _enter(,,%u,, len);
 
-   ASSERTCMP(start + len, =, PAGE_SIZE);
-
-   ret = afs_vnode_fetch_data(vnode, key, start, len, page);
+   ret = afs_vnode_fetch_data(vnode, key, 0, len, page);
if (ret  0) {
if (ret == -ENOENT) {
_debug(got NOENT from server

[PATCH 26/28] AF_RXRPC: Save the operation ID for debugging [try #2]

2007-12-05 Thread David Howells

Save the operation ID to be used with a call that we're making for display
through /proc/net/rxrpc_calls.  This helps debugging stuck operations as we
then know what they are.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/fsclient.c   |   32 +++-
 fs/afs/rxrpc.c  |1 +
 fs/afs/vlclient.c   |2 ++
 include/net/af_rxrpc.h  |1 +
 net/rxrpc/af_rxrpc.c|3 +++
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/ar-proc.c |7 ---
 7 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 04584c0..a468f2d 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -287,6 +287,7 @@ int afs_fs_fetch_file_status(struct afs_server *server,
call-reply2 = volsync;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(FSFETCHSTATUS);
 
/* marshall the parameters */
bp = call-request;
@@ -316,7 +317,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call,
case 0:
call-offset = 0;
call-unmarshall++;
-   if (call-operation_ID != FSFETCHDATA64) {
+   if (call-operation_ID != htonl(FSFETCHDATA64)) {
call-unmarshall++;
goto no_msw;
}
@@ -464,7 +465,7 @@ static int afs_fs_fetch_data64(struct afs_server *server,
call-reply3 = buffer;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
-   call-operation_ID = FSFETCHDATA64;
+   call-operation_ID = htonl(FSFETCHDATA64);
 
/* marshall the parameters */
bp = call-request;
@@ -509,7 +510,7 @@ int afs_fs_fetch_data(struct afs_server *server,
call-reply3 = buffer;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
-   call-operation_ID = FSFETCHDATA;
+   call-operation_ID = htonl(FSFETCHDATA);
 
/* marshall the parameters */
bp = call-request;
@@ -577,6 +578,7 @@ int afs_fs_give_up_callbacks(struct afs_server *server,
 
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(FSGIVEUPCALLBACKS);
 
/* marshall the parameters */
bp = call-request;
@@ -683,10 +685,11 @@ int afs_fs_create(struct afs_server *server,
call-reply4 = newcb;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(S_ISDIR(mode) ? FSMAKEDIR : FSCREATEFILE);
 
/* marshall the parameters */
bp = call-request;
-   *bp++ = htonl(S_ISDIR(mode) ? FSMAKEDIR : FSCREATEFILE);
+   *bp++ = call-operation_ID;
*bp++ = htonl(vnode-fid.vid);
*bp++ = htonl(vnode-fid.vnode);
*bp++ = htonl(vnode-fid.unique);
@@ -772,10 +775,11 @@ int afs_fs_remove(struct afs_server *server,
call-reply = vnode;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(isdir ? FSREMOVEDIR : FSREMOVEFILE);
 
/* marshall the parameters */
bp = call-request;
-   *bp++ = htonl(isdir ? FSREMOVEDIR : FSREMOVEFILE);
+   *bp++ = call-operation_ID;
*bp++ = htonl(vnode-fid.vid);
*bp++ = htonl(vnode-fid.vnode);
*bp++ = htonl(vnode-fid.unique);
@@ -857,6 +861,7 @@ int afs_fs_link(struct afs_server *server,
call-reply2 = vnode;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(FSLINK);
 
/* marshall the parameters */
bp = call-request;
@@ -954,6 +959,7 @@ int afs_fs_symlink(struct afs_server *server,
call-reply3 = newstatus;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(FSSYMLINK);
 
/* marshall the parameters */
bp = call-request;
@@ -1062,6 +1068,7 @@ int afs_fs_rename(struct afs_server *server,
call-reply2 = new_dvnode;
call-service_id = FS_SERVICE;
call-port = htons(AFS_FS_PORT);
+   call-operation_ID = htonl(FSRENAME);
 
/* marshall the parameters */
bp = call-request;
@@ -1178,6 +1185,7 @@ static int afs_fs_store_data64(struct afs_server *server,
call-last_to = to;
call-send_pages = true;
call-store_version = vnode-status.data_version + 1;
+   call-operation_ID = htonl(FSSTOREDATA64);
 
/* marshall the parameters */
bp = call-request;
@@ -1255,6 +1263,7 @@ int afs_fs_store_data(struct afs_server *server, struct 
afs_writeback *wb,
call-last_to = to;
call-send_pages = true;
call-store_version = vnode-status.data_version + 1;
+   call-operation_ID = htonl(FSSTOREDATA);
 
/* marshall the parameters */
bp = call-request;
@@ -1303,7 +1312,8 @@ static int afs_deliver_fs_store_status(struct afs_call 
*call

[PATCH 27/28] AFS: Implement shared-writable mmap [try #2]

2007-12-05 Thread David Howells

Implement shared-writable mmap for AFS.

The key with which to access the file is obtained from the VMA at the point
where the PTE is made writable by the page_mkwrite() VMA op and cached in the
affected page.

If there's an outstanding write on the page made with a different key, then
page_mkwrite() will flush it before attaching a record of the new key.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/file.c |   20 +++-
 fs/afs/internal.h |1 +
 fs/afs/write.c|   35 +++
 3 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 525f7c5..1323df4 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -22,6 +22,7 @@ static int afs_readpage(struct file *file, struct page *page);
 static void afs_invalidatepage(struct page *page, unsigned long offset);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 static int afs_launder_page(struct page *page);
+static int afs_mmap(struct file *file, struct vm_area_struct *vma);
 
 const struct file_operations afs_file_operations = {
.open   = afs_open,
@@ -31,7 +32,7 @@ const struct file_operations afs_file_operations = {
.write  = do_sync_write,
.aio_read   = generic_file_aio_read,
.aio_write  = afs_file_write,
-   .mmap   = generic_file_readonly_mmap,
+   .mmap   = afs_mmap,
.splice_read= generic_file_splice_read,
.fsync  = afs_fsync,
.lock   = afs_lock,
@@ -56,6 +57,11 @@ const struct address_space_operations afs_fs_aops = {
.writepages = afs_writepages,
 };
 
+static struct vm_operations_struct afs_file_vm_ops = {
+   .fault  = filemap_fault,
+   .page_mkwrite   = afs_page_mkwrite,
+};
+
 /*
  * open an AFS file or directory and attach a key to it
  */
@@ -295,3 +301,15 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
_leave( = 0);
return 0;
 }
+
+/*
+ * memory map part of an AFS file
+ */
+static int afs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   _enter();
+
+   file_accessed(file);
+   vma-vm_ops = afs_file_vm_ops;
+   return 0;
+}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 84b90b0..b3da9ab 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -742,6 +742,7 @@ extern ssize_t afs_file_write(struct kiocb *, const struct 
iovec *,
  unsigned long, loff_t);
 extern int afs_writeback_all(struct afs_vnode *);
 extern int afs_fsync(struct file *, struct dentry *, int);
+extern int afs_page_mkwrite(struct vm_area_struct *, struct page *);
 
 
 /*/
diff --git a/fs/afs/write.c b/fs/afs/write.c
index add5892..8a3e9e2 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -155,6 +155,8 @@ static int afs_prepare_page(struct afs_vnode *vnode, struct 
page *page,
  * prepare to perform part of a write to a page
  * - the caller holds the page locked, preventing it from being written out or
  *   modified by anyone else
+ * - may be called from afs_page_mkwrite() to set up a page for modification
+ *   through shared-writable mmap
  */
 int afs_prepare_write(struct file *file, struct page *page,
  unsigned offset, unsigned to)
@@ -833,3 +835,36 @@ int afs_fsync(struct file *file, struct dentry *dentry, 
int datasync)
_leave( = %d, ret);
return ret;
 }
+
+/*
+ * notification that a previously read-only page is about to become writable
+ * - if it returns an error, the caller will deliver a bus error signal
+ *
+ * we use this to make a record of the key with which the writeback should be
+ * performed and to flush any outstanding writes made with a different key
+ *
+ * the key to be used is attached to the struct file pinned by the VMA
+ */
+int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+   struct afs_vnode *vnode = AFS_FS_I(vma-vm_file-f_mapping-host);
+   struct key *key = vma-vm_file-private_data;
+   int ret;
+
+   _enter({{%x:%u},%x},{%lx},
+  vnode-fid.vid, vnode-fid.vnode, key_serial(key), page-index);
+
+   do {
+   lock_page(page);
+   if (page-mapping == vma-vm_file-f_mapping)
+   ret = afs_prepare_write(vma-vm_file, page, 0,
+   PAGE_SIZE);
+   else
+   ret = 0; /* seems there was interference - let the
+ * caller deal with it */
+   unlock_page(page);
+   } while (ret == AOP_TRUNCATED_PAGE);
+
+   _leave( = %d, ret);
+   return ret;
+}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] KEYS: Add keyctl function to get a security label

2007-12-05 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 inode_getsecurity and getprocattr directly return the strings.
 Admittedly, the whole interface could be cleaned up and made far more
 consistent, but I don't think he necessarily has to go through the
 getsecid + secid_to_secctx sequence if he only wants the secctx.

It's what Daniel Walsh wanted.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-10 Thread David Howells


Stephen Smalley [EMAIL PROTECTED] wrote:

  +   tsec-create_sid = SECINITSID_UNLABELED;
  +   tsec-keycreate_sid = SECINITSID_UNLABELED;
  +   tsec-sockcreate_sid = SECINITSID_UNLABELED;

Cleared means what?  Setting to 0?  Or is there some other constant I should
use for that?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-10 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 From a config file whose pathname would be provided by libselinux (ala
 the way in which dbusd imports contexts), or directly as a context
 returned by a libselinux function.

That sounds too SELinux specific.  How do I do it so that it works for any
LSM?

Is linking against libselinux is a viable option if it's not available under
all LSM models?  Is it available under all LSM models?  Perhaps Casey can
answer this one.

  I use to do that, but someone objected...  Possibly Karl MacMillan.
 
 Yes, but I think I disagreed then too.

So, who's right?

 It doesn't fit with how other users of security_kernel_act_as() will
 likely want to work (they will want to just set the context to a
 specified value, whether one obtained from the client or from some local
 source), nor with how type transitions normally work (exec, with the
 program type as the second type field).  I think it will just cause
 confusion and subtle breakage.

It's causing me lots of confusion as it is.  I have been / am being told by
different people to do different things just in dealing with SELinux, and
various people are raising extra requirements or restrictions beyond that.
There doesn't seem to be a consensus.

It sounds like the best option is just to have the kernel nick the userspace
daemon's security context and use that as is, and junk all the restrictions on
what the daemon can do so that the kernel isn't too restricted.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-11 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 All your code has to do is invoke a function provided by libselinux.

Calling libselinux means it's a special case for a specific LSM.

I think the best way to do this, then, has to be to dlopen the appropriate LSM
library.  That way I don't need to do any conditional compilation or linking,
but can build all the bits in to cachefilesd and have the appropriate one
selected by the /etc/cachefilesd.conf.

So, what do I invoke in libselinux, how do I configure it, and how do I
integrate the config into my RPM and install it?

And then what does it give me that I can hand to the kernel (a context string
for SELinux, I presume), how do I get the kernel to make a check on it, how do
I configure the check and how do I install that config from my RPM (I presume
I just need to modify the .fc, .if and .te files that I have already)?

 That mostly works, but it means that an update to policy may require an
 update to /etc/cachefilesd.conf, or that switching from one policy to
 another might likewise require changing that file.  Versus using a
 separate policy-provided config file for the label.

Whilst that's a fair point, if it's in a config file somewhere, then someone
may want to change it or someone may want to provide a second file for a
second cache with a different security label.

 BTW, as should be obvious, some LSMs aren't label-based at all, so it
 would need to be optional.

Aargh.  In which case it might not be possible to make the SELinux context
passing from userspace - kernel generic for all LSMs:-(

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 You may need to have an application, say cachefileselinuxcontext, that will
 read the current policy and spit out an appropriate value of whatever,
 but that can be separate and LSM specific without mucking up your basic
 infrastructure applications.

What would I do with such a thing?  How would it get run?  Spat out to where?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 What sort of authorization are you thinking of? I would expect
 that to have been done by cachefileselinuxcontext (or
 cachefilesspiffylsmcontext) up in userspace. If you're going to
 rely on userspace applications for policy enforcement they need
 to be good enough to count on after all.

It can't be done in userspace, otherwise someone using the cachefilesd
interface can pass an arbitrary context up.  The security context has to be
passed across the file descriptor attached to /dev/cachefiles along with the
other configuration parameters as a text string.  This fd selects the
particular cache context that a particular instance of a running daemon is
using.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 That sounds workable, although I think he will want a more specific hook
 than security_secctx_to_secid(), or possibly a second hook call, that
 would not only validate the context but authorize the use of it by the
 cachefilesd process.  And then the security_task_kernel_act_as() hook
 just takes the secid as input rather than the task struct of the daemon,
 and applies it.  At that point, nfsd can use the same mechanism for
 setting the acting SID based on the client process after doing its own
 authorization.

I thought using secids was verboten as it made things too specific.

Have you example code for the security hook you mention?  I'm not sure I
understand why security_secctx_to_secid() is not sufficient.

Or is it that I need something that takes a secctx, converts it to a secid and
authorises its use all in one go?  If it's this, why can't that be rolles into
security_task_kernel_act_as()?  That sets up a task_security struct which is
then switched in and out without consultation of the LSM.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 Put the result into /etc/cachefiles.conf.

Ewww.  Runtime mangling of the configuration.  I suppose it doesn't have to be
in that file with the rest of the config.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 Yes, but we're talking about writing the configuration information
 to the kernel, not actually making any access checks with it. I
 think. What I think we're talking about (and please correct me David
 if I've stepped into the wrong theatre) is getting the magic
 secctx that cachefiles will use instead of the secctx that the task
 would have otherwise. I don't think we're talking about recomputing
 it on every access, I think David is looking for the blunderbuss
 secctx that he can use any time he needs one.

Indeed.

The way I do it is:

 (1) The daemon opens /dev/cachefiles to being an instance of a cache.

 (2) The daemon negotiates a security context for the module to use.

 (3) The security context is place in a task_security structure.

 (4) This task_security struct is attached temporarily to task-act_as each
 time any task attempts to access the cache through the module.

 (5) The task_security struct is discarded when the file descriptor that was
 created in (1) is closed and the cache is withdrawn at the same time.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

  This fd selects the
  particular cache context that a particular instance of a running daemon is
  using.
 
 Yes, but forgive me being slow, I don't see the problem.

I mean that it's not particularly sensible to have an auxiliary interface (say
a separate /sys/cachefiles file) for setting the cache context as there may be
several caches instantiated simply by opening /dev/cachefiles several times
and retaining all the file descriptors involved.

I meant this as information only.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 More likely, run it at build time in your .spec file to generate
 cachefiles.conf,

I don't think sticking it in cachefiles.conf is a good idea necessarily.
That has to be an administrator modifiable file.  Is there a program I could
make cachefiles run directly and capture the output of that could give me the
info I want?

 then run it again maybe upon a policy update or if the user selects a
 different policy.

How do I do that?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-12 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

  Have you example code for the security hook you mention?  I'm not sure I
  understand why security_secctx_to_secid() is not sufficient.
 
 security_secctx_to_secid() would just validate and map a context string to a
 secid.

Validate as in check it's a valid string?  Okay, I see that.

 It wouldn't perform any permission check, as the caller might a
 kernel-internal user that is just mapping back and forth like current users
 of security_secid_to_secctx, or it might be something that ultimately
 originated from userspace but the hook has no way of knowing why or what set
 of checks would be appropriate.  You'd need a more specific hook for the
 authorization, one that would perform a permission check, e.g. an
 avc_has_perm() call.  Which likely requires defining a new class and
 permissions for your cachefiles kernel interface.

Hmmm...  This is sounding very not-simple.  I don't know how to do this.  I
can probably guess the kernel side by looking at how SECCLASS_KEY is done, but
it sounds like it involves changes to the userspace policy processing tools
too.

It also sounds a bit like overkill, but if it's the right way then I guess it
has to be done.

What does the security class represent in this case?  And can it be generalised
to apply to non-cachefiles stuff too?

  Or is it that I need something that takes a secctx, converts it to a secid
  and authorises its use all in one go?  If it's this, why can't that be
  rolles into security_task_kernel_act_as()?  That sets up a task_security
  struct which is then switched in and out without consultation of the LSM.
 
 I was under the impression that security_task_kernel_act_as() was being used
 to switch the current task to an acting context, not to initially set up a
 struct for later use.

Definitely the latter.  I guess I wasn't very clear in the patch description.
It also sounds like I need to adjust the naming of certain functions.

 If you go with the latter approach, then what is the lifecycle on that
 struct?

 (1) You create a new task_security struct

 (2) You fill in the fsuid, fsgid, etc.

 (3) You request that the LSM security pointer in it be set to point to the
 context you want (at the moment this is done by attempting a transition
 from the daemon's context):

ret = security_transition_sid(dtsec-sid, SECINITSID_KERNEL,
  SECCLASS_PROCESS, ksid);

 and, in the current code, it returns an error if you're not allowed to do
 that.  But instead you'd ask it to set a specific context, and it'd set
 that if you're permitted to do that, and give you an error if you're not.

 (4) You then use the task_security at will to override the task-act_as
 pointer in whatever task(s) you're operating on behalf of at the moment.

 (5) When you cease operating on the behalf of a task, you revert its act_as
 pointer and drop a reference to your task_security struct.

 (6) When the last ref to the task_security struct goes away, the LSM data
 attached to it is released, as are its groups and keyrings.

 BTW, it gets a little confusing with your use of task_security for the full
 task security state vs our existing use of task_security_struct within
 SELinux for the task's LSM security blob.

I know.  I thought about it quite a bit, but the problem is there's so much
overloading of various words (eg: security, context), and I wanted to avoid
struct cred for various other reasons.

 I suppose ours could be renamed to task_selinux.

Better still, perhaps, would be to prefix things with selinux_ to make it
namespace clean.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-13 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 It is just a way of carving up the permission space, typically based on
 object type, but it can essentially be arbitrary.  The check in this
 case seems specific to cachefiles since it is controlling an operation
 on the /dev/cachefiles interface that only applies to cachefiles
 internal operations, so making a cachefiles class seems reasonable.

Can you specify what sort of permissions you're thinking of providing for
tasks to operate on this class?  Can an object of this class 'operate' on
other objects, or can only process-class objects do that?

How does an object of this class acquire a label?  What is an object of this
class?  Is it a cache?  Or were you thinking of a module?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-13 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 
 Yes, we could easily make a simple program that just invokes a
 libselinux function that in turn grabs the proper context from some
 context configuration file under /etc/selinux/$SELINUXTYPE/contexts/ and
 outputs it.  Dan can help with that.

That sounds nicely genericisable, perhaps even for any LSM.

/usr/bin/lsm-get-context cachefiles

It does have to be able to come up with different contexts for different
caches, but that can be controlled by changing the name supplied to it.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-13 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 They would correspond with the operations provided by the /dev/cachefiles
 interface, at the granularity you want to support distinctions to be made.

Can this be made simpler by the fact that /dev/cachefiles has its own unique
label (cachefiles_dev_t).

 Could just be a single 'setcontext' permission if that is all you want to
 control distinctly, or could be a permission per operation.

There is only one operation that makes sense to have a permission: set
context and begin caching.

All the other operations on a file descriptor attached to /dev/cachfiles are
necessary for there to be a managed cache at all, and given that you've
managed to open /dev/cachefiles that's sufficient access for those, I think.

 If the latter, you don't really need a label for the object, and can
 just use the supplied context/secid as the object of the permission
 check, ala:
   rc = avc_has_perm(tsec-sid, secid, SECCLASS_CACHEFILES,
 CACHEFILES__SETCONTEXT);

Ummm.   I was under the impression that the target SID had to be a member of
target class.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2007-12-13 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 Do any of the interfaces allow a task to act on a cache other than one
 it has created?

No.

 How does the task identify the desired cache?

Each file descriptor opened creates one separate cache instance.  Any commands
sent over that filedescriptor affect only the cache instance it is attached
to; similarly, any status data you read only refers to that one cache
instance.

Closing the file descriptor makes the cache go away as far as the kernel is
concerned.  The cachefiles daemon retains its cache dev file descriptor for
the lifetime of the daemon.

 What if there is a conflict between multiple tasks asking for the same
 cache?

As far as the cache daemon is concerned, the file descriptor is its handle to
the cache so the conflict does not arise.

 secid is being applied as the acting context for the cachefiles kernel
 module, so the above makes sense, even though there isn't really any
 object in view here.  Abstractly, the question we are asking above is:

 Can this task set the context of the cachefiles kernel module to this
 value?

So the following (taken from cachefilesd.te):

allow cachefilesd_t cachefiles_var_t : file { getattr rename unlink };

says, for example, allow:

avc_has_perm(cachefilesd_t,
 cachefiles_var_t,
 SECCLASS_FILE,
 FILE__RENAME,
 ...);

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2007-12-17 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

 I'd much prefer if you would handle this in the filesystem, and have it
 set PG_private whenever fscache needs to receive a callback, and DTRT
 depending on whether PG_fscache etc. is set or not.

That's tricky and slower[*].  One of the things I want to do is to modify
iso9660 to do be able to do caching, but PG_private is 'owned' by the generic
buffer cache code.

[*] though perhaps not significantly.

 Also, this wait_on_page_fscache_write / end_page_fscache_write stuff
 seems like it would belong in your fscache headers rather than generic
 mm code (ditto for your PG_fscache checks in the page allocator -- you
 should use their PG_owner_priv_? names for that).

I suppose that's reasonable, though I do want to mention the PG_fscache* bits
in linux/page-flags.h so that anyone looking at those bits to select one to
use can easily see a reason they might not want to.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/28] FS-Cache: Release page-private after failed readahead [try #2]

2007-12-17 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

 This is pretty nasty.

Why?  If the fs doesn't set PG_private or PG_fscache on any pages before
calling read_cache_pages(), there's no difference.

Furthermore, the differences only crop up in the error handling paths.

 I would suggest either to have the function return the number of pages that
 were added to pagecache,

Which helps how?

 or just open code it.

Well, I could give an alternative read_cache_pages(), I suppose, for just this
situation, but that means there are two parallel functions which then both
need to be maintained.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 24/28] AFS: Add a function to excise a rejected write from the pagecache [try #2]

2007-12-17 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

 This reintroduces the fault vs truncate race window, which must be fixed.

Hmmm...  perhaps.  I remember that cropped up in NFS, but I'm doing things a
bit differently to NFS.  Remind me again how that worked please.

 Also, it is adding a fair bit of complexity in an area where we should
 instead be reducing it. I think your filesystem should not be doing
 writeback caching of dirty data in the cases where it is so problematic
 (or at least, disallow mmap and read on the dirty data until it has been
 written back or failed).

Eh?  It's a stateless network filesystem.  There's a gap between writing to a
file (perhaps though an mmap) and the pagecache pages being written back in
which someone may change the security on a file and block the writeback.
There's nothing I can do to prevent it, so I have to instead deal with the
consequences should they arise.  See the description of patch 25 for examples.

So you say I shouldn't do any writeback caching at all?

 But otherwise I guess if you really want to discard the dirty data after
 a failed writeback attempt, what's wrong with just invalidate_inode_pages2?

Erm...  Because it deadlocks?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2007-12-20 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

   I'd much prefer if you would handle this in the filesystem, and have it
   set PG_private whenever fscache needs to receive a callback, and DTRT
   depending on whether PG_fscache etc. is set or not.
 
  That's tricky and slower[*].  One of the things I want to do is to modify
  iso9660 to do be able to do caching, but PG_private is 'owned' by the
  generic buffer cache code.
 
 Maybe it is harder, but it is the right way to do it.

You're wrong.  It would mean that PG_private is the logical disjunction of
PG_fscache and some condition not otherwise explicitly stored.  I tried that
with NFS and it was nasty.

As you can no doubt see, it means that you can't distinguish all the states
you used to be able to.

 So you should modify the filesystems rather than core code.

I think you missed what I said:

but PG_private is 'owned' by the generic buffer cache code.

That means more of the core code would have to change - or, at least, change
more.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch, rfc] mm.h, security.h, key.h and preventing namespace poisoning

2008-01-02 Thread David Howells

James Morris [EMAIL PROTECTED] wrote:

 I suspect it may be useful ensure all global identifiers for the key 
 subsystem are prefixed with key_, as 'copy_keys' does seem a little 
 generic.

Many of the fork helpers are called copy_xxx().

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2008-01-02 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

 Then make a PG_private2 bit and use that.

To what end?  Are you suggesting I should have:

PG_private2 = PG_private | PG_fscache

That's redundant information and doesn't help anything really.

My suggestion (PG_private and PG_fscache separate and independent) is pretty
efficient to actually render into machine instructions, especially if the two
bits are placed in the lower part of the word.  On x86, the test for both bits
can be done with a single TEST instruction, and on most RISC archs, a MOVE and
a single AND will usually suffice.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2008-01-07 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

 No. I mean call the bit PG_private2. That way non-pagecache and
 filesystems that don't use fscache can use it.

The bit is called PG_owner_priv_2, and then 'subclassed' to PG_fscache, much
like PG_owner_priv_1 is 'subclassed' to PG_checked as was recommended.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2008-01-08 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

  Nick Piggin [EMAIL PROTECTED] wrote:
   No. I mean call the bit PG_private2. That way non-pagecache and
   filesystems that don't use fscache can use it.
 
  The bit is called PG_owner_priv_2, and then 'subclassed' to PG_fscache,
  much like PG_owner_priv_1 is 'subclassed' to PG_checked as was recommended.
 
 It is not owner_priv if you're putting checks and tests into core
 kernel pagecache code for it. owner_priv means a filesystem has it
 _all_ to itself.

Okay, I'll change it if it makes you happy.  Bear in mind, though, you're
dictating instructions that conflict with those other people have dictated.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2008-01-09 Thread David Howells

Nick Piggin [EMAIL PROTECTED] wrote:

 It is to make everybody happy. Especially in code that everyone works
 on like mm/ and fs/, you can't just have everybody following their own
 slightly different conventions.

Conventions are what people agree they are.

Anyway, I've attached a revised page flags patch if you can take a quick look
over that.

I'll drop the AFS-caching and AFS-write-fix patches for the moment and
concentrate on trying to get FS-Cache working with NFS.

So if we can agree on the two(?) things you brought up with the remaining
patches:

 (1) Make PG_fscache overload PG_private_2 rather than PG_owner_priv_2.  Would
 that make you happy with patch 10 of 28?

 (2) Then there's patch 9 of 28 - making read_cache_pages() release private
 data that's already attached to pages it then discards due to error.  Are
 you still going to require that I duplicate read_cache_pages()?  Or can
 you accept that sharing is sufficient, especially if PG_private_2 now
 exists?

David
---
FS-Cache: Recruit a couple of page flags for cache management

From: David Howells [EMAIL PROTECTED]

Recruit a couple of page flags to aid in cache management.  The following extra
flags are defined:

 (1) PG_fscache (PG_private_2)

 The marked page is backed by a local cache and is pinning resources in the
 cache driver.

 (2) PG_fscache_write (PG_owner_priv_2)

 The marked page is being written to the local cache.  The page may not be
 modified whilst this is in progress.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/splice.c|2 +-
 include/linux/page-flags.h |   40 ++--
 include/linux/pagemap.h|   11 +++
 mm/filemap.c   |   16 
 mm/migrate.c   |2 +-
 mm/page_alloc.c|3 +++
 mm/readahead.c |9 +
 mm/swap.c  |4 ++--
 mm/swap_state.c|4 ++--
 mm/truncate.c  |   10 +-
 mm/vmscan.c|2 +-
 11 files changed, 85 insertions(+), 18 deletions(-)


diff --git a/fs/splice.c b/fs/splice.c
index 6bdcb61..61edad7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info 
*pipe,
 */
wait_on_page_writeback(page);
 
-   if (PagePrivate(page))
+   if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
 
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..364f8f9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -77,25 +77,32 @@
 #define PG_active   6
 #define PG_slab 7  /* slab debug (Suparna wants 
this) */
 
-#define PG_owner_priv_1 8  /* Owner use. If pagecache, fs 
may use*/
+#define PG_owner_priv_1 8  /* Owner use. fs may use in 
pagecache */
 #define PG_arch_1   9
 #define PG_reserved10
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
+#define PG_private_2   13  /* If pagecache, has fs aux data */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk16  /* Has blocks allocated on-disk 
*/
 #define PG_reclaim 17  /* To be reclaimed asap */
+#define PG_owner_priv_218  /* Owner use. fs may use in 
pagecache */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
-/* PG_owner_priv_1 users should have descriptive aliases */
+/* PG_owner_priv_1/2 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
+#define PG_fscache_write   PG_owner_priv_2 /* Writing to local cache */
+
+/* PG_private_2 causes releasepage() and co to be invoked */
+#define PG_fscache PG_private_2/* Backed by local cache */
+
 
 #if (BITS_PER_LONG  32)
 /*
@@ -199,6 +206,24 @@ static inline void SetPageUptodate(struct page *page)
 #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback,  \
(page)-flags

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-09 Thread David Howells


Okay.  I can:

 (1) Have cachefilesd (the daemon) pass a security context string to the
 cachefiles kernel module, which can then convert it to a secID.  It'll
 require a security_secctx_to_secid() function, but I'm fairly certain I
 have a patch to add such kicking around somewhere.

 (2) Make security_task_kernel_act_as() take a task_security struct and a
 secID and just assign the latter to the former.  I'm not sure it makes
 sense to do any checks here, other than checking that under SELinux the
 secID is of SECCLASS_PROCESS class.

However, I need to write a check that the cachefilesd daemon is permitted to
nominate the secID it did.  Can someone tell me how to do this?  The obvious
way to do this is to add another PROCESS__xxx security permit specifically for
cachefiles, but that seems like a waste of a bit when there are only two spare
bits.

avc_has_perm(daemon_tsec-sid, nominated_sid,
 SECCLASS_PROCESS, PROCESS__CACHEFILES_USE, NULL);

Now, I recall the addition of another security class being mentioned, which
presumably would give something like:

avc_has_perm(daemon_tsec-sid, nominated_sid,
 SECCLASS_CACHE, CACHE__USE_AS_OVERRIDE, NULL);

And I assume this doesn't care if one, the other or both of the two SIDs
mentioned are of SECCLASS_PROCESS rather than of SECCLASS_CACHE.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-09 Thread David Howells

David Howells [EMAIL PROTECTED] wrote:

 Now, I recall the addition of another security class being mentioned, which
 presumably would give something like:
 
   avc_has_perm(daemon_tsec-sid, nominated_sid,
SECCLASS_CACHE, CACHE__USE_AS_OVERRIDE, NULL);

H...  I can't see how to add a new security class.  I can see that
security classes are defined in various autogenerated header files, but
autogenerated from what?  The This file is automatically generated.  Do not
edit. message at the top of these files seems to belie the fact they're
actually checked in to GIT as is.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-10 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 If you have a SELinux:  policy loaded with handle_unknown=allow
 message in your /var/log/messages, then new classes/perms that are not
 yet known to the policy will be allowed by default, so the operation
 will be permitted by the kernel.

I don't.  How do I set it?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-14 Thread David Howells


Stephen Smalley [EMAIL PROTECTED] wrote:

  avc_has_perm(daemon_tsec-sid, nominated_sid,
   SECCLASS_CACHE, CACHE__USE_AS_OVERRIDE, NULL);
  
  And I assume this doesn't care if one, the other or both of the two SIDs
  mentioned are of SECCLASS_PROCESS rather than of SECCLASS_CACHE.
 
 Right, the latter is reasonable.

Okay...  It looks like I want four security operations/hooks for cachefiles:

 (1) Check that a daemon can nominate a secid for use by the kernel to override
 the process subjective secid.

 (2) Set the secid mentioned in (1).

 (3) Check that the kernel may create files as a particular secid (this could
 be specified indirectly by specifying an inode, which would hide the secid
 inside the LSM).

 (4) Set the fscreate secid mentioned in (3).

Now, it's possible to condense (1) and (2) into a single op, and condense (3)
and (4) into a single op.  That, however, might make the ops unusable by nfsd,
which may well want to bypass the checks or do them elsewhere.

Any thoughts?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-14 Thread David Howells

David Howells [EMAIL PROTECTED] wrote:

 Okay...  It looks like I want four security operations/hooks for cachefiles:

FYI, I added the following vectors:

# kernel services that need to override task security
class kernel_service
{
use_as_override
create_files_as
}

The first allows:

avc_has_perm(daemon_tsec-sid, nominated_sid,
 SECCLASS_KERNEL_SERVICE,
 KERNEL_SERVICE__USE_AS_OVERRIDE,
 NULL);

And the second something like:

avc_has_perm(tsec-sid, inode-sid,
 SECCLASS_KERNEL_SERVICE,
 KERNEL_SERVICE__CREATE_FILES_AS,
 NULL);

Rather than specifically dedicating them to the cache, I made them general.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-14 Thread David Howells

Casey Schaufler [EMAIL PROTECTED] wrote:

 Yes, and I would recommend doing so to avoid permission races.
 You're going to have to deal with the case where step (2) fails
 even if you have step (1), so the test and set mindset seems
 prudent to me.

Looking at SELinux, that doesn't get rid of the permission race because there's
no locking.  This may be different for other models.

I was thinking of having steps (2) and (4) not do any checking, but rather
assume that the caller has done the checks before calling the set routines,
possibly by calling the hooks mentioned in (1) and (3).

My main problem is that I don't know how NFSd wants to do things.  I suppose
there *ought* to be rules that say what NFSd is allowed to do.

  and condense (3) and (4) into a single op.  That, however, might make
  the ops unusable by nfsd,
  which may well want to bypass the checks or do them elsewhere.
 
 Again, I don't think you're doing yourself any favors with a separate
 test operation.

It doesn't matter to cachefiles, but it might matter to other users:-/

 On (4) are you suggesting a third attribute value? There's the secid
 of the task originally, the secid you're going to use to do the access
 checks, and the secid you're going to set the file to on creation.

That's correct.  Let me summarise:

 (1) The daemon has an active process security ID (say A).  When the daemon
 nominates an override process security ID (say B) to be used by the
 kernel, the cachefiles module asks the LSM to check that A is allowed to
 nominate B for this purpose.

 (2) The cachefiles module is given a path under which its cache exists.  The
 directory at the base of this path has its own security ID (say C).
 cachefiles wants to create new files in the cache with the same security
 ID as that directory (ie. C).

 However, when cachefiles is creating files in the cache, the security of
 whatever process is doing the access will be overridden with B, so
 cachefiles asks the LSM to check that B is allowed create files as C.

 Note that this is an instantaneous check in the cache startup stage.  This
 allows caching to be aborted early if the security policy does not permit
 B to create Cs.  Technically this check is superfluous as it's re-checked
 each time a vfs_mkdir() or vfs_create() are called.

  Any thoughts?
 
 Let me see if I understand your current scheme.
 
 You want a (object) secid that is used to access the task.

That depends on what you mean.  cachefilesd (the daemon) will be run with a
security label because there's a security model in place.

I don't actually need to access the daemon, but the daemon does need to do
things for which it requires permission grants.

 You want a (subject) secid that the task uses to accesses objects.

Correct.  This is used as an override by any task that accesses the cache
indirectly through the cachefiles module.

The cachefilesd daemon has its own secid with which it accesses the cache
directly.  The sets of permissions that must be granted by the module's
override subjective secid and by the daemon's subjective secid aren't
identical.

 You want a (newobject) secid that an object gets on creation.

File and directory objects, yes.  The cache is stored on disk as a collection
of files and directories, each of which needs labelling.

 And you want them all to be distinct and settable.

Well, they don't technically have to be different.  The daemon and the module
can be given the same secid, for instance.  However, that secid then grants the
daemon permission to anything the module can, and vice versa.

The third secid is a file label rather than a process label, and so may or may
not have to be different anyway, depending on the model.

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-15 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

   (3) Check that the kernel may create files as a particular secid (this
   could be specified indirectly by specifying an inode, which would
   hide the secid inside the LSM).
 
 I don't think this check is on the kernel per se but rather the ability
 of the daemon to nominate a secid for use on files created later by the
 kernel module.

Hmmm...  At the moment the cachefiles module works out for itself what the
file label should be by looking at the root directory it was given and
assuming the label on that is what it's going to be using.  Are you suggesting
this should be specified directly instead by the daemon?

David
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] SECURITY: Allow kernel services to override LSM settings for task actions [try #2]

2008-01-15 Thread David Howells

Stephen Smalley [EMAIL PROTECTED] wrote:

 The cache files are created by the cachefiles kernel module, not by the
 userspace daemon, and the userspace daemon doesn't need to directly
 read/write them at all

That is correct.

 (but I think it does need to be able to unlink them?).

Indeed.

 The userspace daemon merely identifies the directory where the cache should
 live as part of configuring the cache when enabling it.

That is the way it currently works, yes.
 
 Hence, it is fine to use a fixed label for the cache files (systemhigh
 in a MLS world), and to let the directory's label serve as the basis for
 it.

That is what I currently do.  SELinux rules are provided to grant the
appropriate file accesses to the override label used by the kernel module, so
that it can't go and stamp on files with the wrong label.

 Only the cachefiles kernel module directly reads and writes the files.

Correct.
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/26] KEYS: Check starting keyring as part of search

2008-01-15 Thread David Howells

Check the starting keyring as part of the search to (a) see if that is what
we're searching for, and (b) to check it is still valid for searching.

The scenario:  User in process A does things that cause things to be
created in its process session keyring.  The user then does an su to
another user and starts a new process, B.  The two processes now
share the same process session keyring.

Process B does an NFS access which results in an upcall to gssd.
When gssd attempts to instantiate the context key (to be linked
into the process session keyring), it is denied access even though it
has an authorization key.

The order of calls is:

   keyctl_instantiate_key()
  lookup_user_key() (the default: case)
 search_process_keyrings(current)
search_process_keyrings(rka-context)   (recursive call)
   keyring_search_aux()

keyring_search_aux() verifies the keys and keyrings underneath the
top-level keyring it is given, but that top-level keyring is neither
fully validated nor checked to see if it is the thing being searched for.

This patch changes keyring_search_aux() to:
1) do more validation on the top keyring it is given and
2) check whether that top-level keyring is the thing being searched for


Signed-off-by: Kevin Coffman [EMAIL PROTECTED]
Signed-off-by: David Howells [EMAIL PROTECTED]
---

 security/keys/keyring.c |   35 +++
 1 files changed, 31 insertions(+), 4 deletions(-)


diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 88292e3..76b89b2 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
 
struct keyring_list *keylist;
struct timespec now;
-   unsigned long possessed;
+   unsigned long possessed, kflags;
struct key *keyring, *key;
key_ref_t key_ref;
long err;
@@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref,
now = current_kernel_time();
err = -EAGAIN;
sp = 0;
+   
+   /* firstly we should check to see if this top-level keyring is what we
+* are looking for */
+   key_ref = ERR_PTR(-EAGAIN);
+   kflags = keyring-flags;
+   if (keyring-type == type  match(keyring, description)) {
+   key = keyring;
+
+   /* check it isn't negative and hasn't expired or been
+* revoked */
+   if (kflags  (1  KEY_FLAG_REVOKED))
+   goto error_2;
+   if (key-expiry  now.tv_sec = key-expiry)
+   goto error_2;
+   key_ref = ERR_PTR(-ENOKEY);
+   if (kflags  (1  KEY_FLAG_NEGATIVE))
+   goto error_2;
+   goto found;
+   }
+
+   /* otherwise, the top keyring must not be revoked, expired, or
+* negatively instantiated if we are to search it */
+   key_ref = ERR_PTR(-EAGAIN);
+   if (kflags  ((1  KEY_FLAG_REVOKED) | (1  KEY_FLAG_NEGATIVE)) ||
+   (keyring-expiry  now.tv_sec = keyring-expiry))
+   goto error_2;
 
/* start processing a new keyring */
 descend:
@@ -331,13 +357,14 @@ descend:
/* iterate through the keys in this keyring first */
for (kix = 0; kix  keylist-nkeys; kix++) {
key = keylist-keys[kix];
+   kflags = key-flags;
 
/* ignore keys not of this type */
if (key-type != type)
continue;
 
/* skip revoked keys and expired keys */
-   if (test_bit(KEY_FLAG_REVOKED, key-flags))
+   if (kflags  (1  KEY_FLAG_REVOKED))
continue;
 
if (key-expiry  now.tv_sec = key-expiry)
@@ -352,8 +379,8 @@ descend:
context, KEY_SEARCH)  0)
continue;
 
-   /* we set a different error code if we find a negative key */
-   if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) {
+   /* we set a different error code if we pass a negative key */
+   if (kflags  (1  KEY_FLAG_NEGATIVE)) {
err = -ENOKEY;
continue;
}

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 263 matches

Mail list logo