Re: [PATCH 5/5 linux-next V2 (resend)] cifs: Combine set_file_size by handle and path into one function
On Sun, 8 Dec 2013 14:13:37 -0700 Tim Gardner t...@tpi.com wrote: As suggested by Jeff Layton: Now that I look though, it's clear to me that cifs_set_file_size is just wrong. Currently it calls ops-set_path_size and if that fails with certain error codes it tries to do a SMBLegacyOpen, etc. That's going to fall on its face if this is a SMB2/3 connection. Here's what should be done instead, I think: - get rid of both set_path_size and set_file_size operations. The version specific abstraction was implemented at the wrong level, IMO. - implement a new protocol level operation that basically takes the same args as cifs_set_file_size. cifs_setattr_unix/_nounix should call this operation. - the set_path_size operation for SMB1 should basically be the function that is cifs_set_file_size today (though that should probably be renamed). That function should be restructured to do the following: + look for an open filehandle + if there isn't one, open the file + call CIFSSMBSetFileSize + fall back to zero-length write if that fails + close the file if we opened it This patch also moves all of the SMBv1 legacy hacks from cifs_set_file_size() into the SMBv1 specific handler smb_set_file_size() more or less intact. SMBv2 and higher are more regular and orthogonal, so the v2 handler smb2_set_file_size() no longer contains the legacy hacks. Cc: Steve French sfre...@samba.org Cc: Jeff Layton jlay...@redhat.com Cc: Dean Gehnert de...@tpi.com Signed-off-by: Tim Gardner t...@tpi.com --- V2 - this is a new patch in the V2 series. I know this is kind of a giant patch, but there really doesn't seem to be any intermediate steps. Its pretty much all or nothing. I've only tested against a Linux CIFS server running a 3.2 kernel since I no longer have easy access to smbv2 servers (iOS 10.8 or Windows 7). resent with corrected Cc's rtg fs/cifs/cifsglob.h |9 ++--- fs/cifs/inode.c| 108 +-- fs/cifs/smb1ops.c | 109 +++- fs/cifs/smb2ops.c | 57 --- 4 files changed, 170 insertions(+), 113 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 8fd8eb2..6113ce8 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -269,12 +269,9 @@ struct smb_version_operations { int (*get_srv_inum)(const unsigned int, struct cifs_tcon *, struct cifs_sb_info *, const char *, u64 *uniqueid, FILE_ALL_INFO *); - /* set size by path */ - int (*set_file_size_by_path)(const unsigned int, struct cifs_tcon *, - const char *, __u64, struct cifs_sb_info *, bool); - /* set size by file handle */ - int (*set_file_size_by_handle)(const unsigned int, struct cifs_tcon *, - struct cifsFileInfo *, __u64, bool); + /* set file size */ + int (*set_file_size)(struct inode *inode, struct iattr *attrs, + unsigned int xid, char *full_path); /* set attributes */ int (*set_file_info)(struct inode *, const char *, FILE_BASIC_INFO *, const unsigned int); diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index e332038..3edeafd 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1899,118 +1899,28 @@ static void cifs_setsize(struct inode *inode, loff_t offset) truncate_pagecache(inode, offset); } -/* - * Legacy hack to set file length. - */ -static int -cifs_legacy_set_file_size(unsigned int xid, __u16 netfid, __u32 pid, - unsigned int length, struct cifs_tcon *tcon, - char *full_path) -{ - int rc; - unsigned int bytes_written; - struct cifs_io_parms io_parms; - - io_parms.netfid = netfid; - io_parms.pid = pid; - io_parms.tcon = tcon; - io_parms.offset = 0; - io_parms.length = length; - rc = CIFSSMBWrite(xid, io_parms, bytes_written, - NULL, NULL, 1); - cifs_dbg(FYI, %s len %d rc %d on %s\n, __func__, length, rc, - full_path); - return rc; -} - static int cifs_set_file_size(struct inode *inode, struct iattr *attrs, unsigned int xid, char *full_path) { - int rc; - struct cifsFileInfo *open_file; + int rc = -ENOSYS; struct cifsInodeInfo *cifsInode = CIFS_I(inode); struct cifs_sb_info *cifs_sb = CIFS_SB(inode-i_sb); struct tcon_link *tlink = NULL; struct cifs_tcon *tcon = NULL; struct TCP_Server_Info *server; - /* - * To avoid spurious oplock breaks from server, in the case of - * inodes that we already have open, avoid doing path based - * setting of file size if we can do it by handle. - * This keeps our caching token (oplock
Re: [Nfs-ganesha-devel] [RFC PATCH v2 3/5] locks: add new private lock type that is owned by the filp
On Wed, 20 Nov 2013 14:50:32 -0500 J. Bruce Fields bfie...@fieldses.org wrote: On Wed, Nov 20, 2013 at 11:45:04AM -0500, Jeff Layton wrote: Due to some unfortunate history, POSIX locks have very strange and unhelpful semantics. The thing that usually catches people by surprise is that they are dropped whenever the process closes any file descriptor associated with the inode. This is extremely problematic for people developing file servers that need to implement byte-range locks. Developers often need a lock management facility to ensure that file descriptors are not closed until all of the locks associated with the inode are finished. This patchset adds a new type of lock that attempts to address this issue. These locks work just like normal POSIX read/write locks, but have semantics that are more like BSD locks with respect to inheritance and behavior on close. This is implemented primarily by changing how fl_owner field is set for these locks. Instead of having them owned by the files_struct of the process, they are instead owned by the filp on which they were acquired. Thus, they are inherited across fork() and are only released when the last reference to a filp is put. These new semantics prevent them from being merged with classic POSIX locks, even if they are acquired by the same process. These locks will also conflict with classic POSIX locks even if they are acquired by the same process or on the same file descriptor. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 22 +- include/uapi/asm-generic/fcntl.h | 15 +++ 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/fs/locks.c b/fs/locks.c index 86cafc3..3b278a6 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -348,6 +348,26 @@ static int posix_assign_type(struct file_lock *fl, long type) { int err; + /* +* FL_FILP_PRIVATE locks are owned by the filp upon which they were +* acquired, regardless of what task is dealing with them. Set the +* fl_owner appropriately. +*/ + switch (type) { + case F_RDLCKP: + type = F_RDLCK; + fl-fl_owner = (fl_owner_t)fl-fl_file; + break; + case F_WRLCKP: + type = F_WRLCK; + fl-fl_owner = (fl_owner_t)fl-fl_file; + break; + case F_UNLCKP: + type = F_UNLCK; + fl-fl_owner = (fl_owner_t)fl-fl_file; + break; + } + After this fl_owner gets set to current-files in flock{64}_to_posix_lock and then reset here. That seems like a trap for the unwary reader. Could you do something like rename this flock_to_posix_lock_common and move all the 32/64-bit-independent initialization here? Looks like there's way more duplication than necessary between those two cases. Ok, finally got a chance to start looking at this again... I agree that at first glance, it looks like there is a lot of duplication between flock_to_posix_lock and flock64_to_posix_lock, but the problem we have is that we need to initialize the file_locks with info from two different types of structs that were passed in from userland (flock and flock64). I don't see a clean way to consolidate the two given that... (Also, why do we have an fl_owner_t instead of using a void?) --b. err = assign_type(fl, type); if (err) return err; @@ -2225,7 +2245,7 @@ void locks_remove_filp(struct file *filp) while ((fl = *before) != NULL) { if (fl-fl_file == filp) { - if (IS_FLOCK(fl)) { + if (IS_FLOCK(fl) || IS_POSIX(fl)) { locks_delete_lock(before); continue; } diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 95e46c8..6b7b68a 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -151,6 +151,21 @@ struct f_owner_ex { #define F_UNLCK2 #endif +/* + * fd private POSIX locks. + * + * Usually POSIX locks held by a process are released on *any* close and are + * not inherited across a fork(). + * + * These lock types will conflict with normal POSIX locks, but are owned + * by the fd, not the process. This means that they are inherited across + * fork() like BSD (flock) locks, and they are only closed when the last + * reference to the the filp against which were acquired is closed. + */ +#define F_RDLCKP 5 +#define F_WRLCKP 6 +#define F_UNLCKP 7 + /* for old implementation of bsd flock () */ #ifndef F_EXLCK #define F_EXLCK4 /* or 3 */ -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to majord...@vger.kernel.org More majordomo
Re: [PATCH 4/4 linux-next V3] cifs: Combine set_file_size by handle and path into one function
On Mon, 9 Dec 2013 12:41:56 -0700 Tim Gardner t...@tpi.com wrote: As suggested by Jeff Layton: Now that I look though, it's clear to me that cifs_set_file_size is just wrong. Currently it calls ops-set_path_size and if that fails with certain error codes it tries to do a SMBLegacyOpen, etc. That's going to fall on its face if this is a SMB2/3 connection. Here's what should be done instead, I think: - get rid of both set_path_size and set_file_size operations. The version specific abstraction was implemented at the wrong level, IMO. - implement a new protocol level operation that basically takes the same args as cifs_set_file_size. cifs_setattr_unix/_nounix should call this operation. - the set_path_size operation for SMB1 should basically be the function that is cifs_set_file_size today (though that should probably be renamed). That function should be restructured to do the following: + look for an open filehandle + if there isn't one, open the file + call CIFSSMBSetFileSize + fall back to zero-length write if that fails + close the file if we opened it This patch also moves all of the SMBv1 legacy hacks from cifs_set_file_size() into the SMBv1 specific handler smb_set_file_size() more or less intact. SMBv2 and higher are more regular and orthogonal, so the v2 handler smb2_set_file_size() no longer contains the legacy hacks. Move cifsFileInfo_put(open_file) below the first call to cifs_legacy_set_file_size() so that the reference on open_file stays valid throughout. Cc: Steve French sfre...@samba.org Cc: Jeff Layton jlay...@redhat.com Cc: Dean Gehnert de...@tpi.com Signed-off-by: Tim Gardner t...@tpi.com --- V3 - dropped 'cifs: fix incorrect reference count check' which impacted this patch. Also moved cifsFileInfo_put(open_file) according to V2 review comments from Jeff Layton. V2 - this is a new patch in the V2 series. I know this is kind of a giant patch, but there really doesn't seem to be any intermediate steps. Its pretty much all or nothing. I've only tested against a Linux CIFS server running a 3.2 kernel since I no longer have easy access to smbv2 servers (iOS 10.8 or Windows 7). fs/cifs/cifsglob.h |9 ++--- fs/cifs/inode.c| 108 +-- fs/cifs/smb1ops.c | 109 +++- fs/cifs/smb2ops.c | 57 --- 4 files changed, 170 insertions(+), 113 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 8fd8eb2..6113ce8 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -269,12 +269,9 @@ struct smb_version_operations { int (*get_srv_inum)(const unsigned int, struct cifs_tcon *, struct cifs_sb_info *, const char *, u64 *uniqueid, FILE_ALL_INFO *); - /* set size by path */ - int (*set_file_size_by_path)(const unsigned int, struct cifs_tcon *, - const char *, __u64, struct cifs_sb_info *, bool); - /* set size by file handle */ - int (*set_file_size_by_handle)(const unsigned int, struct cifs_tcon *, - struct cifsFileInfo *, __u64, bool); + /* set file size */ + int (*set_file_size)(struct inode *inode, struct iattr *attrs, + unsigned int xid, char *full_path); /* set attributes */ int (*set_file_info)(struct inode *, const char *, FILE_BASIC_INFO *, const unsigned int); diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 3f710c6..3edeafd 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1899,118 +1899,28 @@ static void cifs_setsize(struct inode *inode, loff_t offset) truncate_pagecache(inode, offset); } -/* - * Legacy hack to set file length. - */ -static int -cifs_legacy_set_file_size(unsigned int xid, __u16 netfid, __u32 pid, - unsigned int length, struct cifs_tcon *tcon, - char *full_path) -{ - int rc; - unsigned int bytes_written; - struct cifs_io_parms io_parms; - - io_parms.netfid = netfid; - io_parms.pid = pid; - io_parms.tcon = tcon; - io_parms.offset = 0; - io_parms.length = length; - rc = CIFSSMBWrite(xid, io_parms, bytes_written, - NULL, NULL, 1); - cifs_dbg(FYI, %s len %d rc %d on %s\n, __func__, length, rc, - full_path); - return rc; -} - static int cifs_set_file_size(struct inode *inode, struct iattr *attrs, unsigned int xid, char *full_path) { - int rc; - struct cifsFileInfo *open_file; + int rc = -ENOSYS; struct cifsInodeInfo *cifsInode = CIFS_I(inode); struct cifs_sb_info *cifs_sb = CIFS_SB(inode-i_sb); struct tcon_link *tlink = NULL; struct cifs_tcon *tcon = NULL
Re: [PATCH 1/9] vfs: check submounts and drop atomically
to has_unlinked_ancestor(). + * Returns 0 if successfully unhashed @parent. If there were submounts then + * return -EBUSY. + * + * @dentry: dentry to prune and drop + */ +int check_submounts_and_drop(struct dentry *dentry) +{ + int ret = 0; + + /* Negative dentries can be dropped without further checks */ + if (!dentry-d_inode) { + d_drop(dentry); + goto out; + } + + spin_lock(dentry-d_lock); + if (d_unhashed(dentry)) + goto out_unlock; + if (list_empty(dentry-d_subdirs)) { + if (d_mountpoint(dentry)) { + ret = -EBUSY; + goto out_unlock; + } + __d_drop(dentry); + goto out_unlock; + } + spin_unlock(dentry-d_lock); + + for (;;) { + LIST_HEAD(dispose); + ret = __check_submounts_and_drop(dentry, dispose); + if (!list_empty(dispose)) + shrink_dentry_list(dispose); + + if (ret = 0) + break; + + cond_resched(); + } + +out: + return ret; + +out_unlock: + spin_unlock(dentry-d_lock); + goto out; +} +EXPORT_SYMBOL(check_submounts_and_drop); + /** * __d_alloc - allocate a dcache entry * @sb: filesystem it will belong to diff --git a/include/linux/dcache.h b/include/linux/dcache.h index b90337c..41b21ca 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -251,6 +251,7 @@ extern void d_prune_aliases(struct inode *); /* test whether we have any submounts in a subdir tree */ extern int have_submounts(struct dentry *); +extern int check_submounts_and_drop(struct dentry *); /* * This adds the entry to the hash queues. Nice work... Looks reasonable overall. The only (extremely minor) nit I have is that it might be nice to replace some of the direct struct list_head accesses with helper macros. Acked-by: Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/9] vfs: check unlinked ancestors before mount
On Thu, 8 Aug 2013 17:24:43 +0200 Miklos Szeredi mik...@szeredi.hu wrote: From: Miklos Szeredi mszer...@suse.cz We check submounts before doing d_drop() on a non-empty directory dentry in NFS (have_submounts()), but we do not exclude a racing mount. Nor do we prevent mounts to be added to the disconnected subtree using relative paths after the d_drop(). This patch fixes these issues by checking for unlinked (unhashed, non-root) ancestors before proceeding with the mount. This is done after setting DCACHE_MOUNTED on the soon-to-be mountpoint and with the rename seqlock taken for write. This ensures that the only one of check_submounts_and_drop() or has_unlinked_ancestor() can succeed. Signed-off-by: Miklos Szeredi mszer...@suse.cz CC: David Howells dhowe...@redhat.com CC: Steven Whitehouse swhit...@redhat.com CC: Trond Myklebust trond.mykleb...@netapp.com CC: Greg Kroah-Hartman gre...@linuxfoundation.org --- fs/dcache.c| 35 +++ fs/internal.h | 1 + fs/namespace.c | 9 + 3 files changed, 45 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 020004d..1333445 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1103,6 +1103,41 @@ rename_retry: } EXPORT_SYMBOL(have_submounts); +static bool __has_unlinked_ancestor(struct dentry *dentry) +{ + struct dentry *this; + + for (this = dentry; !IS_ROOT(this); this = this-d_parent) { + int is_unhashed; + + /* Need exclusion wrt. check_submounts_and_drop() */ + spin_lock(this-d_lock); + is_unhashed = d_unhashed(this); + spin_unlock(this-d_lock); + + if (is_unhashed) + return true; + } + return false; +} + +/* + * Called by mount code to check if the mountpoint is reachable (e.g. NFS can + * unhash a directory dentry and then the complete subtree can become + * unreachable). + */ +bool has_unlinked_ancestor(struct dentry *dentry) +{ + bool found; + + /* Need exclusion wrt. check_submounts_and_drop() */ + write_seqlock(rename_lock); + found = __has_unlinked_ancestor(dentry); + write_sequnlock(rename_lock); + + return found; +} + /* * Search the dentry child list of the specified parent, * and move any unused dentries to the end of the unused diff --git a/fs/internal.h b/fs/internal.h index 7c5f01c..d232355 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *, bool); * dcache.c */ extern struct dentry *__d_alloc(struct super_block *, const struct qstr *); +extern bool has_unlinked_ancestor(struct dentry *dentry); /* * read_write.c diff --git a/fs/namespace.c b/fs/namespace.c index 7b1ca9b..bb92a9c 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct dentry *dentry) } dentry-d_flags |= DCACHE_MOUNTED; spin_unlock(dentry-d_lock); + + if (has_unlinked_ancestor(dentry)) { + spin_lock(dentry-d_lock); + dentry-d_flags = ~DCACHE_MOUNTED; + spin_unlock(dentry-d_lock); + kfree(mp); + return ERR_PTR(-ENOENT); + } + mp-m_dentry = dentry; mp-m_count = 1; list_add(mp-m_hash, chain); Looks reasonable. For posterity, it might be worth mentioning the potential regression that you described earlier (racing with rename from another host). That way if someone hits it in the future they might be able to zero in on this change as the culprit more easily. Acked-by: Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/9] nfs: use check_submounts_and_drop()
On Thu, 8 Aug 2013 17:24:46 +0200 Miklos Szeredi mik...@szeredi.hu wrote: From: Miklos Szeredi mszer...@suse.cz Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi mszer...@suse.cz CC: Trond Myklebust trond.mykleb...@netapp.com --- fs/nfs/dir.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index e474ca2b..7468735 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1135,14 +1135,13 @@ out_zap_parent: if (inode S_ISDIR(inode-i_mode)) { /* Purge readdir caches. */ nfs_zap_caches(inode); - /* If we have submounts, don't unhash ! */ - if (have_submounts(dentry)) - goto out_valid; if (dentry-d_flags DCACHE_DISCONNECTED) goto out_valid; - shrink_dcache_parent(dentry); } - d_drop(dentry); + /* If we have submounts, don't unhash ! */ + if (check_submounts_and_drop(dentry) != 0) + goto out_valid; + dput(parent); dfprintk(LOOKUPCACHE, NFS: %s(%s/%s) is invalid\n, __func__, dentry-d_parent-d_name.name, Acked-by: Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 17:23:24 -0700 Andy Lutomirski l...@amacapital.net wrote: Hi various people who care about user-space NFS servers and/or security-relevant APIs. I propose the following set of new syscalls: int credfd_create(unsigned int flags): returns a new credfd that corresponds to current's creds. int credfd_activate(int fd, unsigned int flags): Change current's creds to match the creds stored in fd. To be clear, this changes both the subjective and objective (aka real_cred and cred) because there aren't any real semantics for what happens when userspace code runs with real_cred != cred. Rules: - credfd_activate fails (-EINVAL) if fd is not a credfd. - credfd_activate fails (-EPERM) if the fd's userns doesn't match current's userns. credfd_activate is not intended to be a substitute for setns. - credfd_activate will fail (-EPERM) if LSM does not allow the switch. This probably needs to be a new selinux action -- dyntransition is too restrictive. Optional: - credfd_create always sets cloexec, because the alternative is silly. - credfd_activate fails (-EINVAL) if dumpable. This is because we don't want a privileged daemon to be ptraced while impersonating someone else. - optional: both credfd_create and credfd_activate fail if !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). The first question: does this solve Ganesha's problem? The second question: is this safe? I can see two major concerns. The bigger concern is that having these syscalls available will allow users to exploit things that were previously secure. For example, maybe some configuration assumes that a task running as uid==1 can't switch to uid==2, even with uid 2's consent. Similar issues happen with capabilities. If CAP_SYS_ADMIN is not required, then this is no longer really true. Alternatively, something running as uid == 0 with heavy capability restrictions in a mount namespace (but not a uid namespace) could pass a credfd out of the namespace. This could break things like Docker pretty badly. CAP_SYS_ADMIN guards against this to some extent. But I think that Docker is already totally screwed if a Docker root task can receive an O_DIRECTORY or O_PATH fd out of the container, so it's not entirely clear that the situation is any worse, even without requiring CAP_SYS_ADMIN. The second concern is that it may be difficult to use this correctly. There's a reason that real_cred and cred exist, but it's not really well set up for being used. As a simple way to stay safe, Ganesha could only use credfds that have real_uid == 0. --Andy I still don't quite grok why having this special credfd_create call buys you anything over simply doing what Al had originally suggested -- switch creds using all of the different syscalls and then simply caching that in a normal fd: fd = open(/dev/null, O_PATH...); ...it seems to me that the credfd_activate call will still need to do the same permission checking that all of the individual set*id() calls require (and all of the other stuff like changing selinux contexts, etc). IOW, this fd is just a handle for passing around a struct cred, but I don't see why having access to that handle would allow you to do something you couldn't already do anyway. Am I missing something obvious here? -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 20:05:16 -0700 Andy Lutomirski l...@amacapital.net wrote: On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton jlay...@redhat.com wrote: On Wed, 26 Mar 2014 17:23:24 -0700 Andy Lutomirski l...@amacapital.net wrote: Hi various people who care about user-space NFS servers and/or security-relevant APIs. I propose the following set of new syscalls: int credfd_create(unsigned int flags): returns a new credfd that corresponds to current's creds. int credfd_activate(int fd, unsigned int flags): Change current's creds to match the creds stored in fd. To be clear, this changes both the subjective and objective (aka real_cred and cred) because there aren't any real semantics for what happens when userspace code runs with real_cred != cred. Rules: - credfd_activate fails (-EINVAL) if fd is not a credfd. - credfd_activate fails (-EPERM) if the fd's userns doesn't match current's userns. credfd_activate is not intended to be a substitute for setns. - credfd_activate will fail (-EPERM) if LSM does not allow the switch. This probably needs to be a new selinux action -- dyntransition is too restrictive. Optional: - credfd_create always sets cloexec, because the alternative is silly. - credfd_activate fails (-EINVAL) if dumpable. This is because we don't want a privileged daemon to be ptraced while impersonating someone else. - optional: both credfd_create and credfd_activate fail if !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). The first question: does this solve Ganesha's problem? The second question: is this safe? I can see two major concerns. The bigger concern is that having these syscalls available will allow users to exploit things that were previously secure. For example, maybe some configuration assumes that a task running as uid==1 can't switch to uid==2, even with uid 2's consent. Similar issues happen with capabilities. If CAP_SYS_ADMIN is not required, then this is no longer really true. Alternatively, something running as uid == 0 with heavy capability restrictions in a mount namespace (but not a uid namespace) could pass a credfd out of the namespace. This could break things like Docker pretty badly. CAP_SYS_ADMIN guards against this to some extent. But I think that Docker is already totally screwed if a Docker root task can receive an O_DIRECTORY or O_PATH fd out of the container, so it's not entirely clear that the situation is any worse, even without requiring CAP_SYS_ADMIN. The second concern is that it may be difficult to use this correctly. There's a reason that real_cred and cred exist, but it's not really well set up for being used. As a simple way to stay safe, Ganesha could only use credfds that have real_uid == 0. --Andy I still don't quite grok why having this special credfd_create call buys you anything over simply doing what Al had originally suggested -- switch creds using all of the different syscalls and then simply caching that in a normal fd: fd = open(/dev/null, O_PATH...); ...it seems to me that the credfd_activate call will still need to do the same permission checking that all of the individual set*id() calls require (and all of the other stuff like changing selinux contexts, etc). IOW, this fd is just a handle for passing around a struct cred, but I don't see why having access to that handle would allow you to do something you couldn't already do anyway. Am I missing something obvious here? Not really. I think I didn't adequately explain a piece of this. I think that what you're suggesting is for an fd to encode a set of credentials but not to grant permission to use those credentials. So switch_creds(fd) is more or less the same thing as switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac label, ...). switch_creds needs to verify that the caller can dyntransition to the label, set all the ids, etc., but it avoids allocating anything and running RCU callbacks. The trouble with this is that the verification needed is complicated and expensive. And I think that my proposal is potentially more useful. Is it really though? My understanding of the problem was that it was the syscall (context switching) overhead + having to do a bunch of RCU critical stuff that was the problem. If we can do all of this in the context of a single RCU critical section, isn't that still a win? As to the complicated part...maybe but it doesn't seem like it would have to be. We could simply return -EINVAL or something if the old struct cred doesn't have fields that match the ones we're replacing and that we don't expect to see changed. A credfd is like a struct cred, but possession of a credfd carries the permission to use those credentials. So, for example, credfd_activate to switch to a given uid might work even if setresuid to that uid would be disallowed
Re: Thoughts on credential switching
On Thu, 27 Mar 2014 13:46:06 +0100 Florian Weimer fwei...@redhat.com wrote: On 03/27/2014 01:23 AM, Andy Lutomirski wrote: I propose the following set of new syscalls: int credfd_create(unsigned int flags): returns a new credfd that corresponds to current's creds. int credfd_activate(int fd, unsigned int flags): Change current's creds to match the creds stored in fd. To be clear, this changes both the subjective and objective (aka real_cred and cred) because there aren't any real semantics for what happens when userspace code runs with real_cred != cred. This interface does not address the long-term lack of POSIX compliance in setuid and friends, which are required to be process-global and not thread-specific (as they are on the kernel side). glibc works around this by reserving a signal and running set*id on every thread in a special signal handler. This is just crass, and it is likely impossible to restore the original process state in case of partial failure. We really need kernel support to perform the process-wide switch in an all-or-nothing manner. I disagree. We're treading new ground here with this syscall. It's not defined by POSIX so we're under no obligation to follow its silly directives in this regard. Per-process cred switching doesn't really make much sense in this day and age, IMO. Wasn't part of the spec was written before threading existed The per-process credential switching is pretty universally a pain in the ass for anyone who wants to write something like a threaded file server. Jeremy Allison had to jump through some rather major hoops to work around it for Samba [1]. I think we want to strive to make this a per-task credential switch and ignore that part of the POSIX spec. That said, I think we will need to understand and document what we expect to occur if someone does this new switch_creds(fd) call and then subsequently calls something like setuid(), if only to ensure that we don't get blindsided by it. [1]: http://sourceware.org/ml/libc-help/2012-07/msg4.html -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Thu, 27 Mar 2014 14:06:32 +0100 Florian Weimer fwei...@redhat.com wrote: On 03/27/2014 02:02 PM, Jeff Layton wrote: This interface does not address the long-term lack of POSIX compliance in setuid and friends, which are required to be process-global and not thread-specific (as they are on the kernel side). glibc works around this by reserving a signal and running set*id on every thread in a special signal handler. This is just crass, and it is likely impossible to restore the original process state in case of partial failure. We really need kernel support to perform the process-wide switch in an all-or-nothing manner. I disagree. We're treading new ground here with this syscall. It's not defined by POSIX so we're under no obligation to follow its silly directives in this regard. Per-process cred switching doesn't really make much sense in this day and age, IMO. Wasn't part of the spec was written before threading existed Okay, then we need to add a separate set of system calls. I really, really want to get rid of that signal handler mess in glibc, with its partial failures. I agree, it's a hack, but hardly anyone these days really wants to switch creds on a per-process basis. It's just that we're saddled with a spec for those calls that was written before threads really existed. The kernel syscalls already do the right thing as far as I'm concerned. What would be nice however is a blessed glibc interface to them that didn't involve all of the signal handling stuff. Then samba et. al. wouldn't need to call syscall() directly to get at them. The per-process credential switching is pretty universally a pain in the ass for anyone who wants to write something like a threaded file server. Jeremy Allison had to jump through some rather major hoops to work around it for Samba [1]. I think we want to strive to make this a per-task credential switch and ignore that part of the POSIX spec. Yeah, I get that, setfsuid/setfsgid already behaves this way. (Current directory and umask are equally problematic, but it's possible to avoid most issues.) That said, I think we will need to understand and document what we expect to occur if someone does this new switch_creds(fd) call and then subsequently calls something like setuid(), if only to ensure that we don't get blindsided by it. Currently, from the kernel perspective, this is not really a problem because the credentials are always per-task. It's just that a conforming user space needs the process-wide credentials. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Wed, 26 Mar 2014 20:25:35 -0700 Jeff Layton jlay...@redhat.com wrote: On Wed, 26 Mar 2014 20:05:16 -0700 Andy Lutomirski l...@amacapital.net wrote: On Wed, Mar 26, 2014 at 7:48 PM, Jeff Layton jlay...@redhat.com wrote: On Wed, 26 Mar 2014 17:23:24 -0700 Andy Lutomirski l...@amacapital.net wrote: Hi various people who care about user-space NFS servers and/or security-relevant APIs. I propose the following set of new syscalls: int credfd_create(unsigned int flags): returns a new credfd that corresponds to current's creds. int credfd_activate(int fd, unsigned int flags): Change current's creds to match the creds stored in fd. To be clear, this changes both the subjective and objective (aka real_cred and cred) because there aren't any real semantics for what happens when userspace code runs with real_cred != cred. Rules: - credfd_activate fails (-EINVAL) if fd is not a credfd. - credfd_activate fails (-EPERM) if the fd's userns doesn't match current's userns. credfd_activate is not intended to be a substitute for setns. - credfd_activate will fail (-EPERM) if LSM does not allow the switch. This probably needs to be a new selinux action -- dyntransition is too restrictive. Optional: - credfd_create always sets cloexec, because the alternative is silly. - credfd_activate fails (-EINVAL) if dumpable. This is because we don't want a privileged daemon to be ptraced while impersonating someone else. - optional: both credfd_create and credfd_activate fail if !ns_capable(CAP_SYS_ADMIN) or perhaps !capable(CAP_SETUID). The first question: does this solve Ganesha's problem? The second question: is this safe? I can see two major concerns. The bigger concern is that having these syscalls available will allow users to exploit things that were previously secure. For example, maybe some configuration assumes that a task running as uid==1 can't switch to uid==2, even with uid 2's consent. Similar issues happen with capabilities. If CAP_SYS_ADMIN is not required, then this is no longer really true. Alternatively, something running as uid == 0 with heavy capability restrictions in a mount namespace (but not a uid namespace) could pass a credfd out of the namespace. This could break things like Docker pretty badly. CAP_SYS_ADMIN guards against this to some extent. But I think that Docker is already totally screwed if a Docker root task can receive an O_DIRECTORY or O_PATH fd out of the container, so it's not entirely clear that the situation is any worse, even without requiring CAP_SYS_ADMIN. The second concern is that it may be difficult to use this correctly. There's a reason that real_cred and cred exist, but it's not really well set up for being used. As a simple way to stay safe, Ganesha could only use credfds that have real_uid == 0. --Andy I still don't quite grok why having this special credfd_create call buys you anything over simply doing what Al had originally suggested -- switch creds using all of the different syscalls and then simply caching that in a normal fd: fd = open(/dev/null, O_PATH...); ...it seems to me that the credfd_activate call will still need to do the same permission checking that all of the individual set*id() calls require (and all of the other stuff like changing selinux contexts, etc). IOW, this fd is just a handle for passing around a struct cred, but I don't see why having access to that handle would allow you to do something you couldn't already do anyway. Am I missing something obvious here? Not really. I think I didn't adequately explain a piece of this. I think that what you're suggesting is for an fd to encode a set of credentials but not to grant permission to use those credentials. So switch_creds(fd) is more or less the same thing as switch_creds(ruid, euid, suid, rgid, egid, sgid, groups, mac label, ...). switch_creds needs to verify that the caller can dyntransition to the label, set all the ids, etc., but it avoids allocating anything and running RCU callbacks. The trouble with this is that the verification needed is complicated and expensive. And I think that my proposal is potentially more useful. Is it really though? My understanding of the problem was that it was the syscall (context switching) overhead + having to do a bunch of RCU critical stuff that was the problem. If we can do all of this in the context of a single RCU critical section, isn't that still a win? As to the complicated part...maybe but it doesn't seem like it would have to be. We could simply return -EINVAL or something if the old struct cred doesn't have fields that match the ones we're replacing and that we don't expect to see changed. A credfd is like a struct cred
Re: [PATCH] locks: rename file-private locks to file-description locks
On Sun, 27 Apr 2014 14:51:25 +1000 NeilBrown ne...@suse.de wrote: On Tue, 22 Apr 2014 06:54:36 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 04/21/2014 11:15 PM, Stefan (metze) Metzmacher wrote: Am 21.04.2014 21:55, schrieb Jeff Layton: On Mon, 21 Apr 2014 21:39:12 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 04/21/2014 08:46 PM, Rich Felker wrote: On Mon, Apr 21, 2014 at 08:32:44PM +0200, Michael Kerrisk (man-pages) wrote: On 04/21/2014 06:10 PM, Rich Felker wrote: I'm well aware of that. The problem is that the proposed API is using the two-letter abbreviation FD, which ALWAYS means file descriptor and NEVER means file description (in existing usage) to mean file description. That's what's wrong. So, can you *please* answer this question: what do you call (i.e., what everyday technical language term do use for) the thing that sits between a file descriptor and an i-node? (Please don't say 'struct file' -- that is not is an implementation detail, and does not qualify as the kind of term that I could use when documenting this feature in man pages.) Open file description. Oh! I didn't realize we agreed :-). POSIX uses (or invented, I am not sure which) the term file description for a good reason: it is unambiguous, and therefore precise. I do agree that there's a risk of confusion between 'open file descriptor and 'and file description'--it's the same kind of risk as between English terms such as 'arbitrator' and 'arbitration' (and any number of other examples), and as language speakers we deal with this every day. There's not a problem when the full word is used. On the other hand, if you use arb as an abbreviation for arbitration in a context where it was already universally understood as meaning arbitrator, that would be a big problem. Likewise the problem here isn't that open file description is a bad term. It's that using FD to mean [open] file description is utterly confusing, even moreso than just making up a new completely random word. Ohh -- I had thought you a problem not just with FD but also (open) file description. 2) The new API constants (F_SETLKP, F_SETLKPW, F_GETLKP) have names that are visually very close to the traditional POSIX lock names (F_SETLK, F_SETLKW, F_GETLK). That's an accident waiting to happen when someone mistypes in code and/or misses such a misttyping when reading code. That really must be fixed. I agree, but I don't think making it worse is a solution. I don't agree that it's making it worse. The real problem here is that people use no good unambiguous term for the thing between a file descriptor and an inode. POSIX provides us with a solution that may not seem perfect, but it is unambiguous, and I think it might feel more comfortable if we used it often enough. I would like to see it used more too, and in particular, I think it belongs in the documentation for these new locking interfaces. But that still doesn't answer the question of what to call them (the macros) unless you want: F_OPEN_FILE_DESCRIPTION_GETLK F_OPEN_FILE_DESCRIPTION_SETLK F_OPEN_FILE_DESCRIPTION_SETLKW Or just 'F_OFD_*'? Perhaps OP (for open-private, i.e. private to the particular open) would be a sensible choice; OTOH people are likely to misread it as OPeration. The general principle I have in mind though is that it might be nice to highlight the word open in open file description (Fair enough.) since it (1) contrasts with file descriptor, despite file descriptors also dealing with open files, and (2) contrasts well with legacy fcntl locks, which are (this is the whole bug) associated with the underlying file and not the open file description. Makes sense to me. (We are in more agreement that I realized.) Cheers, Michael So the motion is to call them open file description locks and change the macros to read *_OFD_*. Does anyone object? Works fine for me... And works for me. I think the word open is important here. I find that description is not a word I would have every thought was relevant here - it is obviously too long since I have read the man pages. I would prefer per-open locks which are contrasted with per-process locks An alternative might be flock-like as locks created with flock have exactly the property we are trying to describe. Reading the man page for flock then suggests open file table entry locks which is even more of a mouthful. oftel is pronounceable though. Then we could talk about oftel locks in the same sentence as pin numbers and RAM memory. But maybe I came too late to this party, and the boat has sailed? Note to Michael: The text flock() does not lock files over
Re: [PATCH -V1 05/22] vfs: Add new file and directory create permission flags
On Sun, 27 Apr 2014 21:44:36 +0530 Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: From: Andreas Gruenbacher agr...@kernel.org Some permission models distinguish between the permission to create a non-directory and a directory. Pass this information down to inode_permission() as mask flags Signed-off-by: Andreas Gruenbacher agr...@kernel.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fs/namei.c | 21 - include/linux/fs.h | 2 ++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 7d93d195f0e5..028bc8bcf77c 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -445,7 +445,8 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask) * this, letting us set arbitrary permissions for filesystem access without * changing the normal UIDs which are used for other things. * - * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask. + * When checking for MAY_APPEND, MAY_CREATE_FILE, MAY_CREATE_DIR, + * MAY_WRITE must also be set in @mask. */ int inode_permission(struct inode *inode, int mask) { @@ -2444,14 +2445,16 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) * 3. We should have write and exec permissions on dir * 4. We can't do it if dir is immutable (done in permission()) */ -static inline int may_create(struct inode *dir, struct dentry *child) +static inline int may_create(struct inode *dir, struct dentry *child, bool isdir) { + int mask = isdir ? MAY_CREATE_DIR : MAY_CREATE_FILE; + audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE); if (child-d_inode) return -EEXIST; if (IS_DEADDIR(dir)) return -ENOENT; - return inode_permission(dir, MAY_WRITE | MAY_EXEC); + return inode_permission(dir, MAY_WRITE | MAY_EXEC | mask); } /* @@ -2501,7 +2504,7 @@ EXPORT_SYMBOL(unlock_rename); int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool want_excl) { - int error = may_create(dir, dentry); + int error = may_create(dir, dentry, 0); nit: since the third argument here is a bool, this should be false and not 0. Ditto for all of the other calls of may_create. if (error) return error; @@ -3363,7 +3366,7 @@ EXPORT_SYMBOL(user_path_create); int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev) { - int error = may_create(dir, dentry); + int error = may_create(dir, dentry, 0); if (error) return error; @@ -3455,7 +3458,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) { - int error = may_create(dir, dentry); + int error = may_create(dir, dentry, 1); unsigned max_links = dir-i_sb-s_max_links; if (error) @@ -3785,7 +3788,7 @@ SYSCALL_DEFINE1(unlink, const char __user *, pathname) int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname) { - int error = may_create(dir, dentry); + int error = may_create(dir, dentry, 0); if (error) return error; @@ -3871,7 +3874,7 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de if (S_ISDIR(inode-i_mode)) return -EPERM; - error = may_create(dir, new_dentry); + error = may_create(dir, new_dentry, 0); if (error) return error; @@ -4062,7 +4065,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, return error; if (!target) { - error = may_create(new_dir, new_dentry); + error = may_create(new_dir, new_dentry, is_dir); } else { new_is_dir = d_is_dir(new_dentry); diff --git a/include/linux/fs.h b/include/linux/fs.h index 1f524b4b652d..da5521de04ab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -77,6 +77,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset, #define MAY_CHDIR0x0040 /* called from RCU mode, don't block */ #define MAY_NOT_BLOCK0x0080 +#define MAY_CREATE_FILE 0x0100 +#define MAY_CREATE_DIR 0x0200 /* * flags in file.f_mode. Note that FMODE_READ and FMODE_WRITE must correspond -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] locks: ensure that fl_owner is always initialized properly in flock and lease codepaths
Currently, the fl_owner isn't set for flock locks. Some filesystems use byte-range locks to simulate flock locks and there is a common idiom in those that does: fl-fl_owner = (fl_owner_t)filp; fl-fl_start = 0; fl-fl_end = OFFSET_MAX; Since flock locks are generally owned by the open file description, move this into the common flock lock setup code. The fl_start and fl_end fields are already set appropriately, so remove the unneeded setting of that in flock ops in those filesystems as well. Finally, the lease code also sets the fl_owner as if they were owned by the process and not the open file description. This is incorrect as leases have the same ownership semantics as flock locks. Set them the same way. The lease code doesn't actually use the fl_owner value for anything, so this is more for consistency's sake than a bugfix. Reported-by: Trond Myklebust trond.mykleb...@primarydata.com Signed-off-by: Jeff Layton jlay...@poochiereds.net --- drivers/staging/lustre/lustre/llite/file.c | 17 ++--- fs/9p/vfs_file.c | 3 --- fs/afs/flock.c | 4 fs/ceph/locks.c| 10 ++ fs/fuse/file.c | 1 - fs/locks.c | 4 +++- fs/nfs/file.c | 4 7 files changed, 11 insertions(+), 32 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index 8e844a6371e0..760ccd83f1f7 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -2691,20 +2691,15 @@ int ll_file_flock(struct file *file, int cmd, struct file_lock *file_lock) ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FLOCK, 1); - if (file_lock-fl_flags FL_FLOCK) { + if (file_lock-fl_flags FL_FLOCK) LASSERT((cmd == F_SETLKW) || (cmd == F_SETLK)); - /* flocks are whole-file locks */ - flock.l_flock.end = OFFSET_MAX; - /* For flocks owner is determined by the local file descriptor*/ - flock.l_flock.owner = (unsigned long)file_lock-fl_file; - } else if (file_lock-fl_flags FL_POSIX) { - flock.l_flock.owner = (unsigned long)file_lock-fl_owner; - flock.l_flock.start = file_lock-fl_start; - flock.l_flock.end = file_lock-fl_end; - } else { + else if (!(file_lock-fl_flags FL_POSIX)) return -EINVAL; - } + + flock.l_flock.owner = (unsigned long)file_lock-fl_owner; flock.l_flock.pid = file_lock-fl_pid; + flock.l_flock.start = file_lock-fl_start; + flock.l_flock.end = file_lock-fl_end; /* Somewhat ugly workaround for svc lockd. * lockd installs custom fl_lmops-lm_compare_owner that checks diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c index d8223209d4b1..59e3fe3d56c0 100644 --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -352,9 +352,6 @@ static int v9fs_file_flock_dotl(struct file *filp, int cmd, invalidate_mapping_pages(inode-i_data, 0, -1); } /* Convert flock to posix lock */ - fl-fl_owner = (fl_owner_t)filp; - fl-fl_start = 0; - fl-fl_end = OFFSET_MAX; fl-fl_flags |= FL_POSIX; fl-fl_flags ^= FL_FLOCK; diff --git a/fs/afs/flock.c b/fs/afs/flock.c index a8cf2cff836c..4baf1d2b39e4 100644 --- a/fs/afs/flock.c +++ b/fs/afs/flock.c @@ -555,10 +555,6 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl) return -ENOLCK; /* we're simulating flock() locks using posix locks on the server */ - fl-fl_owner = (fl_owner_t) file; - fl-fl_start = 0; - fl-fl_end = OFFSET_MAX; - if (fl-fl_type == F_UNLCK) return afs_do_unlk(file, fl); return afs_do_setlk(file, fl); diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c index d94ba0df9f4d..db8c1ca15d72 100644 --- a/fs/ceph/locks.c +++ b/fs/ceph/locks.c @@ -52,10 +52,7 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct file *file, else length = fl-fl_end - fl-fl_start + 1; - if (lock_type == CEPH_LOCK_FCNTL) - owner = secure_addr(fl-fl_owner); - else - owner = secure_addr(fl-fl_file); + owner = secure_addr(fl-fl_owner); dout(ceph_lock_message: rule: %d, op: %d, owner: %llx, pid: %llu, start: %llu, length: %llu, wait: %d, type: %d, (int)lock_type, @@ -313,10 +310,7 @@ int lock_to_ceph_filelock(struct file_lock *lock, cephlock-length = cpu_to_le64(lock-fl_end - lock-fl_start + 1); cephlock-client = cpu_to_le64(0); cephlock-pid = cpu_to_le64((u64)lock-fl_pid); - if (lock-fl_flags FL_POSIX) - cephlock-owner = cpu_to_le64(secure_addr(lock-fl_owner)); - else - cephlock-owner = cpu_to_le64
Re: OFD (file private) locks and NFS
On Tue, 29 Apr 2014 10:47:07 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: [CC+= linux-nfs@] On 04/29/2014 10:38 AM, Michael Kerrisk (man-pages) wrote: Hi Jeff, I've been looking a bit at the fcntl() documentation of traditional (F_SETLK) record locking, and a question just jumped out at me. Is it worth considering some future-proofing in the design of OFD locks (open file description locks, formerly known as file-private locks)? What I am thinking of here is that on some systems, the traditional 'struct flock' has a nonstandard field, l_sysid, that is used on F_GETLK to identify the remote system on which a lock is held. Should the design of OFD locks allow for such a field (now, or in the future), which might be useful in the context of locking on network file systems such as NFS. Put more simply, should the new OFD locking system be using a new structure for describing locks, rather than the traditional 'struct flock'? Defining a new structure, might be useful to allow for future extensions to the API. Just add one further detail here. What I'm thinking is, maybe instead there should be something like: struct flockx { int flags; /* Other fields like 'struct flock' */ char reserved[32];/* Or some suitable value */ } That flags field might always be zero for now, but in the future it could be used on the setlk and getlk operations to indicate the presence of additional fields in the structure. Cheers, Michael I considered that early on when I did this, but I don't think it's really helpful. I'm just not a fan of padding out structs when it's not clear what would eventually go in there. The problem is that once actually go to try to convert those from reserved to something usable, it becomes a nightmare for userland to figure that out. How do I know whether my kernel supports the stuff I put in those fields or will just ignore them? If we really find later that we need to do something like this, I think we'd be better off adding a new set of cmd values along with the extended struct, or possibly a new syscall. Some of the samba folks were interested in an async locking mechanism too, so something like that could be added in conjunction with such an interface. -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]
is. There is a relationship between the two that's illustrated in the comments above write_gracetime: /** * write_gracetime - Set or report current NFSv4 grace period time * * As above, but sets the time of the NFSv4 grace period. * * Note this should never be set to less than the *previous* * lease-period time, but we don't try to enforce this. (In the common * case (a new boot), we don't know what the previous lease time was * anyway.) */ The value you're interested in here is the nfsv4leasetime. If the client doesn't renew its lease within that period, then it's subject to the server giving up on it and dropping any state that it holds on that clients' behalf. Note that this is not a firm timeout. The server runs a job periodically to clean out expired stateful objects, and it's likely that there is some time (maybe even up to another whole lease period) between when the timeout expires and the job actually runs. If the client gets a RENEW in there within that window, its lease will be renewed and its state preserved. Also note that all of the above just applies to the Linux knfsd. There are many other servers in the field and they have different rules for dropping state held by clients that have gone AWOL. -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OFD (file private) locks and NFS
On Tue, 29 Apr 2014 07:40:08 -0400 (EDT) Matt W. Benjamin m...@linuxbox.com wrote: Hi Jeff, Something which came up on the last Ganesha conn call is that we have a pretty strong need for some ability to wait on a set of locks, and perhaps receive events. Frank Filz believed that you had made a proposal which would cover this. Can you elaborate on that? Thanks, Matt No, there's no mechanism to wait on a set of locks from within the context of a single thread of execution or to receive events. Again, that would be a new API beyond what I've been proposing over the last several months. -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH man-pages v1] fcntl.2: update manpage with verbiage about open file description locks
Signed-off-by: Jeff Layton jlay...@poochiereds.net --- man2/fcntl.2 | 112 +-- 1 file changed, 109 insertions(+), 3 deletions(-) diff --git a/man2/fcntl.2 b/man2/fcntl.2 index d0154a6d9f42..8d119dfec24c 100644 --- a/man2/fcntl.2 +++ b/man2/fcntl.2 @@ -191,6 +191,9 @@ and .BR O_SYNC flags; see BUGS, below. .SS Advisory locking +This section describes traditional POSIX record locks. Also see the section on +open file description locks below. +.PP .BR F_SETLK , .BR F_SETLKW , and @@ -213,7 +216,8 @@ struct flock { off_t l_start; /* Starting offset for lock */ off_t l_len; /* Number of bytes to lock */ pid_t l_pid; /* PID of process blocking our lock -(F_GETLK only) */ +(returned for F_GETLK and F_OFD_GETLK only. Set + to 0 for open file description locks) */ ... }; .fi @@ -349,9 +353,13 @@ returns details about one of these locks in the .IR l_type , l_whence , l_start , and l_len fields of .I lock -and sets +. +If the conflicting lock is a traditional POSIX lock, then the +.I l_pid +to be the PID of the process holding that lock. If the +conflicting lock is an open file description lock, then the .I l_pid -to be the PID of the process holding that lock. +will be set to \-1. Note that the information returned by .BR F_GETLK may already be out of date by the time the caller inspects it. @@ -394,6 +402,104 @@ should be avoided; use and .BR write (2) instead. +.SS Open file description locks (non-POSIX) +.BR F_OFD_GETLK , F_OFD_SETLK and F_OFD_SETLKW +are used to acquire, release and test open file description record locks. +These are byte-range locks that work identically to the traditional advisory +record locks described above, but are associated with the open file description +on which they were acquired rather than the process, much like locks acquired +with +.BR flock (2) +. +.PP +Unlike traditional advisory record locks, these locks are inherited +across +.BR fork (2) +and +.BR clone (2) +with +.BR CLONE_FILES +and are only released on the last close of the open file description instead +of being released on any close of the file. +.PP +Open file description locks always conflict with traditional record locks, +even when they are acquired by the same process on the same file descriptor. +They only conflict with each other when they are acquired on different +open file descriptions. +.PP +Note that in contrast to traditional record locks, the +.I flock +structure passed in as an argument to the open file description lock commands +must have the +.I l_pid +value set to 0. +.TP +.BR F_OFD_SETLK (\fIstruct flock *\fP) +Acquire an open file description lock (when +.I l_type +is +.B F_RDLCK +or +.BR F_WRLCK ) +or release an open file description lock (when +.I l_type +is +.BR F_UNLCK ) +on the bytes specified by the +.IR l_whence , l_start , and l_len +fields of +.IR lock . +If a conflicting lock is held by another process, +this call returns \-1 and sets +.I errno +to +.B EACCES +or +.BR EAGAIN . +.TP +.BR F_OFD_SETLKW (\fIstruct flock *\fP) +As for +.BR F_OFD_SETLK , +but if a conflicting lock is held on the file, then wait for that lock to be +released. If a signal is caught while waiting, then the call is interrupted +and (after the signal handler has returned) returns immediately (with return +value \-1 and +.I errno +set to +.BR EINTR ; +see +.BR signal (7)). +.TP +.BR F_OFD_GETLK (\fIstruct flock *\fP) +On input to this call, +.I lock +describes an open file description lock we would like to place on the file. +If the lock could be placed, +.BR fcntl () +does not actually place it, but returns +.B F_UNLCK +in the +.I l_type +field of +.I lock +and leaves the other fields of the structure unchanged. +If one or more incompatible locks would prevent +this lock being placed, then +.BR fcntl () +returns details about one of these locks in the +.IR l_type , l_whence , l_start , and l_len +fields of +.I lock +. +If the conflicting lock is a process-associated record lock, then the +.I l_pid +will be set to the PID of the process holding that lock. If the +conflicting lock is an open file description lock, then the +.I l_pid +will be set to -1 to indicate that it is not associated with a process. +Note that the information returned by +.BR F_OFD_GETLK +may already be out of date by the time the caller inspects it. .SS Mandatory locking (Non-POSIX.) The above record locks may be either advisory or mandatory, -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH man-pages v1] fcntl.2: update manpage with verbiage about open file description locks
On Wed, 30 Apr 2014 12:50:23 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: [CC += linux-man] Jeff, Thanks very much for writing this patch! I've taken your patch into a branch and add a number of details. I have one or two questions below. On 04/29/2014 08:51 PM, Jeff Layton wrote: Signed-off-by: Jeff Layton jlay...@poochiereds.net --- man2/fcntl.2 | 112 +-- 1 file changed, 109 insertions(+), 3 deletions(-) diff --git a/man2/fcntl.2 b/man2/fcntl.2 index d0154a6d9f42..8d119dfec24c 100644 --- a/man2/fcntl.2 +++ b/man2/fcntl.2 @@ -191,6 +191,9 @@ and .BR O_SYNC flags; see BUGS, below. .SS Advisory locking +This section describes traditional POSIX record locks. Also see the section on +open file description locks below. +.PP .BR F_SETLK , .BR F_SETLKW , and @@ -213,7 +216,8 @@ struct flock { off_t l_start; /* Starting offset for lock */ off_t l_len; /* Number of bytes to lock */ pid_t l_pid; /* PID of process blocking our lock -(F_GETLK only) */ +(returned for F_GETLK and F_OFD_GETLK only. Set + to 0 for open file description locks) */ ... }; .fi @@ -349,9 +353,13 @@ returns details about one of these locks in the .IR l_type , l_whence , l_start , and l_len fields of .I lock -and sets +. +If the conflicting lock is a traditional POSIX lock, then the +.I l_pid +to be the PID of the process holding that lock. If the +conflicting lock is an open file description lock, then the .I l_pid -to be the PID of the process holding that lock. +will be set to \-1. Note that the information returned by .BR F_GETLK may already be out of date by the time the caller inspects it. @@ -394,6 +402,104 @@ should be avoided; use and .BR write (2) instead. +.SS Open file description locks (non-POSIX) +.BR F_OFD_GETLK , F_OFD_SETLK and F_OFD_SETLKW +are used to acquire, release and test open file description record locks. +These are byte-range locks that work identically to the traditional advisory +record locks described above, but are associated with the open file description +on which they were acquired rather than the process, much like locks acquired +with +.BR flock (2) +. +.PP +Unlike traditional advisory record locks, these locks are inherited +across +.BR fork (2) +and +.BR clone (2) +with +.BR CLONE_FILES +and are only released on the last close of the open file description instead +of being released on any close of the file. +.PP +Open file description locks always conflict with traditional record locks, +even when they are acquired by the same process on the same file descriptor. +They only conflict with each other when they are acquired on different +open file descriptions. +.PP +Note that in contrast to traditional record locks, the +.I flock +structure passed in as an argument to the open file description lock commands +must have the +.I l_pid +value set to 0. In ERRORS, I added EINVAL for this case. +.TP +.BR F_OFD_SETLK (\fIstruct flock *\fP) +Acquire an open file description lock (when +.I l_type +is +.B F_RDLCK +or +.BR F_WRLCK ) +or release an open file description lock (when +.I l_type +is +.BR F_UNLCK ) +on the bytes specified by the +.IR l_whence , l_start , and l_len +fields of +.IR lock . +If a conflicting lock is held by another process, +this call returns \-1 and sets +.I errno +to +.B EACCES +or +.BR EAGAIN . The EACCES or EAGAIN thing comes from POSIX, because different implementations of tradition record locks returned one of these errors. So, portable applications using traditional locks must handle either possibility. However, that argument doesn't apply for these new locks. Surely, we just want to say set errno to EAGAIN for this case? +.TP +.BR F_OFD_SETLKW (\fIstruct flock *\fP) +As for +.BR F_OFD_SETLK , +but if a conflicting lock is held on the file, then wait for that lock to be +released. If a signal is caught while waiting, then the call is interrupted +and (after the signal handler has returned) returns immediately (with return +value \-1 and +.I errno +set to +.BR EINTR ; +see +.BR signal (7)). +.TP +.BR F_OFD_GETLK (\fIstruct flock *\fP) +On input to this call, +.I lock +describes an open file description lock we would like to place on the file. +If the lock could be placed, +.BR fcntl () +does not actually place it, but returns +.B F_UNLCK +in the +.I l_type +field of +.I lock +and leaves the other fields of the structure unchanged. +If one or more incompatible locks would prevent +this lock being placed, then +.BR fcntl () +returns details about one
Re: [PATCH 1/2] cifs: Use min_t() when comparing size_t and unsigned long
On Sun, 13 Apr 2014 20:46:21 +0200 Geert Uytterhoeven ge...@linux-m68k.org wrote: On 32 bit, size_t is unsigned int, not unsigned long, causing the following warning when comparing with PAGE_SIZE, which is always unsigned long: fs/cifs/file.c: In function ‘cifs_readdata_to_iov’: fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast Introduced by commit 7f25bba819a38ab7310024a9350655f374707e20 (cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()), which changed the signedness of remaining and the code from min_t() to min(). Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org --- PAGE_SIZE should really be size_t, but that would require lots of changes all over the place. fs/cifs/file.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 8807442c94dd..8add25538a3b 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2754,7 +2754,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) for (i = 0; i rdata-nr_pages; i++) { struct page *page = rdata-pages[i]; - size_t copy = min(remaining, PAGE_SIZE); + size_t copy = min_t(size_t, remaining, PAGE_SIZE); size_t written = copy_page_to_iter(page, 0, copy, iter); remaining -= written; if (written copy iov_iter_count(iter) 0) Reviewed-by: Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[glibc PATCH v4 resend] fcntl-linux.h: add new definitions and manual updates for open file description locks
Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but block until lock is acquired. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated with the process. +There is also a different type of record lock that is associated with the +open file description instead of the process
Re: [PATCH 1/1] fs/locks.c: replace seq_printf by seq_puts
On Sat, 3 May 2014 22:28:43 +0200 Fabian Frederick f...@skynet.be wrote: Replace seq_printf where possible Cc: Jeff Layton jlay...@redhat.com Cc: Alexander Viro v...@zeniv.linux.org.uk Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Fabian Frederick f...@skynet.be --- fs/locks.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index e663aea..78ac209 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2411,31 +2411,31 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl, seq_printf(f, %lld:%s , id, pfx); if (IS_POSIX(fl)) { if (fl-fl_flags FL_ACCESS) - seq_printf(f, ACCESS); + seq_puts(f, ACCESS); else if (IS_OFDLCK(fl)) - seq_printf(f, OFDLCK); + seq_puts(f, OFDLCK); else - seq_printf(f, POSIX ); + seq_puts(f, POSIX ); seq_printf(f, %s , (inode == NULL) ? *NOINODE* : mandatory_lock(inode) ? MANDATORY : ADVISORY ); } else if (IS_FLOCK(fl)) { if (fl-fl_type LOCK_MAND) { - seq_printf(f, FLOCK MSNFS ); + seq_puts(f, FLOCK MSNFS ); } else { - seq_printf(f, FLOCK ADVISORY ); + seq_puts(f, FLOCK ADVISORY ); } } else if (IS_LEASE(fl)) { - seq_printf(f, LEASE ); + seq_puts(f, LEASE ); if (lease_breaking(fl)) - seq_printf(f, BREAKING ); + seq_puts(f, BREAKING ); else if (fl-fl_file) - seq_printf(f, ACTIVE); + seq_puts(f, ACTIVE); else - seq_printf(f, BREAKER ); + seq_puts(f, BREAKER ); } else { - seq_printf(f, UNKNOWN UNKNOWN ); + seq_puts(f, UNKNOWN UNKNOWN ); } if (fl-fl_type LOCK_MAND) { seq_printf(f, %s , @@ -2467,7 +2467,7 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl, else seq_printf(f, %Ld %Ld\n, fl-fl_start, fl-fl_end); } else { - seq_printf(f, 0 EOF\n); + seq_puts(f, 0 EOF\n); } } Looks good to me. I'll plan to merge it for v3.16. Thanks! -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] MAINTAINERS email address change
The following changes since commit 8aa9e85adac609588eeec356e5a85059b3b819ba: ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe (2014-04-30 08:21:43 -0700) are available in the git repository at: git://git.samba.org/jlayton/linux.git tags/locks-v3.15-3 for you to fetch changes up to 8c836fa85bebfea26ceff024b5d1c518bf63f3d8: MAINTAINERS: email address change for Jeff Layton (2014-04-30 13:28:17 -0400) File locking related changes for v3.15 (pile #3) - only an email address change to the MAINTAINERS file Jeff Layton (1): MAINTAINERS: email address change for Jeff Layton MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- Jeff Layton jlay...@poochiereds.net signature.asc Description: PGP signature
Re: [PATCH 1/1] FS/CIFS: remove obsolete __constant
On Sat, May 3, 2014 at 4:15 PM, Fabian Frederick f...@skynet.be wrote: Replacing all __constant_foo to foo() except in smb2status.h (1700 lines to update). Cc: linux-c...@vger.kernel.org Cc: Steve French sfre...@samba.org Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Fabian Frederick f...@skynet.be --- fs/cifs/cifsacl.c | 2 +- fs/cifs/cifssmb.c | 20 ++-- fs/cifs/sess.c | 2 +- fs/cifs/smb2misc.c | 38 +++--- fs/cifs/smb2ops.c | 2 +- fs/cifs/smb2pdu.c | 2 +- fs/cifs/smb2pdu.h | 28 ++-- 7 files changed, 47 insertions(+), 47 deletions(-) diff --git a/fs/cifs/cifsacl.c b/fs/cifs/cifsacl.c index 7ff866d..54ac0e8 100644 --- a/fs/cifs/cifsacl.c +++ b/fs/cifs/cifsacl.c @@ -38,7 +38,7 @@ static const struct cifs_sid sid_everyone = { 1, 1, {0, 0, 0, 0, 0, 1}, {0} }; /* security id for Authenticated Users system group */ static const struct cifs_sid sid_authusers = { - 1, 1, {0, 0, 0, 0, 0, 5}, {__constant_cpu_to_le32(11)} }; + 1, 1, {0, 0, 0, 0, 0, 5}, {cpu_to_le32(11)} }; Does the build still work on BE arches with this? I know at one point the above wouldn't compile on those arches. See commit 536abdb0802f, for an explanation. /* group users */ static const struct cifs_sid sid_user = {1, 2 , {0, 0, 0, 0, 0, 5}, {} }; diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 6ce4e09..c3dc52e 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -2430,14 +2430,14 @@ CIFSSMBPosixLock(const unsigned int xid, struct cifs_tcon *tcon, } parm_data = (struct cifs_posix_lock *) ((char *)pSMBr-hdr.Protocol + data_offset); - if (parm_data-lock_type == __constant_cpu_to_le16(CIFS_UNLCK)) + if (parm_data-lock_type == cpu_to_le16(CIFS_UNLCK)) pLockData-fl_type = F_UNLCK; else { if (parm_data-lock_type == - __constant_cpu_to_le16(CIFS_RDLCK)) + cpu_to_le16(CIFS_RDLCK)) pLockData-fl_type = F_RDLCK; else if (parm_data-lock_type == - __constant_cpu_to_le16(CIFS_WRLCK)) + cpu_to_le16(CIFS_WRLCK)) pLockData-fl_type = F_WRLCK; pLockData-fl_start = le64_to_cpu(parm_data-start); @@ -3232,25 +3232,25 @@ CIFSSMB_set_compression(const unsigned int xid, struct cifs_tcon *tcon, pSMB-compression_state = cpu_to_le16(COMPRESSION_FORMAT_DEFAULT); pSMB-TotalParameterCount = 0; - pSMB-TotalDataCount = __constant_cpu_to_le32(2); + pSMB-TotalDataCount = cpu_to_le32(2); pSMB-MaxParameterCount = 0; pSMB-MaxDataCount = 0; pSMB-MaxSetupCount = 4; pSMB-Reserved = 0; pSMB-ParameterOffset = 0; - pSMB-DataCount = __constant_cpu_to_le32(2); + pSMB-DataCount = cpu_to_le32(2); pSMB-DataOffset = cpu_to_le32(offsetof(struct smb_com_transaction_compr_ioctl_req, compression_state) - 4); /* 84 */ pSMB-SetupCount = 4; - pSMB-SubCommand = __constant_cpu_to_le16(NT_TRANSACT_IOCTL); + pSMB-SubCommand = cpu_to_le16(NT_TRANSACT_IOCTL); pSMB-ParameterCount = 0; - pSMB-FunctionCode = __constant_cpu_to_le32(FSCTL_SET_COMPRESSION); + pSMB-FunctionCode = cpu_to_le32(FSCTL_SET_COMPRESSION); pSMB-IsFsctl = 1; /* FSCTL */ pSMB-IsRootFlag = 0; pSMB-Fid = fid; /* file handle always le */ /* 3 byte pad, followed by 2 byte compress state */ - pSMB-ByteCount = __constant_cpu_to_le16(5); + pSMB-ByteCount = cpu_to_le16(5); inc_rfc1001_len(pSMB, 5); rc = SendReceive(xid, tcon-ses, (struct smb_hdr *) pSMB, @@ -3386,10 +3386,10 @@ static __u16 ACL_to_cifs_posix(char *parm_data, const char *pACL, cifs_acl-version = cpu_to_le16(1); if (acl_type == ACL_TYPE_ACCESS) { cifs_acl-access_entry_count = cpu_to_le16(count); - cifs_acl-default_entry_count = __constant_cpu_to_le16(0x); + cifs_acl-default_entry_count = cpu_to_le16(0x); } else if (acl_type == ACL_TYPE_DEFAULT) { cifs_acl-default_entry_count = cpu_to_le16(count); - cifs_acl-access_entry_count = __constant_cpu_to_le16(0x); + cifs_acl-access_entry_count = cpu_to_le16(0x); } else { cifs_dbg(FYI, unknown ACL type %d\n, acl_type); return 0; diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index e87387d..27e6175 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -46,7 +46,7 @@ static __u32
Re: [PATCH 1/1] FS/CIFS: remove obsolete __constant
On Mon, 5 May 2014 06:53:01 +0200 Fabian Frederick f...@skynet.be wrote: On Sun, 4 May 2014 18:52:43 -0400 Jeff Layton jlay...@poochiereds.net wrote: On Sat, May 3, 2014 at 4:15 PM, Fabian Frederick f...@skynet.be wrote: Replacing all __constant_foo to foo() except in smb2status.h (1700 lines to update). Cc: linux-c...@vger.kernel.org Cc: Steve French sfre...@samba.org Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Fabian Frederick f...@skynet.be --- fs/cifs/cifsacl.c | 2 +- fs/cifs/cifssmb.c | 20 ++-- fs/cifs/sess.c | 2 +- fs/cifs/smb2misc.c | 38 +++--- fs/cifs/smb2ops.c | 2 +- fs/cifs/smb2pdu.c | 2 +- fs/cifs/smb2pdu.h | 28 ++-- 7 files changed, 47 insertions(+), 47 deletions(-) diff --git a/fs/cifs/cifsacl.c b/fs/cifs/cifsacl.c index 7ff866d..54ac0e8 100644 --- a/fs/cifs/cifsacl.c +++ b/fs/cifs/cifsacl.c @@ -38,7 +38,7 @@ static const struct cifs_sid sid_everyone = { 1, 1, {0, 0, 0, 0, 0, 1}, {0} }; /* security id for Authenticated Users system group */ static const struct cifs_sid sid_authusers = { - 1, 1, {0, 0, 0, 0, 0, 5}, {__constant_cpu_to_le32(11)} }; + 1, 1, {0, 0, 0, 0, 0, 5}, {cpu_to_le32(11)} }; Does the build still work on BE arches with this? I know at one point the above wouldn't compile on those arches. See commit 536abdb0802f, for an explanation. Untested on BE but it would mean checkpatch don't use __constantfoo functions outside of include/uapi/ stuff is wrong ? I'm not sure. Maybe the macros have been fixed so that it's no longer a problem? I think it would be good to test build this on a BE arch before it's merged to be certain though. IIRC, I originally saw the problem on s390 builds. -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfs oom on 3.15
On Mon, 5 May 2014 02:42:35 -0700 (PDT) David Rientjes rient...@google.com wrote: On Mon, 5 May 2014, Marc Dietrich wrote: Hi, this is spaming my log quite a bit. Any idea? I think rpc_malloc() really wants to be __GFP_NOWARN so that the warnings are suppressed, doing order-1 allocations with GFP_NOWAIT is never guaranteed to succeed and will have difficulty if memory is fragmented. Yes. We made a similar fix to xprt_alloc_slot a while back. The RPC engine has a mechanism to handle retrying these allocations, so this warning isn't particularly helpful. -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] locks: add some tracepoints in the lease handling code
v2: add a __break_lease tracepoint for non-blocking case Recently, I needed these to help track down a softlockup when recalling a delegation, but they might be helpful in other situations as well. Cc: J. Bruce Fields bfie...@fieldses.org Signed-off-by: Jeff Layton jlay...@poochiereds.net --- fs/locks.c | 10 + include/trace/events/filelock.h | 93 + 2 files changed, 103 insertions(+) create mode 100644 include/trace/events/filelock.h diff --git a/fs/locks.c b/fs/locks.c index 6b01d5a1de4a..2367f3118662 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -130,6 +130,9 @@ #include linux/percpu.h #include linux/lglock.h +#define CREATE_TRACE_POINTS +#include trace/events/filelock.h + #include asm/uaccess.h #define IS_POSIX(fl) (fl-fl_flags FL_POSIX) @@ -1386,6 +1389,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) } if (i_have_this_lease || (mode O_NONBLOCK)) { + trace_break_lease_noblock(inode, new_fl); error = -EWOULDBLOCK; goto out; } @@ -1397,10 +1401,12 @@ restart: if (break_time == 0) break_time++; locks_insert_block(flock, new_fl); + trace_break_lease_block(inode, new_fl); spin_unlock(inode-i_lock); error = wait_event_interruptible_timeout(new_fl-fl_wait, !new_fl-fl_next, break_time); spin_lock(inode-i_lock); + trace_break_lease_unblock(inode, new_fl); locks_delete_block(new_fl); if (error = 0) { if (error == 0) @@ -1522,6 +1528,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp int error; lease = *flp; + trace_generic_add_lease(inode, lease); + /* * In the delegation case we need mutual exclusion with * a number of operations that take the i_mutex. We trylock @@ -1611,6 +1619,8 @@ static int generic_delete_lease(struct file *filp, struct file_lock **flp) struct dentry *dentry = filp-f_path.dentry; struct inode *inode = dentry-d_inode; + trace_generic_delete_lease(inode, *flp); + for (before = inode-i_flock; ((fl = *before) != NULL) IS_LEASE(fl); before = fl-fl_next) { diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h new file mode 100644 index ..d1e9d3eaa8ce --- /dev/null +++ b/include/trace/events/filelock.h @@ -0,0 +1,93 @@ +/* + * Events for filesystem locks + * + * Copyright 2013 Jeff Layton jlay...@poochiereds.net + */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM filelock + +#if !defined(_TRACE_FILELOCK_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_FILELOCK_H + +#include linux/tracepoint.h +#include linux/fs.h +#include linux/device.h +#include linux/kdev_t.h + +#define show_fl_flags(val) \ + __print_flags(val, |, \ + { FL_POSIX, FL_POSIX }, \ + { FL_FLOCK, FL_FLOCK }, \ + { FL_DELEG, FL_DELEG }, \ + { FL_ACCESS,FL_ACCESS }, \ + { FL_EXISTS,FL_EXISTS }, \ + { FL_LEASE, FL_LEASE }, \ + { FL_CLOSE, FL_CLOSE }, \ + { FL_SLEEP, FL_SLEEP }, \ + { FL_DOWNGRADE_PENDING, FL_DOWNGRADE_PENDING }, \ + { FL_UNLOCK_PENDING,FL_UNLOCK_PENDING }, \ + { FL_OFDLCK,FL_OFDLCK }) + +#define show_fl_type(val) \ + __print_symbolic(val, \ + { F_RDLCK, F_RDLCK }, \ + { F_WRLCK, F_WRLCK }, \ + { F_UNLCK, F_UNLCK }) + +DECLARE_EVENT_CLASS(filelock_lease, + + TP_PROTO(struct inode *inode, struct file_lock *fl), + + TP_ARGS(inode, fl), + + TP_STRUCT__entry( + __field(struct file_lock *, fl) + __field(unsigned long, i_ino) + __field(dev_t, s_dev) + __field(struct file_lock *, fl_next) + __field(fl_owner_t, fl_owner) + __field(unsigned int, fl_flags) + __field(unsigned char, fl_type) + __field(unsigned long, fl_break_time) + __field(unsigned long, fl_downgrade_time) + ), + + TP_fast_assign( + __entry-fl = fl; + __entry-s_dev = inode-i_sb-s_dev; + __entry-i_ino = inode-i_ino; + __entry-fl_next = fl-fl_next; + __entry-fl_owner
Re: linux-3.14 nfsd regression
On Thu, 03 Apr 2014 13:51:06 -0400 Mark Lord ml...@pobox.com wrote: On 14-04-03 01:16 PM, J. Bruce Fields wrote: On Thu, Apr 03, 2014 at 12:33:55PM -0400, Mark Lord wrote: This commit from linux-3.14 breaks our NFS-root clients here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5 - *p++ = htonl((u32) stat-mode); + *p++ = htonl((u32) (stat-mode S_IALLUGO)); Reverting the one-liner above (on the server) fixes it for us, as does reverting back to linux-3.13.8 on the server. The NFS-root clients are on PowerPC (big-endian) architecture, running linux-3.12.16. The NFS server is on an Intel PC running linux-3.14. ACL is completely disabled on server and client, and we're using NFSv2/v3. No support for v4. I instrumented the function to see what other bits were being cleared by the (stat-mode S_IALLUGO) masking. The results are attached. Hm, it sounds like a bug in the client if it's depending on those high bits. But only for mounting / starting up from the nfsroot, it seems. I wonder if there's an unusual code path for that in there? The regular stuff looks mostly fine: p = xdr_decode_ftype3(p, fmode); fattr-mode = (be32_to_cpup(p++) ~S_IFMT) | fmode; Except perhaps that second line ought to use the same mask as the server side is using, just in case there are some other stray high (higher than S_IFMT) bits in there now/someday. The original behavior was in practice harmless and changing it broke something, so I think we should definitely just revert this patch. Yup. Who? But the client may need fixing too. Probably a good thing in the longer term, for better compatibility with non-Linux servers. But we'll still have to keep the revert on the server (nfsd) code for backward compatibility, I think. Cheers It would be good to understand where this is broken in the client. It's incorrect for the client to interpret those bits, as I think that there's no guarantee that other OS's implement the type bits in the same way that Linux does. So if you end up mounting a different OS, it's possible that the client will get that wrong... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.14 nfsd regression
On Thu, 3 Apr 2014 16:16:27 -0400 J. Bruce Fields bfie...@fieldses.org wrote: On Thu, Apr 03, 2014 at 02:55:04PM -0400, Jeff Layton wrote: On Thu, 03 Apr 2014 13:51:06 -0400 Mark Lord ml...@pobox.com wrote: On 14-04-03 01:16 PM, J. Bruce Fields wrote: On Thu, Apr 03, 2014 at 12:33:55PM -0400, Mark Lord wrote: This commit from linux-3.14 breaks our NFS-root clients here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5 - *p++ = htonl((u32) stat-mode); + *p++ = htonl((u32) (stat-mode S_IALLUGO)); Reverting the one-liner above (on the server) fixes it for us, as does reverting back to linux-3.13.8 on the server. The NFS-root clients are on PowerPC (big-endian) architecture, running linux-3.12.16. The NFS server is on an Intel PC running linux-3.14. ACL is completely disabled on server and client, and we're using NFSv2/v3. No support for v4. I instrumented the function to see what other bits were being cleared by the (stat-mode S_IALLUGO) masking. The results are attached. Hm, it sounds like a bug in the client if it's depending on those high bits. But only for mounting / starting up from the nfsroot, it seems. I wonder if there's an unusual code path for that in there? The regular stuff looks mostly fine: p = xdr_decode_ftype3(p, fmode); fattr-mode = (be32_to_cpup(p++) ~S_IFMT) | fmode; Except perhaps that second line ought to use the same mask as the server side is using, just in case there are some other stray high (higher than S_IFMT) bits in there now/someday. The original behavior was in practice harmless and changing it broke something, so I think we should definitely just revert this patch. Yup. Who? But the client may need fixing too. Probably a good thing in the longer term, for better compatibility with non-Linux servers. But we'll still have to keep the revert on the server (nfsd) code for backward compatibility, I think. Cheers It would be good to understand where this is broken in the client. It's incorrect for the client to interpret those bits, as I think that there's no guarantee that other OS's implement the type bits in the same way that Linux does. So if you end up mounting a different OS, it's possible that the client will get that wrong... It turns out these bits actually are defined in rfc 1094, so this is just an odd NFSv2-specific wart, and the nfsd change was just flat-out wrong. --b. Ahh right -- I remember seeing that long ago. So according to the RFC you have to encode both the mode bits and the ftype for v2. The type bits seem to be removed from the mode in NFSv3 though, so perhaps we should only be doing that masking in versions above v2? With a quick check, it looks like the v3 code doesn't rely on those bits and I imagine v4 doesn't either. It might also be nice to have the client v2 decode_fattr function to throw a warning if the server sends us mismatched type bits and ftype values. That would have helped us catch this sooner... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-3.14 nfsd regression
On Fri, 4 Apr 2014 09:58:43 -0400 J. Bruce Fields bfie...@fieldses.org wrote: On Thu, Apr 03, 2014 at 07:21:46PM -0400, Jeff Layton wrote: So according to the RFC you have to encode both the mode bits and the ftype for v2. The type bits seem to be removed from the mode in NFSv3 though, so perhaps we should only be doing that masking in versions above v2? Right, the problematic patch applied the same mask in both v2 and v3 cases, so I'm reverting just the v2 part (see below). With a quick check, it looks like the v3 code doesn't rely on those bits and I imagine v4 doesn't either. It might also be nice to have the client v2 decode_fattr function to throw a warning if the server sends us mismatched type bits and ftype values. That would have helped us catch this sooner... Yes, that might be a reasonable thing to do, though I don't know if it's worth it. --b. commit 35a8dff14e76c00e5b52140290cfb498dc2454a0 Author: J. Bruce Fields bfie...@redhat.com Date: Thu Apr 3 15:10:35 2014 -0400 nfsd: revert v2 half of nfsd: don't return high mode bits This reverts the part of commit 6e14b46b91fee8a049b0940333ce13a820beaaa5 that changes NFSv2 behavior. Mark Lord found that it broke nfs-root for Linux clients, because it broke NFSv2. In fact, from RFC 1094: Notice that the file type is specified both in the mode bits and in the file type. This is really a bug in the protocol and will be fixed in future versions. So NFSv2 clients really are expected to depend on the high bits of the mode. Cc: sta...@kernel.org Reported-by: Mark Lord ml...@pobox.com Signed-off-by: J. Bruce Fields bfie...@redhat.com diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index b17d932..9c769a4 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -152,7 +152,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, type = (stat-mode S_IFMT); *p++ = htonl(nfs_ftypes[type 12]); - *p++ = htonl((u32) (stat-mode S_IALLUGO)); + *p++ = htonl((u32) stat-mode); *p++ = htonl((u32) stat-nlink); *p++ = htonl((u32) from_kuid(init_user_ns, stat-uid)); *p++ = htonl((u32) from_kgid(init_user_ns, stat-gid)); Looks right... Reviewed-by: Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] please pull file locking changes for 3.15
On Fri, 4 Apr 2014 14:28:16 -0700 Linus Torvalds torva...@linux-foundation.org wrote: On Mon, Mar 31, 2014 at 6:31 AM, Jeff Layton jlay...@redhat.com wrote: The following changes since commit 29723adee11804b548903ddb1db666cf4a60f60e: locks: make locks_mandatory_area check for file-private locks (2014-03-31 08:24:43 -0400) are available in the git repository at: git://git.samba.org/jlayton/linux.git locks-3.15 Please please please use the proper scripts (or improve on whatever local script you use) that show the diffstat (git diff -M --stat --summary) and shortlog for the pull request too. I want to know roughly what I'm pulling *before* I pull it, and while I can do a two-stage thing where I first fetch-and-inspect it, that's annoying. Your human-readable Highlights thing is good, and I want that too (I put commentary like that into the merge commit itself), but I do want that diffstat/shortlog in addition to the human-readable summary. Because having the diffstat in the pull request email not only gets me a heasd-up abotu what to expect, it also confirms that I'm on the same page as you are. It acts as a sanity-check for me when I can compare the diffstat you *claim* I should get with the diffstat I actually get after merging. So I really want to get that diffstat and shortlog as a sanity check. I also would prefer signed tags. It's not a must since it's not like git.samba.org is some random public site (I do _require_ them for pull requests from github etc), but it's definitely a good thing to have. Anyway, I'm going through my pile of filesystem pull requests, and I've pulled this into my tree. I'm just going through allmodconfig builds etc before pushing my merge out, so assuming that all works fine you don't need to resend this one. But for future pull requests, please do try to fix the above up. Linus Thanks for the feedback and for pulling this in anyway. I'll make sure to do all of that on subsequent pull requests. Cheers! -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFSv2 directory cannot longer be exported in 3.14, 3.13 works fine
On Sat, 05 Apr 2014 14:17:27 +0200 Toralf Förster toralf.foers...@gmx.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 04/04/2014 11:57 PM, Toralf Förster wrote: At a user mode linux image (32 bit Gentoo Linux) the recent kernel gives this error while trying to mount a NFSv2 directory: mount.nfs: mount point /mnt/nfsv2 is not a directory A bisect points to : 6e14b46b91fee8a049b0940333ce13a820beaaa5 is the first bad commit commit 6e14b46b91fee8a049b0940333ce13a820beaaa5 Author: Albert Fluegel a...@muc.de Date: Mon Nov 18 12:18:01 2013 -0500 nfsd: don't return high mode bits The Linux NFS server replies among other things to a Check access permission the following: NFS:File type = 2 (Directory) NFS:Mode = 040755 A netapp server replies here: NFS:File type = 2 (Directory) NFS:Mode = 0755 The RFC 1813 i read: fattr3 struct fattr3 { ftype3 type; mode3 mode; uint32 nlink; ... For the mode bits only the lowest 9 are defined in the RFC As far as I can tell, knfsd has always done this, so apparently it's harmless. Nevertheless, it appears to be wrong. Note this is already correct in the NFSv4 case, only v2 and v3 need fixing. Signed-off-by: J. Bruce Fields bfie...@redhat.com :04 04 a60e55f5a9749db3977e12e8baa08d45fadba225 b982c02ec1069c5d969ccb71197e5569fe93b075 M fs - -- MfG/Sincerely Toralf Förster pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlM/9FcACgkQxOrN3gB26U5F1AD+JAP4cDM+rT9nk9Q8q681RywT Ejxq7d0tnVUfWyDFDTgA/1sIUtpv9GSK8SeAJFu7oAhMt4e6eeIdbpMT2qUEQv6q =Sles -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html This was already tracked down earlier this week. Bruce has a patch queued to fix it in his nfsd-next branch. See this thread from earlier this week: Subject: Re: linux-3.14 nfsd regression -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] please pull file locking changes for 3.15
On Mon, 7 Apr 2014 11:37:06 +1000 Stephen Rothwell s...@canb.auug.org.au wrote: Hi Jeff, On Fri, 4 Apr 2014 20:56:24 -0400 Jeff Layton jlay...@redhat.com wrote: Thanks for the feedback and for pulling this in anyway. I'll make sure to do all of that on subsequent pull requests. Also, please don't rebase what you have in linux-next before sending it to Linus (without good reason). Especially if you then leave the linux-next included branch as it was - since that may cause conflicts in my tree (and so I notice the rebase). I don't think I did that though, did I? Both the branches I had in this case were based on 3.13-rc1. Now that Linus has pulled in the changes, am I OK to rebase the branches (or do a pull)? -- Jeff Layton jlay...@redhat.com signature.asc Description: PGP signature
Re: [GIT PULL] please pull file locking changes for 3.15
On Mon, 7 Apr 2014 22:27:02 +1000 Stephen Rothwell s...@canb.auug.org.au wrote: Hi Jeff, On Mon, 7 Apr 2014 07:11:30 -0400 Jeff Layton jlay...@redhat.com wrote: On Mon, 7 Apr 2014 11:37:06 +1000 Stephen Rothwell s...@canb.auug.org.au wrote: On Fri, 4 Apr 2014 20:56:24 -0400 Jeff Layton jlay...@redhat.com wrote: Thanks for the feedback and for pulling this in anyway. I'll make sure to do all of that on subsequent pull requests. Also, please don't rebase what you have in linux-next before sending it to Linus (without good reason). Especially if you then leave the linux-next included branch as it was - since that may cause conflicts in my tree (and so I notice the rebase). I don't think I did that though, did I? Both the branches I had in this case were based on 3.13-rc1. OK, so the base stayed the same, but you recommitted all the same patches (I didn't check to see if the commit messages changed) which due to some other change caused conflicts in linux-next today :-( Not really a biggie, but generally, you really shouldn't rewrite your commits just before sending them to Linus. Now that Linus has pulled in the changes, am I OK to rebase the branches (or do a pull)? At this point, you should just reset your next included branch to be in Linus' tree beyond where Linus merged your tree, then it will be effectively empty (unless you then add some more patches on top - but at this point those should only be fixes). Ahh ok, I guess I *did* alter a commit message in my locks-3.15 branch and didn't propagate that to my linux-next branch. That probably threw off the commit IDs. The patches themselves should have been identical though. Sorry about that! In any case, I've gone ahead and reset my branch as you recommend and linux-next branch and will be more careful about that in the future. -- Jeff Layton jlay...@redhat.com signature.asc Description: PGP signature
Re: [PATCH] NFS/RDMA: Document separate Kconfig symbols
On Wed, 09 Apr 2014 16:05:53 +0200 Paul Bolle pebo...@tiscali.nl wrote: The NFS/RDMA Kconfig symbol was split into separate options for client and server in commit 2e8c12e1b765 (xprtrdma: add separate Kconfig options for NFSoRDMA client and server support). Update the documentation to reflect this split. Signed-off-by: Paul Bolle pebo...@tiscali.nl --- 0) Should Documentation/ describe the current release, or the current and previous releases? For these paragraphs I choose only the current release. I think they're only relevant for the tree they're in. The docs are shipped with the kernel sources, after all, so presumably someone who is looking at these has docs that match their kernel (my omission notwithstanding, of course) 1) Another approach could be to not document the Kconfig setup at all, because the Kconfig system should, in theory, provide all help needed to correctly configure the build for (in this case) NFS/RDMA. But is that true? Yeah, it does seem somewhat redundant. 2) By the way: what's the purpose of INFINIBAND_ADDR_TRANS? Documentation/filesystems/nfs/nfs-rdma.txt | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt index e386f7e4bcee..724043858b08 100644 --- a/Documentation/filesystems/nfs/nfs-rdma.txt +++ b/Documentation/filesystems/nfs/nfs-rdma.txt @@ -138,9 +138,9 @@ Installation - Build, install, reboot The NFS/RDMA code will be enabled automatically if NFS and RDMA -are turned on. The NFS/RDMA client and server are configured via the hidden -SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The -value of SUNRPC_XPRT_RDMA will be: +are turned on. The NFS/RDMA client and server are configured via the +SUNRPC_XPRT_RDMA_CLIENT and SUNRPC_XPRT_RDMA_SERVER config options that both +depend on SUNRPC and INFINIBAND. The default value of both options will be: - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client and server will not be built @@ -235,8 +235,9 @@ NFS/RDMA Setup - Start the NFS server -If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in -kernel config), load the RDMA transport module: +If the NFS/RDMA server was built as a module +(CONFIG_SUNRPC_XPRT_RDMA_SERVER=m in kernel config), load the RDMA +transport module: $ modprobe svcrdma @@ -255,8 +256,9 @@ NFS/RDMA Setup - On the client system -If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in -kernel config), load the RDMA client module: +If the NFS/RDMA client was built as a module +(CONFIG_SUNRPC_XPRT_RDMA_CLIENT=m in kernel config), load the RDMA client +module: $ modprobe xprtrdma.ko Thanks, nice catch, I had forgotten about the docs... Reviewed-by: Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][glibc PATCH] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Wed, 23 Apr 2014 15:00:06 -0400 a...@gnu.org (Alfred M. Szmidt) wrote: Likewise. You infact write that it does get the lock information later in the document wrt. F_OFD_GETLK. Sorry, I disagree here...GETLK is really a misnomer, IMO. TESTLK would have been a better name. GETLK are used is to get the first lock. It's a way to test whether a particular lock can be applied, and to return information about a conflicting lock if it can't. If, for instance there is no conflicting lock, then you don't get any lock information back (l_type just gets reset to F_UNLCK). While I kinda see your point, it isn't what GETLK does; it really does get you information about the first lock -- you're not testing anything. It is also the terminology used in the POSIX standard. You *are* testing a lock. For instance, a process has locked bytes 0-5 for read in the file. I then submit a F_GETLK request from another process and set: l_type = F_WRLCK l_start = 7 l_len = 1 ...this range does not overlap with the original range, and so no lock will be returned even though one is being held on the file. In order to determine whether it should return information about a lock it has to first _test_ whether it conflicts with the information in the struct flock that was passed down. Similarly, if the struct flock I submit to the F_GETLK request has this: l_type = F_RDLCK l_start = 0 l_len = 1 ...then I also will not get any information about a lock back. The information in the lock request does not conflict with the one being held on the file (because they are both read locks). If F_GETLK were just getting a lock, then there would be no test involved, but that's not how this works. F_GETLK has to test and see whether there is a conflicting lock before it can return anything. If all you're objecting to is the change in verbiage on those two pieces, then I'll back that part out in the interest of wrapping this up. I still think I'm correct though ;) -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull file locking changes for v3.15 (pile #2)
The following changes since commit 55101e2d6ce1c780f6ee8fee5f37306971aac6cd: Merge git://git.kernel.org/pub/scm/virt/kvm/kvm (2014-04-14 16:21:28 -0700) are available in the git repository at: git://git.samba.org/jlayton/linux.git tags/locks-v3.15-2 for you to fetch changes up to cff2fce58b2b0f59089e7edcdc38803d65057b9f: locks: rename FL_FILE_PVT and IS_FILE_PVT to use *_OFDLCK instead (2014-04-23 16:17:03 -0400) File locking related bugfixes for v3.15 (pile #2) - fix for a long-standing bug in __break_lease that can cause soft lockups - renaming of file-private locks to open file description locks, and the command macros to more visually distinct names. The fix for __break_lease is also in the pile of patches for which Bruce sent a pull request, but I assume that your merge procedure will handle that correctly. For the other patches, I don't like the fact that we need to rename this stuff at this late stage, but it should be settled now (hopefully). Jeff Layton (3): locks: allow __break_lease to sleep even when break_time is 0 locks: rename file-private locks to open file description locks locks: rename FL_FILE_PVT and IS_FILE_PVT to use *_OFDLCK instead arch/arm/kernel/sys_oabi-compat.c | 6 +++--- fs/compat.c | 14 +++--- fs/fcntl.c| 12 ++-- fs/locks.c| 55 +++ include/linux/fs.h| 2 +- include/uapi/asm-generic/fcntl.h | 20 ++-- security/selinux/hooks.c | 6 +++--- 7 files changed, 57 insertions(+), 58 deletions(-) -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[glibc PATCH v2] fcntl-linux.h: add new definitions and manual updates for open file description locks
Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..764f67d469cc 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test a open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear a file lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but wait for completion. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated with the process. +There is also a different type of record lock that is associated with the +open file description instead of the process. @xref{Open File Description Locks
Re: [glibc PATCH v2] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Thu, 24 Apr 2014 14:31:43 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Jeff, Did you receive my mail with comments on the previous patch? (I got no reply.) It looks like some of those comments that needed to be addressed were not. Cheers, Michael I did get the mail and thought I had addressed them all. The only thing I didn't change based on your comments was the one about the return 0 in main() being unnecessary in the example program. That's not true since main is an int return function. I suppose we could turn it into a void return, but does it really matter here? On 04/24/2014 02:15 PM, Jeff Layton wrote: Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..764f67d469cc 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock
[glibc PATCH v3] fcntl-linux.h: add new definitions and manual updates for open file description locks
Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..0e6e387a1bd9 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test a open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear a file lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but wait for completion. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated with the process. +There is also a different type of record lock that is associated with the +open file description instead of the process. @xref{Open File Description Locks
[glibc PATCHi v4] fcntl-linux.h: add new definitions and manual updates for open file description locks
Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but block until lock is acquired. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated with the process. +There is also a different type of record lock that is associated with the +open file description instead of the process. @xref
Re: [glibc PATCHi v4] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Thu, 24 Apr 2014 17:07:11 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Jeff, Please let us know what changed from one version to the next. I've no idea without looking into the patch itself whether this inorporates my comments on the v2 patch... Cheers, Michael It's a little hard to do that specifically since much of what changed were trivial s/a/an/ type changes. But basically I just incorporated your changes and fixed a few other places. Since this is already so close, what might be best is to just ask the glibc folks to merge what we have, and then if there are other changes that are needed later, we can base patches on top of it? On Thu, Apr 24, 2014 at 4:09 PM, Jeff Layton jlay...@redhat.com wrote: Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item
Re: [glibc PATCHi v4] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Thu, 24 Apr 2014 17:34:35 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On Thu, Apr 24, 2014 at 5:11 PM, Jeff Layton jlay...@redhat.com wrote: On Thu, 24 Apr 2014 17:07:11 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Jeff, Please let us know what changed from one version to the next. I've no idea without looking into the patch itself whether this inorporates my comments on the v2 patch... Cheers, Michael It's a little hard to do that specifically since much of what changed were trivial s/a/an/ type changes. But basically I just incorporated your changes and fixed a few other places. I think you missed my point. At the moment, there is no feedback loop. Either reply to my mail or just add a note such as: Ok, fair enough. I'll do that from here on out if there are other revisions needed. * Incorporated changes after Michael Kerrisk's comments on v2 patch In the next patch revision mail. Since this is already so close, what might be best is to just ask the glibc folks to merge what we have, and then if there are other changes that are needed later, we can base patches on top of it? I don't think I'll have more comments. The text looks pretty close to done to me. Good work! Thanks for your patience in reviewing! Carlos, can you help shepherd this into glibc tree? Are there any other changes that need to be made before it can be merged? Thanks, Jeff Cheers, Michael On Thu, Apr 24, 2014 at 4:09 PM, Jeff Layton jlay...@redhat.com wrote: Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated
Re: [GIT PULL] please pull file locking changes for v3.15 (pile #2)
On Fri, 25 Apr 2014 05:57:39 -0400 J. Bruce Fields bfie...@fieldses.org wrote: On Thu, Apr 24, 2014 at 07:45:57AM -0400, Jeff Layton wrote: The following changes since commit 55101e2d6ce1c780f6ee8fee5f37306971aac6cd: Merge git://git.kernel.org/pub/scm/virt/kvm/kvm (2014-04-14 16:21:28 -0700) are available in the git repository at: git://git.samba.org/jlayton/linux.git tags/locks-v3.15-2 for you to fetch changes up to cff2fce58b2b0f59089e7edcdc38803d65057b9f: locks: rename FL_FILE_PVT and IS_FILE_PVT to use *_OFDLCK instead (2014-04-23 16:17:03 -0400) File locking related bugfixes for v3.15 (pile #2) - fix for a long-standing bug in __break_lease that can cause soft lockups - renaming of file-private locks to open file description locks, and the command macros to more visually distinct names. The fix for __break_lease is also in the pile of patches for which Bruce sent a pull request, but I assume that your merge procedure will handle that correctly. For the other patches, I don't like the fact that we need to rename this stuff at this late stage, but it should be settled now (hopefully). Jeff Layton (3): locks: allow __break_lease to sleep even when break_time is 0 locks: rename file-private locks to open file description locks Nit: they're descriptors, not descriptions. But since that only affects comments and changelogs (not even the macros names, which are using OFD), that's a very nitty nit --b. No, they are _descriptions_. The descriptor is the number you get back from doing something like an open(). The description is the thing that the descriptor points to. This is the terminology that POSIX uses, and is what was recommended by the glibc and manpages maintainers. The glibc doc patches lay this out in detail. I'm not terribly thrilled with the name either, fwiw, but it is better than file-private. locks: rename FL_FILE_PVT and IS_FILE_PVT to use *_OFDLCK instead arch/arm/kernel/sys_oabi-compat.c | 6 +++--- fs/compat.c | 14 +++--- fs/fcntl.c| 12 ++-- fs/locks.c| 55 +++ include/linux/fs.h| 2 +- include/uapi/asm-generic/fcntl.h | 20 ++-- security/selinux/hooks.c | 6 +++--- 7 files changed, 57 insertions(+), 58 deletions(-) -- Jeff Layton jlay...@redhat.com -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] locks: rename file-private locks to file-description locks
File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as file-description locks. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type FDLOCK The rest of the renaming can wait until v3.16, since everything else isn't visible outside of the kernel. Cc: Michael Kerrisk mtk.manpa...@gmail.com Signed-off-by: Jeff Layton jlay...@redhat.com --- arch/arm/kernel/sys_oabi-compat.c | 6 +++--- fs/compat.c | 14 +++--- fs/fcntl.c| 12 ++-- fs/locks.c| 14 +++--- include/uapi/asm-generic/fcntl.h | 20 ++-- security/selinux/hooks.c | 6 +++--- 6 files changed, 36 insertions(+), 36 deletions(-) diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c index 702bd329d9d0..d92aa1277e40 100644 --- a/arch/arm/kernel/sys_oabi-compat.c +++ b/arch/arm/kernel/sys_oabi-compat.c @@ -203,9 +203,9 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd, int ret; switch (cmd) { - case F_GETLKP: - case F_SETLKP: - case F_SETLKPW: + case F_FD_GETLK: + case F_FD_SETLK: + case F_FD_SETLKW: case F_GETLK64: case F_SETLK64: case F_SETLKW64: diff --git a/fs/compat.c b/fs/compat.c index ca926ad0430c..4933d5b32ace 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -457,9 +457,9 @@ COMPAT_SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, case F_GETLK64: case F_SETLK64: case F_SETLKW64: - case F_GETLKP: - case F_SETLKP: - case F_SETLKPW: + case F_FD_GETLK: + case F_FD_SETLK: + case F_FD_SETLKW: ret = get_compat_flock64(f, compat_ptr(arg)); if (ret != 0) break; @@ -468,7 +468,7 @@ COMPAT_SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, conv_cmd = convert_fcntl_cmd(cmd); ret = sys_fcntl(fd, conv_cmd, (unsigned long)f); set_fs(old_fs); - if ((conv_cmd == F_GETLK || conv_cmd == F_GETLKP) ret == 0) { + if ((conv_cmd == F_GETLK || conv_cmd == F_FD_GETLK) ret == 0) { /* need to return lock information - see above for commentary */ if (f.l_start COMPAT_LOFF_T_MAX) ret = -EOVERFLOW; @@ -493,9 +493,9 @@ COMPAT_SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, case F_GETLK64: case F_SETLK64: case F_SETLKW64: - case F_GETLKP: - case F_SETLKP: - case F_SETLKPW: + case F_FD_GETLK: + case F_FD_SETLK: + case F_FD_SETLKW: return -EINVAL; } return compat_sys_fcntl64(fd, cmd, arg); diff --git a/fs/fcntl.c b/fs/fcntl.c index 9ead1596399a..a62d9ba3874b 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -274,15 +274,15 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, break; #if BITS_PER_LONG != 32 /* 32-bit arches must use fcntl64() */ - case F_GETLKP: + case F_FD_GETLK: #endif case F_GETLK: err = fcntl_getlk(filp, cmd, (struct flock __user *) arg); break; #if BITS_PER_LONG != 32 /* 32-bit arches must use fcntl64() */ - case F_SETLKP: - case F_SETLKPW: + case F_FD_SETLK: + case F_FD_SETLKW: #endif /* Fallthrough */ case F_SETLK: @@ -399,13 +399,13 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, switch (cmd) { case F_GETLK64: - case F_GETLKP: + case F_FD_GETLK: err = fcntl_getlk64(f.file, cmd, (struct flock64 __user *) arg); break; case F_SETLK64: case F_SETLKW64: - case F_SETLKP: - case F_SETLKPW: + case F_FD_SETLK: + case F_FD_SETLKW: err = fcntl_setlk64(fd, f.file, cmd, (struct flock64 __user *) arg); break; diff --git a/fs/locks.c b/fs/locks.c index b380f5543614..bfe5df7aa685 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1941,7 +1941,7 @@ int
Re: [PATCH] locks: rename file-private locks to file-description locks
On Mon, 21 Apr 2014 12:10:04 -0400 Rich Felker dal...@libc.org wrote: On Mon, Apr 21, 2014 at 04:23:54PM +0200, Michael Kerrisk (man-pages) wrote: On 04/21/2014 04:02 PM, Rich Felker wrote: On Mon, Apr 21, 2014 at 09:45:35AM -0400, Jeff Layton wrote: File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as file-description locks. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type FDLOCK The rest of the renaming can wait until v3.16, since everything else isn't visible outside of the kernel. I'm sorry I didn't chime in on this earlier, but I really prefer the (somewhat bad) current naming (private) to the ridiculously-confusing use of FD to mean file descriptION when everybody is used to it meaning file descriptOR. The potential for confusion that these are file descriptOR locks (they're not) is much more of a problem, IMO, than the confusion about what private means (since it doesn't have an established meaning in this context. Thus my vote is for leaving things the way the kernel did it already. There's at least two problems to solve here: 1) File private locks is _meaningless_ as a term. Elsewhere That's the benefit of it: it doesn't clash with any already-established meaning. I agree it's less than ideal, but all the alternatives I've seen so far are worse. (http://thread.gmane.org/gmane.network.samba.internals/76414/focus=1685376), I suggested various alternatives. File-handle locks [*] was my This is also bad. Handle also has a defined meaning in POSIX. See XSH 2.5.1: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html Not to mention that filehandle has a different meaning altogether in NFS parlance. I think we should avoid handle altogether in the name. initial preference, and I also suggested file-description locks and noted the drawbacks of that term. I think it's insufficient to say stick with the existing poor name--if you have something better, then please propose it. (Note by the way that for nearly a decade now, the open(2) man page has followed POSIX in using the term open file description. Full disclosure: of course, I'm responsible for that change in the man page.) I'm well aware of that. The problem is that the proposed API is using the two-letter abbreviation FD, which ALWAYS means file descriptor and NEVER means file description (in existing usage) to mean file description. That's what's wrong. Fair enough. Assuming we kept file-description locks as a name, what would you propose as new macro names? 2) The new API constants (F_SETLKP, F_SETLKPW, F_GETLKP) have names that are visually very close to the traditional POSIX lock names (F_SETLK, F_SETLKW, F_GETLK). That's an accident waiting to happen when someone mistypes in code and/or misses such a misttyping when reading code. That really must be fixed. I agree, but I don't think making it worse is a solution. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] locks: rename file-private locks to file-description locks
On Mon, 21 Apr 2014 09:09:27 -0700 Christoph Hellwig h...@infradead.org wrote: On Mon, Apr 21, 2014 at 04:23:54PM +0200, Michael Kerrisk (man-pages) wrote: There's at least two problems to solve here: 1) File private locks is _meaningless_ as a term. Elsewhere (http://thread.gmane.org/gmane.network.samba.internals/76414/focus=1685376), It's indeed not a very good choice, but the new name is even worse. Just call them non-broken locks? :) Or not give them a name an just append a 2 to the fcntls? :) I think we'll need to give them a name, if only to make it possible to document this stuff. I'm in Jeremy's camp on this one. I don't really care what that name *is*. I just need to know what it is so I can finish up the docs and make any changes to the interface that are necessary. 2) The new API constants (F_SETLKP, F_SETLKPW, F_GETLKP) have names that are visually very close to the traditional POSIX lock names (F_SETLK, F_SETLKW, F_GETLK). That's an accident waiting to happen when someone mistypes in code and/or misses such a misttyping when reading code. That really must be fixed. I don't think so. They also should have a name very similar because they have the same semantics with a major bug fixed. In fact I can't think of anyone who would actually want the old behavior. On this point, I agree with Michael. It would be easy to mix up the names when scanning by eye, so I think there is some value in making these more visually distinct. I rather like the idea of changing F_SETLKP to F_*_SETLK. The question is what to put in place of the wildcard there, and that sort of hinges on the name... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] locks: rename file-private locks to file-description locks
On Mon, 21 Apr 2014 20:18:50 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Jeff, On 04/21/2014 06:45 PM, Jeff Layton wrote: On Mon, 21 Apr 2014 12:10:04 -0400 Rich Felker dal...@libc.org wrote: On Mon, Apr 21, 2014 at 04:23:54PM +0200, Michael Kerrisk (man-pages) wrote: On 04/21/2014 04:02 PM, Rich Felker wrote: On Mon, Apr 21, 2014 at 09:45:35AM -0400, Jeff Layton wrote: [...] initial preference, and I also suggested file-description locks and noted the drawbacks of that term. I think it's insufficient to say stick with the existing poor name--if you have something better, then please propose it. (Note by the way that for nearly a decade now, the open(2) man page has followed POSIX in using the term open file description. Full disclosure: of course, I'm responsible for that change in the man page.) I'm well aware of that. The problem is that the proposed API is using the two-letter abbreviation FD, which ALWAYS means file descriptor and NEVER means file description (in existing usage) to mean file description. That's what's wrong. Fair enough. Assuming we kept file-description locks as a name, what would you propose as new macro names? I assume you meant, assume we kept the term 'file-private locks'... In that case, at least make the constants something like F_FP_SETLK F_FP_SETLKW F_FP_GETLK so that they are not confused with the traditional constants. Cheer, Actually no, I was asking how Rich would name the constants if we use the name file-description locks (as per the patch I posted this morning), since his objection was the use if *_FD_* names. I would assume that if we stick with file-private locks as the name, then we'll still change the constants to a form like *_FP_*. Also, to be clear...Frank is correct that the name file-private came from allowing the locks to be private to a particular open file description. Though I agree that it's a crappy name at best... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] locks: rename file-private locks to file-description locks
On Mon, 21 Apr 2014 14:48:29 -0400 Rich Felker dal...@libc.org wrote: On Mon, Apr 21, 2014 at 02:32:38PM -0400, Jeff Layton wrote: Fair enough. Assuming we kept file-description locks as a name, what would you propose as new macro names? I assume you meant, assume we kept the term 'file-private locks'... In that case, at least make the constants something like F_FP_SETLK F_FP_SETLKW F_FP_GETLK so that they are not confused with the traditional constants. Cheer, Actually no, I was asking how Rich would name the constants if we use the name file-description locks (as per the patch I posted this morning), since his objection was the use if *_FD_* names. I would assume that if we stick with file-private locks as the name, then we'll still change the constants to a form like *_FP_*. Also, to be clear...Frank is correct that the name file-private came from allowing the locks to be private to a particular open file description. Though I agree that it's a crappy name at best... As I mentioned in a reply to Michael just now, I think FP is bad because the whole problem is that legacy fcntl locks are associated with the underlying file rather than the open file description (open instance). So open-private (OP) might be a better choice than file-private. Rich Is open-private or open-file-private really any better than file-private ? They're all names that only a mother could love and I'm not sure any of them are really any clearer than the others. Also: pedantic Legacy fcntl locks are associated with the _process_ and not the underlying file, per-se. /pedantic -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] locks: rename file-private locks to file-description locks
On Mon, 21 Apr 2014 21:39:12 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: On 04/21/2014 08:46 PM, Rich Felker wrote: On Mon, Apr 21, 2014 at 08:32:44PM +0200, Michael Kerrisk (man-pages) wrote: On 04/21/2014 06:10 PM, Rich Felker wrote: I'm well aware of that. The problem is that the proposed API is using the two-letter abbreviation FD, which ALWAYS means file descriptor and NEVER means file description (in existing usage) to mean file description. That's what's wrong. So, can you *please* answer this question: what do you call (i.e., what everyday technical language term do use for) the thing that sits between a file descriptor and an i-node? (Please don't say 'struct file' -- that is not is an implementation detail, and does not qualify as the kind of term that I could use when documenting this feature in man pages.) Open file description. Oh! I didn't realize we agreed :-). POSIX uses (or invented, I am not sure which) the term file description for a good reason: it is unambiguous, and therefore precise. I do agree that there's a risk of confusion between 'open file descriptor and 'and file description'--it's the same kind of risk as between English terms such as 'arbitrator' and 'arbitration' (and any number of other examples), and as language speakers we deal with this every day. There's not a problem when the full word is used. On the other hand, if you use arb as an abbreviation for arbitration in a context where it was already universally understood as meaning arbitrator, that would be a big problem. Likewise the problem here isn't that open file description is a bad term. It's that using FD to mean [open] file description is utterly confusing, even moreso than just making up a new completely random word. Ohh -- I had thought you a problem not just with FD but also (open) file description. 2) The new API constants (F_SETLKP, F_SETLKPW, F_GETLKP) have names that are visually very close to the traditional POSIX lock names (F_SETLK, F_SETLKW, F_GETLK). That's an accident waiting to happen when someone mistypes in code and/or misses such a misttyping when reading code. That really must be fixed. I agree, but I don't think making it worse is a solution. I don't agree that it's making it worse. The real problem here is that people use no good unambiguous term for the thing between a file descriptor and an inode. POSIX provides us with a solution that may not seem perfect, but it is unambiguous, and I think it might feel more comfortable if we used it often enough. I would like to see it used more too, and in particular, I think it belongs in the documentation for these new locking interfaces. But that still doesn't answer the question of what to call them (the macros) unless you want: F_OPEN_FILE_DESCRIPTION_GETLK F_OPEN_FILE_DESCRIPTION_SETLK F_OPEN_FILE_DESCRIPTION_SETLKW Or just 'F_OFD_*'? Perhaps OP (for open-private, i.e. private to the particular open) would be a sensible choice; OTOH people are likely to misread it as OPeration. The general principle I have in mind though is that it might be nice to highlight the word open in open file description (Fair enough.) since it (1) contrasts with file descriptor, despite file descriptors also dealing with open files, and (2) contrasts well with legacy fcntl locks, which are (this is the whole bug) associated with the underlying file and not the open file description. Makes sense to me. (We are in more agreement that I realized.) Cheers, Michael So the motion is to call them open file description locks and change the macros to read *_OFD_*. Does anyone object? -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] locks: rename file-private locks to open file description locks
File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as open file description locks. The name isn't a big deal for the kernel, but the command macros are not visually distinct enough from the traditional POSIX lock macros. The glibc and documentation folks are recommending that we change them to look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a programmer will typo one of the commands wrong, and also makes it easier to spot this difference when reading code. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type OFDLCK The rest of the renaming can wait until v3.16, since everything else isn't visible outside of the kernel. Cc: Michael Kerrisk mtk.manpa...@gmail.com Cc: Christoph Hellwig h...@infradead.org Cc: Carlos O'Donell car...@redhat.com Cc: Stefan Metzmacher me...@samba.org Cc: Andy Lutomirski l...@amacapital.net Cc: Frank Filz ffilz...@mindspring.com Cc: Theodore Ts'o ty...@mit.edu Signed-off-by: Jeff Layton jlay...@redhat.com --- arch/arm/kernel/sys_oabi-compat.c | 6 +++--- fs/compat.c | 14 +++--- fs/fcntl.c| 12 ++-- fs/locks.c| 14 +++--- include/uapi/asm-generic/fcntl.h | 20 ++-- security/selinux/hooks.c | 6 +++--- 6 files changed, 36 insertions(+), 36 deletions(-) diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c index 702bd329d9d0..e90a3148f385 100644 --- a/arch/arm/kernel/sys_oabi-compat.c +++ b/arch/arm/kernel/sys_oabi-compat.c @@ -203,9 +203,9 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd, int ret; switch (cmd) { - case F_GETLKP: - case F_SETLKP: - case F_SETLKPW: + case F_OFD_GETLK: + case F_OFD_SETLK: + case F_OFD_SETLKW: case F_GETLK64: case F_SETLK64: case F_SETLKW64: diff --git a/fs/compat.c b/fs/compat.c index ca926ad0430c..66d3d3c6b4b2 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -457,9 +457,9 @@ COMPAT_SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, case F_GETLK64: case F_SETLK64: case F_SETLKW64: - case F_GETLKP: - case F_SETLKP: - case F_SETLKPW: + case F_OFD_GETLK: + case F_OFD_SETLK: + case F_OFD_SETLKW: ret = get_compat_flock64(f, compat_ptr(arg)); if (ret != 0) break; @@ -468,7 +468,7 @@ COMPAT_SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, conv_cmd = convert_fcntl_cmd(cmd); ret = sys_fcntl(fd, conv_cmd, (unsigned long)f); set_fs(old_fs); - if ((conv_cmd == F_GETLK || conv_cmd == F_GETLKP) ret == 0) { + if ((conv_cmd == F_GETLK || conv_cmd == F_OFD_GETLK) ret == 0) { /* need to return lock information - see above for commentary */ if (f.l_start COMPAT_LOFF_T_MAX) ret = -EOVERFLOW; @@ -493,9 +493,9 @@ COMPAT_SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, case F_GETLK64: case F_SETLK64: case F_SETLKW64: - case F_GETLKP: - case F_SETLKP: - case F_SETLKPW: + case F_OFD_GETLK: + case F_OFD_SETLK: + case F_OFD_SETLKW: return -EINVAL; } return compat_sys_fcntl64(fd, cmd, arg); diff --git a/fs/fcntl.c b/fs/fcntl.c index 9ead1596399a..72c82f69b01b 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -274,15 +274,15 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, break; #if BITS_PER_LONG != 32 /* 32-bit arches must use fcntl64() */ - case F_GETLKP: + case F_OFD_GETLK: #endif case F_GETLK: err = fcntl_getlk(filp, cmd, (struct flock __user *) arg); break; #if BITS_PER_LONG != 32 /* 32-bit arches must use fcntl64() */ - case F_SETLKP: - case F_SETLKPW: + case F_OFD_SETLK: + case F_OFD_SETLKW: #endif /* Fallthrough */ case F_SETLK: @@ -399,13 +399,13 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int
Re: [PATCH] locks: rename file-private locks to open file description locks
On Tue, 22 Apr 2014 17:45:31 +0300 Boaz Harrosh open...@gmail.com wrote: On 04/22/2014 03:23 PM, Jeff Layton wrote: We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as open file description locks. I completely agree with the rename. (Though could you please post the rest of the rename patches for review) Just a very small nit. My native language is not English but I would rather you use file-descriptor (with an '-' as well) and not use description in the English name of the lock. This is because stated like that, description might refer to the locks and not to the file in the sentence. file-descriptor is more clear I think. (For me it was confusing at first before I realized what you meant) Just my $0.017 Thanks There's a big difference between the descriptor and the description. The numerical value you get back from something like open() is a file descriptor. The thing that that value points to internally in the kernel is the file description. It's very important that we do not conflate the two here as these locks are associated with the file description and not the file descriptor. The best way to illustrate this is the interaction with dup() -- see the LWN article on these for a complete overview. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][glibc PATCH] fcntl-linux.h: add new definitions and manual updates for open file description locks
(I haven't yet sent the patch to rename these to Linus, but here's a new draft version of the glibc header/manual updates for open file description locks. Please speak up if you see any issues or object to the new name.) Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-23 Jeff Layton jlay...@redhat.com [BZ#16839] * manual/llio.texi: add section about open file description locks * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 75 + manual/llio.texi | 237 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 +++ 3 files changed, 326 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..544001769272 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,75 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start(void *arg) +{ + int i, fd, len; + long tid = (long)arg; + char buf[256]; + struct flock lck = { + .l_whence = SEEK_SET, + .l_start= 0, + .l_len = 1, + }; + + fd = open(/tmp/foo, O_RDWR|O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) { + lck.l_type = F_WRLCK; + fcntl(fd, F_OFD_SETLKW, lck); + + len = sprintf(buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek(fd, 0, SEEK_END); + write(fd, buf, len); + fsync(fd); + + lck.l_type = F_UNLCK; + fcntl(fd, F_OFD_SETLK, lck); + + usleep(1); + } + pthread_exit(NULL); +} + +int +main(int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate(FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) + pthread_create(threads[i], NULL, thread_start, (void *)i); + + pthread_exit(NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..b5c2d07ad929 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test a open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear a file lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but wait for completion. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO
Re: [RFC][glibc PATCH] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Wed, 23 Apr 2014 11:09:51 -0400 a...@gnu.org (Alfred M. Szmidt) wrote: @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. F_GETLK does get the (first) lock which blocks; it doesn't test for it. Retrieves information about the first blocking lock ... or something might be better than the original @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test a open file description lock. @xref{Open File Description Locks}. +Specific to Linux. Likewise. You infact write that it does get the lock information later in the document wrt. F_OFD_GETLK. Sorry, I disagree here...GETLK is really a misnomer, IMO. TESTLK would have been a better name. It's a way to test whether a particular lock can be applied, and to return information about a conflicting lock if it can't. If, for instance there is no conflicting lock, then you don't get any lock information back (l_type just gets reset to F_UNLCK). -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] fcntl.2: update manpage with verbiage about file-private locks
On Tue, 15 Apr 2014 22:08:50 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: Hello Jeff, On Thu, Jan 9, 2014 at 3:23 PM, Jeff Layton jlay...@redhat.com wrote: Please don't merge this yet, as the kernel patches are still a work in progress... Now that this has hit mainline, is this man page patch still current? Cheers, Michael No, it needs a bit of a revision. I'm sorting through the glibc patches now, and will plan to send a respin of this once that's complete. Thanks! Signed-off-by: Jeff Layton jlay...@redhat.com --- man2/fcntl.2 | 97 ++-- 1 file changed, 94 insertions(+), 3 deletions(-) diff --git a/man2/fcntl.2 b/man2/fcntl.2 index 72dcd7b..74c67b6 100644 --- a/man2/fcntl.2 +++ b/man2/fcntl.2 @@ -208,7 +208,7 @@ struct flock { off_t l_start; /* Starting offset for lock */ off_t l_len; /* Number of bytes to lock */ pid_t l_pid; /* PID of process blocking our lock -(F_GETLK only) */ +(F_GETLK and F_GETLKP only) */ ... }; .fi @@ -344,9 +344,13 @@ returns details about one of these locks in the .IR l_type , l_whence , l_start , and l_len fields of .I lock -and sets +. +If the conflicting lock is a traditional POSIX lock, then the .I l_pid -to be the PID of the process holding that lock. +will be set to the PID of the process holding that lock. If the +conflicting lock is a file-private lock, then the +.I l_pid +will be set to -1. .P In order to place a read lock, .I fd @@ -386,6 +390,93 @@ should be avoided; use and .BR write (2) instead. +.SS File-private locking +(Currently non-POSIX, but being proposed) +.PP +.BR F_GETLKP , F_SETLKP and F_SETLKPW +are used to acquire, release and test file-private record locks. These +are byte-range locks that work identically to the traditional advisory +record locks described above, but are associated with the open file on +which they were acquired rather than the process, much like locks +acquired with +.BR flock (2) +. +.PP +Unlike traditional advisory record locks, these locks are inherited +across +.BR fork (2) , dup (2) and dup2 (2) +and are only released on the last close of the open file instead of being +released on any close of the file. +.PP +File-private locks always conflict with traditional record locks, even +when they are acquired by the same process on the same file descriptor. +They only conflict with each other when they are acquired on different +open file descriptors. +.TP +.BR F_SETLKP (\fIstruct flock *\fP) +Acquire a lock (when +.I l_type +is +.B F_RDLCK +or +.BR F_WRLCK ) +or release a lock (when +.I l_type +is +.BR F_UNLCK ) +on the bytes specified by the +.IR l_whence , l_start , and l_len +fields of +.IR lock . +If a conflicting lock is held by another process, +this call returns \-1 and sets +.I errno +to +.B EACCES +or +.BR EAGAIN . +.TP +.BR F_SETLKPW (\fIstruct flock *\fP) +As for +.BR F_SETLKP , +but if a conflicting lock is held on the file, then wait for that +lock to be released. +If a signal is caught while waiting, then the call is interrupted +and (after the signal handler has returned) +returns immediately (with return value \-1 and +.I errno +set to +.BR EINTR ; +see +.BR signal (7)). +.TP +.BR F_GETLKP (\fIstruct flock *\fP) +On input to this call, +.I lock +describes a lock we would like to place on the file. +If the lock could be placed, +.BR fcntl () +does not actually place it, but returns +.B F_UNLCK +in the +.I l_type +field of +.I lock +and leaves the other fields of the structure unchanged. +If one or more incompatible locks would prevent +this lock being placed, then +.BR fcntl () +returns details about one of these locks in the +.IR l_type , l_whence , l_start , and l_len +fields of +.I lock +. +If the conflicting lock is a traditional POSIX lock, then the +.I l_pid +will be set to the PID of the process holding that lock. If the +conflicting lock is a file-private lock, then the +.I l_pid +will be set to -1. .SS Mandatory locking (Non-POSIX.) The above record locks may be either advisory or mandatory, -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-man in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC 00/19] Support loop-back NFS mounts
holding sk_lock FS: set PF_FSTRANS while holding mmap_sem in exec.c NET: set PF_FSTRANS while holding rtnl_lock MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock. driver core: set PF_FSTRANS while holding gdp_mutex nfsd: set PF_FSTRANS when client_mutex is held. VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc. VFS: set PF_FSTRANS while namespace_sem is held. nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc. XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks drivers/base/core.c |3 ++ drivers/base/power/runtime.c|6 ++--- drivers/block/nbd.c |6 ++--- drivers/md/dm-bufio.c |6 ++--- drivers/md/dm-ioctl.c |6 ++--- drivers/mtd/nand/nandsim.c | 28 ++--- drivers/scsi/iscsi_tcp.c|6 ++--- drivers/usb/core/hub.c |6 ++--- fs/dcache.c |4 ++- fs/exec.c |6 + fs/fs-writeback.c |5 ++-- fs/namespace.c |4 +++ fs/nfs/file.c |3 +- fs/nfsd/nfs4callback.c |5 fs/nfsd/nfs4state.c |3 ++ fs/nfsd/nfssvc.c| 24 ++ fs/nfsd/vfs.c |6 + fs/xfs/kmem.h |2 -- fs/xfs/xfs_aops.c |7 - fs/xfs/xfs_bmap_util.c |4 +++ fs/xfs/xfs_file.c | 12 + fs/xfs/xfs_linux.h |7 - include/linux/lockdep.h |8 +++--- include/linux/sched.h | 32 +--- include/linux/sunrpc/svc.h |2 ++ include/linux/sunrpc/svc_xprt.h |1 + include/net/sock.h |1 + kernel/locking/lockdep.c| 51 --- kernel/softirq.c|6 ++--- mm/migrate.c|9 +++ mm/page-writeback.c |3 ++ mm/page_alloc.c | 18 -- mm/percpu.c |4 +++ mm/slab.c |2 ++ mm/slob.c |2 ++ mm/slub.c |1 + mm/vmscan.c | 31 +++- net/core/dev.c |6 ++--- net/core/rtnetlink.c|9 ++- net/core/sock.c |8 -- net/sunrpc/sched.c |5 ++-- net/sunrpc/svc.c|6 + net/sunrpc/svcsock.c| 10 net/sunrpc/xprtrdma/transport.c |5 ++-- net/sunrpc/xprtsock.c | 17 - 45 files changed, 247 insertions(+), 149 deletions(-) -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/19] SUNRPC: track whether a request is coming from a loop-back interface.
On Wed, 16 Apr 2014 14:03:36 +1000 NeilBrown ne...@suse.de wrote: If an incoming NFS request is coming from the local host, then nfsd will need to perform some special handling. So detect that possibility and make the source visible in rq_local. Signed-off-by: NeilBrown ne...@suse.de --- include/linux/sunrpc/svc.h |1 + include/linux/sunrpc/svc_xprt.h |1 + net/sunrpc/svcsock.c| 10 ++ 3 files changed, 12 insertions(+) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 04e763221246..a0dbbd1e00e9 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -254,6 +254,7 @@ struct svc_rqst { u32 rq_prot;/* IP protocol */ unsigned short rq_secure : 1; /* secure port */ + unsigned short rq_local : 1; /* local request */ void * rq_argp;/* decoded arguments */ void * rq_resp;/* xdr'd results */ diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index b05963f09ebf..b99bdfb0fcf9 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -63,6 +63,7 @@ struct svc_xprt { #define XPT_DETACHED10 /* detached from tempsocks list */ #define XPT_LISTENER 11 /* listening endpoint */ #define XPT_CACHE_AUTH 12 /* cache auth info */ +#define XPT_LOCAL13 /* connection from loopback interface */ struct svc_serv *xpt_server;/* service for transport */ atomic_txpt_reserved; /* space on outq that is rsvd */ diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index b6e59f0a9475..193115fe968c 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -811,6 +811,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt) struct socket *newsock; struct svc_sock *newsvsk; int err, slen; + struct dst_entry *dst; RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]); dprintk(svc: tcp_accept %p sock %p\n, svsk, sock); @@ -867,6 +868,14 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt *xprt) } svc_xprt_set_local(newsvsk-sk_xprt, sin, slen); + clear_bit(XPT_LOCAL, newsvsk-sk_xprt.xpt_flags); + rcu_read_lock(); + dst = rcu_dereference(newsock-sk-sk_dst_cache); + if (dst dst-dev + (dst-dev-features NETIF_F_LOOPBACK)) + set_bit(XPT_LOCAL, newsvsk-sk_xprt.xpt_flags); + rcu_read_unlock(); + In the use-case you describe, the client is unlikely to have mounted localhost, but is more likely to be mounting a floating address in the cluster. Will this flag end up being set in such a situation? It looks like NETIF_F_LOOPBACK isn't set on all adapters -- mostly on lo and a few others that support MAC-LOOPBACK. if (serv-sv_stats) serv-sv_stats-nettcpconn++; @@ -1112,6 +1121,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) rqstp-rq_xprt_ctxt = NULL; rqstp-rq_prot= IPPROTO_TCP; + rqstp-rq_local = !!test_bit(XPT_LOCAL, svsk-sk_xprt.xpt_flags); p = (__be32 *)rqstp-rq_arg.head[0].iov_base; calldir = p[1]; -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
should we change the name/macros of file-private locks?
Sorry to spam so many lists, but I think this needs widespread distribution and consensus. File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. They do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. Michael Kerrisk suggested several names but I think the only one that doesn't have other issues is file-associated locks, which can be distinguished against process-associated locks (aka classic POSIX locks). At the same time, he suggested that we rename the command macros since the 'P' suffix would no longer be relevant. He suggested something like this: F_FA_GETLK F_FA_SETLK F_FA_SETLKW That would also make them more visually distinguishable from the classic F_GETLK/F_SETLK/F_SETLKW commands. I like that change in particular, as the original macros names would be easy to typo. I think we'd also need to rename how these are reported in /proc/locks. Currently they have a type label of FLPVT. I'd suggest that we change that to FASSOC. For v3.15, this is the only part we'd absolutely have to change before it ships. The rest I can fix up in v3.16. Does this sound like a reasonable set of changes to make? Does anyone else have a better set of names they can suggest? Speak now, or forever hold your peace! -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: should we change the name/macros of file-private locks?
On Thu, 17 Apr 2014 00:42:13 +0200 Stefan (metze) Metzmacher me...@samba.org wrote: Am 16.04.2014 22:00, schrieb Michael Kerrisk (man-pages): [CC += Jeremy Allison] On Wed, Apr 16, 2014 at 8:57 PM, Jeff Layton jlay...@redhat.com wrote: Sorry to spam so many lists, but I think this needs widespread distribution and consensus. File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. They do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. So, to add my perspective: The existing byte-range locking system has persisted (despite egregious faults) for well over two decades. One supposes that Jeff's new improved version might be around at least as long. With that in mind, and before setting in stone (and pushing into POSIX) a model of thinking that thousands of programmers will live with for a long time, it's worth thinking about names. Michael Kerrisk suggested several names but I think the only one that doesn't have other issues is file-associated locks, which can be distinguished against process-associated locks (aka classic POSIX locks). The names I have suggested are: file-associated locks or file-handle locks or (using POSIX terminology) file-description locks I'd use file-handle, file-description or at least something that's not associated to the file itself. file-handle-associated or file-description-associated would also work. Yeah, I understand your point. I'm not keen on using file-handle as file handles have a completely different meaning in the context of something like NFS. file-description-associated is rather a mouthful. You Germans might go for that sort of thing, but it feels awkward to us knuckle-draggers that primarily speak English. Maybe we should just go with one of Michael's earlier suggestions -- file-description locks and change the macros to F_FD_*. In the docs we could take pains to point out that these are file-_description_ locks and not file-_descriptor_ locks, and outline why that is so (which is something I'm trying to make crystal clear in the docs anyway). Does anyone object to that? but that last might be a bit confusing to people who are not standards-aware. (The POSIX standard defines the thing that a file descriptor refers to as a file description; other people often call that thing a file handle or an open file table entry or a struct file. The POSIX term is precise and unambiguous, but, unfortunately, the term is not common outside the standard and is also easily mistaken for file descriptor.) At the same time, he suggested that we rename the command macros since the 'P' suffix would no longer be relevant. He suggested something like this: F_FA_GETLK F_FA_SETLK F_FA_SETLKW With file-description-associated this could be F_FDA_* metze -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: should we change the name/macros of file-private locks?
On Thu, 17 Apr 2014 22:08:54 +0200 Michael Kerrisk (man-pages) mtk.manpa...@gmail.com wrote: In the docs we could take pains to point out that these are file-_description_ locks and not file-_descriptor_ locks, and outline why that is so (which is something I'm trying to make crystal clear in the docs anyway). Does anyone object to that? Well, I'd be silly to object, but maybe we should still allow a day for further comment? Jeff, One further point. I know the intent is to get this scheme into POSIX. Have any conversations happened about this so far on the POSIX/Austin lists? If yes, it might be worth also linking those folks into the naming discussion.. Cheers, Michael Yes, good idea. The open bug is here: http://austingroupbugs.net/view.php?id=768 I'll write something up there and see if they want to chime in on the discussion. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thoughts on credential switching
On Sun, 30 Mar 2014 09:03:29 -0400 Theodore Ts'o ty...@mit.edu wrote: On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: I had some time to think about this last night... While using a fd to pass around credentials is convenient, the danger is that it's pretty opaque. You have a fd that you know has creds attached to it, but it's hard to be certain what is going to change. I don't think that's a particularly tough problem. In general, the fd isn't something that you would want to pass around, and so the process which generated it will know exactly what it contained. I think there's a bit more of a use-case for passing around such an fd via socket... Part of the problem is that the traditional uid/gid switching glibc wrappers are per-process. If we're proposing doing something like: seteuid() setegid() setgroups() fd = open() (...and then revert the creds using same syscalls) ...during the time that you're doing all of that, you can't really allow any thread in the process to be doing something that requires _other_ creds until you've completed the above. So, I could envision a program like ganesha firing up a separate process to handle the credential switching and fd creation and then handing those back to the main process via a unix domain socket. Perhaps we can use the flags field for that. So, assuming we have a fd with the creds attached, we could do something like: err = switch_creds(fd, SC_FSUID|SC_FSGID|SC_GROUPS); ...then the switch_creds syscall could be set up to fail if the new credentials had other fields that didn't match those in the current task credentials. So if (for instance) the cred-euid were different between the two, the above could fail with -EINVAL or something. Huh? The whole *point* is that the creds value will be different, of course they won't match! I would think this would be over complicating the interface. You know that *some* of the credentials won't match, but presumably some of them (e.g. the real uid/gid) should match. Using a flags field could act as a mask of what creds are allowed to change. A couple of other things. What I would suggest is that we create a few new fd flags, to join FD_CLOEXEC: FD_NOPROCFS disallow being able to open the inode via /proc/pid/fd (but in the case of a creds fd, for bonus points, the target of the pseudo-symlink could be something like: uid: 15806 gid: 100: grps: 27, 50 to aid in debugging a userspace file server). This also answers Jeff's concern if for some reason --- I don't know how --- a process doesn't know what the contents of the creds fd that it created itself. FD_NOPASSFD disallow being able to pass the fd via a unix domain socket FD_LOCKFLAGS if this bit is set, disallow any further changes of FD_CLOEXEC, FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags. Some of the functionality requested by the folks suggesting the SEAL API would also be covered by these fd flags. In order to solve some potential race concerns, a credsfd must be created with FD_CLOEXEC and FD_NOPROCFS enabled. Why is this important even if the anon_inode is owned by root with a mode of 0? Because if the system is set up to use SELinux or full Posix capabilities, merely having the a uid of 0 is not special, and we don't want to allow a process with uid of 0 to be able modify the mode with the /proc/pid/fd/FD and then proceed to open the inode using open. This way, instead of adding special case code to prevent this from happening, we can add a more general facility which can be used to solve a few other problems. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull file locking changes for 3.15
The following changes since commit 29723adee11804b548903ddb1db666cf4a60f60e: locks: make locks_mandatory_area check for file-private locks (2014-03-31 08:24:43 -0400) are available in the git repository at: git://git.samba.org/jlayton/linux.git locks-3.15 for you to fetch changes up to 29723adee11804b548903ddb1db666cf4a60f60e: locks: make locks_mandatory_area check for file-private locks (2014-03-31 08:24:43 -0400) (from the branch description for locks-3.15 local branch) file locking patches for 3.15 Highlights: - maintainership change for fs/locks.c. Willy's not interested in maintaining it these days, and is OK with Bruce and I taking it. - fix for open vs. setlease race that Al ID'ed - cleanup and consolidation of file locking code - eliminate unneeded BUG() call - merge of file-private lock implementation -- Jeff Layton jlay...@redhat.com signature.asc Description: PGP signature
Re: Thoughts on credential switching
On Mon, 31 Mar 2014 14:06:01 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Mar 31, 2014, at 7:51, Jeff Layton jlay...@redhat.com wrote: On Sun, 30 Mar 2014 09:03:29 -0400 Theodore Ts'o ty...@mit.edu wrote: On Thu, Mar 27, 2014 at 07:08:02AM -0700, Jeff Layton wrote: I had some time to think about this last night... While using a fd to pass around credentials is convenient, the danger is that it's pretty opaque. You have a fd that you know has creds attached to it, but it's hard to be certain what is going to change. I don't think that's a particularly tough problem. In general, the fd isn't something that you would want to pass around, and so the process which generated it will know exactly what it contained. I think there's a bit more of a use-case for passing around such an fd via socket... Part of the problem is that the traditional uid/gid switching glibc wrappers are per-process. If we're proposing doing something like: seteuid() setegid() setgroups() fd = open() (...and then revert the creds using same syscalls) ...during the time that you're doing all of that, you can't really allow any thread in the process to be doing something that requires _other_ creds until you've completed the above. Umm… open() isn’t the only operation that you want to be able to do with an assumed user identity. You want mknod(), mkdir(), link(), unlink(), … Pretty much any interaction with the underlying filesystem needs to use the right identity. The proposal that Al originally had was to have userland set up the credentials that it wanted to use, and then open(/dev/null, ...) to get a fd that has those creds attached. Then, we'd add a syscall that takes such an fd and switches to the creds attached to it. So the open() in the above example isn't to do an actual operation but rather to create a fd to act as a credential handle. So, I could envision a program like ganesha firing up a separate process to handle the credential switching and fd creation and then handing those back to the main process via a unix domain socket. How about using the keyrings interface to atomically cache and retrieve these user identities? We already have support for different types of keys that store/retrieve different types of structured information. How is this so different? That's an interesting idea. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH resend #4] fcntl-linux.h: add new definitions and manual updates for open file description locks
From: Jeff Layton jlay...@poochiereds.net Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * manual/examples/ofdlocks.c: example of open file description lock usage * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but block until lock is acquired. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated
[PATCH resend #3] fcntl-linux.h: add new definitions and manual updates for open file description locks
From: Jeff Layton jlay...@poochiereds.net Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * manual/examples/ofdlocks.c: example of open file description lock usage * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but block until lock is acquired. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated
[GIT PULL] please pull file locking bugfixes for v3.16 (pile #2)
The following changes since commit 64b2d1fbbfda07765dae3f601862796a61b2c451: Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs (2014-06-09 19:11:44 -0700) are available in the git repository at: git://git.samba.org/jlayton/linux.git tags/locks-v3.16-2 for you to fetch changes up to 0c27362998a8357f199501aa401e99c51c2eb46e: locks: set fl_owner for leases back to current-files (2014-06-10 12:29:05 -0400) (from the branch description for locks-3.16 local branch) File locking related bugfixes for v3.16 (pile #2) Nothing too earth-shattering here. A fix for a potential regression due to a patch in pile #1, and the addition of a memory barrier to prevent a race condition between break_deleg and generic_add_lease. Jeff Layton (2): locks: add missing memory barrier in break_deleg locks: set fl_owner for leases back to current-files fs/locks.c | 2 +- include/linux/fs.h | 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) -- Jeff Layton jlay...@poochiereds.net signature.asc Description: PGP signature
Re: [PATCH v4] fs: umount on symlink leaks mnt count
On Mon, 21 Jul 2014 12:30:23 +0400 Vasily Averin v...@parallels.com wrote: v4: description corrected v3: patch inline Currently umount on symlink blocks following umount: /vz is separate mount # ls /vz/ -al | grep test drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink - /vz/testdir # umount -l /vz/testlink umount: /vz/testlink: not mounted (expected) # lsof /vz # umount /vz umount: /vz: device is busy. (unexpected) In this case mountpoint_last() gets an extra refcount on path-mnt Signed-off-by: Vasily Averin v...@openvz.org --- fs/namei.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 985c6f3..9eb787e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2256,9 +2256,10 @@ done: goto out; } path-dentry = dentry; - path-mnt = mntget(nd-path.mnt); + path-mnt = nd-path.mnt; if (should_follow_link(dentry, nd-flags LOOKUP_FOLLOW)) return 1; + mntget(path-mnt); follow_mount(path); error = 0; out: Looks correct, I think... Acked-by: -- Jeff Layton jlay...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] dlm: Remove unused conf from lm_grant
On Tue, 01 Jul 2014 06:20:10 -0700 Joe Perches j...@perches.com wrote: While doing a bit of adding argument names to fs.h, I looked at lm_grant and it seems the 2nd argument is always NULL. How about removing it? This doesn't apply as it depends on some other patches but it should be clear enough... ACK on the general idea from my standpoint. Anything that simplifies the file locking interfaces is a good thing, particularly the deferred locking code. --- fs/dlm/plock.c | 8 fs/lockd/svclock.c | 12 +++- include/linux/fs.h | 2 +- 3 files changed, 8 insertions(+), 14 deletions(-) diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index e59d332..e0ab3a9 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -30,7 +30,7 @@ struct plock_op { struct plock_xop { struct plock_op xop; - int (*callback)(struct file_lock *, struct file_lock *, int); + int (*callback)(struct file_lock *fl, int result); void *fl; void *file; struct file_lock flc; @@ -190,7 +190,7 @@ static int dlm_plock_callback(struct plock_op *op) struct file *file; struct file_lock *fl; struct file_lock *flc; - int (*notify)(struct file_lock *fl, struct file_lock *cont, int result) = NULL; + int (*notify)(struct file_lock *fl, int result) = NULL; struct plock_xop *xop = (struct plock_xop *)op; int rv = 0; @@ -209,7 +209,7 @@ static int dlm_plock_callback(struct plock_op *op) notify = xop-callback; if (op-info.rv) { - notify(fl, NULL, op-info.rv); + notify(fl, op-info.rv); goto out; } @@ -228,7 +228,7 @@ static int dlm_plock_callback(struct plock_op *op) (unsigned long long)op-info.number, file, fl); } - rv = notify(fl, NULL, 0); + rv = notify(fl, 0); if (rv) { /* XXX: We need to cancel the fs lock here: */ log_print(dlm_plock_callback: lock granted after lock request diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index ab798a8..2a61701 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -667,22 +667,16 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l * deferred rpc for GETLK and SETLK. */ static void -nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf, - int result) +nlmsvc_update_deferred_block(struct nlm_block *block, int result) { block-b_flags |= B_GOT_CALLBACK; if (result == 0) block-b_granted = 1; else block-b_flags |= B_TIMED_OUT; - if (conf) { - if (block-b_fl) - __locks_copy_lock(block-b_fl, conf); - } } -static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf, - int result) +static int nlmsvc_grant_deferred(struct file_lock *fl, int result) { struct nlm_block *block; int rc = -ENOENT; @@ -697,7 +691,7 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf, rc = -ENOLCK; break; } - nlmsvc_update_deferred_block(block, conf, result); + nlmsvc_update_deferred_block(block, result); } else if (result == 0) block-b_granted = 1; diff --git a/include/linux/fs.h b/include/linux/fs.h index 36b8648..6150125 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -842,7 +842,7 @@ struct lock_manager_operations { int (*lm_compare_owner)(struct file_lock *fl1, struct file_lock *fl2); unsigned long (*lm_owner_key)(struct file_lock *fl); void (*lm_notify)(struct file_lock *fl);/* unblock callback */ - int (*lm_grant)(struct file_lock *fl, struct file_lock *conf, int result); + int (*lm_grant)(struct file_lock *fl, int result); void (*lm_break)(struct file_lock *fl); int (*lm_change)(struct file_lock **fl, int type); }; -- Jeff Layton jlay...@poochiereds.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH resend #5] fcntl-linux.h: add new definitions and manual updates for open file description locks
From: Jeff Layton jlay...@poochiereds.net Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * manual/examples/ofdlocks.c: example of open file description lock usage * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but block until lock is acquired. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated
Re: [PATCH resend #5] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Tue, 8 Jul 2014 08:16:57 -0400 Jeff Layton jlay...@primarydata.com wrote: From: Jeff Layton jlay...@poochiereds.net Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ...and to follow up on this, I'm really getting quite exasperated. This patch has been done for several months now. There have been no review comments on it since Ondřej's comment about the changelog in late May. Michael K. merged the manpage patches months ago. The only remaining piece is for glibc to merge this patch. Why is it taking so long to get a simple header file and documentation patch merged? ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * manual/examples/ofdlocks.c: example of open file description lock usage * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS 3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file
[PATCH resend #6] fcntl-linux.h: add new definitions and manual updates for open file description locks
From: Jeff Layton jlay...@poochiereds.net Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * manual/examples/ofdlocks.c: example of open file description lock usage * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void *) i); + + pthread_exit (NULL); + return 0; +} diff --git a/manual/llio.texi b/manual/llio.texi index 6f8adfc607d7..864060dc7140 100644 --- a/manual/llio.texi +++ b/manual/llio.texi @@ -57,6 +57,10 @@ directly.) flags associated with open files. * File Locks:: Fcntl commands for implementing file locking. +* Open File Description Locks:: Fcntl commands for implementing + open file description locking. +* Open File Description Locks Example:: An example of open file description lock + usage * Interrupt Input:: Getting an asynchronous signal when input arrives. * IOCTLs:: Generic I/O Control operations. @@ -2890,7 +2894,7 @@ Get flags associated with the open file. @xref{File Status Flags}. Set flags associated with the open file. @xref{File Status Flags}. @item F_GETLK -Get a file lock. @xref{File Locks}. +Test a file lock. @xref{File Locks}. @item F_SETLK Set or clear a file lock. @xref{File Locks}. @@ -2898,6 +2902,18 @@ Set or clear a file lock. @xref{File Locks}. @item F_SETLKW Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. +@item F_OFD_GETLK +Test an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLK +Set or clear an open file description lock. @xref{Open File Description Locks}. +Specific to Linux. + +@item F_OFD_SETLKW +Like @code{F_OFD_SETLK}, but block until lock is acquired. +@xref{Open File Description Locks}. Specific to Linux. + @item F_GETOWN Get process or process group ID to receive @code{SIGIO} signals. @xref{Interrupt Input}. @@ -3576,6 +3592,10 @@ set_nonblock_flag (int desc, int value) @cindex file locks @cindex record locking +This section describes record locks that are associated
[PATCH] net: clean up some sparse endianness warnings in ipv6.h
sparse is throwing warnings when building sunrpc modules due to some endianness shenanigans in ipv6.h. Sprinkle some endianness fixups to silence them. These should all get fixed up at compile time, so I don't think this will add any extra work to be done at runtime. Signed-off-by: Jeff Layton jlay...@primarydata.com --- include/net/ipv6.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 574337fe72dd..5ed2c24fe950 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -557,9 +557,9 @@ static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const u32 initval) static inline bool ipv6_addr_loopback(const struct in6_addr *a) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - const unsigned long *ul = (const unsigned long *)a; + const __be64 *be = (const __be64 *)a; - return (ul[0] | (ul[1] ^ cpu_to_be64(1))) == 0UL; + return (be[0] | (be[1] ^ cpu_to_be64(1))) == cpu_to_be64(0UL); #else return (a-s6_addr32[0] | a-s6_addr32[1] | a-s6_addr32[2] | (a-s6_addr32[3] ^ htonl(1))) == 0; @@ -570,11 +570,11 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) { return ( #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - *(__be64 *)a | + (*(__be64 *)a != cpu_to_be64(0)) | #else - (a-s6_addr32[0] | a-s6_addr32[1]) | + ((a-s6_addr32[0] | a-s6_addr32[1]) != cpu_to_be32(0)) | #endif - (a-s6_addr32[2] ^ htonl(0x))) == 0UL; + ((a-s6_addr32[2] ^ htonl(0x)) == cpu_to_be32(0))); } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] nfsd: silence sparse warning about accessing credentials
sparse says: fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces) fs/nfsd/auth.c:31:38:expected struct cred const *cred fs/nfsd/auth.c:31:38:got struct cred const [noderef] asn:4*real_cred Add a new accessor for the -real_cred and use that to fetch the pointer. Accessing current-real_cred directly is actually quite safe since we know that they can't go away so this is mostly a cosmetic fixup to silence sparse. Signed-off-by: Jeff Layton jlay...@primarydata.com --- fs/nfsd/auth.c | 2 +- include/linux/cred.h | 9 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c index 72f44823adbb..9d46a0bdd9f9 100644 --- a/fs/nfsd/auth.c +++ b/fs/nfsd/auth.c @@ -28,7 +28,7 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp) validate_process_creds(); /* discard any old override before preparing the new set */ - revert_creds(get_cred(current-real_cred)); + revert_creds(get_cred(current_real_cred())); new = prepare_creds(); if (!new) return -ENOMEM; diff --git a/include/linux/cred.h b/include/linux/cred.h index f61d6c8f5ef3..b2d0820837c4 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -259,6 +259,15 @@ static inline void put_cred(const struct cred *_cred) rcu_dereference_protected(current-cred, 1) /** + * current_real_cred - Access the current task's objective credentials + * + * Access the objective credentials of the current task. RCU-safe, + * since nobody else can modify it. + */ +#define current_real_cred() \ + rcu_dereference_protected(current-real_cred, 1) + +/** * __task_cred - Access a task's objective credentials * @task: The task to query * -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net: clean up some sparse endianness warnings in ipv6.h
On Tue, 15 Jul 2014 10:01:07 -0700 Christoph Hellwig h...@infradead.org wrote: #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - const unsigned long *ul = (const unsigned long *)a; + const __be64 *be = (const __be64 *)a; - return (ul[0] | (ul[1] ^ cpu_to_be64(1))) == 0UL; + return (be[0] | (be[1] ^ cpu_to_be64(1))) == cpu_to_be64(0UL); Do you need the swap for 0UL? I know sparse treats 0 as special, so why wouldn't it treat 0UL special? Or just remove the 0UL postfix, no need for it in a simple comparism. Otherwise looks fine to me. Maybe not, I did it for completeness sake. I'll see if I can remove that. The macros do the conversion at compile time though so it shouldn't hurt anything either way. -- Jeff Layton jlay...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net: clean up some sparse endianness warnings in ipv6.h
On Mon, 14 Jul 2014 08:25:46 -0400 Jeff Layton jlay...@primarydata.com wrote: sparse is throwing warnings when building sunrpc modules due to some endianness shenanigans in ipv6.h. Sprinkle some endianness fixups to silence them. These should all get fixed up at compile time, so I don't think this will add any extra work to be done at runtime. Signed-off-by: Jeff Layton jlay...@primarydata.com --- include/net/ipv6.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 574337fe72dd..5ed2c24fe950 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -557,9 +557,9 @@ static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const u32 initval) static inline bool ipv6_addr_loopback(const struct in6_addr *a) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - const unsigned long *ul = (const unsigned long *)a; + const __be64 *be = (const __be64 *)a; - return (ul[0] | (ul[1] ^ cpu_to_be64(1))) == 0UL; + return (be[0] | (be[1] ^ cpu_to_be64(1))) == cpu_to_be64(0UL); #else return (a-s6_addr32[0] | a-s6_addr32[1] | a-s6_addr32[2] | (a-s6_addr32[3] ^ htonl(1))) == 0; @@ -570,11 +570,11 @@ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) { return ( #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - *(__be64 *)a | + (*(__be64 *)a != cpu_to_be64(0)) | #else - (a-s6_addr32[0] | a-s6_addr32[1]) | + ((a-s6_addr32[0] | a-s6_addr32[1]) != cpu_to_be32(0)) | #endif - (a-s6_addr32[2] ^ htonl(0x))) == 0UL; + ((a-s6_addr32[2] ^ htonl(0x)) == cpu_to_be32(0))); } /* Oof. I think I had the ipv6_addr_v4mapped changes wrong before due to misreading the code. The existing code basically takes the different values, does a bunch of bitwise operations on them and then compares the result to 0. My patch had it doing more than one comparison. I've got a fixed version, but it does mean that we need to use a __force cast in one place. I'll resend once I've tested it our some more. Sorry for the noise! -- Jeff Layton jlay...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 00/13] locks: implement filp-private (aka UNPOSIX) locks
This patchset is the fourth posting of this set. Behavior between this set and the last should be more or less the same. Here is a summary of changes: v4: - prefixed the set with the rest of the locks.c patches I have. - added patch to get rid of BUG() call in locks_remove_flock. I think a WARN + delete the lock is sufficient there. - added patches from Bruce to consolidate the struct flock/flock64 to file_lock conversion code - fixed locks_remove_file not to assume that filp-private locks won't be added on filesystems that have a -lock method. v3: - more consolidation of common code between flock_to_posix_lock and flock_to_posix_lock64 - better bisectability by reordering changes, such that partial implementation won't be exposed - s/filp/file/ s/FILP/FILE/ in symbol names v2: - inheritance semantics have been change to be more BSD-lock like - patchset size has been reduced by changing how lock ownership is handled - new F_UNLCKP l_type value has been added On the last set, I made note that we could consider implementing this with new cmd values instead of new l_type values. That's still doable but I haven't made that change in this set. I'm still open to that change, but I'd like to hear from others as to which they'd prefer. Note too that I've gone ahead and opened a request for the POSIX folks to consider adding this to the POSIX spec once we have something mergeable. They seem amenable to the idea but don't want to enshrine it into the standard until there's a real implementation of it: http://austingroupbugs.net/view.php?id=768 I also owe them a better writeup of the semantics for these new locks, but have been holding off on doing that until they're a little more settled. Original cover letter from v1 posting follows. Comments and suggestions welcome. ---[snip]-- At LSF this year, there was a discussion about the wishlist for userland file servers. One of the things brought up was the goofy and problematic behavior of POSIX locks when a file is closed. Boaz started a thread on it here: http://permalink.gmane.org/gmane.linux.file-systems/73364 Userland fileservers often need to maintain more than one open file descriptor on a file. The POSIX spec says: All locks associated with a file for a given process shall be removed when a file descriptor for that file is closed by that process or the process holding that file descriptor terminates. This is problematic since you can't close any file descriptor without dropping all your POSIX locks. Most userland file servers therefore end up opening the file with more access than is really necessary, and keeping fd's open for longer than is necessary to work around this. This patchset is a first stab at an approach to address this problem by adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short for private -- I'm open to changing that if you have a better mnemonic). For all intents and purposes these lock types act just like their non-P counterpart. The difference is that they are only implicitly released when the fd against which they were acquired is closed. As a side effect, these locks cannot be merged with non-P locks since they have different semantics on close. I've given this patchset some very basic smoke testing and it seems to do the right thing, but it is still pretty rough. If this looks reasonable I'll plan to do some documentation updates and will take a stab at trying to get these new lock types added to the POSIX spec (as HCH recommended). At this point, my main questions are: 1) does this look useful, particularly for fileserver implementors? 2) does this look OK API-wise? We could consider different cmd values or even different syscalls, but I figured this makes it clearer that P and non-P locks will still conflict with one another. J. Bruce Fields (2): locks: consolidate common code in the flock_to_posix_lock routines locks: simplify overflow checking Jeff Layton (11): locks: close potential race between setlease and open locks: clean up comment typo locks: remove inline qualifier from fl_link manipulation functions locks: add __acquires and __releases annotations to locks_start and locks_stop locks: eliminate BUG() call when there's an unexpected lock on file close locks: consolidate checks for compatible filp-f_mode values in setlk handlers locks: don't reference original flock struct in F_GETLK handlers locks: rename locks_remove_flock to locks_remove_file locks: show private lock types in /proc/locks locks: report l_pid as -1 for FL_FILE_PVT locks locks: add new private lock type that is owned by the filp fs/file_table.c | 2 +- fs/locks.c | 321 ++- include/linux/fs.h | 11 +- include/uapi/asm-generic/fcntl.h | 19 ++- 4 files changed, 204 insertions(+), 149 deletions
[PATCH v4 01/13] locks: close potential race between setlease and open
v2: - fix potential double-free of lease if second check finds conflict - add smp_mb's to ensure that other CPUs see i_flock changes v3: - remove smp_mb calls. Partial ordering is unlikely to help here. v4: - add back smp_mb calls. While we have implicit barriers in place that enforce this today, it's best to be explicit about it as a defensive coding measure. As Al Viro points out, there is an unlikely, but possible race between opening a file and setting a lease on it. generic_add_lease is done with the i_lock held, but the inode-i_flock check in break_lease is lockless. It's possible for another task doing an open to do the entire pathwalk and call break_lease between the point where generic_add_lease checks for a conflicting open and adds the lease to the list. If this occurs, we can end up with a lease set on the file with a conflicting open. To guard against that, check again for a conflicting open after adding the lease to the i_flock list. If the above race occurs, then we can simply unwind the lease setting and return -EAGAIN. Because we take dentry references and acquire write access on the file before calling break_lease, we know that if the i_flock list is empty when the open caller goes to check it then the necessary refcounts have already been incremented. Thus the additional check for a conflicting open will see that there is one and the setlease call will fail. Cc: Bruce Fields bfie...@fieldses.org Cc: David Howells dhowe...@redhat.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Reported-by: Al Viro v...@zeniv.linux.org.uk Signed-off-by: Jeff Layton jlay...@redhat.com Signed-off-by: J. Bruce Fields bfie...@redhat.com --- fs/locks.c | 75 -- include/linux/fs.h | 6 + 2 files changed, 68 insertions(+), 13 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 92a0f0a..2cfeea6 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl) locks_insert_global_locks(fl); } -/* - * Delete a lock and then free it. - * Wake up processes that are blocked waiting for this lock, - * notify the FS that the lock has been cleared and - * finally free the lock. +/** + * locks_delete_lock - Delete a lock and then free it. + * @thisfl_p: pointer that points to the fl_next field of the previous + * inode-i_flock list entry + * + * Unlink a lock from all lists and free the namespace reference, but don't + * free it yet. Wake up processes that are blocked waiting for this lock and + * notify the FS that the lock has been cleared. * * Must be called with the i_lock held! */ -static void locks_delete_lock(struct file_lock **thisfl_p) +static void locks_unlink_lock(struct file_lock **thisfl_p) { struct file_lock *fl = *thisfl_p; @@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p) } locks_wake_up_blocks(fl); +} + +/* + * Unlink a lock from all lists and free it. + * + * Must be called with i_lock held! + */ +static void locks_delete_lock(struct file_lock **thisfl_p) +{ + struct file_lock *fl = *thisfl_p; + + locks_unlink_lock(thisfl_p); locks_free_lock(fl); } @@ -1472,6 +1487,32 @@ int fcntl_getlease(struct file *filp) return type; } +/** + * check_conflicting_open - see if the given dentry points to a file that has + * an existing open that would conflict with the + * desired lease. + * @dentry:dentry to check + * @arg: type of lease that we're trying to acquire + * + * Check to see if there's an existing open fd on this file that would + * conflict with the lease we're trying to set. + */ +static int +check_conflicting_open(const struct dentry *dentry, const long arg) +{ + int ret = 0; + struct inode *inode = dentry-d_inode; + + if ((arg == F_RDLCK) (atomic_read(inode-i_writecount) 0)) + return -EAGAIN; + + if ((arg == F_WRLCK) ((d_count(dentry) 1) || + (atomic_read(inode-i_count) 1))) + ret = -EAGAIN; + + return ret; +} + static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp) { struct file_lock *fl, **before, **my_before = NULL, *lease; @@ -1499,12 +1540,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp return -EINVAL; } - error = -EAGAIN; - if ((arg == F_RDLCK) (atomic_read(inode-i_writecount) 0)) - goto out; - if ((arg == F_WRLCK) -((d_count(dentry) 1) - || (atomic_read(inode-i_count) 1))) + error = check_conflicting_open(dentry, arg); + if (error) goto out; /* @@ -1549,7 +1586,19 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp goto out
[PATCH v4 13/13] locks: add new private lock type that is owned by the filp
Due to some unfortunate history, POSIX locks have very strange and unhelpful semantics. The thing that usually catches people by surprise is that they are dropped whenever the process closes any file descriptor associated with the inode. This is extremely problematic for people developing file servers that need to implement byte-range locks. Developers often need a lock management facility to ensure that file descriptors are not closed until all of the locks associated with the inode are finished. Additionally, classic POSIX locks are owned by the process. Locks taken between threads within the same process won't conflict with one another, which renders them useless for synchronization between threads. This patchset adds a new type of lock that attempts to address these issues. These locks conflict with classic POSIX read/write locks, but have semantics that are more like BSD locks with respect to inheritance and behavior on close. This is implemented primarily by changing how fl_owner field is set for these locks. Instead of having them owned by the files_struct of the process, they are instead owned by the filp on which they were acquired. Thus, they are inherited across fork() and are only released when the last reference to a filp is put. These new semantics prevent them from being merged with classic POSIX locks, even if they are acquired by the same process. These locks will also conflict with classic POSIX locks even if they are acquired by the same process or on the same file descriptor. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 34 -- include/uapi/asm-generic/fcntl.h | 16 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 013b177..bd2d824 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -381,7 +381,6 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, } else fl-fl_end = OFFSET_MAX; - fl-fl_owner = current-files; fl-fl_pid = current-tgid; fl-fl_file = filp; fl-fl_flags = FL_POSIX; @@ -391,16 +390,45 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, /* Ensure that fl-fl_filp has compatible f_mode */ switch (l-l_type) { case F_RDLCK: + case F_RDLCKP: if (!(filp-f_mode FMODE_READ)) return -EBADF; break; case F_WRLCK: + case F_WRLCKP: if (!(filp-f_mode FMODE_WRITE)) return -EBADF; break; } - return assign_type(fl, l-l_type); + /* +* FL_FILE_PVT locks are owned by the filp upon which they were +* acquired, regardless of what task is dealing with them. Set the +* fl_owner appropriately and flag them as private. +*/ + switch(l-l_type) { + case F_RDLCKP: + fl-fl_owner = (fl_owner_t)filp; + fl-fl_type = F_RDLCK; + fl-fl_flags |= FL_FILE_PVT; + break; + case F_WRLCKP: + fl-fl_owner = (fl_owner_t)filp; + fl-fl_type = F_WRLCK; + fl-fl_flags |= FL_FILE_PVT; + break; + case F_UNLCKP: + fl-fl_owner = (fl_owner_t)filp; + fl-fl_type = F_UNLCK; + fl-fl_flags |= FL_FILE_PVT; + break; + default: + /* Any other POSIX lock is owned by the file_struct */ + fl-fl_owner = current-files; + return assign_type(fl, l-l_type); + } + + return 0; } /* Verify a struct flock and copy it to a struct file_lock as a POSIX @@ -2207,6 +2235,8 @@ void locks_remove_file(struct file *filp) if (!inode-i_flock) return; + locks_remove_posix(filp, (fl_owner_t)filp); + if (filp-f_op-flock) { struct file_lock fl = { .fl_pid = current-tgid, diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index 36025f7..25eb7be 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -151,6 +151,22 @@ struct f_owner_ex { #define F_UNLCK2 #endif +/* + * fd private POSIX locks. + * + * Usually POSIX locks held by a process are released on *any* close and are + * not inherited across a fork(). + * + * These lock types will conflict with normal POSIX locks, but are owned + * by the opened file, not the process. This means that they are inherited + * across fork() like BSD (flock) locks, and they are only released + * automatically when the last reference to the the open file against which + * they were acquired is put. + */ +#define F_RDLCKP 5 +#define F_WRLCKP 6 +#define F_UNLCKP 7 + /* for old implementation of bsd flock () */ #ifndef F_EXLCK #define F_EXLCK4
[PATCH v4 04/13] locks: add __acquires and __releases annotations to locks_start and locks_stop
...to make sparse happy. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index 049a144..6084f5a 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2430,6 +2430,7 @@ static int locks_show(struct seq_file *f, void *v) } static void *locks_start(struct seq_file *f, loff_t *pos) + __acquires(blocked_lock_lock) { struct locks_iterator *iter = f-private; @@ -2448,6 +2449,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos) } static void locks_stop(struct seq_file *f, void *v) + __releases(blocked_lock_lock) { spin_unlock(blocked_lock_lock); lg_global_unlock(file_lock_lglock); -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 08/13] locks: consolidate checks for compatible filp-f_mode values in setlk handlers
Move this check into flock64_to_posix_lock instead of duplicating it in two places. This also fixes a minor wart in the code where we continue referring to the struct flock after converting it to struct file_lock. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 46 -- 1 file changed, 12 insertions(+), 34 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 1555b05..820322d 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -387,6 +387,18 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, fl-fl_ops = NULL; fl-fl_lmops = NULL; + /* Ensure that fl-fl_filp has compatible f_mode */ + switch (l-l_type) { + case F_RDLCK: + if (!(filp-f_mode FMODE_READ)) + return -EBADF; + break; + case F_WRLCK: + if (!(filp-f_mode FMODE_WRITE)) + return -EBADF; + break; + } + return assign_type(fl, l-l_type); } @@ -2024,23 +2036,6 @@ again: file_lock-fl_flags |= FL_SLEEP; } - error = -EBADF; - switch (flock.l_type) { - case F_RDLCK: - if (!(filp-f_mode FMODE_READ)) - goto out; - break; - case F_WRLCK: - if (!(filp-f_mode FMODE_WRITE)) - goto out; - break; - case F_UNLCK: - break; - default: - error = -EINVAL; - goto out; - } - error = do_lock_file_wait(filp, cmd, file_lock); /* @@ -2142,23 +2137,6 @@ again: file_lock-fl_flags |= FL_SLEEP; } - error = -EBADF; - switch (flock.l_type) { - case F_RDLCK: - if (!(filp-f_mode FMODE_READ)) - goto out; - break; - case F_WRLCK: - if (!(filp-f_mode FMODE_WRITE)) - goto out; - break; - case F_UNLCK: - break; - default: - error = -EINVAL; - goto out; - } - error = do_lock_file_wait(filp, cmd, file_lock); /* -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 10/13] locks: rename locks_remove_flock to locks_remove_file
This function currently removes leases in addition to flock locks and in a later patch we'll have it deal with a new type of POSIX lock too. Rename it to locks_remove_file to indicate that it removes locks that are associated with a particular struct file, and not just flock locks. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/file_table.c| 2 +- fs/locks.c | 2 +- include/linux/fs.h | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index 5fff903..468543c 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -234,7 +234,7 @@ static void __fput(struct file *file) * in the file cleanup chain. */ eventpoll_release(file); - locks_remove_flock(file); + locks_remove_file(file); if (unlikely(file-f_flags FASYNC)) { if (file-f_op-fasync) diff --git a/fs/locks.c b/fs/locks.c index 8180141..a217c2c 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2197,7 +2197,7 @@ EXPORT_SYMBOL(locks_remove_posix); /* * This function is called on the last close of an open file. */ -void locks_remove_flock(struct file *filp) +void locks_remove_file(struct file *filp) { struct inode * inode = file_inode(filp); struct file_lock *fl; diff --git a/include/linux/fs.h b/include/linux/fs.h index 04be202..51974e0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1012,7 +1012,7 @@ extern struct file_lock * locks_alloc_lock(void); extern void locks_copy_lock(struct file_lock *, struct file_lock *); extern void __locks_copy_lock(struct file_lock *, const struct file_lock *); extern void locks_remove_posix(struct file *, fl_owner_t); -extern void locks_remove_flock(struct file *); +extern void locks_remove_file(struct file *); extern void locks_release_private(struct file_lock *); extern void posix_test_lock(struct file *, struct file_lock *); extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *); @@ -1083,7 +1083,7 @@ static inline void locks_remove_posix(struct file *filp, fl_owner_t owner) return; } -static inline void locks_remove_flock(struct file *filp) +static inline void locks_remove_file(struct file *filp) { return; } -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 11/13] locks: show private lock types in /proc/locks
In a later patch, we'll be adding a new type of lock that's owned by the struct file instead of the files_struct. Those sorts of locks will be flagged with a new IS_FILE_PVT flag. Show private locks in /proc/locks with a 'P' suffix. This does mean that we need to widen the column by one character. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 9 ++--- include/linux/fs.h | 1 + 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index a217c2c..995583b 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -135,6 +135,7 @@ #define IS_POSIX(fl) (fl-fl_flags FL_POSIX) #define IS_FLOCK(fl) (fl-fl_flags FL_FLOCK) #define IS_LEASE(fl) (fl-fl_flags (FL_LEASE|FL_DELEG)) +#define IS_FILE_PVT(fl)(fl-fl_flags FL_FILE_PVT) static bool lease_breaking(struct file_lock *fl) { @@ -2336,13 +2337,15 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl, seq_printf(f, UNKNOWN UNKNOWN ); } if (fl-fl_type LOCK_MAND) { - seq_printf(f, %s , + seq_printf(f, %s , (fl-fl_type LOCK_READ) ? (fl-fl_type LOCK_WRITE) ? RW: READ : (fl-fl_type LOCK_WRITE) ? WRITE : NONE ); + } else if (IS_FILE_PVT(fl)) { + seq_printf(f, %s , fl-fl_type == F_WRLCK ? WRITEP : READP ); } else { - seq_printf(f, %s , - (lease_breaking(fl)) + seq_printf(f, %s , + lease_breaking(fl) ? (fl-fl_type == F_UNLCK) ? UNLCK : READ : (fl-fl_type == F_WRLCK) ? WRITE : READ ); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 51974e0..639a3b7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -888,6 +888,7 @@ static inline int file_check_writeable(struct file *filp) #define FL_SLEEP 128 /* A blocking lock */ #define FL_DOWNGRADE_PENDING 256 /* Lease is being downgraded */ #define FL_UNLOCK_PENDING 512 /* Lease is being broken */ +#define FL_FILE_PVT1024/* lock is private to the file */ /* * Special return value from posix_lock_file() and vfs_lock_file() for -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 12/13] locks: report l_pid as -1 for FL_FILE_PVT locks
FL_FILE_PVT locks are no longer tied to a particular pid, and are instead inheritable by child processes. Report a l_pid of '-1' for these sorts of locks since the pid is somewhat meaningless for them. This precedent comes from FreeBSD. There, POSIX and flock() locks can conflict with one another. If fcntl(F_GETLK, ...) returns a lock set with flock() then the l_pid member cannot be a process ID because the lock is not held by a process as such. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 995583b..013b177 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1863,7 +1863,7 @@ EXPORT_SYMBOL_GPL(vfs_test_lock); static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl) { - flock-l_pid = fl-fl_pid; + flock-l_pid = IS_FILE_PVT(fl) ? -1 : fl-fl_pid; #if BITS_PER_LONG == 32 /* * Make sure we can represent the posix lock via @@ -1885,7 +1885,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl) #if BITS_PER_LONG == 32 static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl) { - flock-l_pid = fl-fl_pid; + flock-l_pid = IS_FILE_PVT(fl) ? -1 : fl-fl_pid; flock-l_start = fl-fl_start; flock-l_len = fl-fl_end == OFFSET_MAX ? 0 : fl-fl_end - fl-fl_start + 1; -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 09/13] locks: don't reference original flock struct in F_GETLK handlers
The locks code currently sanity checks the type values in the flock struct before doing the flock-file_lock conversion. That will be problematic when new l_type values are introduced in a later patch. Instead, do the flock_to_posix_lock conversion first, and then sanity check the values in the file_lock instead. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 820322d..8180141 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1905,14 +1905,15 @@ int fcntl_getlk(struct file *filp, struct flock __user *l) error = -EFAULT; if (copy_from_user(flock, l, sizeof(flock))) goto out; - error = -EINVAL; - if ((flock.l_type != F_RDLCK) (flock.l_type != F_WRLCK)) - goto out; error = flock_to_posix_lock(filp, file_lock, flock); if (error) goto out; + error = -EINVAL; + if ((file_lock.fl_type != F_RDLCK) (file_lock.fl_type != F_WRLCK)) + goto out; + error = vfs_test_lock(filp, file_lock); if (error) goto out; @@ -2073,14 +2074,15 @@ int fcntl_getlk64(struct file *filp, struct flock64 __user *l) error = -EFAULT; if (copy_from_user(flock, l, sizeof(flock))) goto out; - error = -EINVAL; - if ((flock.l_type != F_RDLCK) (flock.l_type != F_WRLCK)) - goto out; error = flock64_to_posix_lock(filp, file_lock, flock); if (error) goto out; + error = -EINVAL; + if ((file_lock.fl_type != F_RDLCK) (file_lock.fl_type != F_WRLCK)) + goto out; + error = vfs_test_lock(filp, file_lock); if (error) goto out; -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 05/13] locks: eliminate BUG() call when there's an unexpected lock on file close
A leftover lock on the list is surely a sign of a problem of some sort, but it's not necessarily a reason to panic the box. Instead, just log a warning with some info about the lock, and then delete it like we would any other lock. In the event that the filesystem declares a -lock f_op, we may end up leaking something, but that's generally preferable to an immediate panic. Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 6084f5a..dd30933 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2281,16 +2281,28 @@ void locks_remove_flock(struct file *filp) while ((fl = *before) != NULL) { if (fl-fl_file == filp) { - if (IS_FLOCK(fl)) { - locks_delete_lock(before); - continue; - } if (IS_LEASE(fl)) { lease_modify(before, F_UNLCK); continue; } - /* What? */ - BUG(); + + /* +* There's a leftover lock on the list of a type that +* we didn't expect to see. Most likely a classic +* POSIX lock that ended up not getting released +* properly, or that raced onto the list somehow. Log +* some info about it and then just remove it from +* the list. +*/ + WARN(!IS_FLOCK(fl), + leftover lock: dev=%u:%u ino=%lu type=%hhd flags=0x%x start=%lld end=%lld\n, + MAJOR(inode-i_sb-s_dev), + MINOR(inode-i_sb-s_dev), inode-i_ino, + fl-fl_type, fl-fl_flags, + fl-fl_start, fl-fl_end); + + locks_delete_lock(before); + continue; } before = fl-fl_next; } -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 06/13] locks: consolidate common code in the flock_to_posix_lock routines
From: J. Bruce Fields bfie...@fieldses.org Currently, there's a lot of copy and paste between the two. Add some functions to do the initialization of the file_lock from values passed in, and turn the flock/flock64 variants of those functions into wrappers around them. Signed-off-by: Jeff Layton jlay...@redhat.com Signed-off-by: J. Bruce Fields bfie...@fieldses.org --- fs/locks.c | 112 +++ include/uapi/asm-generic/fcntl.h | 3 -- 2 files changed, 44 insertions(+), 71 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index dd30933..863f4df 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -344,48 +344,42 @@ static int assign_type(struct file_lock *fl, long type) return 0; } -/* Verify a struct flock and copy it to a struct file_lock as a POSIX - * style lock. - */ -static int flock_to_posix_lock(struct file *filp, struct file_lock *fl, - struct flock *l) +static int flock_to_posix_lock_common(struct file *filp, struct file_lock *fl, + struct flock64 *l, loff_t offset_max) { - off_t start, end; - switch (l-l_whence) { case SEEK_SET: - start = 0; + fl-fl_start = 0; break; case SEEK_CUR: - start = filp-f_pos; + fl-fl_start = filp-f_pos; break; case SEEK_END: - start = i_size_read(file_inode(filp)); + fl-fl_start = i_size_read(file_inode(filp)); break; default: return -EINVAL; } + if (l-l_start offset_max - fl-fl_start) + return -EOVERFLOW; + fl-fl_start += l-l_start; + if (fl-fl_start 0) + return -EINVAL; + if (l-l_len offset_max - fl-fl_start) + return -EOVERFLOW; + if (fl-fl_start + l-l_len 0) + return -EINVAL; /* POSIX-1996 leaves the case l-l_len 0 undefined; POSIX-2001 defines it. */ - start += l-l_start; - if (start 0) - return -EINVAL; - fl-fl_end = OFFSET_MAX; - if (l-l_len 0) { - end = start + l-l_len - 1; - fl-fl_end = end; - } else if (l-l_len 0) { - end = start - 1; - fl-fl_end = end; - start += l-l_len; - if (start 0) - return -EINVAL; - } - fl-fl_start = start; /* we record the absolute position */ - if (fl-fl_end fl-fl_start) - return -EOVERFLOW; - + if (l-l_len 0) + fl-fl_end = fl-fl_start + l-l_len - 1; + else if (l-l_len 0) { + fl-fl_end = fl-fl_start - 1; + fl-fl_start += l-l_len; + } else + fl-fl_end = OFFSET_MAX; + fl-fl_owner = current-files; fl-fl_pid = current-tgid; fl-fl_file = filp; @@ -396,50 +390,32 @@ static int flock_to_posix_lock(struct file *filp, struct file_lock *fl, return assign_type(fl, l-l_type); } +/* Verify a struct flock and copy it to a struct file_lock as a POSIX + * style lock. + */ +static int flock_to_posix_lock(struct file *filp, struct file_lock *fl, + struct flock *l) +{ + struct flock64 ll = { + .l_type = l-l_type, + .l_whence = l-l_whence, + .l_start = l-l_start, + .l_len = l-l_len, + }; + + /* +* The use of OFFT_OFFSET_MAX here ensures we return -EOVERFLOW +* if the start or end of the lock could not be represented as +* an off_t, following SUSv3. +*/ + return flock_to_posix_lock_common(filp, fl, ll, OFFT_OFFSET_MAX); +} + #if BITS_PER_LONG == 32 static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, struct flock64 *l) { - loff_t start; - - switch (l-l_whence) { - case SEEK_SET: - start = 0; - break; - case SEEK_CUR: - start = filp-f_pos; - break; - case SEEK_END: - start = i_size_read(file_inode(filp)); - break; - default: - return -EINVAL; - } - - start += l-l_start; - if (start 0) - return -EINVAL; - fl-fl_end = OFFSET_MAX; - if (l-l_len 0) { - fl-fl_end = start + l-l_len - 1; - } else if (l-l_len 0) { - fl-fl_end = start - 1; - start += l-l_len; - if (start 0) - return -EINVAL; - } - fl-fl_start = start; /* we record the absolute position */ - if (fl-fl_end fl-fl_start) - return -EOVERFLOW; - - fl-fl_owner = current-files; - fl-fl_pid = current-tgid; - fl-fl_file = filp; - fl-fl_flags
[PATCH v4 07/13] locks: simplify overflow checking
From: J. Bruce Fields bfie...@fieldses.org Or maybe we don't actually care about indicating overflow in the 32-bit case: sure we could fail if e.g. f_pos+start or f_pos+start+len would exceed 32-bits, but do we really need to? Signed-off-by: J. Bruce Fields bfie...@redhat.com --- fs/locks.c | 23 +-- 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 863f4df..1555b05 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -344,8 +344,8 @@ static int assign_type(struct file_lock *fl, long type) return 0; } -static int flock_to_posix_lock_common(struct file *filp, struct file_lock *fl, - struct flock64 *l, loff_t offset_max) +static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, +struct flock64 *l) { switch (l-l_whence) { case SEEK_SET: @@ -360,12 +360,12 @@ static int flock_to_posix_lock_common(struct file *filp, struct file_lock *fl, default: return -EINVAL; } - if (l-l_start offset_max - fl-fl_start) + if (l-l_start OFFSET_MAX - fl-fl_start) return -EOVERFLOW; fl-fl_start += l-l_start; if (fl-fl_start 0) return -EINVAL; - if (l-l_len offset_max - fl-fl_start) + if (l-l_len OFFSET_MAX - fl-fl_start) return -EOVERFLOW; if (fl-fl_start + l-l_len 0) return -EINVAL; @@ -403,22 +403,9 @@ static int flock_to_posix_lock(struct file *filp, struct file_lock *fl, .l_len = l-l_len, }; - /* -* The use of OFFT_OFFSET_MAX here ensures we return -EOVERFLOW -* if the start or end of the lock could not be represented as -* an off_t, following SUSv3. -*/ - return flock_to_posix_lock_common(filp, fl, ll, OFFT_OFFSET_MAX); + return flock64_to_posix_lock(filp, fl, ll); } -#if BITS_PER_LONG == 32 -static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl, -struct flock64 *l) -{ - return flock_to_posix_lock_common(filp, fl, l, OFFSET_MAX); -} -#endif - /* default lease lock manager operations */ static void lease_break_callback(struct file_lock *fl) { -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 03/13] locks: remove inline qualifier from fl_link manipulation functions
It's best to let the compiler decide that. Reported-by: Stephen Rothwell s...@canb.auug.org.au Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 5e28612..049a144 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -511,8 +511,7 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2) } /* Must be called with the i_lock held! */ -static inline void -locks_insert_global_locks(struct file_lock *fl) +static void locks_insert_global_locks(struct file_lock *fl) { lg_local_lock(file_lock_lglock); fl-fl_link_cpu = smp_processor_id(); @@ -521,8 +520,7 @@ locks_insert_global_locks(struct file_lock *fl) } /* Must be called with the i_lock held! */ -static inline void -locks_delete_global_locks(struct file_lock *fl) +static void locks_delete_global_locks(struct file_lock *fl) { /* * Avoid taking lock if already unhashed. This is safe since this check @@ -544,14 +542,12 @@ posix_owner_key(struct file_lock *fl) return (unsigned long)fl-fl_owner; } -static inline void -locks_insert_global_blocked(struct file_lock *waiter) +static void locks_insert_global_blocked(struct file_lock *waiter) { hash_add(blocked_hash, waiter-fl_link, posix_owner_key(waiter)); } -static inline void -locks_delete_global_blocked(struct file_lock *waiter) +static void locks_delete_global_blocked(struct file_lock *waiter) { hash_del(waiter-fl_link); } -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 02/13] locks: clean up comment typo
Signed-off-by: Jeff Layton jlay...@redhat.com --- fs/locks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/locks.c b/fs/locks.c index 2cfeea6..5e28612 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -581,7 +581,7 @@ static void locks_delete_block(struct file_lock *waiter) * it seems like the reasonable thing to do. * * Must be called with both the i_lock and blocked_lock_lock held. The fl_block - * list itself is protected by the file_lock_list, but by ensuring that the + * list itself is protected by the blocked_lock_lock, but by ensuring that the * i_lock is also held on insertions we can avoid taking the blocked_lock_lock * in some cases when we see that the fl_block list is empty. */ -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Makefile: fix order of INITRD_COMPRESS-y assignments
Commit 7ac181568342ec (fix build with make 3.80) changed how the Makefile handles the INITRD_COMPRESS-y assignment in order to make it work properly with older versions of make. Unfortunately, it also changes the behavior of how that assignment works when multiple CONFIG_RD_* options are set, such that they are now preferred in the reverse order of the way they were before. Reverse the order of assignments to try and preserve the earlier precedence in this situation. Cc: Jan Beulich jbeul...@suse.com Cc: P J P ppan...@redhat.com Cc: Andrew Morton a...@linux-foundation.org Signed-off-by: Jeff Layton jlay...@redhat.com --- Makefile | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index 89cee86..fb76899 100644 --- a/Makefile +++ b/Makefile @@ -733,11 +733,11 @@ export mod_strip_cmd # This shall be used by the dracut(8) tool while creating an initramfs image. # INITRD_COMPRESS-y := gzip -INITRD_COMPRESS-$(CONFIG_RD_BZIP2) := bzip2 -INITRD_COMPRESS-$(CONFIG_RD_LZMA) := lzma -INITRD_COMPRESS-$(CONFIG_RD_XZ):= xz -INITRD_COMPRESS-$(CONFIG_RD_LZO) := lzo INITRD_COMPRESS-$(CONFIG_RD_LZ4) := lz4 +INITRD_COMPRESS-$(CONFIG_RD_LZO) := lzo +INITRD_COMPRESS-$(CONFIG_RD_XZ):= xz +INITRD_COMPRESS-$(CONFIG_RD_LZMA) := lzma +INITRD_COMPRESS-$(CONFIG_RD_BZIP2) := bzip2 export INITRD_COMPRESS := $(INITRD_COMPRESS-y) ifdef CONFIG_MODULE_SIG_ALL -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Broken initrd compression settings in 3.13
On Fri, 20 Dec 2013 18:15:29 -0800 Linus Torvalds torva...@linux-foundation.org wrote: On Fri, Dec 20, 2013 at 5:39 PM, Andrew Morton a...@linux-foundation.org wrote: On Fri, 20 Dec 2013 16:41:43 -0800 Linus Torvalds torva...@linux-foundation.org wrote: (a) most sane people don't even have lz4 _installed_, so dracut won't actually succeed (b) there's no way to select the compression level (unlike the INITRAMFS_COMPRESSION thing that actually has a choice) (c) even if you *do* have lz4, it doesn't actually work, because while that causes the new F20 dracut to compress the initramfs with lz4, the end result is completely broken, because the F20 lsinitrd scripts don't understand the end result, so now the whole kernel install fails. (a) and (b) are very much kernel bugs. Jeff sent the below this morning. Will that fix (a)? Yes, it fixes (a), at least to some degree, in that at least defaulting to bzip2 is a lot more sane than defaulting to lz4. I suspect most everybody has bzip2 installed. And at least on my current F20 install, it looks like lsinitrd understands to use zcat, bzcat or xzcat on the resulting initrd image (and bzcat does that bzip2 decoding). So I think Jeff's patch at least fixes the symptoms. That said, I think it does nothing *but* fix the symptoms, and we're actually still better off with the 3.12 behavior which was to never set INITRD_COMPRESS at all. Because quite frankly, there's currently no way for the kernel to know what the right compressor is. bz2 may well work, but can you guarantee it? I certainly can't.. Now, if we asked the user, that would be a different thing. But right now we very much don't ask the user, and we just pick one at random. We're better off not picking a compression method at all, at which point the distro installkernel will do whatever the distro does. Linus Perhaps a better solution for this would be to instead export an env var with a list of the compression algorithms that the kernel supports. Then installkernel or dracut could use that info to make a semi-intelligent decision based on that and what tools are installed. ...or maybe a separate env var for each one that it supports: $INITRD_COMPRESS_LZ4 $INITRD_COMPRESS_BZIP2 $INITRD_COMPRESS_GZIP ...etc. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Broken initrd compression settings in 3.13
On Fri, 20 Dec 2013 18:15:29 -0800 Linus Torvalds torva...@linux-foundation.org wrote: On Fri, Dec 20, 2013 at 5:39 PM, Andrew Morton a...@linux-foundation.org wrote: On Fri, 20 Dec 2013 16:41:43 -0800 Linus Torvalds torva...@linux-foundation.org wrote: (a) most sane people don't even have lz4 _installed_, so dracut won't actually succeed (b) there's no way to select the compression level (unlike the INITRAMFS_COMPRESSION thing that actually has a choice) (c) even if you *do* have lz4, it doesn't actually work, because while that causes the new F20 dracut to compress the initramfs with lz4, the end result is completely broken, because the F20 lsinitrd scripts don't understand the end result, so now the whole kernel install fails. (a) and (b) are very much kernel bugs. Jeff sent the below this morning. Will that fix (a)? Yes, it fixes (a), at least to some degree, in that at least defaulting to bzip2 is a lot more sane than defaulting to lz4. I suspect most everybody has bzip2 installed. And at least on my current F20 install, it looks like lsinitrd understands to use zcat, bzcat or xzcat on the resulting initrd image (and bzcat does that bzip2 decoding). So I think Jeff's patch at least fixes the symptoms. That said, I think it does nothing *but* fix the symptoms, and we're actually still better off with the 3.12 behavior which was to never set INITRD_COMPRESS at all. Because quite frankly, there's currently no way for the kernel to know what the right compressor is. bz2 may well work, but can you guarantee it? I certainly can't.. Now, if we asked the user, that would be a different thing. But right now we very much don't ask the user, and we just pick one at random. We're better off not picking a compression method at all, at which point the distro installkernel will do whatever the distro does. Linus That works for me, you can drop my patch... But, what's the point of setting INITRD_COMPRESS-y at all if you're not going to export INITRD_COMPRESS? Would it be better to just remove that entire block in the Makefile (i.e. just revert PJP's patch?) -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page leak at nfs_symlink()
On Fri, 7 Feb 2014 13:19:54 -0200 Rafael Aquini aqu...@redhat.com wrote: Changes committed by a0b8cab3 mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API have introduced a call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as now the page gets an extra refcount that is not dropped. Jan Stancek observed and reported the leak effect while running test8 from Connectathon Testsuite. After several iterations over the test case, which creates several symlinks on a NFS mountpoint, the test system was quickly getting into an out-of-memory scenario. This patch fixes the page leak by dropping that extra refcount add_to_page_cache_lru() is grabbing. Signed-off-by: Jan Stancek jstan...@redhat.com Signed-off-by: Rafael Aquini aqu...@redhat.com --- fs/nfs/dir.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index be38b57..4a48fe4 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1846,6 +1846,11 @@ int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname) GFP_KERNEL)) { SetPageUptodate(page); unlock_page(page); + /* + * add_to_page_cache_lru() grabs an extra page refcount. + * Drop it here to avoid leaking this page later. + */ + page_cache_release(page); } else __free_page(page); Looks reasonable as an interim fix and should almost certainly go to stable. Longer term, I think it would be best from an API standpoint to fix add_to_page_cache_lru not to take this extra reference (or to have it drop it itself) and fix up the callers accordingly. That seems like a trap for the unwary... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend #7] fcntl-linux.h: add new definitions and manual updates for open file description locks
On Tue, 29 Jul 2014 23:30:36 -0400 Carlos O'Donell car...@redhat.com wrote: On 07/23/2014 02:21 PM, Jeff Layton wrote: From: Jeff Layton jlay...@poochiereds.net Thanks for resending. Sorry for the delay. Your use of 3 different emails caused me to miss the recent resends. That's my fault and tied to the way I'm tracking everything from patchwork using the first email you used. Sorry about that. I was hoping to get this wrapped up before changing jobs, but that obviously didn't happen... I am assuming that this work is under Red Hat's copyright status since you submitted it originally from @redhat.com. Yeah, that should be fine. This work was all done under the aegis of Red Hat. The small tweaks you mention below are also OK. Thanks for getting this merged! I've committed this to trunk with small tweaks and the following NEWS entry: * Support for file description locks is added to systems running the Linux kernel. The standard file locking interfaces are extended to operate on file descriptions, not file descriptors, via the use of F_OFD_GETLK, F_OFD_SETLK, and F_OFD_SETLKW. File description locks are associated with an open file instead of a process. This will be in 2.20 when we cut the branch. Open file description locks have been merged into the Linux kernel for v3.15. Add the appropriate command-value definitions and an update to the manual that describes their usage. ChangeLog: 2014-04-24 Jeff Layton jlay...@poochiereds.net [BZ#16839] * manual/llio.texi: add section about open file description locks * manual/examples/ofdlocks.c: example of open file description lock usage * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: (F_OFD_GETLK, F_OFD_SETLK, F_OFD_SETLKW): New macros. I simplified it a bit. A space in the ChangeLog indicates a distinct commit, which this is not, so I lump them together. This is what went in: 2014-07-29 Jeff Layton jlay...@poochiereds.net [BZ #16839] * manual/llio.texi: Add section about open file description locks. * manual/examples/ofdlocks.c: Example of open file description lock usage. * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: Define F_OFD_GETLK, F_OFD_SETLK, and F_OFD_SETLKW. As the committer I added #16839 to the fixed bug list following: https://sourceware.org/glibc/wiki/Committer%20checklist --- manual/examples/ofdlocks.c | 77 + manual/llio.texi | 241 - sysdeps/unix/sysv/linux/bits/fcntl-linux.h | 17 ++ 3 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 manual/examples/ofdlocks.c diff --git a/manual/examples/ofdlocks.c b/manual/examples/ofdlocks.c new file mode 100644 index ..85e193cdabe6 --- /dev/null +++ b/manual/examples/ofdlocks.c @@ -0,0 +1,77 @@ +/* Open File Description Locks Usage Example + Copyright (C) 1991-2014 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License + as published by the Free Software Foundation; either version 2 + of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, see http://www.gnu.org/licenses/. +*/ + +#define _GNU_SOURCE +#include stdio.h +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include fcntl.h +#include pthread.h + +#define FILENAME /tmp/foo +#define NUM_THREADS3 +#define ITERATIONS 5 + +void * +thread_start (void *arg) +{ + int i, fd, len; + long tid = (long) arg; + char buf[256]; + struct flock lck = { +.l_whence = SEEK_SET, +.l_start = 0, +.l_len = 1, + }; + + fd = open (/tmp/foo, O_RDWR | O_CREAT, 0666); + + for (i = 0; i ITERATIONS; i++) +{ + lck.l_type = F_WRLCK; + fcntl (fd, F_OFD_SETLKW, lck); + + len = sprintf (buf, %d: tid=%ld fd=%d\n, i, tid, fd); + + lseek (fd, 0, SEEK_END); + write (fd, buf, len); + fsync (fd); + + lck.l_type = F_UNLCK; + fcntl (fd, F_OFD_SETLK, lck); + + /* sleep to ensure lock is yielded to another thread */ + usleep (1); +} + pthread_exit (NULL); +} + +int +main (int argc, char **argv) +{ + long i; + pthread_t threads[NUM_THREADS]; + + truncate (FILENAME, 0); + + for (i = 0; i NUM_THREADS; i++) +pthread_create (threads[i], NULL, thread_start, (void
[PATCH v2] net: clean up some sparse endianness warnings in ipv6.h
sparse is throwing warnings when building sunrpc modules due to some endianness shenanigans in ipv6.h. Specifically: CHECK net/sunrpc/addr.c include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer include/net/ipv6.h:573:17: warning: restricted __be64 degrades to integer include/net/ipv6.h:577:34: warning: restricted __be32 degrades to integer Sprinkle some endianness fixups to silence them. These should all get fixed up at compile time, so I don't think this will add any extra work to be done at runtime. Signed-off-by: Jeff Layton jlay...@primarydata.com --- include/net/ipv6.h | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 574337fe72dd..0535da61fe0b 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -557,24 +557,29 @@ static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const u32 initval) static inline bool ipv6_addr_loopback(const struct in6_addr *a) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - const unsigned long *ul = (const unsigned long *)a; + const __be64 *be = (const __be64 *)a; - return (ul[0] | (ul[1] ^ cpu_to_be64(1))) == 0UL; + return (be[0] | (be[1] ^ cpu_to_be64(1))) == 0UL; #else return (a-s6_addr32[0] | a-s6_addr32[1] | - a-s6_addr32[2] | (a-s6_addr32[3] ^ htonl(1))) == 0; + a-s6_addr32[2] | (a-s6_addr32[3] ^ cpu_to_be32(1))) == 0; #endif } +/* + * Note that we must __force cast these to unsigned long to make sparse happy, + * since all of the endian-annotated types are fixed size regardless of arch. + */ static inline bool ipv6_addr_v4mapped(const struct in6_addr *a) { return ( #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) BITS_PER_LONG == 64 - *(__be64 *)a | + *(unsigned long *)a | #else - (a-s6_addr32[0] | a-s6_addr32[1]) | + (__force unsigned long)(a-s6_addr32[0] | a-s6_addr32[1]) | #endif - (a-s6_addr32[2] ^ htonl(0x))) == 0UL; + (__force unsigned long)(a-s6_addr32[2] ^ + cpu_to_be32(0x))) == 0UL; } /* -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Killing process in D state on mount to dead NFS server. (when process is in fsync)
v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJT2rLiAAoJELbHqkYeJT4OqPgH/0taKW6Be90c1mETZf9yeqZF YMLZk8XC2wloEd9nVz//mXREmiu18Hc+5p7Upd4Os21J2P4PBMGV6P/9DMxxehwH YX1HKha0EoAsbO5ILQhbLf83cRXAPEpvJPgYHrq6xjlKB8Q8OxxND37rY7kl19Zz sdAw6GiqHICF3Hq1ATa/jvixMluDnhER9Dln3wOdAGzmmuFYqpTsV4EwzbKKqInJ 6C15q+cq/9aYh6usN6z2qJhbHgqM9EWcPL6jOrCwX4PbC1XjKHekpFN0t9oKQClx qSPuweMQ7fP4IBd2Ke8L/QlyOVblAKSE7t+NdrjfzLmYPzyHTyfLABR/BI053to= =/9FJ -END PGP SIGNATURE- -- Jeff Layton jlay...@poochiereds.net signature.asc Description: PGP signature
Re: Killing process in D state on mount to dead NFS server. (when process is in fsync)
On Fri, 1 Aug 2014 20:50:13 -0500 Roger Heflin rogerhef...@gmail.com wrote: Doesn't NFS have an intr flag to allow kill -9 to work? Whenever I have had that set it has appeared to work after about 30 seconds or so...without that kill -9 does not work when the nfs server is missing. Not anymore. That mount option has been deprecated (and ignored) for years. The code in the RPC engine will generally give up in the face of fatal signals. In this case though, we're in uninterruptible sleep in the bowels of the writeback code. The problem here is not really specific to NFS, per-se -- it just happens to be the filesystem that most people notice it on. On Fri, Aug 1, 2014 at 8:21 PM, Jeff Layton jlay...@poochiereds.net wrote: On Fri, 1 Aug 2014 07:50:53 +1000 NeilBrown ne...@suse.de wrote: On Thu, 31 Jul 2014 14:20:07 -0700 Ben Greear gree...@candelatech.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/31/2014 01:42 PM, NeilBrown wrote: On Thu, 31 Jul 2014 11:00:35 -0700 Ben Greear gree...@candelatech.com wrote: So, this has been asked all over the interweb for years and years, but the best answer I can find is to reboot the system or create a fake NFS server somewhere with the same IP as the gone-away NFS server. The problem is: I have some mounts to an NFS server that no longer exists (crashed/powered down). I have some processes stuck trying to write to files open on these mounts. I want to kill the process and unmount. umount -l will make the mount go a way, sort of. But process is still hung. umount -f complains: umount2: Device or resource busy umount.nfs: /mnt/foo: device is busy kill -9 does not work on process. Kill -1 should work (since about 2.6.25 or so). That is -[ONE], right? Assuming so, it did not work for me. No, it was -9 sorry, I really shouldn't be let out without my proof reader. However the 'stack' is sufficient to see what is going on. The problem is that it is blocked inside the VM well away from NFS and there is no way for NFS to say give up and go home. I'd suggest that is a bug. I cannot see any justification for fsync to not be killable. It wouldn't be too hard to create a patch to make it so. It would be a little harder to examine all call paths and create a convincing case that the patch was safe. It might be herculean task to convince others that it was the right thing to do so let's start with that one. Hi Linux-mm and fs-devel people. What do people think of making fsync and variants KILLABLE ?? I probably only need a little bit of encouragement to write a patch Thanks, NeilBrown It would be good to fix this in some fashion once and for all, and the wait_on_page_writeback wait is a major source of pain for a lot of people. So to summarize... The problem in a nutshell is that Ben has some cached writes to the NFS server, but the server has gone away (presumably forever). The question is -- how do we communicate to the kernel that that server isn't coming back and that those dirty pages should be invalidated so that we can umount the filesystem? Allowing fsync/close to be killable sounds reasonable to me as at least a partial solution. Both close(2) and fsync(2) are allowed to return EINTR according to the POSIX spec. Allowing a kill -9 there seems like it should be fine, and maybe we ought to even consider letting it be susceptible to lesser signals. That still leaves some open questions though... Is that enough to fix it? You'd still have the dirty pages lingering around, right? Would a umount -f presumably work at that point? Kernel is 3.14.4+, with some of extra patches, but probably nothing that influences this particular behaviour. [root@lf1005-14010010 ~]# cat /proc/3805/stack [811371ba] sleep_on_page+0x9/0xd [8113738e] wait_on_page_bit+0x71/0x78 [8113769a] filemap_fdatawait_range+0xa2/0x16d [8113780e] filemap_write_and_wait_range+0x3b/0x77 [a0f04734] nfs_file_fsync+0x37/0x83 [nfs] [811a8d32] vfs_fsync_range+0x19/0x1b [811a8d4b] vfs_fsync+0x17/0x19 [a0f05305] nfs_file_flush+0x6b/0x6f [nfs] [81183e46] filp_close+0x3f/0x71 [8119c8ae] __close_fd+0x80/0x98 [81183de5] SyS_close+0x1c/0x3e [815c55f9] system_call_fastpath+0x16/0x1b [] 0x [root@lf1005-14010010 ~]# kill -1 3805 [root@lf1005-14010010 ~]# cat /proc/3805/stack [811371ba] sleep_on_page+0x9/0xd [8113738e] wait_on_page_bit+0x71/0x78 [8113769a] filemap_fdatawait_range+0xa2/0x16d [8113780e] filemap_write_and_wait_range+0x3b/0x77 [a0f04734] nfs_file_fsync+0x37/0x83 [nfs] [811a8d32
[PATCH 0/5] locks: move most locks_release_private calls outside of i_lock
In the days of yore, the file locking code was primarily protected by the BKL. That changed in commit 72f98e72551fa (locks: turn lock_flocks into a spinlock), at which point the code was changed to be protected by a conventional spinlock (mostly due to a push to finally eliminate the BKL). Since then, the code has been changed to use the i_lock instead of a global spinlock, but it's still under a spinlock. With that change, several functions now no longer can block when they originally could. This is a particular problem with the fl_release_private operation. In NFSv4, that operation is used to kick off a RELEASE_LOCKOWNER or FREE_STATEID call, and that requires being able to do an allocation. This was reported by Josh Stone here: https://bugzilla.redhat.com/show_bug.cgi?id=1089092 My initial stab at fixing this involved moving this to a workqueue, but Trond pointed out the above change was technically a regression with the way the spinlocking in the file locking code works, and suggested an alternate approach to fixing it. This set focuses on moving most of the locks_release_private calls outside of the inode-i_lock. There are still a few that are done under the i_lock in the lease handling code. Cleaning up the use of the i_lock in the lease code is a larger project which we'll have to tackle at some point, but there are some other cleanups that will need to happen first. Absent any objections, I'll plan to merge these for 3.18. Jeff Layton (5): locks: don't call locks_release_private from locks_copy_lock locks: don't reuse file_lock in __posix_lock_file locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock locks: update Locking documentation to clarify fl_release_private behavior Documentation/filesystems/Locking | 6 ++- fs/locks.c| 80 +-- 2 files changed, 57 insertions(+), 29 deletions(-) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] locks: update Locking documentation to clarify fl_release_private behavior
Signed-off-by: Jeff Layton jlay...@primarydata.com --- Documentation/filesystems/Locking | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index b18dd1779029..f1997e9da61f 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -349,7 +349,11 @@ prototypes: locking rules: inode-i_lock may block fl_copy_lock: yes no -fl_release_private:maybe no +fl_release_private:maybe maybe[1] + +[1]: -fl_release_private for flock or POSIX locks is currently allowed +to block. Leases however can still be freed while the i_lock is held and +so fl_release_private called on a lease should not block. --- lock_manager_operations --- prototypes: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock
There's no need to call locks_free_lock here while still holding the i_lock. Defer that until the lock has been dropped. Signed-off-by: Jeff Layton jlay...@primarydata.com --- fs/locks.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 4ce087cca501..cb66fb05ad4a 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1761,13 +1761,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg) ret = fl; spin_lock(inode-i_lock); error = __vfs_setlease(filp, arg, ret); - if (error) { - spin_unlock(inode-i_lock); - locks_free_lock(fl); - goto out_free_fasync; - } - if (ret != fl) - locks_free_lock(fl); + if (error) + goto out_unlock; + if (ret == fl) + fl = NULL; /* * fasync_insert_entry() returns the old entry if any. @@ -1779,9 +1776,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg) new = NULL; error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0); +out_unlock: spin_unlock(inode-i_lock); - -out_free_fasync: + if (fl) + locks_free_lock(fl); if (new) fasync_free(new); return error; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/