[GIT PULL] Please pull NFS client bugfixes
Hi Linus The following changes since commit fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f: NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security (2013-11-04 16:42:52 -0500) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-2 for you to fetch changes up to 8c2fabc6542d9d0f8b16bd1045c2eda59bdcde13: nfs: fix pnfs Kconfig defaults (2013-11-15 13:41:43 -0500) NFS client bugfixes: - Stable fix for data corruption when retransmitting O_DIRECT writes - Stable fix for a deep recursion/stack overflow bug in rpc_release_client - Stable fix for infinite looping when mounting a NFSv4.x volume - Fix a typo in the nfs mount option parser - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is Christoph Hellwig (1): nfs: fix pnfs Kconfig defaults Jeff Layton (1): nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once NeilBrown (1): NFS: correctly report misuse of "migration" mount option. Trond Myklebust (2): SUNRPC: Fix a data corruption issue when retransmitting RPC calls SUNRPC: Avoid deep recursion in rpc_release_client fs/nfs/Kconfig| 6 +++--- fs/nfs/nfs4state.c| 7 ++- fs/nfs/super.c| 2 +- net/sunrpc/clnt.c | 29 + net/sunrpc/xprtsock.c | 28 +--- 5 files changed, 48 insertions(+), 24 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com signature.asc Description: This is a digitally signed message part
[GIT PULL] Please pull NFS client bugfixes
Hi Linus The following changes since commit fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f: NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security (2013-11-04 16:42:52 -0500) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-2 for you to fetch changes up to 8c2fabc6542d9d0f8b16bd1045c2eda59bdcde13: nfs: fix pnfs Kconfig defaults (2013-11-15 13:41:43 -0500) NFS client bugfixes: - Stable fix for data corruption when retransmitting O_DIRECT writes - Stable fix for a deep recursion/stack overflow bug in rpc_release_client - Stable fix for infinite looping when mounting a NFSv4.x volume - Fix a typo in the nfs mount option parser - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is Christoph Hellwig (1): nfs: fix pnfs Kconfig defaults Jeff Layton (1): nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once NeilBrown (1): NFS: correctly report misuse of migration mount option. Trond Myklebust (2): SUNRPC: Fix a data corruption issue when retransmitting RPC calls SUNRPC: Avoid deep recursion in rpc_release_client fs/nfs/Kconfig| 6 +++--- fs/nfs/nfs4state.c| 7 ++- fs/nfs/super.c| 2 +- net/sunrpc/clnt.c | 29 + net/sunrpc/xprtsock.c | 28 +--- 5 files changed, 48 insertions(+), 24 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com signature.asc Description: This is a digitally signed message part
[GIT PULL] Please pull NFS client changes for 3.13
Hi Linus, The following changes since commit f927318840745095cc7003f1564ca4b87655745d: Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs (2013-09-30 17:10:26 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-1 for you to fetch changes up to fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f: NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security (2013-11-04 16:42:52 -0500) NFS client updates for Linux 3.13 Highlights include: - Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry - Detect TCP connection breakage through the "keepalive" mechanism - Add client side support for NFSv4.x migration (Chuck Lever) - Add support for multiple security flavour arguments to the "sec=" mount option (Dros Adamson) - fs-cache bugfixes from David Howells: - Fix an issue whereby caching can be enabled on a file that is open for writing - More NFSv4 open code stable bugfixes - Various Labeled NFS (selinux) bugfixes, including one stable fix - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding Andy Adamson (1): NFSv4 Remove zeroing state kern warnings Chuck Lever (20): SUNRPC: Modify synopsis of rpc_client_register() NFS: Add nfs4_update_server NFS: Add functions to swap transports during migration recovery NFS: Introduce a vector of migration recovery ops NFS: Export _nfs_display_fhandle() NFS: Add method to retrieve fs_locations during migration recovery NFS: Add a super_block backpointer to the nfs_server struct NFS: Add basic migration support to state manager thread NFS: Re-use exit code in nfs4_async_handle_error() NFS: Rename "stateid_invalid" label NFS: Add migration recovery callouts in nfs4proc.c NFS: Handle NFS4ERR_MOVED during delegation recall NFS: Add method to detect whether an FSID is still on the server NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager NFS: Implement support for NFS4ERR_LEASE_MOVED NFS: Migration support for RELEASE_LOCKOWNER NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW NFS: Handle SEQ4_STATUS_LEASE_MOVED NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR NFS: Fix possible endless state recovery wait David Howells (3): FS-Cache: Add use/unuse/wake cookie wrappers FS-Cache: Provide the ability to enable/disable cookies NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() Geyslan G. Bem (3): nfs: Remove useless 'error' assignment nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c' J. Bruce Fields (2): sunrpc: comment typo fix nfs: use IS_ROOT not DCACHE_DISCONNECTED Jeff Layton (5): nfs: reject version and minorversion changes on remount attempts nfs: fix handling of invalid mount options in nfs_remount nfs: fix inverted test for delegation in nfs4_reclaim_open_state nfs: fix oops when trying to set SELinux label nfs: set security label when revalidating inode NeilBrown (1): SUNRPC: close a rare race in xs_tcp_setup_socket. Trond Myklebust (24): NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk() SUNRPC: Enable the keepalive option for TCP sockets SUNRPC: Only update the TCP connect cookie on a successful connect SUNRPC: Don't set the request connect_cookie until a successful transmit SUNRPC: Clear the request rq_bytes_sent field in xprt_release_write SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool SUNRPC: Add RPC task and client level options to disable the resend timeout NFSv4: Ensure that we disable the resend timeout for NFSv4 SUNRPC: Fix RPC call retransmission statistics SUNRPC: Remove redundant initialisations of request rq_bytes_sent SUNRPC: call_connect_status should recheck bind and connect status on error NFSv4.1: Don't change the security label as part of open reclaim. NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state SUNRPC: Add a helper to switch the transport of an rpc_clnt SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg Merge branch 'fscache' of git://git.kernel.org/.../dhowells/linux-fs into linux-next SUNRPC: Cleanup xs_destroy() NFS: Fix a missing initialisation when reading the SELinux label NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
[GIT PULL] Please pull NFS client changes for 3.13
Hi Linus, The following changes since commit f927318840745095cc7003f1564ca4b87655745d: Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs (2013-09-30 17:10:26 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-1 for you to fetch changes up to fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f: NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security (2013-11-04 16:42:52 -0500) NFS client updates for Linux 3.13 Highlights include: - Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry - Detect TCP connection breakage through the keepalive mechanism - Add client side support for NFSv4.x migration (Chuck Lever) - Add support for multiple security flavour arguments to the sec= mount option (Dros Adamson) - fs-cache bugfixes from David Howells: - Fix an issue whereby caching can be enabled on a file that is open for writing - More NFSv4 open code stable bugfixes - Various Labeled NFS (selinux) bugfixes, including one stable fix - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding Andy Adamson (1): NFSv4 Remove zeroing state kern warnings Chuck Lever (20): SUNRPC: Modify synopsis of rpc_client_register() NFS: Add nfs4_update_server NFS: Add functions to swap transports during migration recovery NFS: Introduce a vector of migration recovery ops NFS: Export _nfs_display_fhandle() NFS: Add method to retrieve fs_locations during migration recovery NFS: Add a super_block backpointer to the nfs_server struct NFS: Add basic migration support to state manager thread NFS: Re-use exit code in nfs4_async_handle_error() NFS: Rename stateid_invalid label NFS: Add migration recovery callouts in nfs4proc.c NFS: Handle NFS4ERR_MOVED during delegation recall NFS: Add method to detect whether an FSID is still on the server NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager NFS: Implement support for NFS4ERR_LEASE_MOVED NFS: Migration support for RELEASE_LOCKOWNER NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW NFS: Handle SEQ4_STATUS_LEASE_MOVED NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR NFS: Fix possible endless state recovery wait David Howells (3): FS-Cache: Add use/unuse/wake cookie wrappers FS-Cache: Provide the ability to enable/disable cookies NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() Geyslan G. Bem (3): nfs: Remove useless 'error' assignment nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c' J. Bruce Fields (2): sunrpc: comment typo fix nfs: use IS_ROOT not DCACHE_DISCONNECTED Jeff Layton (5): nfs: reject version and minorversion changes on remount attempts nfs: fix handling of invalid mount options in nfs_remount nfs: fix inverted test for delegation in nfs4_reclaim_open_state nfs: fix oops when trying to set SELinux label nfs: set security label when revalidating inode NeilBrown (1): SUNRPC: close a rare race in xs_tcp_setup_socket. Trond Myklebust (24): NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk() SUNRPC: Enable the keepalive option for TCP sockets SUNRPC: Only update the TCP connect cookie on a successful connect SUNRPC: Don't set the request connect_cookie until a successful transmit SUNRPC: Clear the request rq_bytes_sent field in xprt_release_write SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool SUNRPC: Add RPC task and client level options to disable the resend timeout NFSv4: Ensure that we disable the resend timeout for NFSv4 SUNRPC: Fix RPC call retransmission statistics SUNRPC: Remove redundant initialisations of request rq_bytes_sent SUNRPC: call_connect_status should recheck bind and connect status on error NFSv4.1: Don't change the security label as part of open reclaim. NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state SUNRPC: Add a helper to switch the transport of an rpc_clnt SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg Merge branch 'fscache' of git://git.kernel.org/.../dhowells/linux-fs into linux-next SUNRPC: Cleanup xs_destroy() NFS: Fix a missing initialisation when reading the SELinux label NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
On Fri, 2013-10-18 at 22:03 +0200, Helge Deller wrote: > On 10/18/2013 09:36 PM, Myklebust, Trond wrote: > > Also, could you please try a sysRQ-t the next time it happens, so that > > we can get a trace of where the mount program is hanging. Knowing that > > the mount is stuck in "__schedule()" is not really interesting unless we > > know from where that was called. > > Actually, the machine was still running in this state. > Here is sysrq-t: > [112009.084000] mount S 401040c0 0 25331 1 > 0x0010 > [112009.084000] Backtrace: > [112009.084000] [<40113a68>] __schedule팞瓓ﴱ > [112009.232000] > [112009.232000] mount.nfs D 401040c0 0 25332 25331 > 0x0010 > [112009.232000] Backtrace: > [112009.232000] [<40113a68>] __schedule팞瓓ﴱ That makes no sense unless sysrq-t works differently on parisc than on other platforms. I'd expect the backtrace to at least include a system call. Parisc experts? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote: > On 10/17/2013 11:07 PM, Myklebust, Trond wrote: > > On Thu, 2013-10-17 at 22:42 퍭, Helge Deller wrote: > >> I'm seeing a regression with current kernel git head when using NFS-mounts. > >> Architecture in my case is parisc, although I don't think that this is > >> relevant. > >> At least kernel 3.10 (and I think 3.11) didn't showed that problem. > >> > >> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1. > >> Here is an output with kswapd1: > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND > >>37 root 20 0 000 R 91.8 0.0 63:00.40 kswapd1 > >> 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top > >> 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init > >> > >> This is what ps shows: > >> ls:~# ps -ef | grep mount > >> root 1181 1 0 14:51 ?00:00:18 /usr/sbin/automount > >> --pid-file /var/run/autofs.pid > >> root 25331 1181 0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o > >> nolock,rw,hard,intr homes:/unixhome1 /net/home1 > >> root 25332 25331 0 21:25 ?00:00:00 /sbin/mount.nfs > >> homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr > >> > >> And using sysrq to show the blocked tasks I get in syslog: > >> SysRq : Show Blocked State > >> mount.nfs D 401040c0 0 25332 25331 0x0010 > >> Backtrace: > >> [<40113a68>] __schedule팞瓓ﴱ > >> > >> I know it's not a problem of the NFS server, since the same mount is still > >> ok on other machines. > >> The NFS directory was already mounted and in use when this mount happened > >> again (called by cron-job). > >> > >> Any ideas? > > > > If the NFS directory is already mounted, then why is the automounter > > trying to mount it a second time? > > I was wrong in this. > The directory wasn't mounted yet (or at least it was unmounted in the > meantime before the new > mount.nfs was called). > > I'm now not even sure, that the high kswapd is really triggered by the NFS > problem, > because I now have another machine with the blocked NFS-mount, but without > the high kswapd usage. > > Nevertheless, the blocked nfs mount tasks really make me wonder. There is > clearly > some kind of regression since it doesn't happen with older kernels. Have you ever reproduced it without the automounter? Also, could you please try a sysRQ-t the next time it happens, so that we can get a trace of where the mount program is hanging. Knowing that the mount is stuck in "__schedule()" is not really interesting unless we know from where that was called. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote: On 10/17/2013 11:07 PM, Myklebust, Trond wrote: On Thu, 2013-10-17 at 22:42 퍭, Helge Deller wrote: I'm seeing a regression with current kernel git head when using NFS-mounts. Architecture in my case is parisc, although I don't think that this is relevant. At least kernel 3.10 (and I think 3.11) didn't showed that problem. The symtom is, that top shows high usage of either kswapd0 or kswapd1. Here is an output with kswapd1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND 37 root 20 0 000 R 91.8 0.0 63:00.40 kswapd1 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init This is what ps shows: ls:~# ps -ef | grep mount root 1181 1 0 14:51 ?00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid root 25331 1181 0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1 root 25332 25331 0 21:25 ?00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr And using sysrq to show the blocked tasks I get in syslog: SysRq : Show Blocked State mount.nfs D 401040c0 0 25332 25331 0x0010 Backtrace: [40113a68] __schedule팞瓓ﴱ I know it's not a problem of the NFS server, since the same mount is still ok on other machines. The NFS directory was already mounted and in use when this mount happened again (called by cron-job). Any ideas? If the NFS directory is already mounted, then why is the automounter trying to mount it a second time? I was wrong in this. The directory wasn't mounted yet (or at least it was unmounted in the meantime before the new mount.nfs was called). I'm now not even sure, that the high kswapd is really triggered by the NFS problem, because I now have another machine with the blocked NFS-mount, but without the high kswapd usage. Nevertheless, the blocked nfs mount tasks really make me wonder. There is clearly some kind of regression since it doesn't happen with older kernels. Have you ever reproduced it without the automounter? Also, could you please try a sysRQ-t the next time it happens, so that we can get a trace of where the mount program is hanging. Knowing that the mount is stuck in __schedule() is not really interesting unless we know from where that was called. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
On Fri, 2013-10-18 at 22:03 +0200, Helge Deller wrote: On 10/18/2013 09:36 PM, Myklebust, Trond wrote: Also, could you please try a sysRQ-t the next time it happens, so that we can get a trace of where the mount program is hanging. Knowing that the mount is stuck in __schedule() is not really interesting unless we know from where that was called. Actually, the machine was still running in this state. Here is sysrq-t: [112009.084000] mount S 401040c0 0 25331 1 0x0010 [112009.084000] Backtrace: [112009.084000] [40113a68] __schedule팞瓓ﴱ [112009.232000] [112009.232000] mount.nfs D 401040c0 0 25332 25331 0x0010 [112009.232000] Backtrace: [112009.232000] [40113a68] __schedule팞瓓ﴱ That makes no sense unless sysrq-t works differently on parisc than on other platforms. I'd expect the backtrace to at least include a system call. Parisc experts? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote: > I'm seeing a regression with current kernel git head when using NFS-mounts. > Architecture in my case is parisc, although I don't think that this is > relevant. > At least kernel 3.10 (and I think 3.11) didn't showed that problem. > > The symtom is, that "top" shows high usage of either kswapd0 or kswapd1. > Here is an output with kswapd1: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >37 root 20 0 000 R 91.8 0.0 63:00.40 kswapd1 > 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top > 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init > > This is what ps shows: > ls:~# ps -ef | grep mount > root 1181 1 0 14:51 ?00:00:18 /usr/sbin/automount > --pid-file /var/run/autofs.pid > root 25331 1181 0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o > nolock,rw,hard,intr homes:/unixhome1 /net/home1 > root 25332 25331 0 21:25 ?00:00:00 /sbin/mount.nfs > homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr > > And using sysrq to show the blocked tasks I get in syslog: > SysRq : Show Blocked State > mount.nfs D 401040c0 0 25332 25331 0x0010 > Backtrace: > [<40113a68>] __schedule+0x500/0x810 > > I know it's not a problem of the NFS server, since the same mount is still ok > on other machines. > The NFS directory was already mounted and in use when this mount happened > again (called by cron-job). > > Any ideas? If the NFS directory is already mounted, then why is the automounter trying to mount it a second time? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?
On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote: I'm seeing a regression with current kernel git head when using NFS-mounts. Architecture in my case is parisc, although I don't think that this is relevant. At least kernel 3.10 (and I think 3.11) didn't showed that problem. The symtom is, that top shows high usage of either kswapd0 or kswapd1. Here is an output with kswapd1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 37 root 20 0 000 R 91.8 0.0 63:00.40 kswapd1 28448 root 20 0 3252 1428 1060 R 15.3 0.0 0:00.09 top 1 root 20 0 2784 988 852 S 0.0 0.0 0:09.95 init This is what ps shows: ls:~# ps -ef | grep mount root 1181 1 0 14:51 ?00:00:18 /usr/sbin/automount --pid-file /var/run/autofs.pid root 25331 1181 0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o nolock,rw,hard,intr homes:/unixhome1 /net/home1 root 25332 25331 0 21:25 ?00:00:00 /sbin/mount.nfs homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr And using sysrq to show the blocked tasks I get in syslog: SysRq : Show Blocked State mount.nfs D 401040c0 0 25332 25331 0x0010 Backtrace: [40113a68] __schedule+0x500/0x810 I know it's not a problem of the NFS server, since the same mount is still ok on other machines. The NFS directory was already mounted and in use when this mount happened again (called by cron-job). Any ideas? If the NFS directory is already mounted, then why is the automounter trying to mount it a second time? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Please pull NFS client bugfixes
Hi Linus, The following changes since commit 4a10c2ac2f368583138b774ca41fac4207911983: Linux 3.12-rc2 (2013-09-23 15:41:09 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-4 for you to fetch changes up to 367156d9a87b21b5232dd93107c5fc61b09ba2ef: NFS: Give "flavor" an initial value to fix a compile warning (2013-09-29 16:03:34 -0400) NFS client bugfixes for 3.12 - Stable fix for Oopses in the pNFS files layout driver - Fix a regression when doing a non-exclusive file create on NFSv4.x - NFSv4.1 security negotiation fixes when looking up the root filesystem - Fix a memory ordering issue in the pNFS files layout driver Anna Schumaker (1): NFS: Give "flavor" an initial value to fix a compile warning Trond Myklebust (3): NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds Weston Andros Adamson (1): NFSv4.1: try SECINFO_NO_NAME flavs until one works fs/nfs/dir.c | 2 +- fs/nfs/nfs4file.c | 3 ++- fs/nfs/nfs4filelayoutdev.c | 20 +--- fs/nfs/nfs4proc.c | 58 +- include/linux/nfs_xdr.h| 3 ++- 5 files changed, 63 insertions(+), 23 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 16:08 -0400, Ric Wheeler wrote: > On 09/30/2013 04:00 PM, Bernd Schubert wrote: > > pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own > > interface? And userspace needs to address all of them differently? > > The NFS and SCSI groups have each defined a standard which Zach's proposal > abstracts into a common user API. > > Distributed file systems tend to be rather unique and do not have similar > standard bodies, but a lot of them could hide server specific implementations > under the current proposed interfaces. > > What is not a good idea is to drag out the core, simple copy offload > discussion > for another 5 years to pull in every odd use case :) Agreed. The whole idea of a common system call interface should be to allow us to abstract away the underlying storage and filesystem architectures. If filesystem developers also want a way to expose that underlying architecture to applications in order to enable further optimisations, then that belongs in a separate discussion. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote: > On 09/30/2013 09:34 PM, Myklebust, Trond wrote: > > On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote: > >> On 09/30/2013 08:02 PM, Myklebust, Trond wrote: > >>> On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: > >>>> On 09/30/2013 07:44 PM, Myklebust, Trond wrote: > >>>>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: > >>>>>> It would be nice if there would be way if the file system would get a > >>>>>> hint that the target file is supposed to be copy of another file. That > >>>>>> way distributed file systems could also create the target-file with the > >>>>>> correct meta-information (same storage targets as in-file has). > >>>>>> Well, if we cannot agree on that, file system with a custom protocol at > >>>>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not > >>>>>> sure if this would work for pNFS, though. > >>>>> > >>>>> splice() does not create new files. What you appear to be asking for > >>>>> lies way outside the scope of that system call interface. > >>>>> > >>>> > >>>> Sorry I know, definitely outside the scope of splice, but in the context > >>>> of offloaded file copies. So the question is, what is the best way to > >>>> address/discuss that? > >>> > >>> Why does it need to be addressed in the first place? > >> > >> An offloaded copy is still not efficient if different storage > >> servers/targets used by from-file and to-file. > > > > So? > > mds1: orig-file > oss1/target1: orig-chunk1 > > mds1: target-file > ossN/targetN: target-chunk1 > > clientN: Performs the copy > > Ideally, orig-chunk1 and target-chunk1 are on the same server and same > target. Copy offload then even could done from the underlying fs, > similiar as local splice. > If different ossN servers are used copies still have to be done over > network by these storage servers, although the client only would need to > initiate the copy. Still faster, but also not ideal. > > > > >>> > >>> What is preventing an application from retrieving and setting this > >>> information using standard libc functions such as fstat()+open(), and > >>> supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd > >>> where appropriate? > >>> > >> > >> At a minimum this requires network and metadata overhead. And while I'm > >> working on FhGFS now, I still wonder what other file system need to do - > >> for example Lustre pre-allocates storage-target files on creating a > >> file, so file layout changes mean even more overhead there. > > > > The problem you are describing is limited to a narrow set of storage > > architectures. If copy offload using splice() doesn't make sense for > > those architectures, then don't implement it for them. > > But it _does_ make sense. The file system just needs a hint that a > splice copy is going to come up. Just wait for the splice() system call. How is this any different from write()? > > You might be able to provide ioctls() to do these special hinted file > > creations for those filesystems that need it, but the vast majority > > don't, and you shouldn't enforce it on them. > > And exactly for that we need a standard - it does not make sense if each > and every distributed file system implements its own > ioctl/libattr/libacl interface for that. > > > > >> Anyway, if we could agree on to use libattr or libacl to teach the file > >> system about the upcoming splice call I would be fine. > > > > libattr and libacl are generic libraries that exist to manipulate xattrs > > and acls. They do not need to contain Lustre-specific code. > > > > pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own > interface? And userspace needs to address all of them differently? > > I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, > didn't find a better name yet), which would take in-file-path and > out-file-path and allow the file system to create out-file-path with the > same meta-layout as in-file-path. And it would need some flags, such as > AUTO (file system decides if it makes sense to do a local copy) and > FORCE (always try a local copy). splice() is not a whole-file copy operation; it's a byte range copy. How does the above help other than in the whole-file case? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote: > On 09/30/2013 08:02 PM, Myklebust, Trond wrote: > > On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: > >> On 09/30/2013 07:44 PM, Myklebust, Trond wrote: > >>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: > >>>> It would be nice if there would be way if the file system would get a > >>>> hint that the target file is supposed to be copy of another file. That > >>>> way distributed file systems could also create the target-file with the > >>>> correct meta-information (same storage targets as in-file has). > >>>> Well, if we cannot agree on that, file system with a custom protocol at > >>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not > >>>> sure if this would work for pNFS, though. > >>> > >>> splice() does not create new files. What you appear to be asking for > >>> lies way outside the scope of that system call interface. > >>> > >> > >> Sorry I know, definitely outside the scope of splice, but in the context > >> of offloaded file copies. So the question is, what is the best way to > >> address/discuss that? > > > > Why does it need to be addressed in the first place? > > An offloaded copy is still not efficient if different storage > servers/targets used by from-file and to-file. So? > > > > What is preventing an application from retrieving and setting this > > information using standard libc functions such as fstat()+open(), and > > supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd > > where appropriate? > > > > At a minimum this requires network and metadata overhead. And while I'm > working on FhGFS now, I still wonder what other file system need to do - > for example Lustre pre-allocates storage-target files on creating a > file, so file layout changes mean even more overhead there. The problem you are describing is limited to a narrow set of storage architectures. If copy offload using splice() doesn't make sense for those architectures, then don't implement it for them. You might be able to provide ioctls() to do these special hinted file creations for those filesystems that need it, but the vast majority don't, and you shouldn't enforce it on them. > Anyway, if we could agree on to use libattr or libacl to teach the file > system about the upcoming splice call I would be fine. libattr and libacl are generic libraries that exist to manipulate xattrs and acls. They do not need to contain Lustre-specific code. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: > On 09/30/2013 07:44 PM, Myklebust, Trond wrote: > > On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: > >> It would be nice if there would be way if the file system would get a > >> hint that the target file is supposed to be copy of another file. That > >> way distributed file systems could also create the target-file with the > >> correct meta-information (same storage targets as in-file has). > >> Well, if we cannot agree on that, file system with a custom protocol at > >> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not > >> sure if this would work for pNFS, though. > > > > splice() does not create new files. What you appear to be asking for > > lies way outside the scope of that system call interface. > > > > Sorry I know, definitely outside the scope of splice, but in the context > of offloaded file copies. So the question is, what is the best way to > address/discuss that? Why does it need to be addressed in the first place? What is preventing an application from retrieving and setting this information using standard libc functions such as fstat()+open(), and supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd where appropriate? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: > It would be nice if there would be way if the file system would get a > hint that the target file is supposed to be copy of another file. That > way distributed file systems could also create the target-file with the > correct meta-information (same storage targets as in-file has). > Well, if we cannot agree on that, file system with a custom protocol at > least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not > sure if this would work for pNFS, though. splice() does not create new files. What you appear to be asking for lies way outside the scope of that system call interface. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [RFC] extending splice for copy offloading
> -Original Message- > From: Ric Wheeler [mailto:rwhee...@redhat.com] > Sent: Monday, September 30, 2013 10:29 AM > To: Miklos Szeredi > Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel > Mailing List; Linux-Fsdevel; linux-...@vger.kernel.org; Schumaker, Bryan; > Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong > Subject: Re: [RFC] extending splice for copy offloading > > On 09/30/2013 10:24 AM, Miklos Szeredi wrote: > > On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler > wrote: > >> On 09/30/2013 10:51 AM, Miklos Szeredi wrote: > >>> On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields > >>> > >>> wrote: > >>>>> My other worry is about interruptibility/restartability. Ideas? > >>>>> > >>>>> What happens on splice(from, to, 4G) and it's a non-reflink copy? > >>>>> Can the page cache copy be made restartable? Or should splice() be > >>>>> allowed to return a short count? What happens on (non-reflink) > >>>>> remote copies and huge request sizes? > >>>> If I were writing an application that required copies to be > >>>> restartable, I'd probably use the largest possible range in the > >>>> reflink case but break the copy into smaller chunks in the splice case. > >>>> > >>> The app really doesn't want to care about that. And it doesn't want > >>> to care about restartability, etc.. It's something the *kernel* has > >>> to care about. You just can't have uninterruptible syscalls that > >>> sleep for a "long" time, otherwise first you'll just have annoyed > >>> users pressing ^C in vain; then, if the sleep is even longer, > >>> warnings about task sleeping too long. > >>> > >>> One idea is letting splice() return a short count, and so the app > >>> can safely issue SIZE_MAX requests and the kernel can decide if it > >>> can copy the whole file in one go or if it wants to do it in smaller > >>> chunks. > >>> > >> You cannot rely on a short count. That implies that an offloaded copy > >> starts at byte 0 and the short count first bytes are all valid. > > Huh? > > > > - app calls splice(from, 0, to, 0, SIZE_MAX) > > 1) VFS calls ->direct_splice(from, 0, to, 0, SIZE_MAX) > > 1.a) fs reflinks the whole file in a jiffy and returns the size of the > > file > > 1 b) fs does copy offload of, say, 64MB and returns 64M > > 2) VFS does page copy of, say, 1MB and returns 1MB > > - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset > > ... > > > > The point is: the app is always doing the same (incrementing offset > > with the return value from splice) and the kernel can decide what is > > the best size it can service within a single uninterruptible syscall. > > > > Wouldn't that work? > > > > Thanks, > > Miklos > > No. > > Keep in mind that the offload operation in (1) might fail partially. The > target > file (the copy) is allocated, the question is what ranges have valid data. > > I don't see that (2) is interesting or really needed to be done in the kernel. > If nothing else, it tends to confuse the discussion > Anna's figures, that were presented at Plumber's, show that (2) is still worth doing on the _server_ for the case of NFS. Cheers Trond
RE: [RFC] extending splice for copy offloading
-Original Message- From: Ric Wheeler [mailto:rwhee...@redhat.com] Sent: Monday, September 30, 2013 10:29 AM To: Miklos Szeredi Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel Mailing List; Linux-Fsdevel; linux-...@vger.kernel.org; Schumaker, Bryan; Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong Subject: Re: [RFC] extending splice for copy offloading On 09/30/2013 10:24 AM, Miklos Szeredi wrote: On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler rwhee...@redhat.com wrote: On 09/30/2013 10:51 AM, Miklos Szeredi wrote: On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields bfie...@fieldses.org wrote: My other worry is about interruptibility/restartability. Ideas? What happens on splice(from, to, 4G) and it's a non-reflink copy? Can the page cache copy be made restartable? Or should splice() be allowed to return a short count? What happens on (non-reflink) remote copies and huge request sizes? If I were writing an application that required copies to be restartable, I'd probably use the largest possible range in the reflink case but break the copy into smaller chunks in the splice case. The app really doesn't want to care about that. And it doesn't want to care about restartability, etc.. It's something the *kernel* has to care about. You just can't have uninterruptible syscalls that sleep for a long time, otherwise first you'll just have annoyed users pressing ^C in vain; then, if the sleep is even longer, warnings about task sleeping too long. One idea is letting splice() return a short count, and so the app can safely issue SIZE_MAX requests and the kernel can decide if it can copy the whole file in one go or if it wants to do it in smaller chunks. You cannot rely on a short count. That implies that an offloaded copy starts at byte 0 and the short count first bytes are all valid. Huh? - app calls splice(from, 0, to, 0, SIZE_MAX) 1) VFS calls -direct_splice(from, 0, to, 0, SIZE_MAX) 1.a) fs reflinks the whole file in a jiffy and returns the size of the file 1 b) fs does copy offload of, say, 64MB and returns 64M 2) VFS does page copy of, say, 1MB and returns 1MB - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset ... The point is: the app is always doing the same (incrementing offset with the return value from splice) and the kernel can decide what is the best size it can service within a single uninterruptible syscall. Wouldn't that work? Thanks, Miklos No. Keep in mind that the offload operation in (1) might fail partially. The target file (the copy) is allocated, the question is what ranges have valid data. I don't see that (2) is interesting or really needed to be done in the kernel. If nothing else, it tends to confuse the discussion Anna's figures, that were presented at Plumber's, show that (2) is still worth doing on the _server_ for the case of NFS. Cheers Trond
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: It would be nice if there would be way if the file system would get a hint that the target file is supposed to be copy of another file. That way distributed file systems could also create the target-file with the correct meta-information (same storage targets as in-file has). Well, if we cannot agree on that, file system with a custom protocol at least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not sure if this would work for pNFS, though. splice() does not create new files. What you appear to be asking for lies way outside the scope of that system call interface. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: On 09/30/2013 07:44 PM, Myklebust, Trond wrote: On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: It would be nice if there would be way if the file system would get a hint that the target file is supposed to be copy of another file. That way distributed file systems could also create the target-file with the correct meta-information (same storage targets as in-file has). Well, if we cannot agree on that, file system with a custom protocol at least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not sure if this would work for pNFS, though. splice() does not create new files. What you appear to be asking for lies way outside the scope of that system call interface. Sorry I know, definitely outside the scope of splice, but in the context of offloaded file copies. So the question is, what is the best way to address/discuss that? Why does it need to be addressed in the first place? What is preventing an application from retrieving and setting this information using standard libc functions such as fstat()+open(), and supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd where appropriate? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote: On 09/30/2013 08:02 PM, Myklebust, Trond wrote: On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: On 09/30/2013 07:44 PM, Myklebust, Trond wrote: On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: It would be nice if there would be way if the file system would get a hint that the target file is supposed to be copy of another file. That way distributed file systems could also create the target-file with the correct meta-information (same storage targets as in-file has). Well, if we cannot agree on that, file system with a custom protocol at least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not sure if this would work for pNFS, though. splice() does not create new files. What you appear to be asking for lies way outside the scope of that system call interface. Sorry I know, definitely outside the scope of splice, but in the context of offloaded file copies. So the question is, what is the best way to address/discuss that? Why does it need to be addressed in the first place? An offloaded copy is still not efficient if different storage servers/targets used by from-file and to-file. So? What is preventing an application from retrieving and setting this information using standard libc functions such as fstat()+open(), and supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd where appropriate? At a minimum this requires network and metadata overhead. And while I'm working on FhGFS now, I still wonder what other file system need to do - for example Lustre pre-allocates storage-target files on creating a file, so file layout changes mean even more overhead there. The problem you are describing is limited to a narrow set of storage architectures. If copy offload using splice() doesn't make sense for those architectures, then don't implement it for them. You might be able to provide ioctls() to do these special hinted file creations for those filesystems that need it, but the vast majority don't, and you shouldn't enforce it on them. Anyway, if we could agree on to use libattr or libacl to teach the file system about the upcoming splice call I would be fine. libattr and libacl are generic libraries that exist to manipulate xattrs and acls. They do not need to contain Lustre-specific code. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote: On 09/30/2013 09:34 PM, Myklebust, Trond wrote: On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote: On 09/30/2013 08:02 PM, Myklebust, Trond wrote: On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: On 09/30/2013 07:44 PM, Myklebust, Trond wrote: On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: It would be nice if there would be way if the file system would get a hint that the target file is supposed to be copy of another file. That way distributed file systems could also create the target-file with the correct meta-information (same storage targets as in-file has). Well, if we cannot agree on that, file system with a custom protocol at least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not sure if this would work for pNFS, though. splice() does not create new files. What you appear to be asking for lies way outside the scope of that system call interface. Sorry I know, definitely outside the scope of splice, but in the context of offloaded file copies. So the question is, what is the best way to address/discuss that? Why does it need to be addressed in the first place? An offloaded copy is still not efficient if different storage servers/targets used by from-file and to-file. So? mds1: orig-file oss1/target1: orig-chunk1 mds1: target-file ossN/targetN: target-chunk1 clientN: Performs the copy Ideally, orig-chunk1 and target-chunk1 are on the same server and same target. Copy offload then even could done from the underlying fs, similiar as local splice. If different ossN servers are used copies still have to be done over network by these storage servers, although the client only would need to initiate the copy. Still faster, but also not ideal. What is preventing an application from retrieving and setting this information using standard libc functions such as fstat()+open(), and supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd where appropriate? At a minimum this requires network and metadata overhead. And while I'm working on FhGFS now, I still wonder what other file system need to do - for example Lustre pre-allocates storage-target files on creating a file, so file layout changes mean even more overhead there. The problem you are describing is limited to a narrow set of storage architectures. If copy offload using splice() doesn't make sense for those architectures, then don't implement it for them. But it _does_ make sense. The file system just needs a hint that a splice copy is going to come up. Just wait for the splice() system call. How is this any different from write()? You might be able to provide ioctls() to do these special hinted file creations for those filesystems that need it, but the vast majority don't, and you shouldn't enforce it on them. And exactly for that we need a standard - it does not make sense if each and every distributed file system implements its own ioctl/libattr/libacl interface for that. Anyway, if we could agree on to use libattr or libacl to teach the file system about the upcoming splice call I would be fine. libattr and libacl are generic libraries that exist to manipulate xattrs and acls. They do not need to contain Lustre-specific code. pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own interface? And userspace needs to address all of them differently? I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, didn't find a better name yet), which would take in-file-path and out-file-path and allow the file system to create out-file-path with the same meta-layout as in-file-path. And it would need some flags, such as AUTO (file system decides if it makes sense to do a local copy) and FORCE (always try a local copy). splice() is not a whole-file copy operation; it's a byte range copy. How does the above help other than in the whole-file case? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 16:08 -0400, Ric Wheeler wrote: On 09/30/2013 04:00 PM, Bernd Schubert wrote: pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own interface? And userspace needs to address all of them differently? The NFS and SCSI groups have each defined a standard which Zach's proposal abstracts into a common user API. Distributed file systems tend to be rather unique and do not have similar standard bodies, but a lot of them could hide server specific implementations under the current proposed interfaces. What is not a good idea is to drag out the core, simple copy offload discussion for another 5 years to pull in every odd use case :) Agreed. The whole idea of a common system call interface should be to allow us to abstract away the underlying storage and filesystem architectures. If filesystem developers also want a way to expose that underlying architecture to applications in order to enable further optimisations, then that belongs in a separate discussion. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client bugfixes
Hi Linus, The following changes since commit 4a10c2ac2f368583138b774ca41fac4207911983: Linux 3.12-rc2 (2013-09-23 15:41:09 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-4 for you to fetch changes up to 367156d9a87b21b5232dd93107c5fc61b09ba2ef: NFS: Give flavor an initial value to fix a compile warning (2013-09-29 16:03:34 -0400) NFS client bugfixes for 3.12 - Stable fix for Oopses in the pNFS files layout driver - Fix a regression when doing a non-exclusive file create on NFSv4.x - NFSv4.1 security negotiation fixes when looking up the root filesystem - Fix a memory ordering issue in the pNFS files layout driver Anna Schumaker (1): NFS: Give flavor an initial value to fix a compile warning Trond Myklebust (3): NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds Weston Andros Adamson (1): NFSv4.1: try SECINFO_NO_NAME flavs until one works fs/nfs/dir.c | 2 +- fs/nfs/nfs4file.c | 3 ++- fs/nfs/nfs4filelayoutdev.c | 20 +--- fs/nfs/nfs4proc.c | 58 +- include/linux/nfs_xdr.h| 3 ++- 5 files changed, 63 insertions(+), 23 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [RFC] extending splice for copy offloading
> -Original Message- > From: Miklos Szeredi [mailto:mik...@szeredi.hu] > Sent: Saturday, September 28, 2013 12:50 AM > To: Zach Brown > Cc: J. Bruce Fields; Ric Wheeler; Anna Schumaker; Kernel Mailing List; Linux- > Fsdevel; linux-...@vger.kernel.org; Myklebust, Trond; Schumaker, Bryan; > Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong > Subject: Re: [RFC] extending splice for copy offloading > > On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown wrote: > >> Also, I don't get the first option above at all. The argument is > >> that it's safer to have more copies? How much safety does another > >> copy on the same disk really give you? Do systems that do dedup > >> provide interfaces to turn it off per-file? > > I don't see the safety argument very compelling either. There are real > semantic differences, however: ENOSPC on a write to a > (apparentlíy) already allocated block. That could be a bit unexpected. Do we > need a fallocate extension to deal with shared blocks? The above has been the case for all enterprise storage arrays ever since the invention of snapshots. The NFSv4.2 spec does allow you to set a per-file attribute that causes the storage server to always preallocate enough buffers to guarantee that you can rewrite the entire file, however the fact that we've lived without it for said 20 years leads me to believe that demand for it is going to be limited. I haven't put it top of the list of features we care to implement... Cheers, Trond
RE: [RFC] extending splice for copy offloading
-Original Message- From: Miklos Szeredi [mailto:mik...@szeredi.hu] Sent: Saturday, September 28, 2013 12:50 AM To: Zach Brown Cc: J. Bruce Fields; Ric Wheeler; Anna Schumaker; Kernel Mailing List; Linux- Fsdevel; linux-...@vger.kernel.org; Myklebust, Trond; Schumaker, Bryan; Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong Subject: Re: [RFC] extending splice for copy offloading On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown z...@redhat.com wrote: Also, I don't get the first option above at all. The argument is that it's safer to have more copies? How much safety does another copy on the same disk really give you? Do systems that do dedup provide interfaces to turn it off per-file? I don't see the safety argument very compelling either. There are real semantic differences, however: ENOSPC on a write to a (apparentlíy) already allocated block. That could be a bit unexpected. Do we need a fallocate extension to deal with shared blocks? The above has been the case for all enterprise storage arrays ever since the invention of snapshots. The NFSv4.2 spec does allow you to set a per-file attribute that causes the storage server to always preallocate enough buffers to guarantee that you can rewrite the entire file, however the fact that we've lived without it for said 20 years leads me to believe that demand for it is going to be limited. I haven't put it top of the list of features we care to implement... Cheers, Trond
RE: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and dfprintk()
> -Original Message- > From: David Howells [mailto:dhowe...@redhat.com] > Sent: Thursday, September 26, 2013 10:36 AM > To: Joe Perches > Cc: dhowe...@redhat.com; bfie...@fieldses.org; Myklebust, Trond; > o...@lixom.net; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and > dfprintk() > > Joe Perches wrote: > > > no_printk doesn't prevent any argument side-effects from being > > optimized away by the compiler. > > > > ie: > > dprintk("%d", func()) > > func is now always called when before it wasn't. > > Yes, I know. There are half a dozen places where this is the case. Those > I've > wrapped in ifdebug(FACILITY) { ... } in the code. It's not the nicest, but at > least the compiler always gets to see everything, rather than bits of it > getting > hidden by the preprocessor - which means the call points will be less likely > to > bit rot over time. Your assumption is that RPC_DEBUG is disabled for most compiles. That is not the case. Trond -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in
> -Original Message- > From: J. Bruce Fields [mailto:bfie...@fieldses.org] > Sent: Thursday, September 26, 2013 10:21 AM > To: David Howells > Cc: Myklebust, Trond; o...@lixom.net; linux-...@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: Re: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in > > On Thu, Sep 26, 2013 at 03:45:02PM +0100, David Howells wrote: > > > > > > Here's a series of patches to make SunRPC/NFS use no_printk() to > > implement its null dfprintk() macro (ie. when RPC_DEBUG is disabled). > > This prevents 'unused variable' errors from occurring when a variable > > is set only for use in debugging statements and renders RPC/NFS_IFDEBUG > unnecessary. > > Does this patch series fix any actual warnings? Or does it just change the > way > that we prevent the warnings? > Right. If this is just code churn, then let's drop it. Otherwise, please explain why it is a good idea. Cheers, Trond -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in
-Original Message- From: J. Bruce Fields [mailto:bfie...@fieldses.org] Sent: Thursday, September 26, 2013 10:21 AM To: David Howells Cc: Myklebust, Trond; o...@lixom.net; linux-...@vger.kernel.org; linux- ker...@vger.kernel.org Subject: Re: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in On Thu, Sep 26, 2013 at 03:45:02PM +0100, David Howells wrote: Here's a series of patches to make SunRPC/NFS use no_printk() to implement its null dfprintk() macro (ie. when RPC_DEBUG is disabled). This prevents 'unused variable' errors from occurring when a variable is set only for use in debugging statements and renders RPC/NFS_IFDEBUG unnecessary. Does this patch series fix any actual warnings? Or does it just change the way that we prevent the warnings? Right. If this is just code churn, then let's drop it. Otherwise, please explain why it is a good idea. Cheers, Trond -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and dfprintk()
-Original Message- From: David Howells [mailto:dhowe...@redhat.com] Sent: Thursday, September 26, 2013 10:36 AM To: Joe Perches Cc: dhowe...@redhat.com; bfie...@fieldses.org; Myklebust, Trond; o...@lixom.net; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and dfprintk() Joe Perches j...@perches.com wrote: no_printk doesn't prevent any argument side-effects from being optimized away by the compiler. ie: dprintk(%d, func()) func is now always called when before it wasn't. Yes, I know. There are half a dozen places where this is the case. Those I've wrapped in ifdebug(FACILITY) { ... } in the code. It's not the nicest, but at least the compiler always gets to see everything, rather than bits of it getting hidden by the preprocessor - which means the call points will be less likely to bit rot over time. Your assumption is that RPC_DEBUG is disabled for most compiles. That is not the case. Trond -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Please pull an NFS client bugfix
Hi Linus, The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f: Linux 3.12-rc1 (2013-09-16 16:17:51 -0400) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-3 for you to fetch changes up to a0f6ed8ebe4f6d494ef70f67d4c0c153cbf59577: RPCSEC_GSS: fix crash on destroying gss auth (2013-09-18 10:18:44 -0500) NFS client bugfix for 3.12 - Fix a regression due to incorrect sharing of gss auth caches J. Bruce Fields (1): RPCSEC_GSS: fix crash on destroying gss auth net/sunrpc/auth_gss/auth_gss.c | 11 +++ 1 file changed, 11 insertions(+) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull an NFS client bugfix
Hi Linus, The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f: Linux 3.12-rc1 (2013-09-16 16:17:51 -0400) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-3 for you to fetch changes up to a0f6ed8ebe4f6d494ef70f67d4c0c153cbf59577: RPCSEC_GSS: fix crash on destroying gss auth (2013-09-18 10:18:44 -0500) NFS client bugfix for 3.12 - Fix a regression due to incorrect sharing of gss auth caches J. Bruce Fields (1): RPCSEC_GSS: fix crash on destroying gss auth net/sunrpc/auth_gss/auth_gss.c | 11 +++ 1 file changed, 11 insertions(+) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)
On Thu, 2013-09-12 at 21:20 +0200, Geert Uytterhoeven wrote: > On Thu, Sep 12, 2013 at 4:13 PM, Myklebust, Trond > wrote: > >> > --- a/net/sunrpc/auth_gss/auth_gss.c > >> > +++ b/net/sunrpc/auth_gss/auth_gss.c > >> > @@ -51,6 +51,7 @@ > >> > #include > >> > #include > >> > #include > >> > +#include > >> > > >> > #include "../netns.h" > >> > > >> > @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = > >> > GSS_RETRY_EXPIRED; > >> > * using integrity (two 4-byte integers): */ > >> > #define GSS_VERF_SLACK 100 > >> > > >> > +static DEFINE_HASHTABLE(gss_auth_hash_table, 16); > >> > +static DEFINE_SPINLOCK(gss_auth_hash_lock); > >> > >> Today's m68k/atari-defconfig kernel no longer boots, as it became larger > >> than > >> 4 MiB. > >> > >> bloat-o-meter tells me: > >> > >> function old new delta > >> gss_auth_hash_table- 262144 +262144 > >> > >> Woops... > > > > Whoops indeed. The above should have declared 16 buckets, and not 1<<16. > > I fell for Sasha's subtle trap... > > > >> Are you trying to game Tim's survey? ;-) > >> (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi) > >> > >> Can this memory be allocated dynamically / only when it's used? > > > > :-) It's declared inside a module, so that should already be the case, > > Only for the modular case. What about builtin, e.g. for nfsroot? > > Or is it better to not build in NFS_V4 support in that case? > > config NFS_V4 > If unsure, say Y. > > config NFSD_V4 > If unsure, say N. > > So that's why my defconfig has NFS_V4 but not NFSD_V4. It should be possible now to compile in NFSv3 support (and/or NFSv2), while keeping NFSv4 a module. That will usually result in CONFIG_SUNRPC_GSS=m... Of course, if your defconfig doesn't have module support then, yes, your only option to avoid compiling in rpcsec_gss is to not select NFSv4 at all. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client changes (part 2)
Hi Linus, The following changes since commit b1b3e136948a2bf4915326acb0d825d7d180753f: NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 -0400) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-2 for you to fetch changes up to 23c323af0375a7f63732bed0386aba5935b8de69: SUNRPC: No, I did not intend to create a 256KiB hashtable (2013-09-12 10:16:31 -0400) NFS client bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations. Andy Adamson (1): NFSv4.1 fix decode_free_stateid Geert Uytterhoeven (1): sunrpc: Add missing kuids conversion for printing Trond Myklebust (1): SUNRPC: No, I did not intend to create a 256KiB hashtable Weston Andros Adamson (4): NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT NFSv4.1: fix SECINFO* use of put_rpccred NFSv4.1: sp4_mach_cred: no need to ref count creds NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE fs/nfs/nfs4_fs.h | 10 +- fs/nfs/nfs4proc.c | 22 ++ fs/nfs/nfs4xdr.c | 17 ++--- net/sunrpc/auth_generic.c | 2 +- net/sunrpc/auth_gss/auth_gss.c | 2 +- 5 files changed, 23 insertions(+), 30 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [PATCH] sunrpc: Add missing kuids conversion for printing
On Thu, 2013-09-12 at 15:09 +0200, Geert Uytterhoeven wrote: > m68k/allmodconfig: > > net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’: > net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but > argument 2 has type ‘kuid_t’ > > commit cdba321e291f0fbf5abda4d88340292b858e3d4d ("sunrpc: Convert kuids and > kgids to uids and gids for printing") forgot to convert one instance. > > Signed-off-by: Geert Uytterhoeven > --- Thanks! Applied... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)
On Thu, 2013-09-12 at 15:24 +0200, Geert Uytterhoeven wrote: > On Mon, Sep 9, 2013 at 6:57 PM, Linux Kernel Mailing List > wrote: > > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c > > index 5ec15bb..dc4b449 100644 > > --- a/net/sunrpc/auth_gss/auth_gss.c > > +++ b/net/sunrpc/auth_gss/auth_gss.c > > @@ -51,6 +51,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "../netns.h" > > > > @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = > > GSS_RETRY_EXPIRED; > > * using integrity (two 4-byte integers): */ > > #define GSS_VERF_SLACK 100 > > > > +static DEFINE_HASHTABLE(gss_auth_hash_table, 16); > > +static DEFINE_SPINLOCK(gss_auth_hash_lock); > > Today's m68k/atari-defconfig kernel no longer boots, as it became larger than > 4 MiB. > > bloat-o-meter tells me: > > function old new delta > gss_auth_hash_table- 262144 +262144 > > Woops... Whoops indeed. The above should have declared 16 buckets, and not 1<<16. I fell for Sasha's subtle trap... > Are you trying to game Tim's survey? ;-) > (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi) > > Can this memory be allocated dynamically / only when it's used? :-) It's declared inside a module, so that should already be the case, however I'll send in a patch to change the above to the intended: DEFINE_HASHTABLE(gss_auth_hash_table, 4); Thanks Geert! -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)
On Thu, 2013-09-12 at 15:24 +0200, Geert Uytterhoeven wrote: On Mon, Sep 9, 2013 at 6:57 PM, Linux Kernel Mailing List linux-kernel@vger.kernel.org wrote: diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5ec15bb..dc4b449 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -51,6 +51,7 @@ #include linux/sunrpc/rpc_pipe_fs.h #include linux/sunrpc/gss_api.h #include asm/uaccess.h +#include linux/hashtable.h #include ../netns.h @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = GSS_RETRY_EXPIRED; * using integrity (two 4-byte integers): */ #define GSS_VERF_SLACK 100 +static DEFINE_HASHTABLE(gss_auth_hash_table, 16); +static DEFINE_SPINLOCK(gss_auth_hash_lock); Today's m68k/atari-defconfig kernel no longer boots, as it became larger than 4 MiB. bloat-o-meter tells me: function old new delta gss_auth_hash_table- 262144 +262144 Woops... Whoops indeed. The above should have declared 16 buckets, and not 116. I fell for Sasha's subtle trap... Are you trying to game Tim's survey? ;-) (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi) Can this memory be allocated dynamically / only when it's used? :-) It's declared inside a module, so that should already be the case, however I'll send in a patch to change the above to the intended: DEFINE_HASHTABLE(gss_auth_hash_table, 4); Thanks Geert! -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] sunrpc: Add missing kuids conversion for printing
On Thu, 2013-09-12 at 15:09 +0200, Geert Uytterhoeven wrote: m68k/allmodconfig: net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’: net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘kuid_t’ commit cdba321e291f0fbf5abda4d88340292b858e3d4d (sunrpc: Convert kuids and kgids to uids and gids for printing) forgot to convert one instance. Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org --- Thanks! Applied... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
[GIT PULL] Please pull NFS client changes (part 2)
Hi Linus, The following changes since commit b1b3e136948a2bf4915326acb0d825d7d180753f: NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 -0400) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-2 for you to fetch changes up to 23c323af0375a7f63732bed0386aba5935b8de69: SUNRPC: No, I did not intend to create a 256KiB hashtable (2013-09-12 10:16:31 -0400) NFS client bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations. Andy Adamson (1): NFSv4.1 fix decode_free_stateid Geert Uytterhoeven (1): sunrpc: Add missing kuids conversion for printing Trond Myklebust (1): SUNRPC: No, I did not intend to create a 256KiB hashtable Weston Andros Adamson (4): NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT NFSv4.1: fix SECINFO* use of put_rpccred NFSv4.1: sp4_mach_cred: no need to ref count creds NFSv4.1: sp4_mach_cred: WARN_ON - WARN_ON_ONCE fs/nfs/nfs4_fs.h | 10 +- fs/nfs/nfs4proc.c | 22 ++ fs/nfs/nfs4xdr.c | 17 ++--- net/sunrpc/auth_generic.c | 2 +- net/sunrpc/auth_gss/auth_gss.c | 2 +- 5 files changed, 23 insertions(+), 30 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)
On Thu, 2013-09-12 at 21:20 +0200, Geert Uytterhoeven wrote: On Thu, Sep 12, 2013 at 4:13 PM, Myklebust, Trond trond.mykleb...@netapp.com wrote: --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -51,6 +51,7 @@ #include linux/sunrpc/rpc_pipe_fs.h #include linux/sunrpc/gss_api.h #include asm/uaccess.h +#include linux/hashtable.h #include ../netns.h @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = GSS_RETRY_EXPIRED; * using integrity (two 4-byte integers): */ #define GSS_VERF_SLACK 100 +static DEFINE_HASHTABLE(gss_auth_hash_table, 16); +static DEFINE_SPINLOCK(gss_auth_hash_lock); Today's m68k/atari-defconfig kernel no longer boots, as it became larger than 4 MiB. bloat-o-meter tells me: function old new delta gss_auth_hash_table- 262144 +262144 Woops... Whoops indeed. The above should have declared 16 buckets, and not 116. I fell for Sasha's subtle trap... Are you trying to game Tim's survey? ;-) (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi) Can this memory be allocated dynamically / only when it's used? :-) It's declared inside a module, so that should already be the case, Only for the modular case. What about builtin, e.g. for nfsroot? Or is it better to not build in NFS_V4 support in that case? config NFS_V4 If unsure, say Y. config NFSD_V4 If unsure, say N. So that's why my defconfig has NFS_V4 but not NFSD_V4. It should be possible now to compile in NFSv3 support (and/or NFSv2), while keeping NFSv4 a module. That will usually result in CONFIG_SUNRPC_GSS=m... Of course, if your defconfig doesn't have module support then, yes, your only option to avoid compiling in rpcsec_gss is to not select NFSv4 at all. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client updates for 3.12
Hi Linus, The following changes since commit 7c6d4dca777d6423cb9ccdc019cad94c75adcbe4: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha (2013-07-23 14:39:57 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-1 for you to fetch changes up to b1b3e136948a2bf4915326acb0d825d7d180753f: NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 -0400) NFS client updates for Linux 3.12 Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file lockingr state. - Allow sharing of RPCSEC_GSS caches between different rpc clients - Fix the broken NFSv4 security auto-negotiation between client and server - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support. Andy Adamson (10): NFSv4.1 Use the mount point rpc_clnt for layoutreturn NFS Remove unused authflavour parameter from init_client NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE NFSv4.1 Use clientid management rpc_clnt for secinfo NFSv4.1 Use clientid management rpc_clnt for secinfo_no_name SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult SUNRPC new rpc_credops to test credential expiry NFS avoid expired credential keys for buffered writes SUNRPC refactor rpcauth_checkverf error returns NFSv4.1 Use MDS auth flavor for data server connection Chuck Lever (20): NFS: Fix return type of nfs4_end_drain_session() stub NFS: Use root's credential for lease management when keytab is missing NFS: Never use user credentials for lease renewal NFS: When displaying session slot numbers, use "%u" consistently NFS: Rename nfs41_call_sync_data as a common data structure NFS: Clean up nfs4_setup_sequence() NFS: Common versions of sequence helper functions NFS: Add RPC callouts to start NFSv4.0 synchronous requests NFS: Remove unused call_sync minor version op NFS: Enable slot table helpers for NFSv4.0 NFS: Add global helper to set up a stand-along nfs4_slot_table NFS: Add global helper for releasing slot table resources NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking NFS: NFSv4.0 transport blocking NFS: Enable nfs4_setup_sequence() for DELEGRETURN NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER NFS: Add nfs4_sequence calls for OPEN_CONFIRM NFS: Update session draining barriers for NFSv4.0 transport blocking When CONFIG_NFS_V4_1 is not enabled, "make C=2" emits this warning: NFS: Fix warning introduced by NFSv4.0 transport blocking patches Jeff Layton (2): rpc_pipe: convert back to simple_dir_inode_operations nfs: verify open flags before allowing an atomic open Nadav Shemer (1): nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0 NeilBrown (2): NFS: remove incorrect "Lock reclaim failed!" warning. NFSv4: Don't try to recover NFSv4 locks when they are lost. Trond Myklebust (63): NFSv4: encode_attrs should not backfill the bitmap and attribute length NFSv4: Fix nfs4_init_uniform_client_string for net namespaces NFSv4: Refuse mount attempts with proto=udp NFS: Remove the NFSv4 "open optimisation" from nfs_permission NFSv3: Deal with a sparse warning in nfs3_proc_create NFSv4: Deal with a sparse warning in nfs4_opendata_alloc NFSv4: Deal with some more sparse warnings NFSv4: Deal with a sparse warning in nfs_idmap_get_key() NFSv4: Fix an incorrect pointer declaration in decode_first_pnfs_layout_type NFS: Clean up nfs_sillyrename() NFS: refactor code for calculating the crc32 hash of a filehandle NFS: Add event tracing for generic NFS events NFS: Pass in lookup flags from nfs_atomic_open to nfs_lookup NFS: Add event tracing for generic NFS lookups NFS: Add tracepoints for debugging generic file create events NFS: Add tracepoints for debugging directory changes NFS: Add tracepoints for debugging NFS rename
[GIT PULL] Please pull NFS client updates for 3.12
Hi Linus, The following changes since commit 7c6d4dca777d6423cb9ccdc019cad94c75adcbe4: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha (2013-07-23 14:39:57 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-1 for you to fetch changes up to b1b3e136948a2bf4915326acb0d825d7d180753f: NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 -0400) NFS client updates for Linux 3.12 Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file lockingr state. - Allow sharing of RPCSEC_GSS caches between different rpc clients - Fix the broken NFSv4 security auto-negotiation between client and server - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support. Andy Adamson (10): NFSv4.1 Use the mount point rpc_clnt for layoutreturn NFS Remove unused authflavour parameter from init_client NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE NFSv4.1 Use clientid management rpc_clnt for secinfo NFSv4.1 Use clientid management rpc_clnt for secinfo_no_name SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult SUNRPC new rpc_credops to test credential expiry NFS avoid expired credential keys for buffered writes SUNRPC refactor rpcauth_checkverf error returns NFSv4.1 Use MDS auth flavor for data server connection Chuck Lever (20): NFS: Fix return type of nfs4_end_drain_session() stub NFS: Use root's credential for lease management when keytab is missing NFS: Never use user credentials for lease renewal NFS: When displaying session slot numbers, use %u consistently NFS: Rename nfs41_call_sync_data as a common data structure NFS: Clean up nfs4_setup_sequence() NFS: Common versions of sequence helper functions NFS: Add RPC callouts to start NFSv4.0 synchronous requests NFS: Remove unused call_sync minor version op NFS: Enable slot table helpers for NFSv4.0 NFS: Add global helper to set up a stand-along nfs4_slot_table NFS: Add global helper for releasing slot table resources NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking NFS: NFSv4.0 transport blocking NFS: Enable nfs4_setup_sequence() for DELEGRETURN NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER NFS: Add nfs4_sequence calls for OPEN_CONFIRM NFS: Update session draining barriers for NFSv4.0 transport blocking When CONFIG_NFS_V4_1 is not enabled, make C=2 emits this warning: NFS: Fix warning introduced by NFSv4.0 transport blocking patches Jeff Layton (2): rpc_pipe: convert back to simple_dir_inode_operations nfs: verify open flags before allowing an atomic open Nadav Shemer (1): nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0 NeilBrown (2): NFS: remove incorrect Lock reclaim failed! warning. NFSv4: Don't try to recover NFSv4 locks when they are lost. Trond Myklebust (63): NFSv4: encode_attrs should not backfill the bitmap and attribute length NFSv4: Fix nfs4_init_uniform_client_string for net namespaces NFSv4: Refuse mount attempts with proto=udp NFS: Remove the NFSv4 open optimisation from nfs_permission NFSv3: Deal with a sparse warning in nfs3_proc_create NFSv4: Deal with a sparse warning in nfs4_opendata_alloc NFSv4: Deal with some more sparse warnings NFSv4: Deal with a sparse warning in nfs_idmap_get_key() NFSv4: Fix an incorrect pointer declaration in decode_first_pnfs_layout_type NFS: Clean up nfs_sillyrename() NFS: refactor code for calculating the crc32 hash of a filehandle NFS: Add event tracing for generic NFS events NFS: Pass in lookup flags from nfs_atomic_open to nfs_lookup NFS: Add event tracing for generic NFS lookups NFS: Add tracepoints for debugging generic file create events NFS: Add tracepoints for debugging directory changes NFS: Add tracepoints for debugging NFS rename and
[GIT PULL] Please pull one NFS client bugfix
Hi Linus, The following changes since commit fa8218def1b1a16f0a410e2c1c767b4738cc81fa: Merge tag 'regmap-v3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap (2013-08-27 10:10:30 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-5 for you to fetch changes up to 347e2233b7667e336d9f671f1a52dfa3f0416e2c: SUNRPC: Fix memory corruption issue on 32-bit highmem systems (2013-08-28 15:43:43 -0400) NFS client bugfix for 3.11 - Stable patch to fix a highmem-related data corruption issue on 32-bit ARM platforms Trond Myklebust (1): SUNRPC: Fix memory corruption issue on 32-bit highmem systems net/sunrpc/xdr.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull one NFS client bugfix
Hi Linus, The following changes since commit fa8218def1b1a16f0a410e2c1c767b4738cc81fa: Merge tag 'regmap-v3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap (2013-08-27 10:10:30 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-5 for you to fetch changes up to 347e2233b7667e336d9f671f1a52dfa3f0416e2c: SUNRPC: Fix memory corruption issue on 32-bit highmem systems (2013-08-28 15:43:43 -0400) NFS client bugfix for 3.11 - Stable patch to fix a highmem-related data corruption issue on 32-bit ARM platforms Trond Myklebust (1): SUNRPC: Fix memory corruption issue on 32-bit highmem systems net/sunrpc/xdr.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client bug fixes
Hi Linus, The following changes since commit c095ba7224d8edc71dcef0d655911399a8bd4a3f: Linux 3.11-rc4 (2013-08-04 13:46:46 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-4 for you to fetch changes up to b72888cb0ba63b2dfc6c8d3cd78a7fea584bebc6: NFSv4: Fix up nfs4_proc_lookup_mountpoint (2013-08-07 20:47:26 -0400) NFS client bugfixes for 3.11 - Stable patch for lockd to fix Oopses due to inappropriate calls to utsname()->nodename - Stable patches for sunrpc to fix Oopses on shutdown when using AF_LOCAL sockets with rpcbind - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint - Fix a regression with the sync mount option failing to work for nfs4 mounts - Fix a writeback performance issue when doing cache invalidation - Remove an incorrect call to nfs_setsecurity in nfs_fhget Scott Mayhew (1): NFSv4: Fix the sync mount option for nfs4 mounts Trond Myklebust (6): LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs SUNRPC: Don't auto-disconnect from the local rpcbind socket SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister NFS: Fix writeback performance issue on cache invalidation NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() NFSv4: Fix up nfs4_proc_lookup_mountpoint fs/lockd/clntlock.c | 13 fs/lockd/clntproc.c | 5 +++-- fs/nfs/inode.c | 11 +++--- fs/nfs/nfs4proc.c| 8 +++- fs/nfs/super.c | 4 include/linux/sunrpc/sched.h | 1 + net/sunrpc/clnt.c| 4 net/sunrpc/netns.h | 1 + net/sunrpc/rpcb_clnt.c | 48 9 files changed, 68 insertions(+), 27 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client bug fixes
Hi Linus, The following changes since commit c095ba7224d8edc71dcef0d655911399a8bd4a3f: Linux 3.11-rc4 (2013-08-04 13:46:46 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-4 for you to fetch changes up to b72888cb0ba63b2dfc6c8d3cd78a7fea584bebc6: NFSv4: Fix up nfs4_proc_lookup_mountpoint (2013-08-07 20:47:26 -0400) NFS client bugfixes for 3.11 - Stable patch for lockd to fix Oopses due to inappropriate calls to utsname()-nodename - Stable patches for sunrpc to fix Oopses on shutdown when using AF_LOCAL sockets with rpcbind - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint - Fix a regression with the sync mount option failing to work for nfs4 mounts - Fix a writeback performance issue when doing cache invalidation - Remove an incorrect call to nfs_setsecurity in nfs_fhget Scott Mayhew (1): NFSv4: Fix the sync mount option for nfs4 mounts Trond Myklebust (6): LOCKD: Don't call utsname()-nodename from nlmclnt_setlockargs SUNRPC: Don't auto-disconnect from the local rpcbind socket SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister NFS: Fix writeback performance issue on cache invalidation NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() NFSv4: Fix up nfs4_proc_lookup_mountpoint fs/lockd/clntlock.c | 13 fs/lockd/clntproc.c | 5 +++-- fs/nfs/inode.c | 11 +++--- fs/nfs/nfs4proc.c| 8 +++- fs/nfs/super.c | 4 include/linux/sunrpc/sched.h | 1 + net/sunrpc/clnt.c| 4 net/sunrpc/netns.h | 1 + net/sunrpc/rpcb_clnt.c | 48 9 files changed, 68 insertions(+), 27 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Wed, 2013-08-07 at 22:01 +0100, Nix wrote: > On 7 Aug 2013, Trond Myklebust said: > > > On Wed, 2013-08-07 at 11:18 +0100, Nix wrote: > >> On 6 Aug 2013, Trond Myklebust verbalised: > >> > True. How about something like the following instead. Note the change to > >> > the original patch... > >> > >> Well, with those applied I could reboot without a panic for the first > >> time since 3.8.x: looking good. I'll give it a reboot or two with a > >> system that's not hot from booting though. > > > > Could you please also try applying only the 1/2 patch, to see if that > > suffices to quell the shutdown panic? > > It doesn't suffice. I see this severely truncated oops: > > [ 115.799092] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [ 115.800284] IP: [] path_init+0x11c/0x36f > [ 115.801463] PGD 0 > [ 115.802625] Oops: [#1] PREEMPT SMP > [ 115.803805] Modules linked in: [last unloaded: microcode] > [ 115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted > 3.10.5-05317-g3c9f6fa-dirty #2 > [ 115.806207] Hardware name: System manufacturer System Product > Name/P8H61-MX USB3, BIOS 0506 08/10/2012 > [ 115.807453] task: 8804189a ti: 8803f74d6000 task.ti: > 8803f74d6000 > OK. Then I'll mark them both for stable inclusion in 3.9+. Thanks for testing! -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Wed, 2013-08-07 at 11:18 +0100, Nix wrote: > On 6 Aug 2013, Trond Myklebust verbalised: > > True. How about something like the following instead. Note the change to > > the original patch... > > Well, with those applied I could reboot without a panic for the first > time since 3.8.x: looking good. I'll give it a reboot or two with a > system that's not hot from booting though. > Could you please also try applying only the 1/2 patch, to see if that suffices to quell the shutdown panic? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Wed, 2013-08-07 at 11:18 +0100, Nix wrote: On 6 Aug 2013, Trond Myklebust verbalised: True. How about something like the following instead. Note the change to the original patch... Well, with those applied I could reboot without a panic for the first time since 3.8.x: looking good. I'll give it a reboot or two with a system that's not hot from booting though. Could you please also try applying only the 1/2 patch, to see if that suffices to quell the shutdown panic? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Wed, 2013-08-07 at 22:01 +0100, Nix wrote: On 7 Aug 2013, Trond Myklebust said: On Wed, 2013-08-07 at 11:18 +0100, Nix wrote: On 6 Aug 2013, Trond Myklebust verbalised: True. How about something like the following instead. Note the change to the original patch... Well, with those applied I could reboot without a panic for the first time since 3.8.x: looking good. I'll give it a reboot or two with a system that's not hot from booting though. Could you please also try applying only the 1/2 patch, to see if that suffices to quell the shutdown panic? It doesn't suffice. I see this severely truncated oops: [ 115.799092] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 115.800284] IP: [81165ec6] path_init+0x11c/0x36f [ 115.801463] PGD 0 [ 115.802625] Oops: [#1] PREEMPT SMP [ 115.803805] Modules linked in: [last unloaded: microcode] [ 115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted 3.10.5-05317-g3c9f6fa-dirty #2 [ 115.806207] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012 [ 115.807453] task: 8804189a ti: 8803f74d6000 task.ti: 8803f74d6000 OK. Then I'll mark them both for stable inclusion in 3.9+. Thanks for testing! -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote: > On Mon, 5 Aug 2013 18:18:03 + > "Myklebust, Trond" wrote: > > > On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote: > > > On Mon, 5 Aug 2013 16:15:01 + > > > "Myklebust, Trond" wrote: > > > > > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 > > > > From: Trond Myklebust > > > > Date: Mon, 5 Aug 2013 12:06:12 -0400 > > > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from > > > > nlmclnt_setlockargs > > > > MIME-Version: 1.0 > > > > Content-Type: text/plain; charset=UTF-8 > > > > Content-Transfer-Encoding: 8bit > > > > > > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in > > > > which case we're in entirely the wrong namespace. > > > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move > > > > exit_task_namespaces() outside of exit_notify()) now means that > > > > exit_task_work() is called after exit_task_namespaces(), which > > > > triggers an Oops when we're freeing up the locks. > > > > > > > > Signed-off-by: Trond Myklebust > > > > Cc: Toralf Förster > > > > Cc: Oleg Nesterov > > > > Cc: Nix > > > > Cc: Jeff Layton > > > > --- > > > > fs/lockd/clntproc.c | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c > > > > index 9760ecb..acd3947 100644 > > > > --- a/fs/lockd/clntproc.c > > > > +++ b/fs/lockd/clntproc.c > > > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst > > > > *req, struct file_lock *fl) > > > > { > > > > struct nlm_args *argp = >a_args; > > > > struct nlm_lock *lock = >lock; > > > > + char *nodename = req->a_host->h_rpcclnt->cl_nodename; > > > > > > > > nlmclnt_next_cookie(>cookie); > > > > memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), > > > > sizeof(struct nfs_fh)); > > > > - lock->caller = utsname()->nodename; > > > > + lock->caller = nodename; > > > > lock->oh.data = req->a_owner; > > > > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), > > > > "%u@%s", > > > > (unsigned > > > > int)fl->fl_u.nfs_fl.owner->pid, > > > > - utsname()->nodename); > > > > + nodename); > > > > lock->svid = fl->fl_u.nfs_fl.owner->pid; > > > > lock->fl.fl_start = fl->fl_start; > > > > lock->fl.fl_end = fl->fl_end; > > > > > > Looks good to me... > > > > > > Reviewed-by: Jeff Layton > > > > > > Trond, any thoughts on the other oops that Nix posted? The issue there > > > seems to be that we're trying to do the pathwalk to the rpcbind unix > > > socket from exit_task_work(), but that's happening after we've already > > > called exit_fs(). > > > > > > The trivial answer seems to be to simply call exit_task_work() before > > > exit_fs() there, but it seems like we ought to be doing the upcall to > > > rpcbind in a mount namespace from which we know we can reach the > > > socket... > > > > Isn't it enough to just do the same thing as we did for gss proxy? i.e. > > set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag. > > > > See attachment. > > Yeah, that looks like a reasonable thing to do... > > OTOH, Is there any other way for a unix socket to end up disconnected > other than if we were to close it? Maybe if rpcbind stopped, the socket > unlinked and recreated and then started again? > > If so then you still could potentially end up in this situation even if > you didn't autoclose it. True. How about something like the following instead. Note the change to the original patch... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From 00326ed6442c66021cd4b5e19e80f3e2027d5d42 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 5 Aug 2013 14:10:43 -0400 Subject: [PATCH v2 1/2] SUNRPC: Don't auto-disconnect from the local rpcbind socket There is no nee
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 19:33 +0100, Nix wrote: > On 5 Aug 2013, Trond Myklebust told this: > > Does the attached patch fix the problem? > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 > > From: Trond Myklebust > > Date: Mon, 5 Aug 2013 12:06:12 -0400 > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from > > nlmclnt_setlockargs > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > It makes it worse. Much, much worse. From a crash every so often when > I'm doing compilations over NFS, I get an immediate panic on startx, > long long before I even try to replicate the earlier panic: > > [ 83.432358] task: 88041aaa5ac0 ti: 8804199e2000 task.ti: > 8804199e2000 > [ 83.432428] RIP: 0010:[] [] > encode_nlm4_lock+0x26/0xbe > [ 83.432512] RSP: 0018:8804199e3a78 EFLAGS: 00010286 > [ 83.432564] RAX: RBX: 88041a577038 RCX: > > [ 83.432630] RDX: 8804193b3098 RSI: 88041a577038 RDI: > 008c > [ 83.432697] RBP: 8804199e3aa8 R08: 8804193b3098 R09: > 0001 > [ 83.432763] R10: 88042fa12980 R11: 88042fa12980 R12: > 8804199e3ae8 > [ 83.432830] R13: 008c R14: 8804199e3fd8 R15: > 815de80e > [ 83.432898] FS: 7f594b40c740() GS:88042fa0() > knlGS: > [ 83.432974] CS: 0010 DS: ES: CR0: 80050033 > [ 83.433028] CR2: 008c CR3: 00041ab3d000 CR4: > 001407f0 > [ 83.433095] DR0: DR1: DR2: > > [ 83.433176] DR3: DR6: 0ff0 DR7: > 0400 > [ 83.433255] Stack: > [ 83.433276] 88041a44fb70 88040004 8804199e3ae8 > 88041a577010 > [ 83.433360] 8804188e0e00 8804199e3fd8 8804199e3ac8 > 8124b0d7 > [ 83.433443] 8804188e0e00 8124b086 8804199e3b38 > 815e6032 > [ 83.433616] Call Trace: > [ 83.433646] [] nlm4_xdr_enc_lockargs+0x51/0x76 > [ 83.433707] [] ? nlm4_xdr_enc_cancargs+0x56/0x56 > [ 83.433769] [] rpcauth_wrap_req+0x57/0x62 > [ 83.433826] [] call_transmit+0x17c/0x1f9 > [ 83.433880] [] __rpc_execute+0xe8/0x2ca > [ 83.433935] [] rpc_execute+0x76/0x9d > [ 83.433986] [] rpc_run_task+0x78/0x80 > [ 83.434039] [] rpc_call_sync+0x88/0x9e > [ 83.434092] [] nlmclnt_call+0xb5/0x240 > [ 83.434146] [] nlmclnt_proc+0x226/0x5fb > [ 83.434226] [] nfs3_proc_lock+0x21/0x23 > [ 83.434280] [] do_setlk+0x65/0xee > [ 83.434329] [] nfs_lock+0x14e/0x162 > [ 83.434382] [] vfs_lock_file+0x29/0x35 > [ 83.434435] [] fcntl_setlk+0x139/0x2c5 > [ 83.434490] [] SyS_fcntl+0x2b6/0x47d > [ 83.434543] [] system_call_fastpath+0x16/0x1b > [ 83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 > e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef > ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00 > [ 83.435077] RIP [] encode_nlm4_lock+0x26/0xbe > [ 83.435140] RSP > [ 83.435197] CR2: 008c > > That's here: > > (gdb) list *(encode_nlm4_lock+0x26) > 0x8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329). > 324 * string caller_name; > 325 */ > 326 static void encode_caller_name(struct xdr_stream *xdr, const char > *name) > 327 { > 328 /* NB: client-side does not set lock->len */ > 329 u32 length = strlen(name); > 330 __be32 *p; > 331 > 332 p = xdr_reserve_space(xdr, 4 + length); > 333 xdr_encode_opaque(p, name, length); > >0x8124af69 <+38>:repnz scas %es:(%rdi),%al > > Pretty clearly, "name" can be NULL after this patch... > Yes. This scheme will only work if we make sure that host->h_rpcclnt is initialised at mount time. Here is a v2 patch that should do the right thing. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From 9a1b6bf818e74bb7aabaecb59492b739f2f4d742 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 5 Aug 2013 12:06:12 -0400 Subject: [PATCH v2] LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in which case we're in entirely the wrong namespace. Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move exit_task_namespaces() outside of exit_notify()) now means that exit_task_work() is called after exit_task_namespaces(), which triggers an Oops when we're freeing up the locks. Fix this by ensuring that we initialise the nlm_host's rpc_client at mount time, so that the cl_nodename field is initialised to the value of utsname()->nodename that the net namespace uses. Then replace the lockd callers of
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote: > On Mon, 5 Aug 2013 16:15:01 + > "Myklebust, Trond" wrote: > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 > > From: Trond Myklebust > > Date: Mon, 5 Aug 2013 12:06:12 -0400 > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from > > nlmclnt_setlockargs > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in > > which case we're in entirely the wrong namespace. > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move > > exit_task_namespaces() outside of exit_notify()) now means that > > exit_task_work() is called after exit_task_namespaces(), which > > triggers an Oops when we're freeing up the locks. > > > > Signed-off-by: Trond Myklebust > > Cc: Toralf Förster > > Cc: Oleg Nesterov > > Cc: Nix > > Cc: Jeff Layton > > --- > > fs/lockd/clntproc.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c > > index 9760ecb..acd3947 100644 > > --- a/fs/lockd/clntproc.c > > +++ b/fs/lockd/clntproc.c > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, > > struct file_lock *fl) > > { > > struct nlm_args *argp = >a_args; > > struct nlm_lock *lock = >lock; > > + char *nodename = req->a_host->h_rpcclnt->cl_nodename; > > > > nlmclnt_next_cookie(>cookie); > > memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct > > nfs_fh)); > > - lock->caller = utsname()->nodename; > > + lock->caller = nodename; > > lock->oh.data = req->a_owner; > > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s", > > (unsigned int)fl->fl_u.nfs_fl.owner->pid, > > - utsname()->nodename); > > + nodename); > > lock->svid = fl->fl_u.nfs_fl.owner->pid; > > lock->fl.fl_start = fl->fl_start; > > lock->fl.fl_end = fl->fl_end; > > Looks good to me... > > Reviewed-by: Jeff Layton > > Trond, any thoughts on the other oops that Nix posted? The issue there > seems to be that we're trying to do the pathwalk to the rpcbind unix > socket from exit_task_work(), but that's happening after we've already > called exit_fs(). > > The trivial answer seems to be to simply call exit_task_work() before > exit_fs() there, but it seems like we ought to be doing the upcall to > rpcbind in a mount namespace from which we know we can reach the > socket... Isn't it enough to just do the same thing as we did for gss proxy? i.e. set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag. See attachment. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From ab56d77893815b1b9f0aaa7a89cee7c832a31cff Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 5 Aug 2013 14:10:43 -0400 Subject: [PATCH] SUNRPC: Don't auto-disconnect from the local rpcbind socket There is no need for the kernel to time out the AF_LOCAL connection to the rpcbind socket, and doing so is problematic because when it is time to reconnect, our process may no longer be using the same mount namespace. Reported-by: Nix Signed-off-by: Trond Myklebust Cc: Jeff Layton Cc: sta...@vger.kernel.org # 3.9.x --- net/sunrpc/rpcb_clnt.c | 9 + 1 file changed, 9 insertions(+) diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c index 3df764d..4b00555 100644 --- a/net/sunrpc/rpcb_clnt.c +++ b/net/sunrpc/rpcb_clnt.c @@ -238,6 +238,15 @@ static int rpcb_create_local_unix(struct net *net) .program = _program, .version = RPCBVERS_2, .authflavor = RPC_AUTH_NULL, + /* + * We turn off the idle timeout to prevent the kernel + * from automatically disconnecting the socket. + * Otherwise, we'd have to cache the mount namespace + * of the caller and somehow pass that to the socket + * reconnect code. + */ + .flags = RPC_CLNT_CREATE_NOPING | + RPC_CLNT_CREATE_NO_IDLE_TIMEOUT, }; struct rpc_clnt *clnt, *clnt4; int result = 0; -- 1.8.3.1
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 16:50 +0100, Nix wrote: > On 5 Aug 2013, Jeff Layton said: > > > On Mon, 5 Aug 2013 11:04:27 -0400 > > Jeff Layton wrote: > > > >> On Mon, 05 Aug 2013 15:48:01 +0100 > >> Nix wrote: > >> > >> > On 5 Aug 2013, Jeff Layton stated: > >> > > >> > > On Sun, 04 Aug 2013 16:40:58 +0100 > >> > > Nix wrote: > >> > > > >> > >> I just got this panic on 3.10.4, in the middle of a large parallel > >> > >> compilation (of Chromium, as it happens) over NFSv3: > >> > >> > >> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference > >> > >> at 0008 > >> > >> [16364.527571] IP: [] nlmclnt_setlockargs+0x55/0xcf > >> > >> [16364.527611] PGD 0 > >> > >> [16364.527626] Oops: [#1] PREEMPT SMP > >> > >> [16364.527656] Modules linked in: [last unloaded: microcode] > >> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted > >> > >> 3.10.4-05315-gf4ce424-dirty #1 > >> > >> [16364.527730] Hardware name: System manufacturer System Product > >> > >> Name/P8H61-MX USB3, BIOS 0506 08/10/2012 > >> > >> [16364.527775] task: 88041a97ad60 ti: 8803501d4000 task.ti: > >> > >> 8803501d4000 > >> > >> [16364.527813] RIP: 0010:[] [] > >> > >> nlmclnt_setlockargs+0x55/0xcf > >> > >> [16364.527860] RSP: 0018:8803501d5c58 EFLAGS: 00010282 > >> > >> [16364.527889] RAX: 88041a97ad60 RBX: 8803e49c8800 RCX: > >> > >> > >> > >> [16364.527926] RDX: RSI: 004a RDI: > >> > >> 8803e49c8b54 > >> > >> [16364.527962] RBP: 8803501d5c68 R08: 00015720 R09: > >> > >> > >> > >> [16364.527998] R10: 7000 R11: 8803501d5d58 R12: > >> > >> 8803501d5d58 > >> > >> [16364.528034] R13: 88041bd2bc00 R14: R15: > >> > >> 8803fc9e2900 > >> > >> [16364.528070] FS: () GS:88042fa0() > >> > >> knlGS: > >> > >> [16364.528111] CS: 0010 DS: ES: CR0: 80050033 > >> > >> [16364.528142] CR2: 0008 CR3: 01c0b000 CR4: > >> > >> 001407f0 > >> > >> [16364.528177] DR0: DR1: DR2: > >> > >> > >> > >> [16364.528214] DR3: DR6: 0ff0 DR7: > >> > >> 0400 > >> > >> [16364.528303] Stack: > >> > >> [16364.528316] 8803501d5d58 8803e49c8800 8803501d5cd8 > >> > >> 81245418 > >> > >> [16364.528369] 8803516f0bc0 8803d7b7b6c0 > >> > >> 81215c81 > >> > >> [16364.528418] 88030007 88041bd2bdc8 8801aabe9650 > >> > >> 8803fc9e2900 > >> > >> [16364.528467] Call Trace: > >> > >> [16364.528485] [] nlmclnt_proc+0x148/0x5fb > >> > >> [16364.528516] [] ? nfs_put_lock_context+0x69/0x6e > >> > >> [16364.528550] [] nfs3_proc_lock+0x21/0x23 > >> > >> [16364.528581] [] do_unlk+0x96/0xb2 > >> > >> [16364.528608] [] nfs_flock+0x5a/0x71 > >> > >> [16364.528637] [] locks_remove_flock+0x9e/0x113 > >> > >> [16364.528668] [] __fput+0xb6/0x1e6 > >> > >> [16364.528695] [] fput+0xe/0x10 > >> > >> [16364.528724] [] task_work_run+0x7e/0x98 > >> > >> [16364.528754] [] do_exit+0x3cc/0x8fa > >> > >> [16364.528782] [] ? SyS_wait4+0xa5/0xc2 > >> > >> [16364.528811] [] do_group_exit+0x6f/0xa2 > >> > >> [16364.528843] [] SyS_exit_group+0x17/0x17 > >> > >> [16364.528876] [] system_call_fastpath+0x16/0x1b > >> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 > >> > >> 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 > >> > >> 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 > >> > >> 38 48 8b > >> > >> [16364.529176] RIP [] nlmclnt_setlockargs+0x55/0xcf > >> > >> [16364.529264] RSP > >> > >> [16364.529283] CR2: 0008 > >> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]--- > [...] > > The listing and disassembly from nlmclnt_proc is not terribly > > interesting unfortunately. You really want to do the listing and > > disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55). > > Oh, sorry! Wrong end of the oops :) > > 0x81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131). > 126 struct nlm_args *argp = >a_args; > 127 struct nlm_lock *lock = >lock; > 128 > 129 nlmclnt_next_cookie(>cookie); > 130 memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), > sizeof(struct nfs_fh)); > 131 lock->caller = utsname()->nodename; > 132 lock->oh.data = req->a_owner; > 133 lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), > "%u@%s", > 134 (unsigned > int)fl->fl_u.nfs_fl.owner->pid, > 135 utsname()->nodename); > >0x81245102 <+0>: callq 0x81613b00 <__fentry__> >0x81245107 <+5>: push %rbp >0x81245108 <+6>: mov%rsp,%rbp >
Re: [PATCH] fs/nfs/inode.c: adjust code alignment
On Mon, 2013-08-05 at 16:47 +0200, Julia Lawall wrote: > From: Julia Lawall > > Signed-off-by: Julia Lawall > > --- > > This patch adjusts the code so that the alignment matches the current > semantics. I have no idea if it is the intended semantics, though. Should > the call to nfs_setsecurity also be under the else? > > fs/nfs/inode.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c > index af6e806..d8ad685 100644 > --- a/fs/nfs/inode.c > +++ b/fs/nfs/inode.c > @@ -463,7 +463,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh > *fh, struct nfs_fattr *fattr, st > unlock_new_inode(inode); > } else > nfs_refresh_inode(inode, fattr); > - nfs_setsecurity(inode, fattr, label); > + nfs_setsecurity(inode, fattr, label); > dprintk("NFS: nfs_fhget(%s/%Ld fh_crc=0x%08x ct=%d)\n", > inode->i_sb->s_id, > (long long)NFS_FILEID(inode), Hi Julia, Thanks for pointing this out! Given that the 'then' clause of the if statement already calls nfs_setsecurity before unlocking the inode, I suspect that the above _should_ really be part of the 'else' clause. That said, I can't see that calling nfs_setsecurity twice on the inode can cause any unintended side-effects, so I suggest that we rather queue the patch up for inclusion in 3.12. Steve and Dave, any comments? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [PATCH] fs/nfs/inode.c: adjust code alignment
On Mon, 2013-08-05 at 16:47 +0200, Julia Lawall wrote: From: Julia Lawall julia.law...@lip6.fr Signed-off-by: Julia Lawall julia.law...@lip6.fr --- This patch adjusts the code so that the alignment matches the current semantics. I have no idea if it is the intended semantics, though. Should the call to nfs_setsecurity also be under the else? fs/nfs/inode.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index af6e806..d8ad685 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -463,7 +463,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr, st unlock_new_inode(inode); } else nfs_refresh_inode(inode, fattr); - nfs_setsecurity(inode, fattr, label); + nfs_setsecurity(inode, fattr, label); dprintk(NFS: nfs_fhget(%s/%Ld fh_crc=0x%08x ct=%d)\n, inode-i_sb-s_id, (long long)NFS_FILEID(inode), Hi Julia, Thanks for pointing this out! Given that the 'then' clause of the if statement already calls nfs_setsecurity before unlocking the inode, I suspect that the above _should_ really be part of the 'else' clause. That said, I can't see that calling nfs_setsecurity twice on the inode can cause any unintended side-effects, so I suggest that we rather queue the patch up for inclusion in 3.12. Steve and Dave, any comments? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 16:50 +0100, Nix wrote: On 5 Aug 2013, Jeff Layton said: On Mon, 5 Aug 2013 11:04:27 -0400 Jeff Layton jlay...@redhat.com wrote: On Mon, 05 Aug 2013 15:48:01 +0100 Nix n...@esperi.org.uk wrote: On 5 Aug 2013, Jeff Layton stated: On Sun, 04 Aug 2013 16:40:58 +0100 Nix n...@esperi.org.uk wrote: I just got this panic on 3.10.4, in the middle of a large parallel compilation (of Chromium, as it happens) over NFSv3: [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0008 [16364.527571] IP: [81245157] nlmclnt_setlockargs+0x55/0xcf [16364.527611] PGD 0 [16364.527626] Oops: [#1] PREEMPT SMP [16364.527656] Modules linked in: [last unloaded: microcode] [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1 [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012 [16364.527775] task: 88041a97ad60 ti: 8803501d4000 task.ti: 8803501d4000 [16364.527813] RIP: 0010:[81245157] [81245157] nlmclnt_setlockargs+0x55/0xcf [16364.527860] RSP: 0018:8803501d5c58 EFLAGS: 00010282 [16364.527889] RAX: 88041a97ad60 RBX: 8803e49c8800 RCX: [16364.527926] RDX: RSI: 004a RDI: 8803e49c8b54 [16364.527962] RBP: 8803501d5c68 R08: 00015720 R09: [16364.527998] R10: 7000 R11: 8803501d5d58 R12: 8803501d5d58 [16364.528034] R13: 88041bd2bc00 R14: R15: 8803fc9e2900 [16364.528070] FS: () GS:88042fa0() knlGS: [16364.528111] CS: 0010 DS: ES: CR0: 80050033 [16364.528142] CR2: 0008 CR3: 01c0b000 CR4: 001407f0 [16364.528177] DR0: DR1: DR2: [16364.528214] DR3: DR6: 0ff0 DR7: 0400 [16364.528303] Stack: [16364.528316] 8803501d5d58 8803e49c8800 8803501d5cd8 81245418 [16364.528369] 8803516f0bc0 8803d7b7b6c0 81215c81 [16364.528418] 88030007 88041bd2bdc8 8801aabe9650 8803fc9e2900 [16364.528467] Call Trace: [16364.528485] [81245418] nlmclnt_proc+0x148/0x5fb [16364.528516] [81215c81] ? nfs_put_lock_context+0x69/0x6e [16364.528550] [812209a2] nfs3_proc_lock+0x21/0x23 [16364.528581] [812149dd] do_unlk+0x96/0xb2 [16364.528608] [81214b41] nfs_flock+0x5a/0x71 [16364.528637] [8119a747] locks_remove_flock+0x9e/0x113 [16364.528668] [8115cc68] __fput+0xb6/0x1e6 [16364.528695] [8115cda6] fput+0xe/0x10 [16364.528724] [810998da] task_work_run+0x7e/0x98 [16364.528754] [81082bc5] do_exit+0x3cc/0x8fa [16364.528782] [81083501] ? SyS_wait4+0xa5/0xc2 [16364.528811] [8108328d] do_group_exit+0x6f/0xa2 [16364.528843] [810832d7] SyS_exit_group+0x17/0x17 [16364.528876] [81613e92] system_call_fastpath+0x16/0x1b [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 48 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b [16364.529176] RIP [81245157] nlmclnt_setlockargs+0x55/0xcf [16364.529264] RSP 8803501d5c58 [16364.529283] CR2: 0008 [16364.539039] ---[ end trace 5a73fddf23441377 ]--- [...] The listing and disassembly from nlmclnt_proc is not terribly interesting unfortunately. You really want to do the listing and disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55). Oh, sorry! Wrong end of the oops :) 0x81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131). 126 struct nlm_args *argp = req-a_args; 127 struct nlm_lock *lock = argp-lock; 128 129 nlmclnt_next_cookie(argp-cookie); 130 memcpy(lock-fh, NFS_FH(file_inode(fl-fl_file)), sizeof(struct nfs_fh)); 131 lock-caller = utsname()-nodename; 132 lock-oh.data = req-a_owner; 133 lock-oh.len = snprintf(req-a_owner, sizeof(req-a_owner), %u@%s, 134 (unsigned int)fl-fl_u.nfs_fl.owner-pid, 135 utsname()-nodename); 0x81245102 +0: callq 0x81613b00 __fentry__ 0x81245107 +5: push %rbp 0x81245108 +6: mov%rsp,%rbp 0x8124510b +9: push %r12 0x8124510d +11:mov%rsi,%r12 0x81245110 +14:
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote: On Mon, 5 Aug 2013 16:15:01 + Myklebust, Trond trond.mykleb...@netapp.com wrote: From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Mon, 5 Aug 2013 12:06:12 -0400 Subject: [PATCH] LOCKD: Don't call utsname()-nodename from nlmclnt_setlockargs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in which case we're in entirely the wrong namespace. Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move exit_task_namespaces() outside of exit_notify()) now means that exit_task_work() is called after exit_task_namespaces(), which triggers an Oops when we're freeing up the locks. Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Cc: Toralf Förster toralf.foers...@gmx.de Cc: Oleg Nesterov o...@redhat.com Cc: Nix n...@esperi.org.uk Cc: Jeff Layton jlay...@redhat.com --- fs/lockd/clntproc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c index 9760ecb..acd3947 100644 --- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl) { struct nlm_args *argp = req-a_args; struct nlm_lock *lock = argp-lock; + char *nodename = req-a_host-h_rpcclnt-cl_nodename; nlmclnt_next_cookie(argp-cookie); memcpy(lock-fh, NFS_FH(file_inode(fl-fl_file)), sizeof(struct nfs_fh)); - lock-caller = utsname()-nodename; + lock-caller = nodename; lock-oh.data = req-a_owner; lock-oh.len = snprintf(req-a_owner, sizeof(req-a_owner), %u@%s, (unsigned int)fl-fl_u.nfs_fl.owner-pid, - utsname()-nodename); + nodename); lock-svid = fl-fl_u.nfs_fl.owner-pid; lock-fl.fl_start = fl-fl_start; lock-fl.fl_end = fl-fl_end; Looks good to me... Reviewed-by: Jeff Layton jlay...@redhat.com Trond, any thoughts on the other oops that Nix posted? The issue there seems to be that we're trying to do the pathwalk to the rpcbind unix socket from exit_task_work(), but that's happening after we've already called exit_fs(). The trivial answer seems to be to simply call exit_task_work() before exit_fs() there, but it seems like we ought to be doing the upcall to rpcbind in a mount namespace from which we know we can reach the socket... Isn't it enough to just do the same thing as we did for gss proxy? i.e. set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag. See attachment. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From ab56d77893815b1b9f0aaa7a89cee7c832a31cff Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Mon, 5 Aug 2013 14:10:43 -0400 Subject: [PATCH] SUNRPC: Don't auto-disconnect from the local rpcbind socket There is no need for the kernel to time out the AF_LOCAL connection to the rpcbind socket, and doing so is problematic because when it is time to reconnect, our process may no longer be using the same mount namespace. Reported-by: Nix n...@esperi.org.uk Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Cc: Jeff Layton jlay...@redhat.com Cc: sta...@vger.kernel.org # 3.9.x --- net/sunrpc/rpcb_clnt.c | 9 + 1 file changed, 9 insertions(+) diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c index 3df764d..4b00555 100644 --- a/net/sunrpc/rpcb_clnt.c +++ b/net/sunrpc/rpcb_clnt.c @@ -238,6 +238,15 @@ static int rpcb_create_local_unix(struct net *net) .program = rpcb_program, .version = RPCBVERS_2, .authflavor = RPC_AUTH_NULL, + /* + * We turn off the idle timeout to prevent the kernel + * from automatically disconnecting the socket. + * Otherwise, we'd have to cache the mount namespace + * of the caller and somehow pass that to the socket + * reconnect code. + */ + .flags = RPC_CLNT_CREATE_NOPING | + RPC_CLNT_CREATE_NO_IDLE_TIMEOUT, }; struct rpc_clnt *clnt, *clnt4; int result = 0; -- 1.8.3.1
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 19:33 +0100, Nix wrote: On 5 Aug 2013, Trond Myklebust told this: Does the attached patch fix the problem? From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Mon, 5 Aug 2013 12:06:12 -0400 Subject: [PATCH] LOCKD: Don't call utsname()-nodename from nlmclnt_setlockargs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It makes it worse. Much, much worse. From a crash every so often when I'm doing compilations over NFS, I get an immediate panic on startx, long long before I even try to replicate the earlier panic: [ 83.432358] task: 88041aaa5ac0 ti: 8804199e2000 task.ti: 8804199e2000 [ 83.432428] RIP: 0010:[8124af69] [8124af69] encode_nlm4_lock+0x26/0xbe [ 83.432512] RSP: 0018:8804199e3a78 EFLAGS: 00010286 [ 83.432564] RAX: RBX: 88041a577038 RCX: [ 83.432630] RDX: 8804193b3098 RSI: 88041a577038 RDI: 008c [ 83.432697] RBP: 8804199e3aa8 R08: 8804193b3098 R09: 0001 [ 83.432763] R10: 88042fa12980 R11: 88042fa12980 R12: 8804199e3ae8 [ 83.432830] R13: 008c R14: 8804199e3fd8 R15: 815de80e [ 83.432898] FS: 7f594b40c740() GS:88042fa0() knlGS: [ 83.432974] CS: 0010 DS: ES: CR0: 80050033 [ 83.433028] CR2: 008c CR3: 00041ab3d000 CR4: 001407f0 [ 83.433095] DR0: DR1: DR2: [ 83.433176] DR3: DR6: 0ff0 DR7: 0400 [ 83.433255] Stack: [ 83.433276] 88041a44fb70 88040004 8804199e3ae8 88041a577010 [ 83.433360] 8804188e0e00 8804199e3fd8 8804199e3ac8 8124b0d7 [ 83.433443] 8804188e0e00 8124b086 8804199e3b38 815e6032 [ 83.433616] Call Trace: [ 83.433646] [8124b0d7] nlm4_xdr_enc_lockargs+0x51/0x76 [ 83.433707] [8124b086] ? nlm4_xdr_enc_cancargs+0x56/0x56 [ 83.433769] [815e6032] rpcauth_wrap_req+0x57/0x62 [ 83.433826] [815de98a] call_transmit+0x17c/0x1f9 [ 83.433880] [815e4e58] __rpc_execute+0xe8/0x2ca [ 83.433935] [815e50f9] rpc_execute+0x76/0x9d [ 83.433986] [815debc1] rpc_run_task+0x78/0x80 [ 83.434039] [815decff] rpc_call_sync+0x88/0x9e [ 83.434092] [81244b3c] nlmclnt_call+0xb5/0x240 [ 83.434146] [812454f0] nlmclnt_proc+0x226/0x5fb [ 83.434226] [812209a2] nfs3_proc_lock+0x21/0x23 [ 83.434280] [81214a5e] do_setlk+0x65/0xee [ 83.434329] [81214ca6] nfs_lock+0x14e/0x162 [ 83.434382] [81199661] vfs_lock_file+0x29/0x35 [ 83.434435] [8119a51d] fcntl_setlk+0x139/0x2c5 [ 83.434490] [81169621] SyS_fcntl+0x2b6/0x47d [ 83.434543] [81613e92] system_call_fastpath+0x16/0x1b [ 83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef f2 ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00 [ 83.435077] RIP [8124af69] encode_nlm4_lock+0x26/0xbe [ 83.435140] RSP 8804199e3a78 [ 83.435197] CR2: 008c That's here: (gdb) list *(encode_nlm4_lock+0x26) 0x8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329). 324 * string caller_nameLM_MAXSTRLEN; 325 */ 326 static void encode_caller_name(struct xdr_stream *xdr, const char *name) 327 { 328 /* NB: client-side does not set lock-len */ 329 u32 length = strlen(name); 330 __be32 *p; 331 332 p = xdr_reserve_space(xdr, 4 + length); 333 xdr_encode_opaque(p, name, length); 0x8124af69 +38:repnz scas %es:(%rdi),%al Pretty clearly, name can be NULL after this patch... Yes. This scheme will only work if we make sure that host-h_rpcclnt is initialised at mount time. Here is a v2 patch that should do the right thing. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From 9a1b6bf818e74bb7aabaecb59492b739f2f4d742 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Mon, 5 Aug 2013 12:06:12 -0400 Subject: [PATCH v2] LOCKD: Don't call utsname()-nodename from nlmclnt_setlockargs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in which case we're in entirely the wrong namespace. Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move exit_task_namespaces() outside of exit_notify()) now means that exit_task_work() is called after
Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*
On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote: On Mon, 5 Aug 2013 18:18:03 + Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote: On Mon, 5 Aug 2013 16:15:01 + Myklebust, Trond trond.mykleb...@netapp.com wrote: From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Mon, 5 Aug 2013 12:06:12 -0400 Subject: [PATCH] LOCKD: Don't call utsname()-nodename from nlmclnt_setlockargs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in which case we're in entirely the wrong namespace. Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move exit_task_namespaces() outside of exit_notify()) now means that exit_task_work() is called after exit_task_namespaces(), which triggers an Oops when we're freeing up the locks. Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Cc: Toralf Förster toralf.foers...@gmx.de Cc: Oleg Nesterov o...@redhat.com Cc: Nix n...@esperi.org.uk Cc: Jeff Layton jlay...@redhat.com --- fs/lockd/clntproc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c index 9760ecb..acd3947 100644 --- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl) { struct nlm_args *argp = req-a_args; struct nlm_lock *lock = argp-lock; + char *nodename = req-a_host-h_rpcclnt-cl_nodename; nlmclnt_next_cookie(argp-cookie); memcpy(lock-fh, NFS_FH(file_inode(fl-fl_file)), sizeof(struct nfs_fh)); - lock-caller = utsname()-nodename; + lock-caller = nodename; lock-oh.data = req-a_owner; lock-oh.len = snprintf(req-a_owner, sizeof(req-a_owner), %u@%s, (unsigned int)fl-fl_u.nfs_fl.owner-pid, - utsname()-nodename); + nodename); lock-svid = fl-fl_u.nfs_fl.owner-pid; lock-fl.fl_start = fl-fl_start; lock-fl.fl_end = fl-fl_end; Looks good to me... Reviewed-by: Jeff Layton jlay...@redhat.com Trond, any thoughts on the other oops that Nix posted? The issue there seems to be that we're trying to do the pathwalk to the rpcbind unix socket from exit_task_work(), but that's happening after we've already called exit_fs(). The trivial answer seems to be to simply call exit_task_work() before exit_fs() there, but it seems like we ought to be doing the upcall to rpcbind in a mount namespace from which we know we can reach the socket... Isn't it enough to just do the same thing as we did for gss proxy? i.e. set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag. See attachment. Yeah, that looks like a reasonable thing to do... OTOH, Is there any other way for a unix socket to end up disconnected other than if we were to close it? Maybe if rpcbind stopped, the socket unlinked and recreated and then started again? If so then you still could potentially end up in this situation even if you didn't autoclose it. True. How about something like the following instead. Note the change to the original patch... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From 00326ed6442c66021cd4b5e19e80f3e2027d5d42 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Mon, 5 Aug 2013 14:10:43 -0400 Subject: [PATCH v2 1/2] SUNRPC: Don't auto-disconnect from the local rpcbind socket There is no need for the kernel to time out the AF_LOCAL connection to the rpcbind socket, and doing so is problematic because when it is time to reconnect, our process may no longer be using the same mount namespace. Reported-by: Nix n...@esperi.org.uk Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Cc: Jeff Layton jlay...@redhat.com Cc: sta...@vger.kernel.org # 3.9.x --- net/sunrpc/rpcb_clnt.c | 8 1 file changed, 8 insertions(+) diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c index 3df764d..b0f7232 100644 --- a/net/sunrpc/rpcb_clnt.c +++ b/net/sunrpc/rpcb_clnt.c @@ -238,6 +238,14 @@ static int rpcb_create_local_unix(struct net *net) .program = rpcb_program, .version = RPCBVERS_2, .authflavor = RPC_AUTH_NULL, + /* + * We turn off the idle timeout to prevent the kernel + * from automatically disconnecting the socket. + * Otherwise, we'd have to cache the mount namespace + * of the caller and somehow pass that to the socket + * reconnect code. + */ + .flags
Re: [ 068/103] SUNRPC: fix races on PipeFS UMOUNT notifications
On Tue, 2013-07-23 at 15:26 -0700, Greg Kroah-Hartman wrote: > 3.10-stable review patch. If anyone has any objections, please let me know. > Again, please drop this patch and 67/103 for now. We'll get back to whether or not this should be stable material later. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Linux 3.11-rc2
On Tue, 2013-07-23 at 13:42 -0700, Linus Torvalds wrote: > On Tue, Jul 23, 2013 at 12:08 PM, wrote: > > Hi Trond, > > > >> OK. With Andre's help, I think we've root caused the problem. Can you > >> please confirm that the attached patch also solves the issue for you? > > > > Seems to work fine, > > > >Reported-and-tested-by: Henrik Rydberg > > Trond, do you have anything else pending and are planning a git pull, > or should I just take this patch directly? Nothing else queued for now, so if you could take it directly, then that would save the trouble of setting up an extra pull. Thanks! -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: Linux 3.11-rc2
On Mon, 2013-07-22 at 21:17 -0400, Trond Myklebust wrote: > On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote: > > Hi Trond, Linus, > > > > On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote: > > > So it's been another week, and -rc2 is out there. > > > > This one happens to break nfs in a rather blunt-instrument fashion - > > creating files on a nfs4 partition [1] no longer works. Bisection > > yields this commit as the culprit: > > > > commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd > > Author: Trond Myklebust > > Date: Wed Jul 17 16:43:16 2013 -0400 > > > > NFSv4: Fix a regression against the FreeBSD server > > > > Technically, the Linux client is allowed by the NFSv4 spec to send > > 3 word bitmaps as part of an OPEN request. However, this causes the > > current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors. > > > > Fix the regression by making the Linux client use a 2 word bitmap unless > > doing NFSv4.2 with labeled NFS. > > > > Signed-off-by: Trond Myklebust > > > > Reverting the patch returns things to normal. > > - Can you please provide me with a binary tcpdump or wireshark dump that > demonstrates the problem? > > - What server is this? OK. With Andre's help, I think we've root caused the problem. Can you please confirm that the attached patch also solves the issue for you? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From 1f84e4f9ef9fc4ff502c112df049dfa6f656f4e0 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 23 Jul 2013 12:53:39 -0400 Subject: [PATCH] NFSv4: Fix brainfart in attribute length calculation The calculation of the attribute length was 4 bytes off. Signed-off-by: Trond Myklebust Tested-by: Andre Heider --- fs/nfs/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index c74d616..3850b01 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -1118,11 +1118,11 @@ static void encode_attrs(struct xdr_stream *xdr, const struct iattr *iap, len, ((char *)p - (char *)q) + 4); BUG(); } - len = (char *)p - (char *)q - (bmval_len << 2); *q++ = htonl(bmval0); *q++ = htonl(bmval1); if (bmval_len == 3) *q++ = htonl(bmval2); + len = (char *)p - (char *)(q + 1); *q = htonl(len); /* out: */ -- 1.8.3.1
Re: Linux 3.11-rc2
On Mon, 2013-07-22 at 21:17 -0400, Trond Myklebust wrote: On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote: Hi Trond, Linus, On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote: So it's been another week, and -rc2 is out there. This one happens to break nfs in a rather blunt-instrument fashion - creating files on a nfs4 partition [1] no longer works. Bisection yields this commit as the culprit: commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd Author: Trond Myklebust trond.mykleb...@netapp.com Date: Wed Jul 17 16:43:16 2013 -0400 NFSv4: Fix a regression against the FreeBSD server Technically, the Linux client is allowed by the NFSv4 spec to send 3 word bitmaps as part of an OPEN request. However, this causes the current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors. Fix the regression by making the Linux client use a 2 word bitmap unless doing NFSv4.2 with labeled NFS. Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Reverting the patch returns things to normal. - Can you please provide me with a binary tcpdump or wireshark dump that demonstrates the problem? - What server is this? OK. With Andre's help, I think we've root caused the problem. Can you please confirm that the attached patch also solves the issue for you? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com From 1f84e4f9ef9fc4ff502c112df049dfa6f656f4e0 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@netapp.com Date: Tue, 23 Jul 2013 12:53:39 -0400 Subject: [PATCH] NFSv4: Fix brainfart in attribute length calculation The calculation of the attribute length was 4 bytes off. Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Tested-by: Andre Heider a.hei...@gmail.com --- fs/nfs/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index c74d616..3850b01 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -1118,11 +1118,11 @@ static void encode_attrs(struct xdr_stream *xdr, const struct iattr *iap, len, ((char *)p - (char *)q) + 4); BUG(); } - len = (char *)p - (char *)q - (bmval_len 2); *q++ = htonl(bmval0); *q++ = htonl(bmval1); if (bmval_len == 3) *q++ = htonl(bmval2); + len = (char *)p - (char *)(q + 1); *q = htonl(len); /* out: */ -- 1.8.3.1
Re: Linux 3.11-rc2
On Tue, 2013-07-23 at 13:42 -0700, Linus Torvalds wrote: On Tue, Jul 23, 2013 at 12:08 PM, rydb...@euromail.se wrote: Hi Trond, OK. With Andre's help, I think we've root caused the problem. Can you please confirm that the attached patch also solves the issue for you? Seems to work fine, Reported-and-tested-by: Henrik Rydberg rydb...@euromail.se Trond, do you have anything else pending and are planning a git pull, or should I just take this patch directly? Nothing else queued for now, so if you could take it directly, then that would save the trouble of setting up an extra pull. Thanks! -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [ 068/103] SUNRPC: fix races on PipeFS UMOUNT notifications
On Tue, 2013-07-23 at 15:26 -0700, Greg Kroah-Hartman wrote: 3.10-stable review patch. If anyone has any objections, please let me know. Again, please drop this patch and 67/103 for now. We'll get back to whether or not this should be stable material later. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag
On Mon, 2013-07-22 at 19:47 -0700, James Bottomley wrote: > On Tue, 2013-07-23 at 02:40 +0000, Myklebust, Trond wrote: > > On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote: > > > The solution, to me, looks simple: Let's co-opt a process we already > > > know how to do: mailing list review and tree handling. So the proposal > > > is simple: > > > > > > 1. Drop the cc: stable@ tag: it makes it way too easy to add an ill > > > reviewed patch to stable > > > 2. All patches to stable should follow current review rules: They > > > should go to the mailing list the original patch was sent to > > > once the original is upstream as a request for stable. > > > 3. Following debate on the list, the original maintainer would be > > > responsible for collecting the patches (including the upstream > > > commit) adjudicating on them and passing them on to stable after > > > list review (either by git tree pull or email to stable@). > > > > > > I contend this raises the bar for adding patches to stable much higher, > > > which seems to be needed, and adds a review stage which involves all the > > > original reviewers. > > > > Could we keep the Cc: stable tag itself, since the dependency > > information ("Cc: # 3.3.x: a1f84a3: sched: > > Check for idle") is actually very useful? If we discard that, then we > > really should revise the whole stable system, since it would mean that > > we are in effect discarding the 'upstream first' rule. > > The two don't follow. No-one's proposing to dump the must be upstream > rule. The proposal is to modify the automatic behaviour that leads to > over tagging for stable and consequently too many "stable" patches that > aren't really. My point was that the _tag_ is useful as a list of dependencies for something that we thing might need to be backported to older kernels. I'd like to see us keep that information somehow. Whether or not we interpret it as being an automatic "for stable" request is a different matter. I'd be quite happy to see the "propose for stable" step as reverting to being a manual step that occurs only after we've upstreamed the fix. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag
On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote: > Before the "3.10.1-stable review" thread degenerated into a disagreement > about habits of politeness, there were some solid points being made > which, I think, bear consideration and which may now be lost. > > The problem, as Jiří Kosina put is succinctly is that the distributions > are finding stable less useful because it contains to much stuff they'd > classify as not stable material. > > The question that arises from this is who is stable aiming at ... > because if it's the distributions (and that's what people seem to be > using it for) then we need to take this feedback seriously. > > The next question is how should we, the maintainers, be policing commits > to stable. As I think has been demonstrated in the discussion the > "stable rules" are more sort of guidelines (apologies for the pirates > reference). In many ways, this is as it should be, because people > should have enough taste to know what constitutes a stable fix. The > real root cause of the problem is that the cc: stable tag can't be > stripped once it's in the tree, so maintainers only get to police things > they put in the tree. Stuff they pull from others is already tagged and > that tag can't be changed. This effectively pushes the problem out to > the lowest (and possibly more inexperienced) leaves of the Maintainer > tree. In theory we have a review stage for stable, but the review > patches don't automatically get routed to the right mailing list and the > first round usually comes out in the merge window when Maintainers' > attention is elsewhere. > > The solution, to me, looks simple: Let's co-opt a process we already > know how to do: mailing list review and tree handling. So the proposal > is simple: > > 1. Drop the cc: stable@ tag: it makes it way too easy to add an ill > reviewed patch to stable > 2. All patches to stable should follow current review rules: They > should go to the mailing list the original patch was sent to > once the original is upstream as a request for stable. > 3. Following debate on the list, the original maintainer would be > responsible for collecting the patches (including the upstream > commit) adjudicating on them and passing them on to stable after > list review (either by git tree pull or email to stable@). > > I contend this raises the bar for adding patches to stable much higher, > which seems to be needed, and adds a review stage which involves all the > original reviewers. Could we keep the Cc: stable tag itself, since the dependency information ("Cc: # 3.3.x: a1f84a3: sched: Check for idle") is actually very useful? If we discard that, then we really should revise the whole stable system, since it would mean that we are in effect discarding the 'upstream first' rule. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: Linux 3.11-rc2
On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote: > Hi Trond, Linus, > > On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote: > > So it's been another week, and -rc2 is out there. > > This one happens to break nfs in a rather blunt-instrument fashion - > creating files on a nfs4 partition [1] no longer works. Bisection > yields this commit as the culprit: > > commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd > Author: Trond Myklebust > Date: Wed Jul 17 16:43:16 2013 -0400 > > NFSv4: Fix a regression against the FreeBSD server > > Technically, the Linux client is allowed by the NFSv4 spec to send > 3 word bitmaps as part of an OPEN request. However, this causes the > current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors. > > Fix the regression by making the Linux client use a 2 word bitmap unless > doing NFSv4.2 with labeled NFS. > > Signed-off-by: Trond Myklebust > > Reverting the patch returns things to normal. - Can you please provide me with a binary tcpdump or wireshark dump that demonstrates the problem? - What server is this? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Linux 3.11-rc2
On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote: Hi Trond, Linus, On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote: So it's been another week, and -rc2 is out there. This one happens to break nfs in a rather blunt-instrument fashion - creating files on a nfs4 partition [1] no longer works. Bisection yields this commit as the culprit: commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd Author: Trond Myklebust trond.mykleb...@netapp.com Date: Wed Jul 17 16:43:16 2013 -0400 NFSv4: Fix a regression against the FreeBSD server Technically, the Linux client is allowed by the NFSv4 spec to send 3 word bitmaps as part of an OPEN request. However, this causes the current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors. Fix the regression by making the Linux client use a 2 word bitmap unless doing NFSv4.2 with labeled NFS. Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com Reverting the patch returns things to normal. - Can you please provide me with a binary tcpdump or wireshark dump that demonstrates the problem? - What server is this? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag
On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote: Before the 3.10.1-stable review thread degenerated into a disagreement about habits of politeness, there were some solid points being made which, I think, bear consideration and which may now be lost. The problem, as Jiří Kosina put is succinctly is that the distributions are finding stable less useful because it contains to much stuff they'd classify as not stable material. The question that arises from this is who is stable aiming at ... because if it's the distributions (and that's what people seem to be using it for) then we need to take this feedback seriously. The next question is how should we, the maintainers, be policing commits to stable. As I think has been demonstrated in the discussion the stable rules are more sort of guidelines (apologies for the pirates reference). In many ways, this is as it should be, because people should have enough taste to know what constitutes a stable fix. The real root cause of the problem is that the cc: stable tag can't be stripped once it's in the tree, so maintainers only get to police things they put in the tree. Stuff they pull from others is already tagged and that tag can't be changed. This effectively pushes the problem out to the lowest (and possibly more inexperienced) leaves of the Maintainer tree. In theory we have a review stage for stable, but the review patches don't automatically get routed to the right mailing list and the first round usually comes out in the merge window when Maintainers' attention is elsewhere. The solution, to me, looks simple: Let's co-opt a process we already know how to do: mailing list review and tree handling. So the proposal is simple: 1. Drop the cc: stable@ tag: it makes it way too easy to add an ill reviewed patch to stable 2. All patches to stable should follow current review rules: They should go to the mailing list the original patch was sent to once the original is upstream as a request for stable. 3. Following debate on the list, the original maintainer would be responsible for collecting the patches (including the upstream commit) adjudicating on them and passing them on to stable after list review (either by git tree pull or email to stable@). I contend this raises the bar for adding patches to stable much higher, which seems to be needed, and adds a review stage which involves all the original reviewers. Could we keep the Cc: stable tag itself, since the dependency information (Cc: sta...@vger.kernel.org # 3.3.x: a1f84a3: sched: Check for idle) is actually very useful? If we discard that, then we really should revise the whole stable system, since it would mean that we are in effect discarding the 'upstream first' rule. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag
On Mon, 2013-07-22 at 19:47 -0700, James Bottomley wrote: On Tue, 2013-07-23 at 02:40 +, Myklebust, Trond wrote: On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote: The solution, to me, looks simple: Let's co-opt a process we already know how to do: mailing list review and tree handling. So the proposal is simple: 1. Drop the cc: stable@ tag: it makes it way too easy to add an ill reviewed patch to stable 2. All patches to stable should follow current review rules: They should go to the mailing list the original patch was sent to once the original is upstream as a request for stable. 3. Following debate on the list, the original maintainer would be responsible for collecting the patches (including the upstream commit) adjudicating on them and passing them on to stable after list review (either by git tree pull or email to stable@). I contend this raises the bar for adding patches to stable much higher, which seems to be needed, and adds a review stage which involves all the original reviewers. Could we keep the Cc: stable tag itself, since the dependency information (Cc: sta...@vger.kernel.org # 3.3.x: a1f84a3: sched: Check for idle) is actually very useful? If we discard that, then we really should revise the whole stable system, since it would mean that we are in effect discarding the 'upstream first' rule. The two don't follow. No-one's proposing to dump the must be upstream rule. The proposal is to modify the automatic behaviour that leads to over tagging for stable and consequently too many stable patches that aren't really. My point was that the _tag_ is useful as a list of dependencies for something that we thing might need to be backported to older kernels. I'd like to see us keep that information somehow. Whether or not we interpret it as being an automatic for stable request is a different matter. I'd be quite happy to see the propose for stable step as reverting to being a manual step that occurs only after we've upstreamed the fix. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client fixes
Hi Linus, The following changes since commit ad81f0545ef01ea651886dddac4bef6cec930092: Linux 3.11-rc1 (2013-07-14 15:18:27 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-3 for you to fetch changes up to b4a2cf76ab7c08628c62b2062dacefa496b59dfd: NFSv4: Fix a regression against the FreeBSD server (2013-07-17 16:54:46 -0400) NFS client bugfixes for 3.11 - Fix a regression against NFSv4 FreeBSD servers when creating a new file - Fix another regression in rpc_client_register() Trond Myklebust (2): SUNRPC: Fix another issue with rpc_client_register() NFSv4: Fix a regression against the FreeBSD server fs/nfs/nfs4xdr.c | 21 ++--- net/sunrpc/clnt.c | 1 + 2 files changed, 15 insertions(+), 7 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client fixes
Hi Linus, The following changes since commit ad81f0545ef01ea651886dddac4bef6cec930092: Linux 3.11-rc1 (2013-07-14 15:18:27 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-3 for you to fetch changes up to b4a2cf76ab7c08628c62b2062dacefa496b59dfd: NFSv4: Fix a regression against the FreeBSD server (2013-07-17 16:54:46 -0400) NFS client bugfixes for 3.11 - Fix a regression against NFSv4 FreeBSD servers when creating a new file - Fix another regression in rpc_client_register() Trond Myklebust (2): SUNRPC: Fix another issue with rpc_client_register() NFSv4: Fix a regression against the FreeBSD server fs/nfs/nfs4xdr.c | 21 ++--- net/sunrpc/clnt.c | 1 + 2 files changed, 15 insertions(+), 7 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [GIT PULL] x86 fixes for 3.11-rc2
On Thu, 2013-07-18 at 17:46 -0700, Linus Torvalds wrote: > Finnish is hard. But good for swearing. Only because the ratio of vowels to consonants causes an immediate outbreak of swearing among those who try... Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [GIT PULL] x86 fixes for 3.11-rc2
On Thu, 2013-07-18 at 17:46 -0700, Linus Torvalds wrote: Finnish is hard. But good for swearing. Only because the ratio of vowels to consonants causes an immediate outbreak of swearing among those who try... Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML
On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote: > On 07/16/2013 07:12 PM, Sarah Sharp wrote: > > On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote: > >> On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote: > >> > >>> Yes, that's true. Some kernel developers are better at moderating their > >>> comments and tone towards individuals who are "sensitive". Others > >>> simply don't give a shit. So we need to figure out how to meet > >>> somewhere in the middle, in order to establish a baseline of civility. > >> I have to ask this because I'm thick, and don't really understand, > >> but ... > >> > >> What problem exactly are we trying to solve here? > > Personal attacks are not cool Steve. Some people simply don't care if a > > verbal tirade is directed at them. Others do not want anyone to attack > > them personally, but they're fine with people attacking their code. > > > > Bystanders that don't understand the kernel community structure are > > discouraged from contributing because they don't want to be verbally > > abused, and they really don't want to see either personal attacks or > > intense belittling, demeaning comments about code. > > > > In order to make our community better, we need to figure out where the > > baseline of "good" behavior is. We need to define what behavior we want > > from both maintainers and patch submitters. E.g. "No regressions" and > > "don't break userspace" and "no personal attacks". That needs to be > > written down somewhere, and it isn't. If it's documented somewhere, > > point me to the file in Documentation. Hint: it's not there. > > > > That is the problem. > > > > Sarah Sharp > > The problem you are pointing out - and it is a problem - makes us less > effective > as a community. Not really. Most of the people who already work as part of this community are completely used to it. We've created the environment, and have no problems with it. Where it could possibly be a problem is when it comes to recruiting _new_ members to our community. Particularly so given that some journalists take a special pleasure in reporting particularly juicy comments and antics. That would tend to scare off a lot of gun-shy newbies. On the other hand, it might tend to bias our recruitment toward people of a more "special" disposition. Perhaps we finally need the services of a social scientist to help us find out... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML
On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote: On 07/16/2013 07:12 PM, Sarah Sharp wrote: On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote: On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote: Yes, that's true. Some kernel developers are better at moderating their comments and tone towards individuals who are sensitive. Others simply don't give a shit. So we need to figure out how to meet somewhere in the middle, in order to establish a baseline of civility. I have to ask this because I'm thick, and don't really understand, but ... What problem exactly are we trying to solve here? Personal attacks are not cool Steve. Some people simply don't care if a verbal tirade is directed at them. Others do not want anyone to attack them personally, but they're fine with people attacking their code. Bystanders that don't understand the kernel community structure are discouraged from contributing because they don't want to be verbally abused, and they really don't want to see either personal attacks or intense belittling, demeaning comments about code. In order to make our community better, we need to figure out where the baseline of good behavior is. We need to define what behavior we want from both maintainers and patch submitters. E.g. No regressions and don't break userspace and no personal attacks. That needs to be written down somewhere, and it isn't. If it's documented somewhere, point me to the file in Documentation. Hint: it's not there. That is the problem. Sarah Sharp The problem you are pointing out - and it is a problem - makes us less effective as a community. Not really. Most of the people who already work as part of this community are completely used to it. We've created the environment, and have no problems with it. Where it could possibly be a problem is when it comes to recruiting _new_ members to our community. Particularly so given that some journalists take a special pleasure in reporting particularly juicy comments and antics. That would tend to scare off a lot of gun-shy newbies. On the other hand, it might tend to bias our recruitment toward people of a more special disposition. Perhaps we finally need the services of a social scientist to help us find out... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: sunrpc/clnt.c: BUG kmalloc-256 (Not tainted): Poison overwritten
On Sun, 2013-07-14 at 10:02 +0200, Toralf Förster wrote: > This bisected commit produces at a 32 bit user mode linux guest the attached > BUG : > > commit 245268c951262b861bc1be4e9dc812352499 > Author: Trond Myklebust > Date: Wed Jul 10 15:33:01 2013 -0400 > > SUNRPC: Fix a deadlock in rpc_client_register() > > Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on > PipeFS MOUNT notifications) introduces a regression when we call > rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour. > > By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we > end up deadlocking in gss_pipes_dentries_create_net(). > Fix is to register the client and release the mutex before calling > rpcauth_create(). > > Reported-by: Weston Andros Adamson > Tested-by: Weston Andros Adamson > Cc: Stanislav Kinsbursky > Cc: # : 3848160: SUNRPC: fix races on PipeFS > MOUNT > Cc: # : e73f4cc: SUNRPC: split client creation > Signed-off-by: Trond Myklebust > > > > > > > 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Version 1.2.6 starting > 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Backgrounding to > notify hosts... > 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1043]: Running as root. > chown /var/lib/nfs to choose different user > 2013-07-13T22:09:10.000+02:00 trinity mount[1047]: mount to NFS server > 'n22stab4' failed: No route to host, retrying > 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: sending IPv6 Router > Solicitation > 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: no IPv6 Routers > available > 2013-07-13T22:09:13.000+02:00 trinity mount[1048]: mount to NFS server > 'n22stab4' failed: No route to host, retrying > 2013-07-13T22:09:13.647+02:00 trinity kernel: > = > 2013-07-13T22:09:13.647+02:00 trinity kernel: BUG kmalloc-256 (Not tainted): > Poison overwritten > 2013-07-13T22:09:13.647+02:00 trinity kernel: > - > 2013-07-13T22:09:13.647+02:00 trinity kernel: > 2013-07-13T22:09:13.647+02:00 trinity kernel: Disabling lock debugging due to > kernel taint > 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: 0x49b1ed18-0x49b1ed1b. > First byte 0x74 instead of 0x6b > 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Allocated in > rpc_new_client+0x81/0x3a0 age=300 cpu=0 pid=1049 > 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Freed in > rpc_new_client+0x35a/0x3a0 age=300 cpu=0 pid=1049 > 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Slab 0x0b87bac0 > objects=13 used=13 fp=0x (null) flags=0x0080 > 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Object 0x49b1ed10 > @offset=3344 fp=0x49b1ee40 > 2013-07-13T22:09:13.653+02:00 trinity kernel: > 2013-07-13T22:09:13.653+02:00 trinity kernel: Bytes b4 49b1ed00: f6 03 00 00 > bd 94 ff ff 5a 5a 5a 5a 5a 5a 5a 5a > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed10: 6b 6b 6b 6b 6b > 6b 6b 6b 74 3a 85 49 6b 6b 6b 6b t:.I > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed20: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed30: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed40: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed50: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed60: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed70: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed80: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ed90: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1eda0: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edb0: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edc0: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edd0: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ede0: 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edf0: 6b 6b 6b 6b 6b > 6b
Re: sunrpc/clnt.c: BUG kmalloc-256 (Not tainted): Poison overwritten
On Sun, 2013-07-14 at 10:02 +0200, Toralf Förster wrote: This bisected commit produces at a 32 bit user mode linux guest the attached BUG : commit 245268c951262b861bc1be4e9dc812352499 Author: Trond Myklebust trond.mykleb...@netapp.com Date: Wed Jul 10 15:33:01 2013 -0400 SUNRPC: Fix a deadlock in rpc_client_register() Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on PipeFS MOUNT notifications) introduces a regression when we call rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour. By calling rpcauth_create() while holding the sn-pipefs_sb_lock, we end up deadlocking in gss_pipes_dentries_create_net(). Fix is to register the client and release the mutex before calling rpcauth_create(). Reported-by: Weston Andros Adamson d...@netapp.com Tested-by: Weston Andros Adamson d...@netapp.com Cc: Stanislav Kinsbursky skinsbur...@parallels.com Cc: sta...@vger.kernel.org # : 3848160: SUNRPC: fix races on PipeFS MOUNT Cc: sta...@vger.kernel.org # : e73f4cc: SUNRPC: split client creation Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Version 1.2.6 starting 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Backgrounding to notify hosts... 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1043]: Running as root. chown /var/lib/nfs to choose different user 2013-07-13T22:09:10.000+02:00 trinity mount[1047]: mount to NFS server 'n22stab4' failed: No route to host, retrying 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: sending IPv6 Router Solicitation 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: no IPv6 Routers available 2013-07-13T22:09:13.000+02:00 trinity mount[1048]: mount to NFS server 'n22stab4' failed: No route to host, retrying 2013-07-13T22:09:13.647+02:00 trinity kernel: = 2013-07-13T22:09:13.647+02:00 trinity kernel: BUG kmalloc-256 (Not tainted): Poison overwritten 2013-07-13T22:09:13.647+02:00 trinity kernel: - 2013-07-13T22:09:13.647+02:00 trinity kernel: 2013-07-13T22:09:13.647+02:00 trinity kernel: Disabling lock debugging due to kernel taint 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: 0x49b1ed18-0x49b1ed1b. First byte 0x74 instead of 0x6b 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Allocated in rpc_new_client+0x81/0x3a0 age=300 cpu=0 pid=1049 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Freed in rpc_new_client+0x35a/0x3a0 age=300 cpu=0 pid=1049 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Slab 0x0b87bac0 objects=13 used=13 fp=0x (null) flags=0x0080 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Object 0x49b1ed10 @offset=3344 fp=0x49b1ee40 2013-07-13T22:09:13.653+02:00 trinity kernel: 2013-07-13T22:09:13.653+02:00 trinity kernel: Bytes b4 49b1ed00: f6 03 00 00 bd 94 ff ff 5a 5a 5a 5a 5a 5a 5a 5a 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed10: 6b 6b 6b 6b 6b 6b 6b 6b 74 3a 85 49 6b 6b 6b 6b t:.I 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ed90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1eda0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ede0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
[GIT PULL] Please pull NFS client updates
Hi Linus, The following pull request mainly contains some small readdir optimisations that had dependencies on Al Viro's readdir rewrite. There is also a fix for a nasty deadlock which surfaced earlier in this merge window. The following changes since commit a82a729f04232ccd0b59406574ba4cf20027a49d: Merge branch 'akpm' (updates from Andrew Morton) (2013-07-09 13:33:36 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-2 for you to fetch changes up to 245268c951262b861bc1be4e9dc812352499: SUNRPC: Fix a deadlock in rpc_client_register() (2013-07-10 15:58:55 -0400) NFS client updates for Linux 3.11 (part 2) Highlights include: - Fix an_rpc pipefs regression that causes a deadlock on mount - Readdir optimisations by Scott Mayhew and Jeff Layton - clean up the rpc_pipefs dentry operation setup Fengguang Wu (1): rpc_pipe: rpc_dir_inode_operations can be static Jeff Layton (2): nfs: set verifier on existing dentries in nfs_prime_dcache rpc_pipe: set dentry operations at d_alloc time Scott Mayhew (3): NFS: Make nfs_attribute_cache_expired() non-static NFS: Make nfs_readdir revalidate less often NFS: Allow nfs_updatepage to extend a write under additional circumstances Trond Myklebust (1): SUNRPC: Fix a deadlock in rpc_client_register() fs/nfs/dir.c | 6 -- fs/nfs/inode.c | 2 +- fs/nfs/write.c | 31 +++ include/linux/nfs_fs.h | 1 + net/sunrpc/clnt.c | 16 +--- net/sunrpc/rpc_pipe.c | 25 - 6 files changed, 58 insertions(+), 23 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client updates
Hi Linus, The following pull request mainly contains some small readdir optimisations that had dependencies on Al Viro's readdir rewrite. There is also a fix for a nasty deadlock which surfaced earlier in this merge window. The following changes since commit a82a729f04232ccd0b59406574ba4cf20027a49d: Merge branch 'akpm' (updates from Andrew Morton) (2013-07-09 13:33:36 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-2 for you to fetch changes up to 245268c951262b861bc1be4e9dc812352499: SUNRPC: Fix a deadlock in rpc_client_register() (2013-07-10 15:58:55 -0400) NFS client updates for Linux 3.11 (part 2) Highlights include: - Fix an_rpc pipefs regression that causes a deadlock on mount - Readdir optimisations by Scott Mayhew and Jeff Layton - clean up the rpc_pipefs dentry operation setup Fengguang Wu (1): rpc_pipe: rpc_dir_inode_operations can be static Jeff Layton (2): nfs: set verifier on existing dentries in nfs_prime_dcache rpc_pipe: set dentry operations at d_alloc time Scott Mayhew (3): NFS: Make nfs_attribute_cache_expired() non-static NFS: Make nfs_readdir revalidate less often NFS: Allow nfs_updatepage to extend a write under additional circumstances Trond Myklebust (1): SUNRPC: Fix a deadlock in rpc_client_register() fs/nfs/dir.c | 6 -- fs/nfs/inode.c | 2 +- fs/nfs/write.c | 31 +++ include/linux/nfs_fs.h | 1 + net/sunrpc/clnt.c | 16 +--- net/sunrpc/rpc_pipe.c | 25 - 6 files changed, 58 insertions(+), 23 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client updates
Hi Linus, The following changes since commit f722406faae2d073cc1d01063d1123c35425939e: Linux 3.10-rc1 (2013-05-11 17:14:08 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-1 for you to fetch changes up to 959d921f5eb8878ea16049a7f6e9bcbb6dfbcb88: Merge branch 'labeled-nfs' into linux-next (2013-06-28 16:29:51 -0400) NFS client updates for Linux 3.11 Feature highlights include: - Add basic client support for NFSv4.2 - Add basic client support for Labeled NFS (selinux for NFSv4.2) - Fix the use of credentials in NFSv4.1 stateful operations, and add support for NFSv4.1 state protection. Bugfix highlights: - Fix another NFSv4 open state recovery race - Fix an NFSv4.1 back channel session regression - Various rpc_pipefs races - Fix another issue with NFSv3 auth negotiation Please note that Labeled NFS does require some additional support from the security subsystem. The relevant changesets have all been reviewed and acked by James Morris. Andy Adamson (6): NFSv4.1 Fix a pNFS session draining deadlock NFSv4.1 end back channel session draining NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs Bryan Schumaker (4): NFS: Make callbacks minor version generic NFS: Add in v4.2 callback operation NFS: Apply v4.1 capabilities to v4.2 NFS: Improve legacy idmapping fallback Chuck Lever (3): NFS: Fix SETCLIENTID fallback if GSS is not available NFS: Fix security flavor negotiation with legacy binary mounts NFS: Set NFS_CS_MIGRATION for NFSv4 mounts David Quigley (10): Security: Add hook to calculate context based on a negative dentry. Security: Add Hook to test if the particular xattr is part of a MAC model. LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data. SELinux: Add new labeling type native labels NFSv4: Add label recommended attribute and NFSv4 flags NFSv4: Extend fattr bitmaps to support all 3 words NFS:Add labels to client function prototypes NFS: Add label lifecycle management NFS: Client implementation of Labeled-NFS NFS: Extend NFS xattr handlers to accept the security namespace Djalal Harouni (1): NFSv4: SETCLIENTID add the format string for the NETID Jeff Layton (5): rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set nfs: refactor "need_mount" code out of nfs_try_mount nfs: move server_authlist into nfs_try_mount_request nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it nfs: have NFSv3 try server-specified auth flavors in turn Stanislav Kinsbursky (4): SUNRPC: fix races on PipeFS MOUNT notifications SUNRPC: fix races on PipeFS UMOUNT notifications SUNRPC: split client creation routine into setup and registration SUNRPC: PipeFS MOUNT notification optimization for dying clients Steve Dickson (4): NFS: Add NFSv4.2 protocol constants NFSv4.2: Added NFS v4.2 support to the NFS client NFSv4: Introduce new label structure Kconfig: Add Kconfig entry for Labeled NFS V4 client Trond Myklebust (26): SUNRPC: Fix a bug in gss_create_upcall SUNRPC: Faster detection if gssd is actually running SUNRPC: Convert auth_gss pipe detection to work in namespaces SUNRPC: Prevent an rpc_task wakeup race NFSv4: Fix a thinko in nfs4_try_open_cached NFSv4.1: Ensure that layoutget is called using the layout credential NFSv4.1: Ensure that layoutreturn uses the correct credential NFSv4.1: Ensure that reclaim_complete uses the right credential NFSv4.1: Ensure that test_stateid and free_stateid use correct credentials NFSv4.1: Use layout credentials for get_deviceinfo calls NFSv4.1: Enable state protection NFSv4.1: Simplify setting the layout header credential SUNRPC: Fix a potential race in rpc_execute SUNRPC: Remove unused function rpc_queue_empty SUNRPC: Remove the unused helpers task_for_each() and task_for_first() SUNRPC: Remove unused functions rpc_task_set/has_priority SUNRPC: Remove redundant call to rpc_set_running() in __rpc_execute() NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_open NFSv4: Cleanup: pass the nfs_open_context to nfs4_do_open NFSv4: Refactor _nfs4_open_and_get_state to set ctx->state NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code NFSv4: Close another NFSv4 recovery race NFSv4: Move the DNS resolver into the NFSv4 module
Re: [PATCH v3 24/25] sunrpc: Change how dentry's d_lock field is accessed
On Thu, 2013-07-04 at 05:20 +0100, Al Viro wrote: > On Wed, Jul 03, 2013 at 04:25:32PM -0400, Waiman Long wrote: > > There is no change in logic and everything should just work. > > > - spin_lock(>f_path.dentry->d_lock); > > + d_lock(file->f_path.dentry); > > if (!d_unhashed(file->f_path.dentry)) > > clnt = RPC_I(inode)->private; > > if (clnt != NULL && atomic_inc_not_zero(>cl_count)) { > > - spin_unlock(>f_path.dentry->d_lock); > > + d_unlock(file->f_path.dentry); > > Could somebody explain WTF is being protected here? It's not ->private - > that gets set (and, more importantly, cleared) without ->d_lock in sight. > Trond, that seems to be your code from about three years ago (introduced > in "SUNRPC: Fix a race in rpc_info_open"). What's going on there? AFAICR we're using the fact that the dentry will remain hashed until we're in the process of freeing up the rpc_client. By testing that the dentry is hashed under the dentry->d_lock, we are assured that the non-NULL 'clnt' is still pointing to a valid rpc_client, and that it is safe to access clnt->cl_count. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
Re: [PATCH v3 24/25] sunrpc: Change how dentry's d_lock field is accessed
On Thu, 2013-07-04 at 05:20 +0100, Al Viro wrote: On Wed, Jul 03, 2013 at 04:25:32PM -0400, Waiman Long wrote: There is no change in logic and everything should just work. - spin_lock(file-f_path.dentry-d_lock); + d_lock(file-f_path.dentry); if (!d_unhashed(file-f_path.dentry)) clnt = RPC_I(inode)-private; if (clnt != NULL atomic_inc_not_zero(clnt-cl_count)) { - spin_unlock(file-f_path.dentry-d_lock); + d_unlock(file-f_path.dentry); Could somebody explain WTF is being protected here? It's not -private - that gets set (and, more importantly, cleared) without -d_lock in sight. Trond, that seems to be your code from about three years ago (introduced in SUNRPC: Fix a race in rpc_info_open). What's going on there? AFAICR we're using the fact that the dentry will remain hashed until we're in the process of freeing up the rpc_client. By testing that the dentry is hashed under the dentry-d_lock, we are assured that the non-NULL 'clnt' is still pointing to a valid rpc_client, and that it is safe to access clnt-cl_count. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com
[GIT PULL] Please pull NFS client updates
Hi Linus, The following changes since commit f722406faae2d073cc1d01063d1123c35425939e: Linux 3.10-rc1 (2013-05-11 17:14:08 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-1 for you to fetch changes up to 959d921f5eb8878ea16049a7f6e9bcbb6dfbcb88: Merge branch 'labeled-nfs' into linux-next (2013-06-28 16:29:51 -0400) NFS client updates for Linux 3.11 Feature highlights include: - Add basic client support for NFSv4.2 - Add basic client support for Labeled NFS (selinux for NFSv4.2) - Fix the use of credentials in NFSv4.1 stateful operations, and add support for NFSv4.1 state protection. Bugfix highlights: - Fix another NFSv4 open state recovery race - Fix an NFSv4.1 back channel session regression - Various rpc_pipefs races - Fix another issue with NFSv3 auth negotiation Please note that Labeled NFS does require some additional support from the security subsystem. The relevant changesets have all been reviewed and acked by James Morris. Andy Adamson (6): NFSv4.1 Fix a pNFS session draining deadlock NFSv4.1 end back channel session draining NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs Bryan Schumaker (4): NFS: Make callbacks minor version generic NFS: Add in v4.2 callback operation NFS: Apply v4.1 capabilities to v4.2 NFS: Improve legacy idmapping fallback Chuck Lever (3): NFS: Fix SETCLIENTID fallback if GSS is not available NFS: Fix security flavor negotiation with legacy binary mounts NFS: Set NFS_CS_MIGRATION for NFSv4 mounts David Quigley (10): Security: Add hook to calculate context based on a negative dentry. Security: Add Hook to test if the particular xattr is part of a MAC model. LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data. SELinux: Add new labeling type native labels NFSv4: Add label recommended attribute and NFSv4 flags NFSv4: Extend fattr bitmaps to support all 3 words NFS:Add labels to client function prototypes NFS: Add label lifecycle management NFS: Client implementation of Labeled-NFS NFS: Extend NFS xattr handlers to accept the security namespace Djalal Harouni (1): NFSv4: SETCLIENTID add the format string for the NETID Jeff Layton (5): rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set nfs: refactor need_mount code out of nfs_try_mount nfs: move server_authlist into nfs_try_mount_request nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it nfs: have NFSv3 try server-specified auth flavors in turn Stanislav Kinsbursky (4): SUNRPC: fix races on PipeFS MOUNT notifications SUNRPC: fix races on PipeFS UMOUNT notifications SUNRPC: split client creation routine into setup and registration SUNRPC: PipeFS MOUNT notification optimization for dying clients Steve Dickson (4): NFS: Add NFSv4.2 protocol constants NFSv4.2: Added NFS v4.2 support to the NFS client NFSv4: Introduce new label structure Kconfig: Add Kconfig entry for Labeled NFS V4 client Trond Myklebust (26): SUNRPC: Fix a bug in gss_create_upcall SUNRPC: Faster detection if gssd is actually running SUNRPC: Convert auth_gss pipe detection to work in namespaces SUNRPC: Prevent an rpc_task wakeup race NFSv4: Fix a thinko in nfs4_try_open_cached NFSv4.1: Ensure that layoutget is called using the layout credential NFSv4.1: Ensure that layoutreturn uses the correct credential NFSv4.1: Ensure that reclaim_complete uses the right credential NFSv4.1: Ensure that test_stateid and free_stateid use correct credentials NFSv4.1: Use layout credentials for get_deviceinfo calls NFSv4.1: Enable state protection NFSv4.1: Simplify setting the layout header credential SUNRPC: Fix a potential race in rpc_execute SUNRPC: Remove unused function rpc_queue_empty SUNRPC: Remove the unused helpers task_for_each() and task_for_first() SUNRPC: Remove unused functions rpc_task_set/has_priority SUNRPC: Remove redundant call to rpc_set_running() in __rpc_execute() NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_open NFSv4: Cleanup: pass the nfs_open_context to nfs4_do_open NFSv4: Refactor _nfs4_open_and_get_state to set ctx-state NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code NFSv4: Close another NFSv4 recovery race NFSv4: Move the DNS resolver into the NFSv4 module
Re: [PATCH v3 2/4] SUNRPC: fix races on PipeFS UMOUNT notifications
On Mon, 2013-06-24 at 11:52 +0400, Stanislav Kinsbursky wrote: > CPU#0 CPU#1 > - - > rpc_kill_sb > sn->pipefs_sb = NULLrpc_release_client > (UMOUNT_EVENT) rpc_free_auth > rpc_pipefs_event > rpc_get_client_for_event > !atomic_inc_not_zero(cl_count) > > atomic_inc(cl_count) > rpc_free_client > rpc_clnt_remove_pipedir > > > To fix this, this patch does the following: > > 1) Calls RPC_PIPEFS_UMOUNT notification with sn->pipefs_sb_lock being held. > 2) Removes SUNRPC client from the list AFTER pipes destroying. > 3) Doesn't hold RPC client on notification: if client in the list, then it > can't be destroyed while sn->pipefs_sb_lock in hold by notification caller. > > Signed-off-by: Stanislav Kinsbursky > Cc: sta...@vger.kernel.org > --- > net/sunrpc/clnt.c |5 + > net/sunrpc/rpc_pipe.c |2 +- > 2 files changed, 2 insertions(+), 5 deletions(-) > diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c > index c512448..efca2f7 100644 > --- a/net/sunrpc/rpc_pipe.c > +++ b/net/sunrpc/rpc_pipe.c > @@ -1165,7 +1165,6 @@ static void rpc_kill_sb(struct super_block *sb) > goto out; > } > sn->pipefs_sb = NULL; > - mutex_unlock(>pipefs_sb_lock); > dprintk("RPC: sending pipefs UMOUNT notification for net %p%s\n", > net, NET_NAME(net)); > blocking_notifier_call_chain(_pipefs_notifier_list, > @@ -1173,6 +1172,7 @@ static void rpc_kill_sb(struct super_block *sb) > sb); > put_net(net); > out: > + mutex_unlock(>pipefs_sb_lock); Is this safe to do after the put_net()? > kill_litter_super(sb); > } > > -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] SUNRPC: fix races on PipeFS UMOUNT notifications
On Mon, 2013-06-24 at 11:52 +0400, Stanislav Kinsbursky wrote: CPU#0 CPU#1 - - rpc_kill_sb sn-pipefs_sb = NULLrpc_release_client (UMOUNT_EVENT) rpc_free_auth rpc_pipefs_event rpc_get_client_for_event !atomic_inc_not_zero(cl_count) skip the client atomic_inc(cl_count) rpc_free_client rpc_clnt_remove_pipedir skip client dir removing To fix this, this patch does the following: 1) Calls RPC_PIPEFS_UMOUNT notification with sn-pipefs_sb_lock being held. 2) Removes SUNRPC client from the list AFTER pipes destroying. 3) Doesn't hold RPC client on notification: if client in the list, then it can't be destroyed while sn-pipefs_sb_lock in hold by notification caller. Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com Cc: sta...@vger.kernel.org --- net/sunrpc/clnt.c |5 + net/sunrpc/rpc_pipe.c |2 +- 2 files changed, 2 insertions(+), 5 deletions(-) snip diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c index c512448..efca2f7 100644 --- a/net/sunrpc/rpc_pipe.c +++ b/net/sunrpc/rpc_pipe.c @@ -1165,7 +1165,6 @@ static void rpc_kill_sb(struct super_block *sb) goto out; } sn-pipefs_sb = NULL; - mutex_unlock(sn-pipefs_sb_lock); dprintk(RPC: sending pipefs UMOUNT notification for net %p%s\n, net, NET_NAME(net)); blocking_notifier_call_chain(rpc_pipefs_notifier_list, @@ -1173,6 +1172,7 @@ static void rpc_kill_sb(struct super_block *sb) sb); put_net(net); out: + mutex_unlock(sn-pipefs_sb_lock); Is this safe to do after the put_net()? kill_litter_super(sb); } -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/4] SUNRPC: fix races on PipeFS MOUNT notifications
On Tue, 2013-06-11 at 18:39 +0400, Stanislav Kinsbursky wrote: > Below are races, when RPC client can be created without PiepFS dentries > > CPU#0 CPU#1 > - - > rpc_new_clientrpc_fill_super > rpc_setup_pipedir > mutex_lock(>pipefs_sb_lock) > rpc_get_sb_net == NULL > (no per-net PipeFS superblock) > sn->pipefs_sb = sb; > notifier_call_chain(MOUNT) > (client is not in the list) > rpc_register_client > (client without pipes dentries) > > To fix this patch: > 1) makes PipeFS mount notification call with pipefs_sb_lock being held. > 2) releases pipefs_sb_lock on new SUNRPC client creation only after > registration. > > Signed-off-by: Stanislav Kinsbursky > Cc: sta...@vger.kernel.org Hi Stanislav, This isn't going to apply to the stable kernels without the cleanup patch. Could you please reorganise this patch series so that the cleanup comes last. Thanks, Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/4] SUNRPC: fix races on PipeFS MOUNT notifications
On Tue, 2013-06-11 at 18:39 +0400, Stanislav Kinsbursky wrote: Below are races, when RPC client can be created without PiepFS dentries CPU#0 CPU#1 - - rpc_new_clientrpc_fill_super rpc_setup_pipedir mutex_lock(sn-pipefs_sb_lock) rpc_get_sb_net == NULL (no per-net PipeFS superblock) sn-pipefs_sb = sb; notifier_call_chain(MOUNT) (client is not in the list) rpc_register_client (client without pipes dentries) To fix this patch: 1) makes PipeFS mount notification call with pipefs_sb_lock being held. 2) releases pipefs_sb_lock on new SUNRPC client creation only after registration. Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com Cc: sta...@vger.kernel.org Hi Stanislav, This isn't going to apply to the stable kernels without the cleanup patch. Could you please reorganise this patch series so that the cleanup comes last. Thanks, Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Please pull 2 NFS client bugfixes
Hi Linus, The following changes since commit 83c168bf8017212a9d502536f9dcd0b54d24e330: NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 -0400) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-4 for you to fetch changes up to eb54d43707c69340581940e1fcaecb4d7d17b814: NFS: Fix security flavor negotiation with legacy binary mounts (2013-05-30 16:31:34 -0400) NFS client fixes: - Fix a regression that broke NFS mounting using klibc and busybox - Stable fix to check access modes correctly on NFSv4 delegated open() Chuck Lever (1): NFS: Fix security flavor negotiation with legacy binary mounts Trond Myklebust (1): NFSv4: Fix a thinko in nfs4_try_open_cached fs/nfs/nfs4proc.c | 2 +- fs/nfs/super.c| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com signature.asc Description: This is a digitally signed message part
[GIT PULL] Please pull 2 NFS client bugfixes
Hi Linus, The following changes since commit 83c168bf8017212a9d502536f9dcd0b54d24e330: NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 -0400) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-4 for you to fetch changes up to eb54d43707c69340581940e1fcaecb4d7d17b814: NFS: Fix security flavor negotiation with legacy binary mounts (2013-05-30 16:31:34 -0400) NFS client fixes: - Fix a regression that broke NFS mounting using klibc and busybox - Stable fix to check access modes correctly on NFSv4 delegated open() Chuck Lever (1): NFS: Fix security flavor negotiation with legacy binary mounts Trond Myklebust (1): NFSv4: Fix a thinko in nfs4_try_open_cached fs/nfs/nfs4proc.c | 2 +- fs/nfs/super.c| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com signature.asc Description: This is a digitally signed message part
Re: 3.10-rc3 NFSv3 mount issues
On Thu, 2013-05-30 at 16:26 -0400, Chuck Lever wrote: > On May 30, 2013, at 4:19 PM, Jim Schutt wrote: > > > Hi, > > > > I've been trying to test 3.10-rc3 on some diskless clients, and found > > that I can no longer mount my root file system via NFSv3. > > > > I poked around looking at NFS changes for 3.10, and found these two > > commits: > > > > d497ab9751 "NFSv3: match sec= flavor against server list" > > 4580a92d44 "NFS: Use server-recommended security flavor by default (NFSv3)" > > > > If I revert both of these commits from 3.10-rc3, then my diskless > > client can mount its root file system. > > > > The busybox mount command fails like this, when using 3.10-rc3: > > > > / # mount -t nfs -o ro,nolock,vers=3,proto=tcp > > 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt > > mount: mounting 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x on /mnt > > failed: Invalid argument > > > > The commit messages for both these commits seem to say that mounting > > with the "sys=sec" option should work, but unfortunately, my busybox doesn't > > seem to understand the "sec=" mount option: > > > > / # mount -t nfs -o ro,nolock,vers=3,proto=tcp,sec=sys > > 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt > > mount: invalid number 'sys' > > > > My NFS server is based on RHEL6, and is not using any "sec=" option > > in its export for this file system. I did try exporting with "sec=sys", > > but it didn't seem to make any difference either. > > > > So far, this seems like a regression to me > > Any ideas what I might be doing wrong? How can I > > help make this work again? > > 3.10-rc3 appears to be missing the fix for this. See: > > http://marc.info/?l=linux-nfs=136855668104598=2 > > Trond, can we get this applied? > For some reason it got lost in the mail heap. I've applied it now to the 'bugfixes' branch. Will push upstream in the next few days... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.10-rc3 NFSv3 mount issues
On Thu, 2013-05-30 at 16:26 -0400, Chuck Lever wrote: On May 30, 2013, at 4:19 PM, Jim Schutt jasc...@sandia.gov wrote: Hi, I've been trying to test 3.10-rc3 on some diskless clients, and found that I can no longer mount my root file system via NFSv3. I poked around looking at NFS changes for 3.10, and found these two commits: d497ab9751 NFSv3: match sec= flavor against server list 4580a92d44 NFS: Use server-recommended security flavor by default (NFSv3) If I revert both of these commits from 3.10-rc3, then my diskless client can mount its root file system. The busybox mount command fails like this, when using 3.10-rc3: / # mount -t nfs -o ro,nolock,vers=3,proto=tcp 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt mount: mounting 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x on /mnt failed: Invalid argument The commit messages for both these commits seem to say that mounting with the sys=sec option should work, but unfortunately, my busybox doesn't seem to understand the sec= mount option: / # mount -t nfs -o ro,nolock,vers=3,proto=tcp,sec=sys 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt mount: invalid number 'sys' My NFS server is based on RHEL6, and is not using any sec= option in its export for this file system. I did try exporting with sec=sys, but it didn't seem to make any difference either. So far, this seems like a regression to me Any ideas what I might be doing wrong? How can I help make this work again? 3.10-rc3 appears to be missing the fix for this. See: http://marc.info/?l=linux-nfsm=136855668104598w=2 Trond, can we get this applied? For some reason it got lost in the mail heap. I've applied it now to the 'bugfixes' branch. Will push upstream in the next few days... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.9-stable] NFSv4.1 Fix a pNFS session draining deadlock
On Mon, 2013-05-27 at 09:23 +0900, Jonghwan Choi wrote: > This patch looks like it should be in the 3.9-stable tree, should we apply > it? It's a condition which appears to be extremely rare: so far, we've only seen it during extreme stress testing at NetApp. For that reason, and because it is NFSv4.1 only, I'm inclined to wait until we see real-world cases before making it a stable patch. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Please pull NFS client bugfixes
Hi Linus, The following changes since commit f722406faae2d073cc1d01063d1123c35425939e: Linux 3.10-rc1 (2013-05-11 17:14:08 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-3 for you to fetch changes up to 83c168bf8017212a9d502536f9dcd0b54d24e330: NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 -0400) NFS client bugfixes for 3.10 - Stable fix to prevent an rpc_task wakeup race - Fix a NFSv4.1 session drain deadlock - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd - Ensure auth_gss pipe detection works in namespaces - Fix SETCLIENTID fallback if rpcsec_gss is not available Andy Adamson (1): NFSv4.1 Fix a pNFS session draining deadlock Chuck Lever (1): NFS: Fix SETCLIENTID fallback if GSS is not available Trond Myklebust (4): SUNRPC: Fix a bug in gss_create_upcall SUNRPC: Faster detection if gssd is actually running SUNRPC: Convert auth_gss pipe detection to work in namespaces SUNRPC: Prevent an rpc_task wakeup race fs/nfs/callback_proc.c | 2 +- fs/nfs/callback_xdr.c | 2 +- fs/nfs/nfs4client.c| 2 +- fs/nfs/nfs4proc.c | 2 +- fs/nfs/nfs4session.c | 4 +-- fs/nfs/nfs4session.h | 13 + fs/nfs/nfs4state.c | 15 +- net/sunrpc/auth_gss/auth_gss.c | 62 -- net/sunrpc/netns.h | 4 +++ net/sunrpc/rpc_pipe.c | 5 net/sunrpc/sched.c | 8 +- 11 files changed, 78 insertions(+), 41 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Please pull NFS client bugfixes
Hi Linus, The following changes since commit f722406faae2d073cc1d01063d1123c35425939e: Linux 3.10-rc1 (2013-05-11 17:14:08 -0700) are available in the git repository at: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-3 for you to fetch changes up to 83c168bf8017212a9d502536f9dcd0b54d24e330: NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 -0400) NFS client bugfixes for 3.10 - Stable fix to prevent an rpc_task wakeup race - Fix a NFSv4.1 session drain deadlock - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd - Ensure auth_gss pipe detection works in namespaces - Fix SETCLIENTID fallback if rpcsec_gss is not available Andy Adamson (1): NFSv4.1 Fix a pNFS session draining deadlock Chuck Lever (1): NFS: Fix SETCLIENTID fallback if GSS is not available Trond Myklebust (4): SUNRPC: Fix a bug in gss_create_upcall SUNRPC: Faster detection if gssd is actually running SUNRPC: Convert auth_gss pipe detection to work in namespaces SUNRPC: Prevent an rpc_task wakeup race fs/nfs/callback_proc.c | 2 +- fs/nfs/callback_xdr.c | 2 +- fs/nfs/nfs4client.c| 2 +- fs/nfs/nfs4proc.c | 2 +- fs/nfs/nfs4session.c | 4 +-- fs/nfs/nfs4session.h | 13 + fs/nfs/nfs4state.c | 15 +- net/sunrpc/auth_gss/auth_gss.c | 62 -- net/sunrpc/netns.h | 4 +++ net/sunrpc/rpc_pipe.c | 5 net/sunrpc/sched.c | 8 +- 11 files changed, 78 insertions(+), 41 deletions(-) -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.9-stable] NFSv4.1 Fix a pNFS session draining deadlock
On Mon, 2013-05-27 at 09:23 +0900, Jonghwan Choi wrote: This patch looks like it should be in the 3.9-stable tree, should we apply it? It's a condition which appears to be extremely rare: so far, we've only seen it during extreme stress testing at NetApp. For that reason, and because it is NFSv4.1 only, I'm inclined to wait until we see real-world cases before making it a stable patch. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support
On Wed, 2013-05-15 at 16:19 -0400, J. Bruce Fields wrote: > On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote: > > This crude patch illustrates the simplest plumbing involved in > > supporting sys_call_range with the NFS COPY operation that's pending in > > the 4.2 draft spec. > > > > The patch is based on a previous prototype that used the COPY op to > > implement sys_copyfileat which created a new file (based on the ocfs2 > > reflink ioctl). By contrast, this copies file contents between existing > > files. > > > > There's still a lot of implementation and testing to do, but this can > > get discussion going. > > I'm using: > > git://github.com/loghyr/NFSv4.2 > > as my reference for the draft protocol. > > On a quick skim, one thing this is missing before it complies is a > client implementation of CB_OFFLOAD: "If a client desires an > intra-server file copy, then it MUST support the COPY and CB_OFFLOAD > operations." Note that Bryan is currently working on updating the NFS implementation to match the draft protocol. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support
On Wed, 2013-05-15 at 16:19 -0400, J. Bruce Fields wrote: On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote: This crude patch illustrates the simplest plumbing involved in supporting sys_call_range with the NFS COPY operation that's pending in the 4.2 draft spec. The patch is based on a previous prototype that used the COPY op to implement sys_copyfileat which created a new file (based on the ocfs2 reflink ioctl). By contrast, this copies file contents between existing files. There's still a lot of implementation and testing to do, but this can get discussion going. I'm using: git://github.com/loghyr/NFSv4.2 as my reference for the draft protocol. On a quick skim, one thing this is missing before it complies is a client implementation of CB_OFFLOAD: If a client desires an intra-server file copy, then it MUST support the COPY and CB_OFFLOAD operations. Note that Bryan is currently working on updating the NFS implementation to match the draft protocol. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/