from:"Myklebust, Trond"

[GIT PULL] Please pull NFS client bugfixes

2013-11-16 Thread Myklebust, Trond

Hi Linus

The following changes since commit fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f:

  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security 
(2013-11-04 16:42:52 -0500)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-2

for you to fetch changes up to 8c2fabc6542d9d0f8b16bd1045c2eda59bdcde13:

  nfs: fix pnfs Kconfig defaults (2013-11-15 13:41:43 -0500)


NFS client bugfixes:

- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is


Christoph Hellwig (1):
  nfs: fix pnfs Kconfig defaults

Jeff Layton (1):
  nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once

NeilBrown (1):
  NFS: correctly report misuse of "migration" mount option.

Trond Myklebust (2):
  SUNRPC: Fix a data corruption issue when retransmitting RPC calls
  SUNRPC: Avoid deep recursion in rpc_release_client

 fs/nfs/Kconfig|  6 +++---
 fs/nfs/nfs4state.c|  7 ++-
 fs/nfs/super.c|  2 +-
 net/sunrpc/clnt.c | 29 +
 net/sunrpc/xprtsock.c | 28 +---
 5 files changed, 48 insertions(+), 24 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com


signature.asc
Description: This is a digitally signed message part

[GIT PULL] Please pull NFS client bugfixes

2013-11-16 Thread Myklebust, Trond

Hi Linus

The following changes since commit fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f:

  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security 
(2013-11-04 16:42:52 -0500)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-2

for you to fetch changes up to 8c2fabc6542d9d0f8b16bd1045c2eda59bdcde13:

  nfs: fix pnfs Kconfig defaults (2013-11-15 13:41:43 -0500)


NFS client bugfixes:

- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is


Christoph Hellwig (1):
  nfs: fix pnfs Kconfig defaults

Jeff Layton (1):
  nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once

NeilBrown (1):
  NFS: correctly report misuse of migration mount option.

Trond Myklebust (2):
  SUNRPC: Fix a data corruption issue when retransmitting RPC calls
  SUNRPC: Avoid deep recursion in rpc_release_client

 fs/nfs/Kconfig|  6 +++---
 fs/nfs/nfs4state.c|  7 ++-
 fs/nfs/super.c|  2 +-
 net/sunrpc/clnt.c | 29 +
 net/sunrpc/xprtsock.c | 28 +---
 5 files changed, 48 insertions(+), 24 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com


signature.asc
Description: This is a digitally signed message part

[GIT PULL] Please pull NFS client changes for 3.13

2013-11-07 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f927318840745095cc7003f1564ca4b87655745d:

  Merge tag 'nfs-for-3.12-4' of 
git://git.linux-nfs.org/projects/trondmy/linux-nfs (2013-09-30 17:10:26 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-1

for you to fetch changes up to fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f:

  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security 
(2013-11-04 16:42:52 -0500)


NFS client updates for Linux 3.13

Highlights include:

- Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry
  - Detect TCP connection breakage through the "keepalive" mechanism
- Add client side support for NFSv4.x migration (Chuck Lever)
- Add support for multiple security flavour arguments to the "sec=" mount
  option (Dros Adamson)
- fs-cache bugfixes from David Howells:
  - Fix an issue whereby caching can be enabled on a file that is open for
writing
- More NFSv4 open code stable bugfixes
- Various Labeled NFS (selinux) bugfixes, including one stable fix
- Fix buffer overflow checking in the RPCSEC_GSS upcall encoding


Andy Adamson (1):
  NFSv4 Remove zeroing state kern warnings

Chuck Lever (20):
  SUNRPC: Modify synopsis of rpc_client_register()
  NFS: Add nfs4_update_server
  NFS: Add functions to swap transports during migration recovery
  NFS: Introduce a vector of migration recovery ops
  NFS: Export _nfs_display_fhandle()
  NFS: Add method to retrieve fs_locations during migration recovery
  NFS: Add a super_block backpointer to the nfs_server struct
  NFS: Add basic migration support to state manager thread
  NFS: Re-use exit code in nfs4_async_handle_error()
  NFS: Rename "stateid_invalid" label
  NFS: Add migration recovery callouts in nfs4proc.c
  NFS: Handle NFS4ERR_MOVED during delegation recall
  NFS: Add method to detect whether an FSID is still on the server
  NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager
  NFS: Implement support for NFS4ERR_LEASE_MOVED
  NFS: Migration support for RELEASE_LOCKOWNER
  NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW
  NFS: Handle SEQ4_STATUS_LEASE_MOVED
  NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR
  NFS: Fix possible endless state recovery wait

David Howells (3):
  FS-Cache: Add use/unuse/wake cookie wrappers
  FS-Cache: Provide the ability to enable/disable cookies
  NFS: Use i_writecount to control whether to get an fscache cookie in 
nfs_open()

Geyslan G. Bem (3):
  nfs: Remove useless 'error' assignment
  nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
  nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'

J. Bruce Fields (2):
  sunrpc: comment typo fix
  nfs: use IS_ROOT not DCACHE_DISCONNECTED

Jeff Layton (5):
  nfs: reject version and minorversion changes on remount attempts
  nfs: fix handling of invalid mount options in nfs_remount
  nfs: fix inverted test for delegation in nfs4_reclaim_open_state
  nfs: fix oops when trying to set SELinux label
  nfs: set security label when revalidating inode

NeilBrown (1):
  SUNRPC: close a rare race in xs_tcp_setup_socket.

Trond Myklebust (24):
  NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()
  SUNRPC: Enable the keepalive option for TCP sockets
  SUNRPC: Only update the TCP connect cookie on a successful connect
  SUNRPC: Don't set the request connect_cookie until a successful transmit
  SUNRPC: Clear the request rq_bytes_sent field in xprt_release_write
  SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool
  SUNRPC: Add RPC task and client level options to disable the resend 
timeout
  NFSv4: Ensure that we disable the resend timeout for NFSv4
  SUNRPC: Fix RPC call retransmission statistics
  SUNRPC: Remove redundant initialisations of request rq_bytes_sent
  SUNRPC: call_connect_status should recheck bind and connect status on 
error
  NFSv4.1: Don't change the security label as part of open reclaim.
  NFSv4: Fix state reference counting in 
_nfs4_opendata_reclaim_to_nfs4_state
  SUNRPC: Add a helper to switch the transport of an rpc_clnt
  SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
  SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
  SUNRPC: Fix buffer overflow checking in 
gss_encode_v0_msg/gss_encode_v1_msg
  Merge branch 'fscache' of git://git.kernel.org/.../dhowells/linux-fs into 
linux-next
  SUNRPC: Cleanup xs_destroy()
  NFS: Fix a missing initialisation when reading the SELinux label
  NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
  NFSv4.2: encode_readdir - only ask for labels when doing readdirplus

[GIT PULL] Please pull NFS client changes for 3.13

2013-11-07 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f927318840745095cc7003f1564ca4b87655745d:

  Merge tag 'nfs-for-3.12-4' of 
git://git.linux-nfs.org/projects/trondmy/linux-nfs (2013-09-30 17:10:26 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.13-1

for you to fetch changes up to fab99ebe39fe7d11fbd9b5fb84f07432af9ba36f:

  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security 
(2013-11-04 16:42:52 -0500)


NFS client updates for Linux 3.13

Highlights include:

- Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry
  - Detect TCP connection breakage through the keepalive mechanism
- Add client side support for NFSv4.x migration (Chuck Lever)
- Add support for multiple security flavour arguments to the sec= mount
  option (Dros Adamson)
- fs-cache bugfixes from David Howells:
  - Fix an issue whereby caching can be enabled on a file that is open for
writing
- More NFSv4 open code stable bugfixes
- Various Labeled NFS (selinux) bugfixes, including one stable fix
- Fix buffer overflow checking in the RPCSEC_GSS upcall encoding


Andy Adamson (1):
  NFSv4 Remove zeroing state kern warnings

Chuck Lever (20):
  SUNRPC: Modify synopsis of rpc_client_register()
  NFS: Add nfs4_update_server
  NFS: Add functions to swap transports during migration recovery
  NFS: Introduce a vector of migration recovery ops
  NFS: Export _nfs_display_fhandle()
  NFS: Add method to retrieve fs_locations during migration recovery
  NFS: Add a super_block backpointer to the nfs_server struct
  NFS: Add basic migration support to state manager thread
  NFS: Re-use exit code in nfs4_async_handle_error()
  NFS: Rename stateid_invalid label
  NFS: Add migration recovery callouts in nfs4proc.c
  NFS: Handle NFS4ERR_MOVED during delegation recall
  NFS: Add method to detect whether an FSID is still on the server
  NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager
  NFS: Implement support for NFS4ERR_LEASE_MOVED
  NFS: Migration support for RELEASE_LOCKOWNER
  NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW
  NFS: Handle SEQ4_STATUS_LEASE_MOVED
  NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR
  NFS: Fix possible endless state recovery wait

David Howells (3):
  FS-Cache: Add use/unuse/wake cookie wrappers
  FS-Cache: Provide the ability to enable/disable cookies
  NFS: Use i_writecount to control whether to get an fscache cookie in 
nfs_open()

Geyslan G. Bem (3):
  nfs: Remove useless 'error' assignment
  nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
  nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'

J. Bruce Fields (2):
  sunrpc: comment typo fix
  nfs: use IS_ROOT not DCACHE_DISCONNECTED

Jeff Layton (5):
  nfs: reject version and minorversion changes on remount attempts
  nfs: fix handling of invalid mount options in nfs_remount
  nfs: fix inverted test for delegation in nfs4_reclaim_open_state
  nfs: fix oops when trying to set SELinux label
  nfs: set security label when revalidating inode

NeilBrown (1):
  SUNRPC: close a rare race in xs_tcp_setup_socket.

Trond Myklebust (24):
  NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()
  SUNRPC: Enable the keepalive option for TCP sockets
  SUNRPC: Only update the TCP connect cookie on a successful connect
  SUNRPC: Don't set the request connect_cookie until a successful transmit
  SUNRPC: Clear the request rq_bytes_sent field in xprt_release_write
  SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool
  SUNRPC: Add RPC task and client level options to disable the resend 
timeout
  NFSv4: Ensure that we disable the resend timeout for NFSv4
  SUNRPC: Fix RPC call retransmission statistics
  SUNRPC: Remove redundant initialisations of request rq_bytes_sent
  SUNRPC: call_connect_status should recheck bind and connect status on 
error
  NFSv4.1: Don't change the security label as part of open reclaim.
  NFSv4: Fix state reference counting in 
_nfs4_opendata_reclaim_to_nfs4_state
  SUNRPC: Add a helper to switch the transport of an rpc_clnt
  SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
  SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
  SUNRPC: Fix buffer overflow checking in 
gss_encode_v0_msg/gss_encode_v1_msg
  Merge branch 'fscache' of git://git.kernel.org/.../dhowells/linux-fs into 
linux-next
  SUNRPC: Cleanup xs_destroy()
  NFS: Fix a missing initialisation when reading the SELinux label
  NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
  NFSv4.2: encode_readdir - only ask for labels when doing readdirplus

Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

2013-10-18 Thread Myklebust, Trond

On Fri, 2013-10-18 at 22:03 +0200, Helge Deller wrote:
> On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
> > Also, could you please try a sysRQ-t the next time it happens, so that
> > we can get a trace of where the mount program is hanging. Knowing that
> > the mount is stuck in "__schedule()" is not really interesting unless we
> > know from where that was called.
> 
> Actually, the machine was still running in this state.
> Here is sysrq-t:
> [112009.084000] mount   S 401040c0 0 25331  1 
> 0x0010
> [112009.084000] Backtrace:
> [112009.084000]  [<40113a68>] __schedule팞瓓ﴱ
> [112009.232000]
> [112009.232000] mount.nfs   D 401040c0 0 25332  25331 
> 0x0010
> [112009.232000] Backtrace:
> [112009.232000]  [<40113a68>] __schedule팞瓓ﴱ

That makes no sense unless sysrq-t works differently on parisc than on
other platforms. I'd expect the backtrace to at least include a system
call. Parisc experts?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

2013-10-18 Thread Myklebust, Trond

On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote:
> On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
> > On Thu, 2013-10-17 at 22:42 퍭, Helge Deller wrote:
> >> I'm seeing a regression with current kernel git head when using NFS-mounts.
> >> Architecture in my case is parisc, although I don't think that this is 
> >> relevant.
> >> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
> >>
> >> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
> >> Here is an output with kswapd1:
> >>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEM TIME COMMAND
> >>37 root  20   0 000 R  91.8  0.0  63:00.40 kswapd1
> >> 28448 root  20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
> >> 1 root  20   0  2784  988  852 S   0.0  0.0   0:09.95 init
> >>
> >> This is what ps shows:
> >> ls:~# ps -ef |  grep mount
> >> root  1181 1  0 14:51 ?00:00:18 /usr/sbin/automount 
> >> --pid-file /var/run/autofs.pid
> >> root 25331  1181  0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o 
> >> nolock,rw,hard,intr homes:/unixhome1 /net/home1
> >> root 25332 25331  0 21:25 ?00:00:00 /sbin/mount.nfs 
> >> homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
> >>
> >> And using sysrq to show the blocked tasks I get in syslog:
> >> SysRq : Show Blocked State
> >> mount.nfs   D 401040c0 0 25332  25331 0x0010
> >> Backtrace:
> >> [<40113a68>] __schedule팞瓓ﴱ
> >>
> >> I know it's not a problem of the NFS server, since the same mount is still 
> >> ok on other machines.
> >> The NFS directory was already mounted and in use when this mount happened 
> >> again (called by cron-job). 
> >>  
> >> Any ideas?
> > 
> > If the NFS directory is already mounted, then why is the automounter
> > trying to mount it a second time?
> 
> I was wrong in this.
> The directory wasn't mounted yet (or at least it was unmounted in the 
> meantime before the new
> mount.nfs was called).
> 
> I'm now not even sure, that the high kswapd is really triggered by the NFS 
> problem,
> because I now have another machine with the blocked NFS-mount, but without
> the high kswapd usage.
> 
> Nevertheless, the blocked nfs mount tasks really make me wonder. There is 
> clearly
> some kind of regression since it doesn't happen with older kernels.

Have you ever reproduced it without the automounter?

Also, could you please try a sysRQ-t the next time it happens, so that
we can get a trace of where the mount program is hanging. Knowing that
the mount is stuck in "__schedule()" is not really interesting unless we
know from where that was called.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

2013-10-18 Thread Myklebust, Trond

On Fri, 2013-10-18 at 21:26 +0200, Helge Deller wrote:
 On 10/17/2013 11:07 PM, Myklebust, Trond wrote:
  On Thu, 2013-10-17 at 22:42 퍭, Helge Deller wrote:
  I'm seeing a regression with current kernel git head when using NFS-mounts.
  Architecture in my case is parisc, although I don't think that this is 
  relevant.
  At least kernel 3.10 (and I think 3.11) didn't showed that problem.
 
  The symtom is, that top shows high usage of either kswapd0 or kswapd1.
  Here is an output with kswapd1:
PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEM TIME COMMAND
 37 root  20   0 000 R  91.8  0.0  63:00.40 kswapd1
  28448 root  20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
  1 root  20   0  2784  988  852 S   0.0  0.0   0:09.95 init
 
  This is what ps shows:
  ls:~# ps -ef |  grep mount
  root  1181 1  0 14:51 ?00:00:18 /usr/sbin/automount 
  --pid-file /var/run/autofs.pid
  root 25331  1181  0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o 
  nolock,rw,hard,intr homes:/unixhome1 /net/home1
  root 25332 25331  0 21:25 ?00:00:00 /sbin/mount.nfs 
  homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
 
  And using sysrq to show the blocked tasks I get in syslog:
  SysRq : Show Blocked State
  mount.nfs   D 401040c0 0 25332  25331 0x0010
  Backtrace:
  [40113a68] __schedule팞瓓ﴱ
 
  I know it's not a problem of the NFS server, since the same mount is still 
  ok on other machines.
  The NFS directory was already mounted and in use when this mount happened 
  again (called by cron-job). 
   
  Any ideas?
  
  If the NFS directory is already mounted, then why is the automounter
  trying to mount it a second time?
 
 I was wrong in this.
 The directory wasn't mounted yet (or at least it was unmounted in the 
 meantime before the new
 mount.nfs was called).
 
 I'm now not even sure, that the high kswapd is really triggered by the NFS 
 problem,
 because I now have another machine with the blocked NFS-mount, but without
 the high kswapd usage.
 
 Nevertheless, the blocked nfs mount tasks really make me wonder. There is 
 clearly
 some kind of regression since it doesn't happen with older kernels.

Have you ever reproduced it without the automounter?

Also, could you please try a sysRQ-t the next time it happens, so that
we can get a trace of where the mount program is hanging. Knowing that
the mount is stuck in __schedule() is not really interesting unless we
know from where that was called.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

2013-10-18 Thread Myklebust, Trond

On Fri, 2013-10-18 at 22:03 +0200, Helge Deller wrote:
 On 10/18/2013 09:36 PM, Myklebust, Trond wrote:
  Also, could you please try a sysRQ-t the next time it happens, so that
  we can get a trace of where the mount program is hanging. Knowing that
  the mount is stuck in __schedule() is not really interesting unless we
  know from where that was called.
 
 Actually, the machine was still running in this state.
 Here is sysrq-t:
 [112009.084000] mount   S 401040c0 0 25331  1 
 0x0010
 [112009.084000] Backtrace:
 [112009.084000]  [40113a68] __schedule팞瓓ﴱ
 [112009.232000]
 [112009.232000] mount.nfs   D 401040c0 0 25332  25331 
 0x0010
 [112009.232000] Backtrace:
 [112009.232000]  [40113a68] __schedule팞瓓ﴱ

That makes no sense unless sysrq-t works differently on parisc than on
other platforms. I'd expect the backtrace to at least include a system
call. Parisc experts?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

2013-10-17 Thread Myklebust, Trond

On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote:
> I'm seeing a regression with current kernel git head when using NFS-mounts.
> Architecture in my case is parisc, although I don't think that this is 
> relevant.
> At least kernel 3.10 (and I think 3.11) didn't showed that problem.
> 
> The symtom is, that "top" shows high usage of either kswapd0 or kswapd1.
> Here is an output with kswapd1:
>   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEM TIME+ COMMAND
>37 root  20   0 000 R  91.8  0.0  63:00.40 kswapd1
> 28448 root  20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
> 1 root  20   0  2784  988  852 S   0.0  0.0   0:09.95 init
> 
> This is what ps shows:
> ls:~# ps -ef |  grep mount
> root  1181 1  0 14:51 ?00:00:18 /usr/sbin/automount 
> --pid-file /var/run/autofs.pid
> root 25331  1181  0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o 
> nolock,rw,hard,intr homes:/unixhome1 /net/home1
> root 25332 25331  0 21:25 ?00:00:00 /sbin/mount.nfs 
> homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
> 
> And using sysrq to show the blocked tasks I get in syslog:
> SysRq : Show Blocked State
> mount.nfs   D 401040c0 0 25332  25331 0x0010
> Backtrace:
> [<40113a68>] __schedule+0x500/0x810
> 
> I know it's not a problem of the NFS server, since the same mount is still ok 
> on other machines.
> The NFS directory was already mounted and in use when this mount happened 
> again (called by cron-job). 
>  
> Any ideas?

If the NFS directory is already mounted, then why is the automounter
trying to mount it a second time?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12-rcX - NFS regression - kswapd0 / kswapd1 stays using 100% CPU?

2013-10-17 Thread Myklebust, Trond

On Thu, 2013-10-17 at 22:42 +0200, Helge Deller wrote:
 I'm seeing a regression with current kernel git head when using NFS-mounts.
 Architecture in my case is parisc, although I don't think that this is 
 relevant.
 At least kernel 3.10 (and I think 3.11) didn't showed that problem.
 
 The symtom is, that top shows high usage of either kswapd0 or kswapd1.
 Here is an output with kswapd1:
   PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEM TIME+ COMMAND
37 root  20   0 000 R  91.8  0.0  63:00.40 kswapd1
 28448 root  20   0  3252 1428 1060 R  15.3  0.0   0:00.09 top
 1 root  20   0  2784  988  852 S   0.0  0.0   0:09.95 init
 
 This is what ps shows:
 ls:~# ps -ef |  grep mount
 root  1181 1  0 14:51 ?00:00:18 /usr/sbin/automount 
 --pid-file /var/run/autofs.pid
 root 25331  1181  0 21:25 ?00:00:00 /bin/mount -n -t nfs -s -o 
 nolock,rw,hard,intr homes:/unixhome1 /net/home1
 root 25332 25331  0 21:25 ?00:00:00 /sbin/mount.nfs 
 homes:/unixhome1 /net/home1 -s -n -o rw,nolock,hard,intr
 
 And using sysrq to show the blocked tasks I get in syslog:
 SysRq : Show Blocked State
 mount.nfs   D 401040c0 0 25332  25331 0x0010
 Backtrace:
 [40113a68] __schedule+0x500/0x810
 
 I know it's not a problem of the NFS server, since the same mount is still ok 
 on other machines.
 The NFS directory was already mounted and in use when this mount happened 
 again (called by cron-job). 
  
 Any ideas?

If the NFS directory is already mounted, then why is the automounter
trying to mount it a second time?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client bugfixes

2013-09-30 Thread Myklebust, Trond


Hi Linus,

The following changes since commit 4a10c2ac2f368583138b774ca41fac4207911983:

  Linux 3.12-rc2 (2013-09-23 15:41:09 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-4

for you to fetch changes up to 367156d9a87b21b5232dd93107c5fc61b09ba2ef:

  NFS: Give "flavor" an initial value to fix a compile warning (2013-09-29 
16:03:34 -0400)


NFS client bugfixes for 3.12

- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root filesystem
- Fix a memory ordering issue in the pNFS files layout driver


Anna Schumaker (1):
  NFS: Give "flavor" an initial value to fix a compile warning

Trond Myklebust (3):
  NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem 
method
  NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
  NFSv4.1: Ensure memory ordering between nfs4_ds_connect and 
nfs4_fl_prepare_ds

Weston Andros Adamson (1):
  NFSv4.1: try SECINFO_NO_NAME flavs until one works

 fs/nfs/dir.c   |  2 +-
 fs/nfs/nfs4file.c  |  3 ++-
 fs/nfs/nfs4filelayoutdev.c | 20 +---
 fs/nfs/nfs4proc.c  | 58 +-
 include/linux/nfs_xdr.h|  3 ++-
 5 files changed, 63 insertions(+), 23 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 16:08 -0400, Ric Wheeler wrote:
> On 09/30/2013 04:00 PM, Bernd Schubert wrote:
> > pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
> > interface? And userspace needs to address all of them differently? 
> 
> The NFS and SCSI groups have each defined a standard which Zach's proposal 
> abstracts into a common user API.
> 
> Distributed file systems tend to be rather unique and do not have similar 
> standard bodies, but a lot of them could hide server specific implementations 
> under the current proposed interfaces.
> 
> What is not a good idea is to drag out the core, simple copy offload 
> discussion 
> for another 5 years to pull in every odd use case :)

Agreed. The whole idea of a common system call interface should be to
allow us to abstract away the underlying storage and filesystem
architectures. If filesystem developers also want a way to expose that
underlying architecture to applications in order to enable further
optimisations, then that belongs in a separate discussion.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote:
> On 09/30/2013 09:34 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
> >> On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
> >>> On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> >>>> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> >>>>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >>>>>> It would be nice if there would be way if the file system would get a
> >>>>>> hint that the target file is supposed to be copy of another file. That
> >>>>>> way distributed file systems could also create the target-file with the
> >>>>>> correct meta-information (same storage targets as in-file has).
> >>>>>> Well, if we cannot agree on that, file system with a custom protocol at
> >>>>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >>>>>> sure if this would work for pNFS, though.
> >>>>>
> >>>>> splice() does not create new files. What you appear to be asking for
> >>>>> lies way outside the scope of that system call interface.
> >>>>>
> >>>>
> >>>> Sorry I know, definitely outside the scope of splice, but in the context
> >>>> of offloaded file copies. So the question is, what is the best way to
> >>>> address/discuss that?
> >>>
> >>> Why does it need to be addressed in the first place?
> >>
> >> An offloaded copy is still not efficient if different storage
> >> servers/targets used by from-file and to-file.
> >
> > So?
> 
> mds1: orig-file
> oss1/target1: orig-chunk1
> 
> mds1: target-file
> ossN/targetN: target-chunk1
> 
> clientN: Performs the copy
> 
> Ideally, orig-chunk1 and target-chunk1 are on the same server and same 
> target. Copy offload then even could done from the underlying fs, 
> similiar as local splice.
> If different ossN servers are used copies still have to be done over 
> network by these storage servers, although the client only would need to 
> initiate the copy. Still faster, but also not ideal.
> 
> >
> >>>
> >>> What is preventing an application from retrieving and setting this
> >>> information using standard libc functions such as fstat()+open(), and
> >>> supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
> >>> where appropriate?
> >>>
> >>
> >> At a minimum this requires network and metadata overhead. And while I'm
> >> working on FhGFS now, I still wonder what other file system need to do -
> >> for example Lustre pre-allocates storage-target files on creating a
> >> file, so file layout changes mean even more overhead there.
> >
> > The problem you are describing is limited to a narrow set of storage
> > architectures. If copy offload using splice() doesn't make sense for
> > those architectures, then don't implement it for them.
> 
> But it _does_ make sense. The file system just needs a hint that a 
> splice copy is going to come up.

Just wait for the splice() system call. How is this any different from
write()?

> > You might be able to provide ioctls() to do these special hinted file
> > creations for those filesystems that need it, but the vast majority
> > don't, and you shouldn't enforce it on them.
> 
> And exactly for that we need a standard - it does not make sense if each 
> and every distributed file system implements its own 
> ioctl/libattr/libacl interface for that.
> 
> >
> >> Anyway, if we could agree on to use libattr or libacl to teach the file
> >> system about the upcoming splice call I would be fine.
> >
> > libattr and libacl are generic libraries that exist to manipulate xattrs
> > and acls. They do not need to contain Lustre-specific code.
> >
> 
> pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
> interface? And userspace needs to address all of them differently?
>
> I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, 
> didn't find a better name yet), which would take in-file-path and 
> out-file-path and allow the file system to create out-file-path with the 
> same meta-layout as in-file-path. And it would need some flags, such as 
> AUTO (file system decides if it makes sense to do a local copy) and 
> FORCE (always try a local copy).

splice() is not a whole-file copy operation; it's a byte range copy. How
does the above help other than in the whole-file case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
> On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> >> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> >>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >>>> It would be nice if there would be way if the file system would get a
> >>>> hint that the target file is supposed to be copy of another file. That
> >>>> way distributed file systems could also create the target-file with the
> >>>> correct meta-information (same storage targets as in-file has).
> >>>> Well, if we cannot agree on that, file system with a custom protocol at
> >>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >>>> sure if this would work for pNFS, though.
> >>>
> >>> splice() does not create new files. What you appear to be asking for
> >>> lies way outside the scope of that system call interface.
> >>>
> >>
> >> Sorry I know, definitely outside the scope of splice, but in the context
> >> of offloaded file copies. So the question is, what is the best way to
> >> address/discuss that?
> >
> > Why does it need to be addressed in the first place?
> 
> An offloaded copy is still not efficient if different storage 
> servers/targets used by from-file and to-file.

So? 

> >
> > What is preventing an application from retrieving and setting this
> > information using standard libc functions such as fstat()+open(), and
> > supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
> > where appropriate?
> >
> 
> At a minimum this requires network and metadata overhead. And while I'm 
> working on FhGFS now, I still wonder what other file system need to do - 
> for example Lustre pre-allocates storage-target files on creating a 
> file, so file layout changes mean even more overhead there.

The problem you are describing is limited to a narrow set of storage
architectures. If copy offload using splice() doesn't make sense for
those architectures, then don't implement it for them.
You might be able to provide ioctls() to do these special hinted file
creations for those filesystems that need it, but the vast majority
don't, and you shouldn't enforce it on them.

> Anyway, if we could agree on to use libattr or libacl to teach the file 
> system about the upcoming splice call I would be fine.

libattr and libacl are generic libraries that exist to manipulate xattrs
and acls. They do not need to contain Lustre-specific code.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >> It would be nice if there would be way if the file system would get a
> >> hint that the target file is supposed to be copy of another file. That
> >> way distributed file systems could also create the target-file with the
> >> correct meta-information (same storage targets as in-file has).
> >> Well, if we cannot agree on that, file system with a custom protocol at
> >> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >> sure if this would work for pNFS, though.
> >
> > splice() does not create new files. What you appear to be asking for
> > lies way outside the scope of that system call interface.
> >
> 
> Sorry I know, definitely outside the scope of splice, but in the context 
> of offloaded file copies. So the question is, what is the best way to 
> address/discuss that?

Why does it need to be addressed in the first place?

What is preventing an application from retrieving and setting this
information using standard libc functions such as fstat()+open(), and
supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
where appropriate?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> It would be nice if there would be way if the file system would get a 
> hint that the target file is supposed to be copy of another file. That 
> way distributed file systems could also create the target-file with the 
> correct meta-information (same storage targets as in-file has).
> Well, if we cannot agree on that, file system with a custom protocol at 
> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not 
> sure if this would work for pNFS, though.

splice() does not create new files. What you appear to be asking for
lies way outside the scope of that system call interface.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

> -Original Message-
> From: Ric Wheeler [mailto:rwhee...@redhat.com]
> Sent: Monday, September 30, 2013 10:29 AM
> To: Miklos Szeredi
> Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel
> Mailing List; Linux-Fsdevel; linux-...@vger.kernel.org; Schumaker, Bryan;
> Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
> Subject: Re: [RFC] extending splice for copy offloading
> 
> On 09/30/2013 10:24 AM, Miklos Szeredi wrote:
> > On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler 
> wrote:
> >> On 09/30/2013 10:51 AM, Miklos Szeredi wrote:
> >>> On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields
> >>> 
> >>> wrote:
> >>>>> My other worry is about interruptibility/restartability.  Ideas?
> >>>>>
> >>>>> What happens on splice(from, to, 4G) and it's a non-reflink copy?
> >>>>> Can the page cache copy be made restartable?   Or should splice() be
> >>>>> allowed to return a short count?  What happens on (non-reflink)
> >>>>> remote copies and huge request sizes?
> >>>> If I were writing an application that required copies to be
> >>>> restartable, I'd probably use the largest possible range in the
> >>>> reflink case but break the copy into smaller chunks in the splice case.
> >>>>
> >>> The app really doesn't want to care about that.  And it doesn't want
> >>> to care about restartability, etc..  It's something the *kernel* has
> >>> to care about.   You just can't have uninterruptible syscalls that
> >>> sleep for a "long" time, otherwise first you'll just have annoyed
> >>> users pressing ^C in vain; then, if the sleep is even longer,
> >>> warnings about task sleeping too long.
> >>>
> >>> One idea is letting splice() return a short count, and so the app
> >>> can safely issue SIZE_MAX requests and the kernel can decide if it
> >>> can copy the whole file in one go or if it wants to do it in smaller
> >>> chunks.
> >>>
> >> You cannot rely on a short count. That implies that an offloaded copy
> >> starts at byte 0 and the short count first bytes are all valid.
> > Huh?
> >
> > - app calls splice(from, 0, to, 0, SIZE_MAX)
> >   1) VFS calls ->direct_splice(from, 0,  to, 0, SIZE_MAX)
> >  1.a) fs reflinks the whole file in a jiffy and returns the size of the 
> > file
> >  1 b) fs does copy offload of, say, 64MB and returns 64M
> >   2) VFS does page copy of, say, 1MB and returns 1MB
> > - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset
> > ...
> >
> > The point is: the app is always doing the same (incrementing offset
> > with the return value from splice) and the kernel can decide what is
> > the best size it can service within a single uninterruptible syscall.
> >
> > Wouldn't that work?
> >
> > Thanks,
> > Miklos
> 
> No.
> 
> Keep in mind that the offload operation in (1) might fail partially. The 
> target
> file (the copy) is allocated, the question is what ranges have valid data.
> 
> I don't see that (2) is interesting or really needed to be done in the kernel.
> If nothing else, it tends to confuse the discussion
> 

Anna's figures, that were presented at Plumber's, show that (2) is still worth 
doing on the _server_ for the case of NFS.

Cheers
  Trond

RE: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

 -Original Message-
 From: Ric Wheeler [mailto:rwhee...@redhat.com]
 Sent: Monday, September 30, 2013 10:29 AM
 To: Miklos Szeredi
 Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel
 Mailing List; Linux-Fsdevel; linux-...@vger.kernel.org; Schumaker, Bryan;
 Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
 Subject: Re: [RFC] extending splice for copy offloading

 On 09/30/2013 10:24 AM, Miklos Szeredi wrote:
  On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler rwhee...@redhat.com
 wrote:
  On 09/30/2013 10:51 AM, Miklos Szeredi wrote:
  On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields
  bfie...@fieldses.org
  wrote:
  My other worry is about interruptibility/restartability.  Ideas?

  What happens on splice(from, to, 4G) and it's a non-reflink copy?
  Can the page cache copy be made restartable?   Or should splice() be
  allowed to return a short count?  What happens on (non-reflink)
  remote copies and huge request sizes?
  If I were writing an application that required copies to be
  restartable, I'd probably use the largest possible range in the
  reflink case but break the copy into smaller chunks in the splice case.

  The app really doesn't want to care about that.  And it doesn't want
  to care about restartability, etc..  It's something the *kernel* has
  to care about.   You just can't have uninterruptible syscalls that
  sleep for a long time, otherwise first you'll just have annoyed
  users pressing ^C in vain; then, if the sleep is even longer,
  warnings about task sleeping too long.

  One idea is letting splice() return a short count, and so the app
  can safely issue SIZE_MAX requests and the kernel can decide if it
  can copy the whole file in one go or if it wants to do it in smaller
  chunks.

  You cannot rely on a short count. That implies that an offloaded copy
  starts at byte 0 and the short count first bytes are all valid.
  Huh?

  - app calls splice(from, 0, to, 0, SIZE_MAX)
1) VFS calls -direct_splice(from, 0,  to, 0, SIZE_MAX)
   1.a) fs reflinks the whole file in a jiffy and returns the size of the 
  file
   1 b) fs does copy offload of, say, 64MB and returns 64M
2) VFS does page copy of, say, 1MB and returns 1MB
  - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset
  ...

  The point is: the app is always doing the same (incrementing offset
  with the return value from splice) and the kernel can decide what is
  the best size it can service within a single uninterruptible syscall.

  Wouldn't that work?

  Thanks,
  Miklos

 No.

 Keep in mind that the offload operation in (1) might fail partially. The 
 target
 file (the copy) is allocated, the question is what ranges have valid data.

 I don't see that (2) is interesting or really needed to be done in the kernel.
 If nothing else, it tends to confuse the discussion

Anna's figures, that were presented at Plumber's, show that (2) is still worth 
doing on the _server_ for the case of NFS.

Cheers
  Trond

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
 It would be nice if there would be way if the file system would get a 
 hint that the target file is supposed to be copy of another file. That 
 way distributed file systems could also create the target-file with the 
 correct meta-information (same storage targets as in-file has).
 Well, if we cannot agree on that, file system with a custom protocol at 
 least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not 
 sure if this would work for pNFS, though.

splice() does not create new files. What you appear to be asking for
lies way outside the scope of that system call interface.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
 On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
  On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
  It would be nice if there would be way if the file system would get a
  hint that the target file is supposed to be copy of another file. That
  way distributed file systems could also create the target-file with the
  correct meta-information (same storage targets as in-file has).
  Well, if we cannot agree on that, file system with a custom protocol at
  least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
  sure if this would work for pNFS, though.
 
  splice() does not create new files. What you appear to be asking for
  lies way outside the scope of that system call interface.
 
 
 Sorry I know, definitely outside the scope of splice, but in the context 
 of offloaded file copies. So the question is, what is the best way to 
 address/discuss that?

Why does it need to be addressed in the first place?

What is preventing an application from retrieving and setting this
information using standard libc functions such as fstat()+open(), and
supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
where appropriate?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
It would be nice if there would be way if the file system would get a
hint that the target file is supposed to be copy of another file. That
way distributed file systems could also create the target-file with the
correct meta-information (same storage targets as in-file has).
Well, if we cannot agree on that, file system with a custom protocol at
least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
sure if this would work for pNFS, though.

splice() does not create new files. What you appear to be asking for
lies way outside the scope of that system call interface.

Sorry I know, definitely outside the scope of splice, but in the context
of offloaded file copies. So the question is, what is the best way to
address/discuss that?

Why does it need to be addressed in the first place?

An offloaded copy is still not efficient if different storage
servers/targets used by from-file and to-file.

So?

What is preventing an application from retrieving and setting this
information using standard libc functions such as fstat()+open(), and
supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
where appropriate?

At a minimum this requires network and metadata overhead. And while I'm
working on FhGFS now, I still wonder what other file system need to do -
for example Lustre pre-allocates storage-target files on creating a
file, so file layout changes mean even more overhead there.

The problem you are describing is limited to a narrow set of storage
architectures. If copy offload using splice() doesn't make sense for
those architectures, then don't implement it for them.
You might be able to provide ioctls() to do these special hinted file
creations for those filesystems that need it, but the vast majority
don't, and you shouldn't enforce it on them.

Anyway, if we could agree on to use libattr or libacl to teach the file
system about the upcoming splice call I would be fine.

libattr and libacl are generic libraries that exist to manipulate xattrs
and acls. They do not need to contain Lustre-specific code.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote:
 On 09/30/2013 09:34 PM, Myklebust, Trond wrote:
  On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
  On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
  On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
  On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
  On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
  It would be nice if there would be way if the file system would get a
  hint that the target file is supposed to be copy of another file. That
  way distributed file systems could also create the target-file with the
  correct meta-information (same storage targets as in-file has).
  Well, if we cannot agree on that, file system with a custom protocol at
  least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
  sure if this would work for pNFS, though.
 
  splice() does not create new files. What you appear to be asking for
  lies way outside the scope of that system call interface.
 
 
  Sorry I know, definitely outside the scope of splice, but in the context
  of offloaded file copies. So the question is, what is the best way to
  address/discuss that?
 
  Why does it need to be addressed in the first place?
 
  An offloaded copy is still not efficient if different storage
  servers/targets used by from-file and to-file.
 
  So?
 
 mds1: orig-file
 oss1/target1: orig-chunk1
 
 mds1: target-file
 ossN/targetN: target-chunk1
 
 clientN: Performs the copy
 
 Ideally, orig-chunk1 and target-chunk1 are on the same server and same 
 target. Copy offload then even could done from the underlying fs, 
 similiar as local splice.
 If different ossN servers are used copies still have to be done over 
 network by these storage servers, although the client only would need to 
 initiate the copy. Still faster, but also not ideal.
 
 
 
  What is preventing an application from retrieving and setting this
  information using standard libc functions such as fstat()+open(), and
  supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
  where appropriate?
 
 
  At a minimum this requires network and metadata overhead. And while I'm
  working on FhGFS now, I still wonder what other file system need to do -
  for example Lustre pre-allocates storage-target files on creating a
  file, so file layout changes mean even more overhead there.
 
  The problem you are describing is limited to a narrow set of storage
  architectures. If copy offload using splice() doesn't make sense for
  those architectures, then don't implement it for them.
 
 But it _does_ make sense. The file system just needs a hint that a 
 splice copy is going to come up.

Just wait for the splice() system call. How is this any different from
write()?

  You might be able to provide ioctls() to do these special hinted file
  creations for those filesystems that need it, but the vast majority
  don't, and you shouldn't enforce it on them.
 
 And exactly for that we need a standard - it does not make sense if each 
 and every distributed file system implements its own 
 ioctl/libattr/libacl interface for that.
 
 
  Anyway, if we could agree on to use libattr or libacl to teach the file
  system about the upcoming splice call I would be fine.
 
  libattr and libacl are generic libraries that exist to manipulate xattrs
  and acls. They do not need to contain Lustre-specific code.
 
 
 pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
 interface? And userspace needs to address all of them differently?

 I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, 
 didn't find a better name yet), which would take in-file-path and 
 out-file-path and allow the file system to create out-file-path with the 
 same meta-layout as in-file-path. And it would need some flags, such as 
 AUTO (file system decides if it makes sense to do a local copy) and 
 FORCE (always try a local copy).

splice() is not a whole-file copy operation; it's a byte range copy. How
does the above help other than in the whole-file case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 16:08 -0400, Ric Wheeler wrote:
 On 09/30/2013 04:00 PM, Bernd Schubert wrote:
  pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
  interface? And userspace needs to address all of them differently? 
 
 The NFS and SCSI groups have each defined a standard which Zach's proposal 
 abstracts into a common user API.
 
 Distributed file systems tend to be rather unique and do not have similar 
 standard bodies, but a lot of them could hide server specific implementations 
 under the current proposed interfaces.
 
 What is not a good idea is to drag out the core, simple copy offload 
 discussion 
 for another 5 years to pull in every odd use case :)

Agreed. The whole idea of a common system call interface should be to
allow us to abstract away the underlying storage and filesystem
architectures. If filesystem developers also want a way to expose that
underlying architecture to applications in order to enable further
optimisations, then that belongs in a separate discussion.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client bugfixes

2013-09-30 Thread Myklebust, Trond


Hi Linus,

The following changes since commit 4a10c2ac2f368583138b774ca41fac4207911983:

  Linux 3.12-rc2 (2013-09-23 15:41:09 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-4

for you to fetch changes up to 367156d9a87b21b5232dd93107c5fc61b09ba2ef:

  NFS: Give flavor an initial value to fix a compile warning (2013-09-29 
16:03:34 -0400)


NFS client bugfixes for 3.12

- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root filesystem
- Fix a memory ordering issue in the pNFS files layout driver


Anna Schumaker (1):
  NFS: Give flavor an initial value to fix a compile warning

Trond Myklebust (3):
  NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem 
method
  NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
  NFSv4.1: Ensure memory ordering between nfs4_ds_connect and 
nfs4_fl_prepare_ds

Weston Andros Adamson (1):
  NFSv4.1: try SECINFO_NO_NAME flavs until one works

 fs/nfs/dir.c   |  2 +-
 fs/nfs/nfs4file.c  |  3 ++-
 fs/nfs/nfs4filelayoutdev.c | 20 +---
 fs/nfs/nfs4proc.c  | 58 +-
 include/linux/nfs_xdr.h|  3 ++-
 5 files changed, 63 insertions(+), 23 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [RFC] extending splice for copy offloading

2013-09-28 Thread Myklebust, Trond

> -Original Message-
> From: Miklos Szeredi [mailto:mik...@szeredi.hu]
> Sent: Saturday, September 28, 2013 12:50 AM
> To: Zach Brown
> Cc: J. Bruce Fields; Ric Wheeler; Anna Schumaker; Kernel Mailing List; Linux-
> Fsdevel; linux-...@vger.kernel.org; Myklebust, Trond; Schumaker, Bryan;
> Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
> Subject: Re: [RFC] extending splice for copy offloading
> 
> On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown  wrote:
> >> Also, I don't get the first option above at all.  The argument is
> >> that it's safer to have more copies?  How much safety does another
> >> copy on the same disk really give you?  Do systems that do dedup
> >> provide interfaces to turn it off per-file?
> 
> I don't see the safety argument very compelling either.  There are real
> semantic differences, however: ENOSPC on a write to a
> (apparentlíy) already allocated block.  That could be a bit unexpected.  Do we
> need a fallocate extension to deal with shared blocks?

The above has been the case for all enterprise storage arrays ever since the 
invention of snapshots. The NFSv4.2 spec does allow you to set a per-file 
attribute that causes the storage server to always preallocate enough buffers 
to guarantee that you can rewrite the entire file, however the fact that we've 
lived without it for said 20 years leads me to believe that demand for it is 
going to be limited. I haven't put it top of the list of features we care to 
implement...

Cheers,
   Trond

RE: [RFC] extending splice for copy offloading

2013-09-28 Thread Myklebust, Trond

 -Original Message-
 From: Miklos Szeredi [mailto:mik...@szeredi.hu]
 Sent: Saturday, September 28, 2013 12:50 AM
 To: Zach Brown
 Cc: J. Bruce Fields; Ric Wheeler; Anna Schumaker; Kernel Mailing List; Linux-
 Fsdevel; linux-...@vger.kernel.org; Myklebust, Trond; Schumaker, Bryan;
 Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
 Subject: Re: [RFC] extending splice for copy offloading

 On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown z...@redhat.com wrote:
  Also, I don't get the first option above at all.  The argument is
  that it's safer to have more copies?  How much safety does another
  copy on the same disk really give you?  Do systems that do dedup
  provide interfaces to turn it off per-file?

 I don't see the safety argument very compelling either.  There are real
 semantic differences, however: ENOSPC on a write to a
 (apparentlíy) already allocated block.  That could be a bit unexpected.  Do we
 need a fallocate extension to deal with shared blocks?

The above has been the case for all enterprise storage arrays ever since the 
invention of snapshots. The NFSv4.2 spec does allow you to set a per-file 
attribute that causes the storage server to always preallocate enough buffers 
to guarantee that you can rewrite the entire file, however the fact that we've 
lived without it for said 20 years leads me to believe that demand for it is 
going to be limited. I haven't put it top of the list of features we care to 
implement...

Cheers,
   Trond

RE: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and dfprintk()

2013-09-26 Thread Myklebust, Trond

> -Original Message-
> From: David Howells [mailto:dhowe...@redhat.com]
> Sent: Thursday, September 26, 2013 10:36 AM
> To: Joe Perches
> Cc: dhowe...@redhat.com; bfie...@fieldses.org; Myklebust, Trond;
> o...@lixom.net; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and
> dfprintk()
> 
> Joe Perches  wrote:
> 
> > no_printk doesn't prevent any argument side-effects from being
> > optimized away by the compiler.
> >
> > ie:
> > dprintk("%d", func())
> > func is now always called when before it wasn't.
> 
> Yes, I know.  There are half a dozen places where this is the case.  Those 
> I've
> wrapped in ifdebug(FACILITY) { ... } in the code.  It's not the nicest, but at
> least the compiler always gets to see everything, rather than bits of it 
> getting
> hidden by the preprocessor - which means the call points will be less likely 
> to
> bit rot over time.

Your assumption is that RPC_DEBUG is disabled for most compiles. That is not 
the case.

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in

2013-09-26 Thread Myklebust, Trond

> -Original Message-
> From: J. Bruce Fields [mailto:bfie...@fieldses.org]
> Sent: Thursday, September 26, 2013 10:21 AM
> To: David Howells
> Cc: Myklebust, Trond; o...@lixom.net; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in
> 
> On Thu, Sep 26, 2013 at 03:45:02PM +0100, David Howells wrote:
> >
> >
> > Here's a series of patches to make SunRPC/NFS use no_printk() to
> > implement its null dfprintk() macro (ie. when RPC_DEBUG is disabled).
> > This prevents 'unused variable' errors from occurring when a variable
> > is set only for use in debugging statements and renders RPC/NFS_IFDEBUG
> unnecessary.
> 
> Does this patch series fix any actual warnings?  Or does it just change the 
> way
> that we prevent the warnings?
> 

Right. If this is just code churn, then let's drop it. Otherwise, please 
explain why it is a good idea.

Cheers,
  Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in

2013-09-26 Thread Myklebust, Trond

 -Original Message-
 From: J. Bruce Fields [mailto:bfie...@fieldses.org]
 Sent: Thursday, September 26, 2013 10:21 AM
 To: David Howells
 Cc: Myklebust, Trond; o...@lixom.net; linux-...@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in

 On Thu, Sep 26, 2013 at 03:45:02PM +0100, David Howells wrote:

  Here's a series of patches to make SunRPC/NFS use no_printk() to
  implement its null dfprintk() macro (ie. when RPC_DEBUG is disabled).
  This prevents 'unused variable' errors from occurring when a variable
  is set only for use in debugging statements and renders RPC/NFS_IFDEBUG
 unnecessary.

 Does this patch series fix any actual warnings?  Or does it just change the 
 way
 that we prevent the warnings?

Right. If this is just code churn, then let's drop it. Otherwise, please 
explain why it is a good idea.

Cheers,
  Trond

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and dfprintk()

2013-09-26 Thread Myklebust, Trond

 -Original Message-
 From: David Howells [mailto:dhowe...@redhat.com]
 Sent: Thursday, September 26, 2013 10:36 AM
 To: Joe Perches
 Cc: dhowe...@redhat.com; bfie...@fieldses.org; Myklebust, Trond;
 o...@lixom.net; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
 Subject: Re: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and
 dfprintk()

 Joe Perches j...@perches.com wrote:

  no_printk doesn't prevent any argument side-effects from being
  optimized away by the compiler.

  ie:
  dprintk(%d, func())
  func is now always called when before it wasn't.

 Yes, I know.  There are half a dozen places where this is the case.  Those 
 I've
 wrapped in ifdebug(FACILITY) { ... } in the code.  It's not the nicest, but at
 least the compiler always gets to see everything, rather than bits of it 
 getting
 hidden by the preprocessor - which means the call points will be less likely 
 to
 bit rot over time.

Your assumption is that RPC_DEBUG is disabled for most compiles. That is not 
the case.

Trond
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull an NFS client bugfix

2013-09-21 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f:

  Linux 3.12-rc1 (2013-09-16 16:17:51 -0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-3

for you to fetch changes up to a0f6ed8ebe4f6d494ef70f67d4c0c153cbf59577:

  RPCSEC_GSS: fix crash on destroying gss auth (2013-09-18 10:18:44 -0500)


NFS client bugfix for 3.12

- Fix a regression due to incorrect sharing of gss auth caches


J. Bruce Fields (1):
  RPCSEC_GSS: fix crash on destroying gss auth

 net/sunrpc/auth_gss/auth_gss.c | 11 +++
 1 file changed, 11 insertions(+)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull an NFS client bugfix

2013-09-21 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f:

  Linux 3.12-rc1 (2013-09-16 16:17:51 -0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-3

for you to fetch changes up to a0f6ed8ebe4f6d494ef70f67d4c0c153cbf59577:

  RPCSEC_GSS: fix crash on destroying gss auth (2013-09-18 10:18:44 -0500)


NFS client bugfix for 3.12

- Fix a regression due to incorrect sharing of gss auth caches


J. Bruce Fields (1):
  RPCSEC_GSS: fix crash on destroying gss auth

 net/sunrpc/auth_gss/auth_gss.c | 11 +++
 1 file changed, 11 insertions(+)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 21:20 +0200, Geert Uytterhoeven wrote:
> On Thu, Sep 12, 2013 at 4:13 PM, Myklebust, Trond
>  wrote:
> >> > --- a/net/sunrpc/auth_gss/auth_gss.c
> >> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> >> > @@ -51,6 +51,7 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> >
> >> >  #include "../netns.h"
> >> >
> >> > @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = 
> >> > GSS_RETRY_EXPIRED;
> >> >   * using integrity (two 4-byte integers): */
> >> >  #define GSS_VERF_SLACK 100
> >> >
> >> > +static DEFINE_HASHTABLE(gss_auth_hash_table, 16);
> >> > +static DEFINE_SPINLOCK(gss_auth_hash_lock);
> >>
> >> Today's m68k/atari-defconfig kernel no longer boots, as it became larger 
> >> than
> >> 4 MiB.
> >>
> >> bloat-o-meter tells me:
> >>
> >> function old new   delta
> >> gss_auth_hash_table-  262144 +262144
> >>
> >> Woops...
> >
> > Whoops indeed. The above should have declared 16 buckets, and not 1<<16.
> > I fell for Sasha's subtle trap...
> >
> >> Are you trying to game Tim's survey? ;-)
> >> (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi)
> >>
> >> Can this memory be allocated dynamically / only when it's used?
> >
> > :-) It's declared inside a module, so that should already be the case,
> 
> Only for the modular case. What about builtin, e.g. for nfsroot?
> 
> Or is it better to not build in NFS_V4 support in that case?
> 
> config NFS_V4
>   If unsure, say Y.
> 
> config NFSD_V4
>   If unsure, say N.
> 
> So that's why my defconfig has NFS_V4 but not NFSD_V4.

It should be possible now to compile in NFSv3 support (and/or NFSv2),
while keeping NFSv4 a module. That will usually result in
CONFIG_SUNRPC_GSS=m...

Of course, if your defconfig doesn't have module support then, yes, your
only option to avoid compiling in rpcsec_gss is to not select NFSv4 at
all.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client changes (part 2)

2013-09-12 Thread Myklebust, Trond

Hi Linus,

The following changes since commit b1b3e136948a2bf4915326acb0d825d7d180753f:

  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 
-0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-2

for you to fetch changes up to 23c323af0375a7f63732bed0386aba5935b8de69:

  SUNRPC: No, I did not intend to create a 256KiB hashtable (2013-09-12 
10:16:31 -0400)


NFS client bugfixes:

- Fix a few credential reference leaks resulting from the SP4_MACH_CRED
  NFSv4.1 state protection code.
- Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the
  intended 64 byte structure.
- Fix a long standing XDR issue with FREE_STATEID
- Fix a potential WARN_ON spamming issue
- Fix a missing dprintk() kuid conversion

New features:
- Enable the NFSv4.1 state protection support for the WRITE and COMMIT
  operations.


Andy Adamson (1):
  NFSv4.1 fix decode_free_stateid

Geert Uytterhoeven (1):
  sunrpc: Add missing kuids conversion for printing

Trond Myklebust (1):
  SUNRPC: No, I did not intend to create a 256KiB hashtable

Weston Andros Adamson (4):
  NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
  NFSv4.1: fix SECINFO* use of put_rpccred
  NFSv4.1: sp4_mach_cred: no need to ref count creds
  NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE

 fs/nfs/nfs4_fs.h   | 10 +-
 fs/nfs/nfs4proc.c  | 22 ++
 fs/nfs/nfs4xdr.c   | 17 ++---
 net/sunrpc/auth_generic.c  |  2 +-
 net/sunrpc/auth_gss/auth_gss.c |  2 +-
 5 files changed, 23 insertions(+), 30 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH] sunrpc: Add missing kuids conversion for printing

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 15:09 +0200, Geert Uytterhoeven wrote:
> m68k/allmodconfig:
> 
> net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’:
> net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but
> argument 2 has type ‘kuid_t’
> 
> commit cdba321e291f0fbf5abda4d88340292b858e3d4d ("sunrpc: Convert kuids and
> kgids to uids and gids for printing") forgot to convert one instance.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---

Thanks! Applied...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 15:24 +0200, Geert Uytterhoeven wrote:
> On Mon, Sep 9, 2013 at 6:57 PM, Linux Kernel Mailing List
>  wrote:
> > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> > index 5ec15bb..dc4b449 100644
> > --- a/net/sunrpc/auth_gss/auth_gss.c
> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> > @@ -51,6 +51,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "../netns.h"
> >
> > @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = 
> > GSS_RETRY_EXPIRED;
> >   * using integrity (two 4-byte integers): */
> >  #define GSS_VERF_SLACK 100
> >
> > +static DEFINE_HASHTABLE(gss_auth_hash_table, 16);
> > +static DEFINE_SPINLOCK(gss_auth_hash_lock);
> 
> Today's m68k/atari-defconfig kernel no longer boots, as it became larger than
> 4 MiB.
> 
> bloat-o-meter tells me:
> 
> function old new   delta
> gss_auth_hash_table-  262144 +262144
> 
> Woops...

Whoops indeed. The above should have declared 16 buckets, and not 1<<16.
I fell for Sasha's subtle trap...

> Are you trying to game Tim's survey? ;-)
> (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi)
> 
> Can this memory be allocated dynamically / only when it's used?

:-) It's declared inside a module, so that should already be the case,
however I'll send in a patch to change the above to the intended:

DEFINE_HASHTABLE(gss_auth_hash_table, 4);

Thanks Geert!

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 15:24 +0200, Geert Uytterhoeven wrote:
 On Mon, Sep 9, 2013 at 6:57 PM, Linux Kernel Mailing List
 linux-kernel@vger.kernel.org wrote:
  diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
  index 5ec15bb..dc4b449 100644
  --- a/net/sunrpc/auth_gss/auth_gss.c
  +++ b/net/sunrpc/auth_gss/auth_gss.c
  @@ -51,6 +51,7 @@
   #include linux/sunrpc/rpc_pipe_fs.h
   #include linux/sunrpc/gss_api.h
   #include asm/uaccess.h
  +#include linux/hashtable.h
 
   #include ../netns.h
 
  @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = 
  GSS_RETRY_EXPIRED;
* using integrity (two 4-byte integers): */
   #define GSS_VERF_SLACK 100
 
  +static DEFINE_HASHTABLE(gss_auth_hash_table, 16);
  +static DEFINE_SPINLOCK(gss_auth_hash_lock);
 
 Today's m68k/atari-defconfig kernel no longer boots, as it became larger than
 4 MiB.
 
 bloat-o-meter tells me:
 
 function old new   delta
 gss_auth_hash_table-  262144 +262144
 
 Woops...

Whoops indeed. The above should have declared 16 buckets, and not 116.
I fell for Sasha's subtle trap...

 Are you trying to game Tim's survey? ;-)
 (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi)
 
 Can this memory be allocated dynamically / only when it's used?

:-) It's declared inside a module, so that should already be the case,
however I'll send in a patch to change the above to the intended:

DEFINE_HASHTABLE(gss_auth_hash_table, 4);

Thanks Geert!

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] sunrpc: Add missing kuids conversion for printing

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 15:09 +0200, Geert Uytterhoeven wrote:
 m68k/allmodconfig:
 
 net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’:
 net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but
 argument 2 has type ‘kuid_t’
 
 commit cdba321e291f0fbf5abda4d88340292b858e3d4d (sunrpc: Convert kuids and
 kgids to uids and gids for printing) forgot to convert one instance.
 
 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
 ---

Thanks! Applied...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

[GIT PULL] Please pull NFS client changes (part 2)

2013-09-12 Thread Myklebust, Trond

Hi Linus,

The following changes since commit b1b3e136948a2bf4915326acb0d825d7d180753f:

  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 
-0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-2

for you to fetch changes up to 23c323af0375a7f63732bed0386aba5935b8de69:

  SUNRPC: No, I did not intend to create a 256KiB hashtable (2013-09-12 
10:16:31 -0400)


NFS client bugfixes:

- Fix a few credential reference leaks resulting from the SP4_MACH_CRED
  NFSv4.1 state protection code.
- Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the
  intended 64 byte structure.
- Fix a long standing XDR issue with FREE_STATEID
- Fix a potential WARN_ON spamming issue
- Fix a missing dprintk() kuid conversion

New features:
- Enable the NFSv4.1 state protection support for the WRITE and COMMIT
  operations.


Andy Adamson (1):
  NFSv4.1 fix decode_free_stateid

Geert Uytterhoeven (1):
  sunrpc: Add missing kuids conversion for printing

Trond Myklebust (1):
  SUNRPC: No, I did not intend to create a 256KiB hashtable

Weston Andros Adamson (4):
  NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
  NFSv4.1: fix SECINFO* use of put_rpccred
  NFSv4.1: sp4_mach_cred: no need to ref count creds
  NFSv4.1: sp4_mach_cred: WARN_ON - WARN_ON_ONCE

 fs/nfs/nfs4_fs.h   | 10 +-
 fs/nfs/nfs4proc.c  | 22 ++
 fs/nfs/nfs4xdr.c   | 17 ++---
 net/sunrpc/auth_generic.c  |  2 +-
 net/sunrpc/auth_gss/auth_gss.c |  2 +-
 5 files changed, 23 insertions(+), 30 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 21:20 +0200, Geert Uytterhoeven wrote:
 On Thu, Sep 12, 2013 at 4:13 PM, Myklebust, Trond
 trond.mykleb...@netapp.com wrote:
   --- a/net/sunrpc/auth_gss/auth_gss.c
   +++ b/net/sunrpc/auth_gss/auth_gss.c
   @@ -51,6 +51,7 @@
#include linux/sunrpc/rpc_pipe_fs.h
#include linux/sunrpc/gss_api.h
#include asm/uaccess.h
   +#include linux/hashtable.h
  
#include ../netns.h
  
   @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = 
   GSS_RETRY_EXPIRED;
 * using integrity (two 4-byte integers): */
#define GSS_VERF_SLACK 100
  
   +static DEFINE_HASHTABLE(gss_auth_hash_table, 16);
   +static DEFINE_SPINLOCK(gss_auth_hash_lock);
 
  Today's m68k/atari-defconfig kernel no longer boots, as it became larger 
  than
  4 MiB.
 
  bloat-o-meter tells me:
 
  function old new   delta
  gss_auth_hash_table-  262144 +262144
 
  Woops...
 
  Whoops indeed. The above should have declared 16 buckets, and not 116.
  I fell for Sasha's subtle trap...
 
  Are you trying to game Tim's survey? ;-)
  (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi)
 
  Can this memory be allocated dynamically / only when it's used?
 
  :-) It's declared inside a module, so that should already be the case,
 
 Only for the modular case. What about builtin, e.g. for nfsroot?
 
 Or is it better to not build in NFS_V4 support in that case?
 
 config NFS_V4
   If unsure, say Y.
 
 config NFSD_V4
   If unsure, say N.
 
 So that's why my defconfig has NFS_V4 but not NFSD_V4.

It should be possible now to compile in NFSv3 support (and/or NFSv2),
while keeping NFSv4 a module. That will usually result in
CONFIG_SUNRPC_GSS=m...

Of course, if your defconfig doesn't have module support then, yes, your
only option to avoid compiling in rpcsec_gss is to not select NFSv4 at
all.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates for 3.12

2013-09-09 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 7c6d4dca777d6423cb9ccdc019cad94c75adcbe4:

  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha (2013-07-23 
14:39:57 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-1

for you to fetch changes up to b1b3e136948a2bf4915326acb0d825d7d180753f:

  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 
-0400)


NFS client updates for Linux 3.12

Highlights include:

- Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
  lease loss due to a network partition, where doing so may result in data
  corruption. Add a kernel parameter to control choice of legacy behaviour
  or not.
- Performance improvements when 2 processes are writing to the same file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
  NFS clients from being able to manipulate our lease and file lockingr
  state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients
- Fix the broken NFSv4 security auto-negotiation between client and server
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration support.


Andy Adamson (10):
  NFSv4.1 Use the mount point rpc_clnt for layoutreturn
  NFS Remove unused authflavour parameter from init_client
  NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE
  NFSv4.1 Use clientid management rpc_clnt for secinfo
  NFSv4.1 Use clientid management rpc_clnt for secinfo_no_name
  SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult
  SUNRPC new rpc_credops to test credential expiry
  NFS avoid expired credential keys for buffered writes
  SUNRPC refactor rpcauth_checkverf error returns
  NFSv4.1 Use MDS auth flavor for data server connection

Chuck Lever (20):
  NFS: Fix return type of nfs4_end_drain_session() stub
  NFS: Use root's credential for lease management when keytab is missing
  NFS: Never use user credentials for lease renewal
  NFS: When displaying session slot numbers, use "%u" consistently
  NFS: Rename nfs41_call_sync_data as a common data structure
  NFS: Clean up nfs4_setup_sequence()
  NFS: Common versions of sequence helper functions
  NFS: Add RPC callouts to start NFSv4.0 synchronous requests
  NFS: Remove unused call_sync minor version op
  NFS: Enable slot table helpers for NFSv4.0
  NFS: Add global helper to set up a stand-along nfs4_slot_table
  NFS: Add global helper for releasing slot table resources
  NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking
  NFS: NFSv4.0 transport blocking
  NFS: Enable nfs4_setup_sequence() for DELEGRETURN
  NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER
  NFS: Add nfs4_sequence calls for OPEN_CONFIRM
  NFS: Update session draining barriers for NFSv4.0 transport blocking
  When CONFIG_NFS_V4_1 is not enabled, "make C=2" emits this warning:
  NFS: Fix warning introduced by NFSv4.0 transport blocking patches

Jeff Layton (2):
  rpc_pipe: convert back to simple_dir_inode_operations
  nfs: verify open flags before allowing an atomic open

Nadav Shemer (1):
  nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0

NeilBrown (2):
  NFS: remove incorrect "Lock reclaim failed!" warning.
  NFSv4: Don't try to recover NFSv4 locks when they are lost.

Trond Myklebust (63):
  NFSv4: encode_attrs should not backfill the bitmap and attribute length
  NFSv4: Fix nfs4_init_uniform_client_string for net namespaces
  NFSv4: Refuse mount attempts with proto=udp
  NFS: Remove the NFSv4 "open optimisation" from nfs_permission
  NFSv3: Deal with a sparse warning in nfs3_proc_create
  NFSv4: Deal with a sparse warning in nfs4_opendata_alloc
  NFSv4: Deal with some more sparse warnings
  NFSv4: Deal with a sparse warning in nfs_idmap_get_key()
  NFSv4: Fix an incorrect pointer declaration in 
decode_first_pnfs_layout_type
  NFS: Clean up nfs_sillyrename()
  NFS: refactor code for calculating the crc32 hash of a filehandle
  NFS: Add event tracing for generic NFS events
  NFS: Pass in lookup flags from nfs_atomic_open to nfs_lookup
  NFS: Add event tracing for generic NFS lookups
  NFS: Add tracepoints for debugging generic file create events
  NFS: Add tracepoints for debugging directory changes
  NFS: Add tracepoints for debugging NFS rename

[GIT PULL] Please pull NFS client updates for 3.12

2013-09-09 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 7c6d4dca777d6423cb9ccdc019cad94c75adcbe4:

  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha (2013-07-23 
14:39:57 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-1

for you to fetch changes up to b1b3e136948a2bf4915326acb0d825d7d180753f:

  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 
-0400)


NFS client updates for Linux 3.12

Highlights include:

- Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
  lease loss due to a network partition, where doing so may result in data
  corruption. Add a kernel parameter to control choice of legacy behaviour
  or not.
- Performance improvements when 2 processes are writing to the same file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
  NFS clients from being able to manipulate our lease and file lockingr
  state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients
- Fix the broken NFSv4 security auto-negotiation between client and server
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration support.


Andy Adamson (10):
  NFSv4.1 Use the mount point rpc_clnt for layoutreturn
  NFS Remove unused authflavour parameter from init_client
  NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE
  NFSv4.1 Use clientid management rpc_clnt for secinfo
  NFSv4.1 Use clientid management rpc_clnt for secinfo_no_name
  SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult
  SUNRPC new rpc_credops to test credential expiry
  NFS avoid expired credential keys for buffered writes
  SUNRPC refactor rpcauth_checkverf error returns
  NFSv4.1 Use MDS auth flavor for data server connection

Chuck Lever (20):
  NFS: Fix return type of nfs4_end_drain_session() stub
  NFS: Use root's credential for lease management when keytab is missing
  NFS: Never use user credentials for lease renewal
  NFS: When displaying session slot numbers, use %u consistently
  NFS: Rename nfs41_call_sync_data as a common data structure
  NFS: Clean up nfs4_setup_sequence()
  NFS: Common versions of sequence helper functions
  NFS: Add RPC callouts to start NFSv4.0 synchronous requests
  NFS: Remove unused call_sync minor version op
  NFS: Enable slot table helpers for NFSv4.0
  NFS: Add global helper to set up a stand-along nfs4_slot_table
  NFS: Add global helper for releasing slot table resources
  NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking
  NFS: NFSv4.0 transport blocking
  NFS: Enable nfs4_setup_sequence() for DELEGRETURN
  NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER
  NFS: Add nfs4_sequence calls for OPEN_CONFIRM
  NFS: Update session draining barriers for NFSv4.0 transport blocking
  When CONFIG_NFS_V4_1 is not enabled, make C=2 emits this warning:
  NFS: Fix warning introduced by NFSv4.0 transport blocking patches

Jeff Layton (2):
  rpc_pipe: convert back to simple_dir_inode_operations
  nfs: verify open flags before allowing an atomic open

Nadav Shemer (1):
  nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0

NeilBrown (2):
  NFS: remove incorrect Lock reclaim failed! warning.
  NFSv4: Don't try to recover NFSv4 locks when they are lost.

Trond Myklebust (63):
  NFSv4: encode_attrs should not backfill the bitmap and attribute length
  NFSv4: Fix nfs4_init_uniform_client_string for net namespaces
  NFSv4: Refuse mount attempts with proto=udp
  NFS: Remove the NFSv4 open optimisation from nfs_permission
  NFSv3: Deal with a sparse warning in nfs3_proc_create
  NFSv4: Deal with a sparse warning in nfs4_opendata_alloc
  NFSv4: Deal with some more sparse warnings
  NFSv4: Deal with a sparse warning in nfs_idmap_get_key()
  NFSv4: Fix an incorrect pointer declaration in 
decode_first_pnfs_layout_type
  NFS: Clean up nfs_sillyrename()
  NFS: refactor code for calculating the crc32 hash of a filehandle
  NFS: Add event tracing for generic NFS events
  NFS: Pass in lookup flags from nfs_atomic_open to nfs_lookup
  NFS: Add event tracing for generic NFS lookups
  NFS: Add tracepoints for debugging generic file create events
  NFS: Add tracepoints for debugging directory changes
  NFS: Add tracepoints for debugging NFS rename and

[GIT PULL] Please pull one NFS client bugfix

2013-08-29 Thread Myklebust, Trond

Hi Linus,

The following changes since commit fa8218def1b1a16f0a410e2c1c767b4738cc81fa:

  Merge tag 'regmap-v3.11-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap (2013-08-27 
10:10:30 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-5

for you to fetch changes up to 347e2233b7667e336d9f671f1a52dfa3f0416e2c:

  SUNRPC: Fix memory corruption issue on 32-bit highmem systems (2013-08-28 
15:43:43 -0400)


NFS client bugfix for 3.11

- Stable patch to fix a highmem-related data corruption issue on 32-bit
  ARM platforms


Trond Myklebust (1):
  SUNRPC: Fix memory corruption issue on 32-bit highmem systems

 net/sunrpc/xdr.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull one NFS client bugfix

2013-08-29 Thread Myklebust, Trond

Hi Linus,

The following changes since commit fa8218def1b1a16f0a410e2c1c767b4738cc81fa:

  Merge tag 'regmap-v3.11-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap (2013-08-27 
10:10:30 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-5

for you to fetch changes up to 347e2233b7667e336d9f671f1a52dfa3f0416e2c:

  SUNRPC: Fix memory corruption issue on 32-bit highmem systems (2013-08-28 
15:43:43 -0400)


NFS client bugfix for 3.11

- Stable patch to fix a highmem-related data corruption issue on 32-bit
  ARM platforms


Trond Myklebust (1):
  SUNRPC: Fix memory corruption issue on 32-bit highmem systems

 net/sunrpc/xdr.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client bug fixes

2013-08-09 Thread Myklebust, Trond

Hi Linus,

The following changes since commit c095ba7224d8edc71dcef0d655911399a8bd4a3f:

  Linux 3.11-rc4 (2013-08-04 13:46:46 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-4

for you to fetch changes up to b72888cb0ba63b2dfc6c8d3cd78a7fea584bebc6:

  NFSv4: Fix up nfs4_proc_lookup_mountpoint (2013-08-07 20:47:26 -0400)


NFS client bugfixes for 3.11

- Stable patch for lockd to fix Oopses due to inappropriate calls to
  utsname()->nodename
- Stable patches for sunrpc to fix Oopses on shutdown when using
  AF_LOCAL sockets with rpcbind
- Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint
- Fix a regression with the sync mount option failing to work for nfs4 mounts
- Fix a writeback performance issue when doing cache invalidation
- Remove an incorrect call to nfs_setsecurity in nfs_fhget


Scott Mayhew (1):
  NFSv4: Fix the sync mount option for nfs4 mounts

Trond Myklebust (6):
  LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs
  SUNRPC: Don't auto-disconnect from the local rpcbind socket
  SUNRPC: If the rpcbind channel is disconnected, fail the call to 
unregister
  NFS: Fix writeback performance issue on cache invalidation
  NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
  NFSv4: Fix up nfs4_proc_lookup_mountpoint

 fs/lockd/clntlock.c  | 13 
 fs/lockd/clntproc.c  |  5 +++--
 fs/nfs/inode.c   | 11 +++---
 fs/nfs/nfs4proc.c|  8 +++-
 fs/nfs/super.c   |  4 
 include/linux/sunrpc/sched.h |  1 +
 net/sunrpc/clnt.c|  4 
 net/sunrpc/netns.h   |  1 +
 net/sunrpc/rpcb_clnt.c   | 48 
 9 files changed, 68 insertions(+), 27 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client bug fixes

2013-08-09 Thread Myklebust, Trond

Hi Linus,

The following changes since commit c095ba7224d8edc71dcef0d655911399a8bd4a3f:

  Linux 3.11-rc4 (2013-08-04 13:46:46 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-4

for you to fetch changes up to b72888cb0ba63b2dfc6c8d3cd78a7fea584bebc6:

  NFSv4: Fix up nfs4_proc_lookup_mountpoint (2013-08-07 20:47:26 -0400)


NFS client bugfixes for 3.11

- Stable patch for lockd to fix Oopses due to inappropriate calls to
  utsname()-nodename
- Stable patches for sunrpc to fix Oopses on shutdown when using
  AF_LOCAL sockets with rpcbind
- Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint
- Fix a regression with the sync mount option failing to work for nfs4 mounts
- Fix a writeback performance issue when doing cache invalidation
- Remove an incorrect call to nfs_setsecurity in nfs_fhget


Scott Mayhew (1):
  NFSv4: Fix the sync mount option for nfs4 mounts

Trond Myklebust (6):
  LOCKD: Don't call utsname()-nodename from nlmclnt_setlockargs
  SUNRPC: Don't auto-disconnect from the local rpcbind socket
  SUNRPC: If the rpcbind channel is disconnected, fail the call to 
unregister
  NFS: Fix writeback performance issue on cache invalidation
  NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
  NFSv4: Fix up nfs4_proc_lookup_mountpoint

 fs/lockd/clntlock.c  | 13 
 fs/lockd/clntproc.c  |  5 +++--
 fs/nfs/inode.c   | 11 +++---
 fs/nfs/nfs4proc.c|  8 +++-
 fs/nfs/super.c   |  4 
 include/linux/sunrpc/sched.h |  1 +
 net/sunrpc/clnt.c|  4 
 net/sunrpc/netns.h   |  1 +
 net/sunrpc/rpcb_clnt.c   | 48 
 9 files changed, 68 insertions(+), 27 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond

On Wed, 2013-08-07 at 22:01 +0100, Nix wrote:
> On 7 Aug 2013, Trond Myklebust said:
> 
> > On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> >> On 6 Aug 2013, Trond Myklebust verbalised:
> >> > True. How about something like the following instead. Note the change to
> >> > the original patch...
> >> 
> >> Well, with those applied I could reboot without a panic for the first
> >> time since 3.8.x: looking good. I'll give it a reboot or two with a
> >> system that's not hot from booting though.
> >
> > Could you please also try applying only the 1/2 patch, to see if that
> > suffices to quell the shutdown panic?
> 
> It doesn't suffice. I see this severely truncated oops:
> 
> [  115.799092] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [  115.800284] IP: [] path_init+0x11c/0x36f
> [  115.801463] PGD 0 
> [  115.802625] Oops:  [#1] PREEMPT SMP 
> [  115.803805] Modules linked in: [last unloaded: microcode] 
> [  115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted 
> 3.10.5-05317-g3c9f6fa-dirty #2
> [  115.806207] Hardware name: System manufacturer System Product 
> Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> [  115.807453] task: 8804189a ti: 8803f74d6000 task.ti: 
> 8803f74d6000
> 
OK. Then I'll mark them both for stable inclusion in 3.9+.

Thanks for testing!
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond

On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> On 6 Aug 2013, Trond Myklebust verbalised:
> > True. How about something like the following instead. Note the change to
> > the original patch...
> 
> Well, with those applied I could reboot without a panic for the first
> time since 3.8.x: looking good. I'll give it a reboot or two with a
> system that's not hot from booting though.
> 

Could you please also try applying only the 1/2 patch, to see if that
suffices to quell the shutdown panic?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond

On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
 On 6 Aug 2013, Trond Myklebust verbalised:
  True. How about something like the following instead. Note the change to
  the original patch...
 
 Well, with those applied I could reboot without a panic for the first
 time since 3.8.x: looking good. I'll give it a reboot or two with a
 system that's not hot from booting though.
 

Could you please also try applying only the 1/2 patch, to see if that
suffices to quell the shutdown panic?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond

On Wed, 2013-08-07 at 22:01 +0100, Nix wrote:
 On 7 Aug 2013, Trond Myklebust said:
 
  On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
  On 6 Aug 2013, Trond Myklebust verbalised:
   True. How about something like the following instead. Note the change to
   the original patch...
  
  Well, with those applied I could reboot without a panic for the first
  time since 3.8.x: looking good. I'll give it a reboot or two with a
  system that's not hot from booting though.
 
  Could you please also try applying only the 1/2 patch, to see if that
  suffices to quell the shutdown panic?
 
 It doesn't suffice. I see this severely truncated oops:
 
 [  115.799092] BUG: unable to handle kernel NULL pointer dereference at 
 0008
 [  115.800284] IP: [81165ec6] path_init+0x11c/0x36f
 [  115.801463] PGD 0 
 [  115.802625] Oops:  [#1] PREEMPT SMP 
 [  115.803805] Modules linked in: [last unloaded: microcode] 
 [  115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted 
 3.10.5-05317-g3c9f6fa-dirty #2
 [  115.806207] Hardware name: System manufacturer System Product 
 Name/P8H61-MX USB3, BIOS 0506 08/10/2012
 [  115.807453] task: 8804189a ti: 8803f74d6000 task.ti: 
 8803f74d6000
 
OK. Then I'll mark them both for stable inclusion in 3.9+.

Thanks for testing!
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 18:18:03 +
> "Myklebust, Trond"  wrote:
> 
> > On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > > On Mon, 5 Aug 2013 16:15:01 +
> > > "Myklebust, Trond"  wrote:
> > > 
> > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > > From: Trond Myklebust 
> > > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > > >  nlmclnt_setlockargs
> > > > MIME-Version: 1.0
> > > > Content-Type: text/plain; charset=UTF-8
> > > > Content-Transfer-Encoding: 8bit
> > > > 
> > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > > which case we're in entirely the wrong namespace.
> > > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > > exit_task_namespaces() outside of exit_notify()) now means that
> > > > exit_task_work() is called after exit_task_namespaces(), which
> > > > triggers an Oops when we're freeing up the locks.
> > > > 
> > > > Signed-off-by: Trond Myklebust 
> > > > Cc: Toralf Förster 
> > > > Cc: Oleg Nesterov 
> > > > Cc: Nix 
> > > > Cc: Jeff Layton 
> > > > ---
> > > >  fs/lockd/clntproc.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > > index 9760ecb..acd3947 100644
> > > > --- a/fs/lockd/clntproc.c
> > > > +++ b/fs/lockd/clntproc.c
> > > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst 
> > > > *req, struct file_lock *fl)
> > > >  {
> > > > struct nlm_args *argp = >a_args;
> > > > struct nlm_lock *lock = >lock;
> > > > +   char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > > >  
> > > > nlmclnt_next_cookie(>cookie);
> > > > memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), 
> > > > sizeof(struct nfs_fh));
> > > > -   lock->caller  = utsname()->nodename;
> > > > +   lock->caller  = nodename;
> > > > lock->oh.data = req->a_owner;
> > > > lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), 
> > > > "%u@%s",
> > > > (unsigned 
> > > > int)fl->fl_u.nfs_fl.owner->pid,
> > > > -   utsname()->nodename);
> > > > +   nodename);
> > > > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > > > lock->fl.fl_start = fl->fl_start;
> > > > lock->fl.fl_end = fl->fl_end;
> > > 
> > > Looks good to me...
> > > 
> > > Reviewed-by: Jeff Layton 
> > > 
> > > Trond, any thoughts on the other oops that Nix posted? The issue there
> > > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > > socket from exit_task_work(), but that's happening after we've already
> > > called exit_fs().
> > > 
> > > The trivial answer seems to be to simply call exit_task_work() before
> > > exit_fs() there, but it seems like we ought to be doing the upcall to
> > > rpcbind in a mount namespace from which we know we can reach the
> > > socket...
> > 
> > Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> > set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
> > 
> > See attachment.
> 
> Yeah, that looks like a reasonable thing to do...
> 
> OTOH, Is there any other way for a unix socket to end up disconnected
> other than if we were to close it? Maybe if rpcbind stopped, the socket
> unlinked and recreated and then started again?
> 
> If so then you still could potentially end up in this situation even if
> you didn't autoclose it.

True. How about something like the following instead. Note the change to
the original patch...
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 00326ed6442c66021cd4b5e19e80f3e2027d5d42 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH v2 1/2] SUNRPC: Don't auto-disconnect from the local rpcbind
 socket

There is no nee

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 19:33 +0100, Nix wrote:
> On 5 Aug 2013, Trond Myklebust told this:
> > Does the attached patch fix the problem?
> 
> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust 
> > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> >  nlmclnt_setlockargs
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> 
> It makes it worse. Much, much worse. From a crash every so often when
> I'm doing compilations over NFS, I get an immediate panic on startx,
> long long before I even try to replicate the earlier panic:
> 
> [   83.432358] task: 88041aaa5ac0 ti: 8804199e2000 task.ti: 
> 8804199e2000
> [   83.432428] RIP: 0010:[] [] 
> encode_nlm4_lock+0x26/0xbe
> [   83.432512] RSP: 0018:8804199e3a78  EFLAGS: 00010286
> [   83.432564] RAX:  RBX: 88041a577038 RCX: 
> 
> [   83.432630] RDX: 8804193b3098 RSI: 88041a577038 RDI: 
> 008c
> [   83.432697] RBP: 8804199e3aa8 R08: 8804193b3098 R09: 
> 0001
> [   83.432763] R10: 88042fa12980 R11: 88042fa12980 R12: 
> 8804199e3ae8
> [   83.432830] R13: 008c R14: 8804199e3fd8 R15: 
> 815de80e
> [   83.432898] FS:  7f594b40c740() GS:88042fa0() 
> knlGS:
> [   83.432974] CS:  0010 DS:  ES:  CR0: 80050033
> [   83.433028] CR2: 008c CR3: 00041ab3d000 CR4: 
> 001407f0
> [   83.433095] DR0:  DR1:  DR2: 
> 
> [   83.433176] DR3:  DR6: 0ff0 DR7: 
> 0400
> [   83.433255] Stack:
> [   83.433276]  88041a44fb70 88040004 8804199e3ae8 
> 88041a577010 
> [   83.433360]  8804188e0e00 8804199e3fd8 8804199e3ac8 
> 8124b0d7 
> [   83.433443]  8804188e0e00 8124b086 8804199e3b38 
> 815e6032 
> [   83.433616] Call Trace:
> [   83.433646]  [] nlm4_xdr_enc_lockargs+0x51/0x76
> [   83.433707]  [] ? nlm4_xdr_enc_cancargs+0x56/0x56
> [   83.433769]  [] rpcauth_wrap_req+0x57/0x62
> [   83.433826]  [] call_transmit+0x17c/0x1f9
> [   83.433880]  [] __rpc_execute+0xe8/0x2ca
> [   83.433935]  [] rpc_execute+0x76/0x9d
> [   83.433986]  [] rpc_run_task+0x78/0x80
> [   83.434039]  [] rpc_call_sync+0x88/0x9e
> [   83.434092]  [] nlmclnt_call+0xb5/0x240
> [   83.434146]  [] nlmclnt_proc+0x226/0x5fb
> [   83.434226]  [] nfs3_proc_lock+0x21/0x23
> [   83.434280]  [] do_setlk+0x65/0xee
> [   83.434329]  [] nfs_lock+0x14e/0x162
> [   83.434382]  [] vfs_lock_file+0x29/0x35
> [   83.434435]  [] fcntl_setlk+0x139/0x2c5
> [   83.434490]  [] SyS_fcntl+0x2b6/0x47d
> [   83.434543]  [] system_call_fastpath+0x16/0x1b
> [   83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 
> e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef  
> ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00 
> [   83.435077] RIP [] encode_nlm4_lock+0x26/0xbe
> [   83.435140]  RSP 
> [   83.435197] CR2: 008c
> 
> That's here:
> 
> (gdb) list *(encode_nlm4_lock+0x26)
> 0x8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329).
> 324  *  string caller_name;
> 325  */
> 326 static void encode_caller_name(struct xdr_stream *xdr, const char 
> *name)
> 327 {
> 328 /* NB: client-side does not set lock->len */
> 329 u32 length = strlen(name);
> 330 __be32 *p;
> 331
> 332 p = xdr_reserve_space(xdr, 4 + length);
> 333 xdr_encode_opaque(p, name, length);
> 
>0x8124af69 <+38>:repnz scas %es:(%rdi),%al
> 
> Pretty clearly, "name" can be NULL after this patch...
> 
Yes. This scheme will only work if we make sure that host->h_rpcclnt is
initialised at mount time. Here is a v2 patch that should do the right
thing.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 9a1b6bf818e74bb7aabaecb59492b739f2f4d742 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 12:06:12 -0400
Subject: [PATCH v2] LOCKD: Don't call utsname()->nodename from
 nlmclnt_setlockargs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
which case we're in entirely the wrong namespace.

Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
exit_task_namespaces() outside of exit_notify()) now means that
exit_task_work() is called after exit_task_namespaces(), which
triggers an Oops when we're freeing up the locks.

Fix this by ensuring that we initialise the nlm_host's rpc_client at mount
time, so that the cl_nodename field is initialised to the value of
utsname()->nodename that the net namespace uses. Then replace the
lockd callers of

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 16:15:01 +
> "Myklebust, Trond"  wrote:
> 
> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust 
> > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> >  nlmclnt_setlockargs
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > which case we're in entirely the wrong namespace.
> > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > exit_task_namespaces() outside of exit_notify()) now means that
> > exit_task_work() is called after exit_task_namespaces(), which
> > triggers an Oops when we're freeing up the locks.
> > 
> > Signed-off-by: Trond Myklebust 
> > Cc: Toralf Förster 
> > Cc: Oleg Nesterov 
> > Cc: Nix 
> > Cc: Jeff Layton 
> > ---
> >  fs/lockd/clntproc.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > index 9760ecb..acd3947 100644
> > --- a/fs/lockd/clntproc.c
> > +++ b/fs/lockd/clntproc.c
> > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, 
> > struct file_lock *fl)
> >  {
> > struct nlm_args *argp = >a_args;
> > struct nlm_lock *lock = >lock;
> > +   char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> >  
> > nlmclnt_next_cookie(>cookie);
> > memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct 
> > nfs_fh));
> > -   lock->caller  = utsname()->nodename;
> > +   lock->caller  = nodename;
> > lock->oh.data = req->a_owner;
> > lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > -   utsname()->nodename);
> > +   nodename);
> > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > lock->fl.fl_start = fl->fl_start;
> > lock->fl.fl_end = fl->fl_end;
> 
> Looks good to me...
> 
> Reviewed-by: Jeff Layton 
> 
> Trond, any thoughts on the other oops that Nix posted? The issue there
> seems to be that we're trying to do the pathwalk to the rpcbind unix
> socket from exit_task_work(), but that's happening after we've already
> called exit_fs().
> 
> The trivial answer seems to be to simply call exit_task_work() before
> exit_fs() there, but it seems like we ought to be doing the upcall to
> rpcbind in a mount namespace from which we know we can reach the
> socket...

Isn't it enough to just do the same thing as we did for gss proxy? i.e.
set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.

See attachment.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From ab56d77893815b1b9f0aaa7a89cee7c832a31cff Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH] SUNRPC: Don't auto-disconnect from the local rpcbind socket

There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix 
Signed-off-by: Trond Myklebust 
Cc: Jeff Layton 
Cc: sta...@vger.kernel.org # 3.9.x
---
 net/sunrpc/rpcb_clnt.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 3df764d..4b00555 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -238,6 +238,15 @@ static int rpcb_create_local_unix(struct net *net)
 		.program	= _program,
 		.version	= RPCBVERS_2,
 		.authflavor	= RPC_AUTH_NULL,
+		/*
+		 * We turn off the idle timeout to prevent the kernel
+		 * from automatically disconnecting the socket.
+		 * Otherwise, we'd have to cache the mount namespace
+		 * of the caller and somehow pass that to the socket
+		 * reconnect code.
+		 */
+		.flags		= RPC_CLNT_CREATE_NOPING |
+  RPC_CLNT_CREATE_NO_IDLE_TIMEOUT,
 	};
 	struct rpc_clnt *clnt, *clnt4;
 	int result = 0;
-- 
1.8.3.1

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 16:50 +0100, Nix wrote:
> On 5 Aug 2013, Jeff Layton said:
> 
> > On Mon, 5 Aug 2013 11:04:27 -0400
> > Jeff Layton  wrote:
> >
> >> On Mon, 05 Aug 2013 15:48:01 +0100
> >> Nix  wrote:
> >> 
> >> > On 5 Aug 2013, Jeff Layton stated:
> >> > 
> >> > > On Sun, 04 Aug 2013 16:40:58 +0100
> >> > > Nix  wrote:
> >> > >
> >> > >> I just got this panic on 3.10.4, in the middle of a large parallel
> >> > >> compilation (of Chromium, as it happens) over NFSv3:
> >> > >> 
> >> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference 
> >> > >> at 0008
> >> > >> [16364.527571] IP: [] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527611] PGD 0 
> >> > >> [16364.527626] Oops:  [#1] PREEMPT SMP 
> >> > >> [16364.527656] Modules linked in: [last unloaded: microcode] 
> >> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 
> >> > >> 3.10.4-05315-gf4ce424-dirty #1
> >> > >> [16364.527730] Hardware name: System manufacturer System Product 
> >> > >> Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> >> > >> [16364.527775] task: 88041a97ad60 ti: 8803501d4000 task.ti: 
> >> > >> 8803501d4000
> >> > >> [16364.527813] RIP: 0010:[] [] 
> >> > >> nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527860] RSP: 0018:8803501d5c58  EFLAGS: 00010282
> >> > >> [16364.527889] RAX: 88041a97ad60 RBX: 8803e49c8800 RCX: 
> >> > >> 
> >> > >> [16364.527926] RDX:  RSI: 004a RDI: 
> >> > >> 8803e49c8b54
> >> > >> [16364.527962] RBP: 8803501d5c68 R08: 00015720 R09: 
> >> > >> 
> >> > >> [16364.527998] R10: 7000 R11: 8803501d5d58 R12: 
> >> > >> 8803501d5d58
> >> > >> [16364.528034] R13: 88041bd2bc00 R14:  R15: 
> >> > >> 8803fc9e2900
> >> > >> [16364.528070] FS:  () GS:88042fa0() 
> >> > >> knlGS:
> >> > >> [16364.528111] CS:  0010 DS:  ES:  CR0: 80050033
> >> > >> [16364.528142] CR2: 0008 CR3: 01c0b000 CR4: 
> >> > >> 001407f0
> >> > >> [16364.528177] DR0:  DR1:  DR2: 
> >> > >> 
> >> > >> [16364.528214] DR3:  DR6: 0ff0 DR7: 
> >> > >> 0400
> >> > >> [16364.528303] Stack:
> >> > >> [16364.528316]  8803501d5d58 8803e49c8800 8803501d5cd8 
> >> > >> 81245418 
> >> > >> [16364.528369]   8803516f0bc0 8803d7b7b6c0 
> >> > >> 81215c81 
> >> > >> [16364.528418]  88030007 88041bd2bdc8 8801aabe9650 
> >> > >> 8803fc9e2900 
> >> > >> [16364.528467] Call Trace:
> >> > >> [16364.528485]  [] nlmclnt_proc+0x148/0x5fb
> >> > >> [16364.528516]  [] ? nfs_put_lock_context+0x69/0x6e
> >> > >> [16364.528550]  [] nfs3_proc_lock+0x21/0x23
> >> > >> [16364.528581]  [] do_unlk+0x96/0xb2
> >> > >> [16364.528608]  [] nfs_flock+0x5a/0x71
> >> > >> [16364.528637]  [] locks_remove_flock+0x9e/0x113
> >> > >> [16364.528668]  [] __fput+0xb6/0x1e6
> >> > >> [16364.528695]  [] fput+0xe/0x10
> >> > >> [16364.528724]  [] task_work_run+0x7e/0x98
> >> > >> [16364.528754]  [] do_exit+0x3cc/0x8fa
> >> > >> [16364.528782]  [] ? SyS_wait4+0xa5/0xc2
> >> > >> [16364.528811]  [] do_group_exit+0x6f/0xa2
> >> > >> [16364.528843]  [] SyS_exit_group+0x17/0x17
> >> > >> [16364.528876]  [] system_call_fastpath+0x16/0x1b
> >> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 
> >> > >> 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 
> >> > >> 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 
> >> > >> 38 48 8b 
> >> > >> [16364.529176] RIP [] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.529264]  RSP 
> >> > >> [16364.529283] CR2: 0008
> >> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
> [...]
> > The listing and disassembly from nlmclnt_proc is not terribly
> > interesting unfortunately. You really want to do the listing and
> > disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).
> 
> Oh, sorry! Wrong end of the oops :)
> 
> 0x81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131).
> 126 struct nlm_args *argp = >a_args;
> 127 struct nlm_lock *lock = >lock;
> 128
> 129 nlmclnt_next_cookie(>cookie);
> 130 memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), 
> sizeof(struct nfs_fh));
> 131 lock->caller  = utsname()->nodename;
> 132 lock->oh.data = req->a_owner;
> 133 lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), 
> "%u@%s",
> 134 (unsigned 
> int)fl->fl_u.nfs_fl.owner->pid,
> 135 utsname()->nodename);
> 
>0x81245102 <+0>: callq  0x81613b00 <__fentry__>
>0x81245107 <+5>: push   %rbp
>0x81245108 <+6>: mov%rsp,%rbp
>

Re: [PATCH] fs/nfs/inode.c: adjust code alignment

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 16:47 +0200, Julia Lawall wrote:
> From: Julia Lawall 
> 
> Signed-off-by: Julia Lawall 
> 
> ---
> 
> This patch adjusts the code so that the alignment matches the current
> semantics.  I have no idea if it is the intended semantics, though.  Should
> the call to nfs_setsecurity also be under the else?
> 

>  fs/nfs/inode.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index af6e806..d8ad685 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -463,7 +463,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh
> *fh, struct nfs_fattr *fattr, st
> unlock_new_inode(inode);
> } else
> nfs_refresh_inode(inode, fattr);
> -   nfs_setsecurity(inode, fattr, label);
> +   nfs_setsecurity(inode, fattr, label);
> dprintk("NFS: nfs_fhget(%s/%Ld fh_crc=0x%08x ct=%d)\n",
> inode->i_sb->s_id,
> (long long)NFS_FILEID(inode),

Hi Julia,

Thanks for pointing this out! Given that the 'then' clause of the if
statement already calls nfs_setsecurity before unlocking the inode, I
suspect that the above _should_ really be part of the 'else' clause. 

That said, I can't see that calling nfs_setsecurity twice on the inode
can cause any unintended side-effects, so I suggest that we rather queue
the patch up for inclusion in 3.12.
Steve and Dave, any comments?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH] fs/nfs/inode.c: adjust code alignment

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 16:47 +0200, Julia Lawall wrote:
 From: Julia Lawall julia.law...@lip6.fr
 
 Signed-off-by: Julia Lawall julia.law...@lip6.fr
 
 ---
 
 This patch adjusts the code so that the alignment matches the current
 semantics.  I have no idea if it is the intended semantics, though.  Should
 the call to nfs_setsecurity also be under the else?
 

  fs/nfs/inode.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
 index af6e806..d8ad685 100644
 --- a/fs/nfs/inode.c
 +++ b/fs/nfs/inode.c
 @@ -463,7 +463,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh
 *fh, struct nfs_fattr *fattr, st
 unlock_new_inode(inode);
 } else
 nfs_refresh_inode(inode, fattr);
 -   nfs_setsecurity(inode, fattr, label);
 +   nfs_setsecurity(inode, fattr, label);
 dprintk(NFS: nfs_fhget(%s/%Ld fh_crc=0x%08x ct=%d)\n,
 inode-i_sb-s_id,
 (long long)NFS_FILEID(inode),

Hi Julia,

Thanks for pointing this out! Given that the 'then' clause of the if
statement already calls nfs_setsecurity before unlocking the inode, I
suspect that the above _should_ really be part of the 'else' clause. 

That said, I can't see that calling nfs_setsecurity twice on the inode
can cause any unintended side-effects, so I suggest that we rather queue
the patch up for inclusion in 3.12.
Steve and Dave, any comments?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 16:50 +0100, Nix wrote:
 On 5 Aug 2013, Jeff Layton said:
 
  On Mon, 5 Aug 2013 11:04:27 -0400
  Jeff Layton jlay...@redhat.com wrote:
 
  On Mon, 05 Aug 2013 15:48:01 +0100
  Nix n...@esperi.org.uk wrote:
  
   On 5 Aug 2013, Jeff Layton stated:
   
On Sun, 04 Aug 2013 16:40:58 +0100
Nix n...@esperi.org.uk wrote:
   
I just got this panic on 3.10.4, in the middle of a large parallel
compilation (of Chromium, as it happens) over NFSv3:

[16364.527516] BUG: unable to handle kernel NULL pointer dereference 
at 0008
[16364.527571] IP: [81245157] nlmclnt_setlockargs+0x55/0xcf
[16364.527611] PGD 0 
[16364.527626] Oops:  [#1] PREEMPT SMP 
[16364.527656] Modules linked in: [last unloaded: microcode] 
[16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 
3.10.4-05315-gf4ce424-dirty #1
[16364.527730] Hardware name: System manufacturer System Product 
Name/P8H61-MX USB3, BIOS 0506 08/10/2012
[16364.527775] task: 88041a97ad60 ti: 8803501d4000 task.ti: 
8803501d4000
[16364.527813] RIP: 0010:[81245157] [81245157] 
nlmclnt_setlockargs+0x55/0xcf
[16364.527860] RSP: 0018:8803501d5c58  EFLAGS: 00010282
[16364.527889] RAX: 88041a97ad60 RBX: 8803e49c8800 RCX: 

[16364.527926] RDX:  RSI: 004a RDI: 
8803e49c8b54
[16364.527962] RBP: 8803501d5c68 R08: 00015720 R09: 

[16364.527998] R10: 7000 R11: 8803501d5d58 R12: 
8803501d5d58
[16364.528034] R13: 88041bd2bc00 R14:  R15: 
8803fc9e2900
[16364.528070] FS:  () GS:88042fa0() 
knlGS:
[16364.528111] CS:  0010 DS:  ES:  CR0: 80050033
[16364.528142] CR2: 0008 CR3: 01c0b000 CR4: 
001407f0
[16364.528177] DR0:  DR1:  DR2: 

[16364.528214] DR3:  DR6: 0ff0 DR7: 
0400
[16364.528303] Stack:
[16364.528316]  8803501d5d58 8803e49c8800 8803501d5cd8 
81245418 
[16364.528369]   8803516f0bc0 8803d7b7b6c0 
81215c81 
[16364.528418]  88030007 88041bd2bdc8 8801aabe9650 
8803fc9e2900 
[16364.528467] Call Trace:
[16364.528485]  [81245418] nlmclnt_proc+0x148/0x5fb
[16364.528516]  [81215c81] ? nfs_put_lock_context+0x69/0x6e
[16364.528550]  [812209a2] nfs3_proc_lock+0x21/0x23
[16364.528581]  [812149dd] do_unlk+0x96/0xb2
[16364.528608]  [81214b41] nfs_flock+0x5a/0x71
[16364.528637]  [8119a747] locks_remove_flock+0x9e/0x113
[16364.528668]  [8115cc68] __fput+0xb6/0x1e6
[16364.528695]  [8115cda6] fput+0xe/0x10
[16364.528724]  [810998da] task_work_run+0x7e/0x98
[16364.528754]  [81082bc5] do_exit+0x3cc/0x8fa
[16364.528782]  [81083501] ? SyS_wait4+0xa5/0xc2
[16364.528811]  [8108328d] do_group_exit+0x6f/0xa2
[16364.528843]  [810832d7] SyS_exit_group+0x17/0x17
[16364.528876]  [81613e92] system_call_fastpath+0x16/0x1b
[16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 
81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 
68 05 00 00 48 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 
38 48 8b 
[16364.529176] RIP [81245157] nlmclnt_setlockargs+0x55/0xcf
[16364.529264]  RSP 8803501d5c58
[16364.529283] CR2: 0008
[16364.539039] ---[ end trace 5a73fddf23441377 ]---
 [...]
  The listing and disassembly from nlmclnt_proc is not terribly
  interesting unfortunately. You really want to do the listing and
  disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).
 
 Oh, sorry! Wrong end of the oops :)
 
 0x81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131).
 126 struct nlm_args *argp = req-a_args;
 127 struct nlm_lock *lock = argp-lock;
 128
 129 nlmclnt_next_cookie(argp-cookie);
 130 memcpy(lock-fh, NFS_FH(file_inode(fl-fl_file)), 
 sizeof(struct nfs_fh));
 131 lock-caller  = utsname()-nodename;
 132 lock-oh.data = req-a_owner;
 133 lock-oh.len  = snprintf(req-a_owner, sizeof(req-a_owner), 
 %u@%s,
 134 (unsigned 
 int)fl-fl_u.nfs_fl.owner-pid,
 135 utsname()-nodename);
 
0x81245102 +0: callq  0x81613b00 __fentry__
0x81245107 +5: push   %rbp
0x81245108 +6: mov%rsp,%rbp
0x8124510b +9: push   %r12
0x8124510d +11:mov%rsi,%r12
0x81245110 +14:

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
 On Mon, 5 Aug 2013 16:15:01 +
 Myklebust, Trond trond.mykleb...@netapp.com wrote:
 
  From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
  From: Trond Myklebust trond.mykleb...@netapp.com
  Date: Mon, 5 Aug 2013 12:06:12 -0400
  Subject: [PATCH] LOCKD: Don't call utsname()-nodename from
   nlmclnt_setlockargs
  MIME-Version: 1.0
  Content-Type: text/plain; charset=UTF-8
  Content-Transfer-Encoding: 8bit
  
  Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
  which case we're in entirely the wrong namespace.
  Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
  exit_task_namespaces() outside of exit_notify()) now means that
  exit_task_work() is called after exit_task_namespaces(), which
  triggers an Oops when we're freeing up the locks.
  
  Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
  Cc: Toralf Förster toralf.foers...@gmx.de
  Cc: Oleg Nesterov o...@redhat.com
  Cc: Nix n...@esperi.org.uk
  Cc: Jeff Layton jlay...@redhat.com
  ---
   fs/lockd/clntproc.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)
  
  diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
  index 9760ecb..acd3947 100644
  --- a/fs/lockd/clntproc.c
  +++ b/fs/lockd/clntproc.c
  @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, 
  struct file_lock *fl)
   {
  struct nlm_args *argp = req-a_args;
  struct nlm_lock *lock = argp-lock;
  +   char *nodename = req-a_host-h_rpcclnt-cl_nodename;
   
  nlmclnt_next_cookie(argp-cookie);
  memcpy(lock-fh, NFS_FH(file_inode(fl-fl_file)), sizeof(struct 
  nfs_fh));
  -   lock-caller  = utsname()-nodename;
  +   lock-caller  = nodename;
  lock-oh.data = req-a_owner;
  lock-oh.len  = snprintf(req-a_owner, sizeof(req-a_owner), %u@%s,
  (unsigned int)fl-fl_u.nfs_fl.owner-pid,
  -   utsname()-nodename);
  +   nodename);
  lock-svid = fl-fl_u.nfs_fl.owner-pid;
  lock-fl.fl_start = fl-fl_start;
  lock-fl.fl_end = fl-fl_end;
 
 Looks good to me...
 
 Reviewed-by: Jeff Layton jlay...@redhat.com
 
 Trond, any thoughts on the other oops that Nix posted? The issue there
 seems to be that we're trying to do the pathwalk to the rpcbind unix
 socket from exit_task_work(), but that's happening after we've already
 called exit_fs().
 
 The trivial answer seems to be to simply call exit_task_work() before
 exit_fs() there, but it seems like we ought to be doing the upcall to
 rpcbind in a mount namespace from which we know we can reach the
 socket...

Isn't it enough to just do the same thing as we did for gss proxy? i.e.
set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.

See attachment.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From ab56d77893815b1b9f0aaa7a89cee7c832a31cff Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@netapp.com
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH] SUNRPC: Don't auto-disconnect from the local rpcbind socket

There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix n...@esperi.org.uk
Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
Cc: Jeff Layton jlay...@redhat.com
Cc: sta...@vger.kernel.org # 3.9.x
---
 net/sunrpc/rpcb_clnt.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 3df764d..4b00555 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -238,6 +238,15 @@ static int rpcb_create_local_unix(struct net *net)
 		.program	= rpcb_program,
 		.version	= RPCBVERS_2,
 		.authflavor	= RPC_AUTH_NULL,
+		/*
+		 * We turn off the idle timeout to prevent the kernel
+		 * from automatically disconnecting the socket.
+		 * Otherwise, we'd have to cache the mount namespace
+		 * of the caller and somehow pass that to the socket
+		 * reconnect code.
+		 */
+		.flags		= RPC_CLNT_CREATE_NOPING |
+  RPC_CLNT_CREATE_NO_IDLE_TIMEOUT,
 	};
 	struct rpc_clnt *clnt, *clnt4;
 	int result = 0;
-- 
1.8.3.1

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 19:33 +0100, Nix wrote:
 On 5 Aug 2013, Trond Myklebust told this:
  Does the attached patch fix the problem?
 
  From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
  From: Trond Myklebust trond.mykleb...@netapp.com
  Date: Mon, 5 Aug 2013 12:06:12 -0400
  Subject: [PATCH] LOCKD: Don't call utsname()-nodename from
   nlmclnt_setlockargs
  MIME-Version: 1.0
  Content-Type: text/plain; charset=UTF-8
  Content-Transfer-Encoding: 8bit
 
 It makes it worse. Much, much worse. From a crash every so often when
 I'm doing compilations over NFS, I get an immediate panic on startx,
 long long before I even try to replicate the earlier panic:
 
 [   83.432358] task: 88041aaa5ac0 ti: 8804199e2000 task.ti: 
 8804199e2000
 [   83.432428] RIP: 0010:[8124af69] [8124af69] 
 encode_nlm4_lock+0x26/0xbe
 [   83.432512] RSP: 0018:8804199e3a78  EFLAGS: 00010286
 [   83.432564] RAX:  RBX: 88041a577038 RCX: 
 
 [   83.432630] RDX: 8804193b3098 RSI: 88041a577038 RDI: 
 008c
 [   83.432697] RBP: 8804199e3aa8 R08: 8804193b3098 R09: 
 0001
 [   83.432763] R10: 88042fa12980 R11: 88042fa12980 R12: 
 8804199e3ae8
 [   83.432830] R13: 008c R14: 8804199e3fd8 R15: 
 815de80e
 [   83.432898] FS:  7f594b40c740() GS:88042fa0() 
 knlGS:
 [   83.432974] CS:  0010 DS:  ES:  CR0: 80050033
 [   83.433028] CR2: 008c CR3: 00041ab3d000 CR4: 
 001407f0
 [   83.433095] DR0:  DR1:  DR2: 
 
 [   83.433176] DR3:  DR6: 0ff0 DR7: 
 0400
 [   83.433255] Stack:
 [   83.433276]  88041a44fb70 88040004 8804199e3ae8 
 88041a577010 
 [   83.433360]  8804188e0e00 8804199e3fd8 8804199e3ac8 
 8124b0d7 
 [   83.433443]  8804188e0e00 8124b086 8804199e3b38 
 815e6032 
 [   83.433616] Call Trace:
 [   83.433646]  [8124b0d7] nlm4_xdr_enc_lockargs+0x51/0x76
 [   83.433707]  [8124b086] ? nlm4_xdr_enc_cancargs+0x56/0x56
 [   83.433769]  [815e6032] rpcauth_wrap_req+0x57/0x62
 [   83.433826]  [815de98a] call_transmit+0x17c/0x1f9
 [   83.433880]  [815e4e58] __rpc_execute+0xe8/0x2ca
 [   83.433935]  [815e50f9] rpc_execute+0x76/0x9d
 [   83.433986]  [815debc1] rpc_run_task+0x78/0x80
 [   83.434039]  [815decff] rpc_call_sync+0x88/0x9e
 [   83.434092]  [81244b3c] nlmclnt_call+0xb5/0x240
 [   83.434146]  [812454f0] nlmclnt_proc+0x226/0x5fb
 [   83.434226]  [812209a2] nfs3_proc_lock+0x21/0x23
 [   83.434280]  [81214a5e] do_setlk+0x65/0xee
 [   83.434329]  [81214ca6] nfs_lock+0x14e/0x162
 [   83.434382]  [81199661] vfs_lock_file+0x29/0x35
 [   83.434435]  [8119a51d] fcntl_setlk+0x139/0x2c5
 [   83.434490]  [81169621] SyS_fcntl+0x2b6/0x47d
 [   83.434543]  [81613e92] system_call_fastpath+0x16/0x1b
 [   83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 
 e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef f2 
 ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00 
 [   83.435077] RIP [8124af69] encode_nlm4_lock+0x26/0xbe
 [   83.435140]  RSP 8804199e3a78
 [   83.435197] CR2: 008c
 
 That's here:
 
 (gdb) list *(encode_nlm4_lock+0x26)
 0x8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329).
 324  *  string caller_nameLM_MAXSTRLEN;
 325  */
 326 static void encode_caller_name(struct xdr_stream *xdr, const char 
 *name)
 327 {
 328 /* NB: client-side does not set lock-len */
 329 u32 length = strlen(name);
 330 __be32 *p;
 331
 332 p = xdr_reserve_space(xdr, 4 + length);
 333 xdr_encode_opaque(p, name, length);
 
0x8124af69 +38:repnz scas %es:(%rdi),%al
 
 Pretty clearly, name can be NULL after this patch...
 
Yes. This scheme will only work if we make sure that host-h_rpcclnt is
initialised at mount time. Here is a v2 patch that should do the right
thing.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 9a1b6bf818e74bb7aabaecb59492b739f2f4d742 Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@netapp.com
Date: Mon, 5 Aug 2013 12:06:12 -0400
Subject: [PATCH v2] LOCKD: Don't call utsname()-nodename from
 nlmclnt_setlockargs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
which case we're in entirely the wrong namespace.

Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
exit_task_namespaces() outside of exit_notify()) now means that
exit_task_work() is called after

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote:
 On Mon, 5 Aug 2013 18:18:03 +
 Myklebust, Trond trond.mykleb...@netapp.com wrote:
 
  On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
   On Mon, 5 Aug 2013 16:15:01 +
   Myklebust, Trond trond.mykleb...@netapp.com wrote:
   
From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@netapp.com
Date: Mon, 5 Aug 2013 12:06:12 -0400
Subject: [PATCH] LOCKD: Don't call utsname()-nodename from
 nlmclnt_setlockargs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
which case we're in entirely the wrong namespace.
Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
exit_task_namespaces() outside of exit_notify()) now means that
exit_task_work() is called after exit_task_namespaces(), which
triggers an Oops when we're freeing up the locks.

Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
Cc: Toralf Förster toralf.foers...@gmx.de
Cc: Oleg Nesterov o...@redhat.com
Cc: Nix n...@esperi.org.uk
Cc: Jeff Layton jlay...@redhat.com
---
 fs/lockd/clntproc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 9760ecb..acd3947 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst 
*req, struct file_lock *fl)
 {
struct nlm_args *argp = req-a_args;
struct nlm_lock *lock = argp-lock;
+   char *nodename = req-a_host-h_rpcclnt-cl_nodename;
 
nlmclnt_next_cookie(argp-cookie);
memcpy(lock-fh, NFS_FH(file_inode(fl-fl_file)), 
sizeof(struct nfs_fh));
-   lock-caller  = utsname()-nodename;
+   lock-caller  = nodename;
lock-oh.data = req-a_owner;
lock-oh.len  = snprintf(req-a_owner, sizeof(req-a_owner), 
%u@%s,
(unsigned 
int)fl-fl_u.nfs_fl.owner-pid,
-   utsname()-nodename);
+   nodename);
lock-svid = fl-fl_u.nfs_fl.owner-pid;
lock-fl.fl_start = fl-fl_start;
lock-fl.fl_end = fl-fl_end;
   
   Looks good to me...
   
   Reviewed-by: Jeff Layton jlay...@redhat.com
   
   Trond, any thoughts on the other oops that Nix posted? The issue there
   seems to be that we're trying to do the pathwalk to the rpcbind unix
   socket from exit_task_work(), but that's happening after we've already
   called exit_fs().
   
   The trivial answer seems to be to simply call exit_task_work() before
   exit_fs() there, but it seems like we ought to be doing the upcall to
   rpcbind in a mount namespace from which we know we can reach the
   socket...
  
  Isn't it enough to just do the same thing as we did for gss proxy? i.e.
  set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
  
  See attachment.
 
 Yeah, that looks like a reasonable thing to do...
 
 OTOH, Is there any other way for a unix socket to end up disconnected
 other than if we were to close it? Maybe if rpcbind stopped, the socket
 unlinked and recreated and then started again?
 
 If so then you still could potentially end up in this situation even if
 you didn't autoclose it.

True. How about something like the following instead. Note the change to
the original patch...
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 00326ed6442c66021cd4b5e19e80f3e2027d5d42 Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@netapp.com
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH v2 1/2] SUNRPC: Don't auto-disconnect from the local rpcbind
 socket

There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix n...@esperi.org.uk
Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
Cc: Jeff Layton jlay...@redhat.com
Cc: sta...@vger.kernel.org # 3.9.x
---
 net/sunrpc/rpcb_clnt.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 3df764d..b0f7232 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -238,6 +238,14 @@ static int rpcb_create_local_unix(struct net *net)
 		.program	= rpcb_program,
 		.version	= RPCBVERS_2,
 		.authflavor	= RPC_AUTH_NULL,
+		/*
+		 * We turn off the idle timeout to prevent the kernel
+		 * from automatically disconnecting the socket.
+		 * Otherwise, we'd have to cache the mount namespace
+		 * of the caller and somehow pass that to the socket
+		 * reconnect code.
+		 */
+		.flags

Re: [ 068/103] SUNRPC: fix races on PipeFS UMOUNT notifications

2013-07-23 Thread Myklebust, Trond

On Tue, 2013-07-23 at 15:26 -0700, Greg Kroah-Hartman wrote:
> 3.10-stable review patch.  If anyone has any objections, please let me know.
> 

Again, please drop this patch and 67/103 for now. We'll get back to
whether or not this should be stable material later.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Linux 3.11-rc2

2013-07-23 Thread Myklebust, Trond

On Tue, 2013-07-23 at 13:42 -0700, Linus Torvalds wrote:
> On Tue, Jul 23, 2013 at 12:08 PM,   wrote:
> > Hi Trond,
> >
> >> OK. With Andre's help, I think we've root caused the problem. Can you
> >> please confirm that the attached patch also solves the issue for you?
> >
> > Seems to work fine,
> >
> >Reported-and-tested-by: Henrik Rydberg 
> 
> Trond, do you have anything else pending and are planning a git pull,
> or should I just take this patch directly?

Nothing else queued for now, so if you could take it directly, then that
would save the trouble of setting up an extra pull.

Thanks!
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: Linux 3.11-rc2

2013-07-23 Thread Myklebust, Trond

On Mon, 2013-07-22 at 21:17 -0400, Trond Myklebust wrote:
> On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote:
> > Hi Trond, Linus,
> > 
> > On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote:
> > > So it's been another week, and -rc2 is out there.
> > 
> > This one happens to break nfs in a rather blunt-instrument fashion -
> > creating files on a nfs4 partition [1] no longer works. Bisection
> > yields this commit as the culprit:
> > 
> > commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd
> > Author: Trond Myklebust 
> > Date:   Wed Jul 17 16:43:16 2013 -0400
> > 
> > NFSv4: Fix a regression against the FreeBSD server
> > 
> > Technically, the Linux client is allowed by the NFSv4 spec to send
> > 3 word bitmaps as part of an OPEN request. However, this causes the
> > current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors.
> > 
> > Fix the regression by making the Linux client use a 2 word bitmap unless
> > doing NFSv4.2 with labeled NFS.
> > 
> > Signed-off-by: Trond Myklebust 
> > 
> > Reverting the patch returns things to normal.
> 
> - Can you please provide me with a binary tcpdump or wireshark dump that
> demonstrates the problem?
> 
> - What server is this?

OK. With Andre's help, I think we've root caused the problem. Can you
please confirm that the attached patch also solves the issue for you?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 1f84e4f9ef9fc4ff502c112df049dfa6f656f4e0 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Tue, 23 Jul 2013 12:53:39 -0400
Subject: [PATCH] NFSv4: Fix brainfart in attribute length calculation

The calculation of the attribute length was 4 bytes off.

Signed-off-by: Trond Myklebust 
Tested-by: Andre Heider 
---
 fs/nfs/nfs4xdr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c74d616..3850b01 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1118,11 +1118,11 @@ static void encode_attrs(struct xdr_stream *xdr, const struct iattr *iap,
 len, ((char *)p - (char *)q) + 4);
 		BUG();
 	}
-	len = (char *)p - (char *)q - (bmval_len << 2);
 	*q++ = htonl(bmval0);
 	*q++ = htonl(bmval1);
 	if (bmval_len == 3)
 		*q++ = htonl(bmval2);
+	len = (char *)p - (char *)(q + 1);
 	*q = htonl(len);
 
 /* out: */
-- 
1.8.3.1

Re: Linux 3.11-rc2

2013-07-23 Thread Myklebust, Trond

On Mon, 2013-07-22 at 21:17 -0400, Trond Myklebust wrote:
 On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote:
  Hi Trond, Linus,
  
  On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote:
   So it's been another week, and -rc2 is out there.
  
  This one happens to break nfs in a rather blunt-instrument fashion -
  creating files on a nfs4 partition [1] no longer works. Bisection
  yields this commit as the culprit:
  
  commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd
  Author: Trond Myklebust trond.mykleb...@netapp.com
  Date:   Wed Jul 17 16:43:16 2013 -0400
  
  NFSv4: Fix a regression against the FreeBSD server
  
  Technically, the Linux client is allowed by the NFSv4 spec to send
  3 word bitmaps as part of an OPEN request. However, this causes the
  current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors.
  
  Fix the regression by making the Linux client use a 2 word bitmap unless
  doing NFSv4.2 with labeled NFS.
  
  Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
  
  Reverting the patch returns things to normal.
 
 - Can you please provide me with a binary tcpdump or wireshark dump that
 demonstrates the problem?
 
 - What server is this?

OK. With Andre's help, I think we've root caused the problem. Can you
please confirm that the attached patch also solves the issue for you?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 1f84e4f9ef9fc4ff502c112df049dfa6f656f4e0 Mon Sep 17 00:00:00 2001
From: Trond Myklebust trond.mykleb...@netapp.com
Date: Tue, 23 Jul 2013 12:53:39 -0400
Subject: [PATCH] NFSv4: Fix brainfart in attribute length calculation

The calculation of the attribute length was 4 bytes off.

Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
Tested-by: Andre Heider a.hei...@gmail.com
---
 fs/nfs/nfs4xdr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c74d616..3850b01 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1118,11 +1118,11 @@ static void encode_attrs(struct xdr_stream *xdr, const struct iattr *iap,
 len, ((char *)p - (char *)q) + 4);
 		BUG();
 	}
-	len = (char *)p - (char *)q - (bmval_len  2);
 	*q++ = htonl(bmval0);
 	*q++ = htonl(bmval1);
 	if (bmval_len == 3)
 		*q++ = htonl(bmval2);
+	len = (char *)p - (char *)(q + 1);
 	*q = htonl(len);
 
 /* out: */
-- 
1.8.3.1

Re: Linux 3.11-rc2

2013-07-23 Thread Myklebust, Trond

On Tue, 2013-07-23 at 13:42 -0700, Linus Torvalds wrote:
 On Tue, Jul 23, 2013 at 12:08 PM,  rydb...@euromail.se wrote:
  Hi Trond,
 
  OK. With Andre's help, I think we've root caused the problem. Can you
  please confirm that the attached patch also solves the issue for you?
 
  Seems to work fine,
 
 Reported-and-tested-by: Henrik Rydberg rydb...@euromail.se
 
 Trond, do you have anything else pending and are planning a git pull,
 or should I just take this patch directly?

Nothing else queued for now, so if you could take it directly, then that
would save the trouble of setting up an extra pull.

Thanks!
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [ 068/103] SUNRPC: fix races on PipeFS UMOUNT notifications

2013-07-23 Thread Myklebust, Trond

On Tue, 2013-07-23 at 15:26 -0700, Greg Kroah-Hartman wrote:
 3.10-stable review patch.  If anyone has any objections, please let me know.
 

Again, please drop this patch and 67/103 for now. We'll get back to
whether or not this should be stable material later.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag

2013-07-22 Thread Myklebust, Trond

On Mon, 2013-07-22 at 19:47 -0700, James Bottomley wrote:
> On Tue, 2013-07-23 at 02:40 +0000, Myklebust, Trond wrote:
> > On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote:
> > > The solution, to me, looks simple:  Let's co-opt a process we already
> > > know how to do: mailing list review and tree handling.  So the proposal
> > > is simple:
> > > 
> > >  1. Drop the cc: stable@ tag: it makes it way too easy to add an ill
> > > reviewed patch to stable
> > >  2. All patches to stable should follow current review rules: They
> > > should go to the mailing list the original patch was sent to
> > > once the original is upstream as a request for stable.
> > >  3. Following debate on the list, the original maintainer would be
> > > responsible for collecting the patches (including the upstream
> > > commit) adjudicating on them and passing them on to stable after
> > > list review (either by git tree pull or email to stable@).
> > > 
> > > I contend this raises the bar for adding patches to stable much higher,
> > > which seems to be needed, and adds a review stage which involves all the
> > > original reviewers.
> > 
> > Could we keep the Cc: stable tag itself, since the dependency
> > information ("Cc:  # 3.3.x: a1f84a3: sched:
> > Check for idle") is actually very useful? If we discard that, then we
> > really should revise the whole stable system, since it would mean that
> > we are in effect discarding the 'upstream first' rule.
> 
> The two don't follow.  No-one's proposing to dump the must be upstream
> rule.  The proposal is to modify the automatic behaviour that leads to
> over tagging for stable and consequently too many "stable" patches that
> aren't really.

My point was that the _tag_ is useful as a list of dependencies for
something that we thing might need to be backported to older kernels.
I'd like to see us keep that information somehow.

Whether or not we interpret it as being an automatic "for stable"
request is a different matter. I'd be quite happy to see the "propose
for stable" step as reverting to being a manual step that occurs only
after we've upstreamed the fix.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag

2013-07-22 Thread Myklebust, Trond

On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote:
> Before the "3.10.1-stable review" thread degenerated into a disagreement
> about habits of politeness, there were some solid points being made
> which, I think, bear consideration and which may now be lost.
> 
> The problem, as Jiří Kosina put is succinctly is that the distributions
> are finding stable less useful because it contains to much stuff they'd
> classify as not stable material.
> 
> The question that arises from this is who is stable aiming at ...
> because if it's the distributions (and that's what people seem to be
> using it for) then we need to take this feedback seriously.
> 
> The next question is how should we, the maintainers, be policing commits
> to stable.  As I think has been demonstrated in the discussion the
> "stable rules" are more sort of guidelines (apologies for the pirates
> reference).  In many ways, this is as it should be, because people
> should have enough taste to know what constitutes a stable fix.  The
> real root cause of the problem is that the cc: stable tag can't be
> stripped once it's in the tree, so maintainers only get to police things
> they put in the tree.  Stuff they pull from others is already tagged and
> that tag can't be changed.  This effectively pushes the problem out to
> the lowest (and possibly more inexperienced) leaves of the Maintainer
> tree.  In theory we have a review stage for stable, but the review
> patches don't automatically get routed to the right mailing list and the
> first round usually comes out in the merge window when Maintainers'
> attention is elsewhere.
> 
> The solution, to me, looks simple:  Let's co-opt a process we already
> know how to do: mailing list review and tree handling.  So the proposal
> is simple:
> 
>  1. Drop the cc: stable@ tag: it makes it way too easy to add an ill
> reviewed patch to stable
>  2. All patches to stable should follow current review rules: They
> should go to the mailing list the original patch was sent to
> once the original is upstream as a request for stable.
>  3. Following debate on the list, the original maintainer would be
> responsible for collecting the patches (including the upstream
> commit) adjudicating on them and passing them on to stable after
> list review (either by git tree pull or email to stable@).
> 
> I contend this raises the bar for adding patches to stable much higher,
> which seems to be needed, and adds a review stage which involves all the
> original reviewers.

Could we keep the Cc: stable tag itself, since the dependency
information ("Cc:  # 3.3.x: a1f84a3: sched:
Check for idle") is actually very useful? If we discard that, then we
really should revise the whole stable system, since it would mean that
we are in effect discarding the 'upstream first' rule.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: Linux 3.11-rc2

2013-07-22 Thread Myklebust, Trond

On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote:
> Hi Trond, Linus,
> 
> On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote:
> > So it's been another week, and -rc2 is out there.
> 
> This one happens to break nfs in a rather blunt-instrument fashion -
> creating files on a nfs4 partition [1] no longer works. Bisection
> yields this commit as the culprit:
> 
> commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd
> Author: Trond Myklebust 
> Date:   Wed Jul 17 16:43:16 2013 -0400
> 
> NFSv4: Fix a regression against the FreeBSD server
> 
> Technically, the Linux client is allowed by the NFSv4 spec to send
> 3 word bitmaps as part of an OPEN request. However, this causes the
> current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors.
> 
> Fix the regression by making the Linux client use a 2 word bitmap unless
> doing NFSv4.2 with labeled NFS.
> 
> Signed-off-by: Trond Myklebust 
> 
> Reverting the patch returns things to normal.

- Can you please provide me with a binary tcpdump or wireshark dump that
demonstrates the problem?

- What server is this?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Linux 3.11-rc2

2013-07-22 Thread Myklebust, Trond

On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote:
 Hi Trond, Linus,
 
 On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote:
  So it's been another week, and -rc2 is out there.
 
 This one happens to break nfs in a rather blunt-instrument fashion -
 creating files on a nfs4 partition [1] no longer works. Bisection
 yields this commit as the culprit:
 
 commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd
 Author: Trond Myklebust trond.mykleb...@netapp.com
 Date:   Wed Jul 17 16:43:16 2013 -0400
 
 NFSv4: Fix a regression against the FreeBSD server
 
 Technically, the Linux client is allowed by the NFSv4 spec to send
 3 word bitmaps as part of an OPEN request. However, this causes the
 current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors.
 
 Fix the regression by making the Linux client use a 2 word bitmap unless
 doing NFSv4.2 with labeled NFS.
 
 Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
 
 Reverting the patch returns things to normal.

- Can you please provide me with a binary tcpdump or wireshark dump that
demonstrates the problem?

- What server is this?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag

2013-07-22 Thread Myklebust, Trond

On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote:
 Before the 3.10.1-stable review thread degenerated into a disagreement
 about habits of politeness, there were some solid points being made
 which, I think, bear consideration and which may now be lost.
 
 The problem, as Jiří Kosina put is succinctly is that the distributions
 are finding stable less useful because it contains to much stuff they'd
 classify as not stable material.
 
 The question that arises from this is who is stable aiming at ...
 because if it's the distributions (and that's what people seem to be
 using it for) then we need to take this feedback seriously.
 
 The next question is how should we, the maintainers, be policing commits
 to stable.  As I think has been demonstrated in the discussion the
 stable rules are more sort of guidelines (apologies for the pirates
 reference).  In many ways, this is as it should be, because people
 should have enough taste to know what constitutes a stable fix.  The
 real root cause of the problem is that the cc: stable tag can't be
 stripped once it's in the tree, so maintainers only get to police things
 they put in the tree.  Stuff they pull from others is already tagged and
 that tag can't be changed.  This effectively pushes the problem out to
 the lowest (and possibly more inexperienced) leaves of the Maintainer
 tree.  In theory we have a review stage for stable, but the review
 patches don't automatically get routed to the right mailing list and the
 first round usually comes out in the merge window when Maintainers'
 attention is elsewhere.
 
 The solution, to me, looks simple:  Let's co-opt a process we already
 know how to do: mailing list review and tree handling.  So the proposal
 is simple:
 
  1. Drop the cc: stable@ tag: it makes it way too easy to add an ill
 reviewed patch to stable
  2. All patches to stable should follow current review rules: They
 should go to the mailing list the original patch was sent to
 once the original is upstream as a request for stable.
  3. Following debate on the list, the original maintainer would be
 responsible for collecting the patches (including the upstream
 commit) adjudicating on them and passing them on to stable after
 list review (either by git tree pull or email to stable@).
 
 I contend this raises the bar for adding patches to stable much higher,
 which seems to be needed, and adds a review stage which involves all the
 original reviewers.

Could we keep the Cc: stable tag itself, since the dependency
information (Cc: sta...@vger.kernel.org # 3.3.x: a1f84a3: sched:
Check for idle) is actually very useful? If we discard that, then we
really should revise the whole stable system, since it would mean that
we are in effect discarding the 'upstream first' rule.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [Ksummit-2013-discuss] KS Topic request: Handling the Stable kernel, let's dump the cc: stable tag

2013-07-22 Thread Myklebust, Trond

On Mon, 2013-07-22 at 19:47 -0700, James Bottomley wrote:
 On Tue, 2013-07-23 at 02:40 +, Myklebust, Trond wrote:
  On Mon, 2013-07-15 at 23:27 +0400, James Bottomley wrote:
   The solution, to me, looks simple:  Let's co-opt a process we already
   know how to do: mailing list review and tree handling.  So the proposal
   is simple:
   
1. Drop the cc: stable@ tag: it makes it way too easy to add an ill
   reviewed patch to stable
2. All patches to stable should follow current review rules: They
   should go to the mailing list the original patch was sent to
   once the original is upstream as a request for stable.
3. Following debate on the list, the original maintainer would be
   responsible for collecting the patches (including the upstream
   commit) adjudicating on them and passing them on to stable after
   list review (either by git tree pull or email to stable@).
   
   I contend this raises the bar for adding patches to stable much higher,
   which seems to be needed, and adds a review stage which involves all the
   original reviewers.
  
  Could we keep the Cc: stable tag itself, since the dependency
  information (Cc: sta...@vger.kernel.org # 3.3.x: a1f84a3: sched:
  Check for idle) is actually very useful? If we discard that, then we
  really should revise the whole stable system, since it would mean that
  we are in effect discarding the 'upstream first' rule.
 
 The two don't follow.  No-one's proposing to dump the must be upstream
 rule.  The proposal is to modify the automatic behaviour that leads to
 over tagging for stable and consequently too many stable patches that
 aren't really.

My point was that the _tag_ is useful as a list of dependencies for
something that we thing might need to be backported to older kernels.
I'd like to see us keep that information somehow.

Whether or not we interpret it as being an automatic for stable
request is a different matter. I'd be quite happy to see the propose
for stable step as reverting to being a manual step that occurs only
after we've upstreamed the fix.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client fixes

2013-07-19 Thread Myklebust, Trond

Hi Linus,

The following changes since commit ad81f0545ef01ea651886dddac4bef6cec930092:

  Linux 3.11-rc1 (2013-07-14 15:18:27 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-3

for you to fetch changes up to b4a2cf76ab7c08628c62b2062dacefa496b59dfd:

  NFSv4: Fix a regression against the FreeBSD server (2013-07-17 16:54:46 -0400)


NFS client bugfixes for 3.11

- Fix a regression against NFSv4 FreeBSD servers when creating a new file
- Fix another regression in rpc_client_register()


Trond Myklebust (2):
  SUNRPC: Fix another issue with rpc_client_register()
  NFSv4: Fix a regression against the FreeBSD server

 fs/nfs/nfs4xdr.c  | 21 ++---
 net/sunrpc/clnt.c |  1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client fixes

2013-07-19 Thread Myklebust, Trond

Hi Linus,

The following changes since commit ad81f0545ef01ea651886dddac4bef6cec930092:

  Linux 3.11-rc1 (2013-07-14 15:18:27 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-3

for you to fetch changes up to b4a2cf76ab7c08628c62b2062dacefa496b59dfd:

  NFSv4: Fix a regression against the FreeBSD server (2013-07-17 16:54:46 -0400)


NFS client bugfixes for 3.11

- Fix a regression against NFSv4 FreeBSD servers when creating a new file
- Fix another regression in rpc_client_register()


Trond Myklebust (2):
  SUNRPC: Fix another issue with rpc_client_register()
  NFSv4: Fix a regression against the FreeBSD server

 fs/nfs/nfs4xdr.c  | 21 ++---
 net/sunrpc/clnt.c |  1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [GIT PULL] x86 fixes for 3.11-rc2

2013-07-18 Thread Myklebust, Trond

On Thu, 2013-07-18 at 17:46 -0700, Linus Torvalds wrote:

> Finnish is hard. But good for swearing.

Only because the ratio of vowels to consonants causes an immediate
outbreak of swearing among those who try...

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [GIT PULL] x86 fixes for 3.11-rc2

2013-07-18 Thread Myklebust, Trond

On Thu, 2013-07-18 at 17:46 -0700, Linus Torvalds wrote:

 Finnish is hard. But good for swearing.

Only because the ratio of vowels to consonants causes an immediate
outbreak of swearing among those who try...

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-16 Thread Myklebust, Trond

On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
> On 07/16/2013 07:12 PM, Sarah Sharp wrote:
> > On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
> >> On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:
> >>
> >>> Yes, that's true.  Some kernel developers are better at moderating their
> >>> comments and tone towards individuals who are "sensitive".  Others
> >>> simply don't give a shit.  So we need to figure out how to meet
> >>> somewhere in the middle, in order to establish a baseline of civility.
> >> I have to ask this because I'm thick, and don't really understand,
> >> but ...
> >>
> >> What problem exactly are we trying to solve here?
> > Personal attacks are not cool Steve.  Some people simply don't care if a
> > verbal tirade is directed at them.  Others do not want anyone to attack
> > them personally, but they're fine with people attacking their code.
> >
> > Bystanders that don't understand the kernel community structure are
> > discouraged from contributing because they don't want to be verbally
> > abused, and they really don't want to see either personal attacks or
> > intense belittling, demeaning comments about code.
> >
> > In order to make our community better, we need to figure out where the
> > baseline of "good" behavior is.  We need to define what behavior we want
> > from both maintainers and patch submitters.  E.g. "No regressions" and
> > "don't break userspace" and "no personal attacks".  That needs to be
> > written down somewhere, and it isn't.  If it's documented somewhere,
> > point me to the file in Documentation.  Hint: it's not there.
> >
> > That is the problem.
> >
> > Sarah Sharp
> 
> The problem you are pointing out - and it is a problem - makes us less 
> effective 
> as a community.

Not really. Most of the people who already work as part of this
community are completely used to it. We've created the environment, and
have no problems with it.

Where it could possibly be a problem is when it comes to recruiting
_new_ members to our community. Particularly so given that some
journalists take a special pleasure in reporting particularly juicy
comments and antics. That would tend to scare off a lot of gun-shy
newbies.
On the other hand, it might tend to bias our recruitment toward people
of a more "special" disposition. Perhaps we finally need the services of
a social scientist to help us find out...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-16 Thread Myklebust, Trond

On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
 On 07/16/2013 07:12 PM, Sarah Sharp wrote:
  On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
  On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:
 
  Yes, that's true.  Some kernel developers are better at moderating their
  comments and tone towards individuals who are sensitive.  Others
  simply don't give a shit.  So we need to figure out how to meet
  somewhere in the middle, in order to establish a baseline of civility.
  I have to ask this because I'm thick, and don't really understand,
  but ...
 
  What problem exactly are we trying to solve here?
  Personal attacks are not cool Steve.  Some people simply don't care if a
  verbal tirade is directed at them.  Others do not want anyone to attack
  them personally, but they're fine with people attacking their code.
 
  Bystanders that don't understand the kernel community structure are
  discouraged from contributing because they don't want to be verbally
  abused, and they really don't want to see either personal attacks or
  intense belittling, demeaning comments about code.
 
  In order to make our community better, we need to figure out where the
  baseline of good behavior is.  We need to define what behavior we want
  from both maintainers and patch submitters.  E.g. No regressions and
  don't break userspace and no personal attacks.  That needs to be
  written down somewhere, and it isn't.  If it's documented somewhere,
  point me to the file in Documentation.  Hint: it's not there.
 
  That is the problem.
 
  Sarah Sharp
 
 The problem you are pointing out - and it is a problem - makes us less 
 effective 
 as a community.

Not really. Most of the people who already work as part of this
community are completely used to it. We've created the environment, and
have no problems with it.

Where it could possibly be a problem is when it comes to recruiting
_new_ members to our community. Particularly so given that some
journalists take a special pleasure in reporting particularly juicy
comments and antics. That would tend to scare off a lot of gun-shy
newbies.
On the other hand, it might tend to bias our recruitment toward people
of a more special disposition. Perhaps we finally need the services of
a social scientist to help us find out...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: sunrpc/clnt.c: BUG kmalloc-256 (Not tainted): Poison overwritten

2013-07-14 Thread Myklebust, Trond

On Sun, 2013-07-14 at 10:02 +0200, Toralf Förster wrote:
> This bisected commit produces at a 32 bit user mode linux guest the attached 
> BUG :
> 
> commit 245268c951262b861bc1be4e9dc812352499
> Author: Trond Myklebust 
> Date:   Wed Jul 10 15:33:01 2013 -0400
> 
> SUNRPC: Fix a deadlock in rpc_client_register()
> 
> Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on
> PipeFS MOUNT notifications) introduces a regression when we call
> rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.
> 
> By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
> end up deadlocking in gss_pipes_dentries_create_net().
> Fix is to register the client and release the mutex before calling
> rpcauth_create().
> 
> Reported-by: Weston Andros Adamson 
> Tested-by: Weston Andros Adamson 
> Cc: Stanislav Kinsbursky 
> Cc:  # : 3848160: SUNRPC: fix races on PipeFS 
> MOUNT
> Cc:  # : e73f4cc: SUNRPC: split client creation
> Signed-off-by: Trond Myklebust 
> 
> 
> 
> 
> 
> 
> 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Version 1.2.6 starting
> 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Backgrounding to 
> notify hosts...
> 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1043]: Running as root.  
> chown /var/lib/nfs to choose different user
> 2013-07-13T22:09:10.000+02:00 trinity mount[1047]: mount to NFS server 
> 'n22stab4' failed: No route to host, retrying
> 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: sending IPv6 Router 
> Solicitation
> 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: no IPv6 Routers 
> available
> 2013-07-13T22:09:13.000+02:00 trinity mount[1048]: mount to NFS server 
> 'n22stab4' failed: No route to host, retrying
> 2013-07-13T22:09:13.647+02:00 trinity kernel: 
> =
> 2013-07-13T22:09:13.647+02:00 trinity kernel: BUG kmalloc-256 (Not tainted): 
> Poison overwritten
> 2013-07-13T22:09:13.647+02:00 trinity kernel: 
> -
> 2013-07-13T22:09:13.647+02:00 trinity kernel:
> 2013-07-13T22:09:13.647+02:00 trinity kernel: Disabling lock debugging due to 
> kernel taint
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: 0x49b1ed18-0x49b1ed1b. 
> First byte 0x74 instead of 0x6b
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Allocated in 
> rpc_new_client+0x81/0x3a0 age=300 cpu=0 pid=1049
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Freed in 
> rpc_new_client+0x35a/0x3a0 age=300 cpu=0 pid=1049
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Slab 0x0b87bac0 
> objects=13 used=13 fp=0x  (null) flags=0x0080
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Object 0x49b1ed10 
> @offset=3344 fp=0x49b1ee40
> 2013-07-13T22:09:13.653+02:00 trinity kernel:
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Bytes b4 49b1ed00: f6 03 00 00 
> bd 94 ff ff 5a 5a 5a 5a 5a 5a 5a 5a  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed10: 6b 6b 6b 6b 6b 
> 6b 6b 6b 74 3a 85 49 6b 6b 6b 6b  t:.I
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed20: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed30: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed40: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed50: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed60: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed70: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed80: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ed90: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1eda0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edb0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edc0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edd0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ede0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edf0: 6b 6b 6b 6b 6b 
> 6b

Re: sunrpc/clnt.c: BUG kmalloc-256 (Not tainted): Poison overwritten

2013-07-14 Thread Myklebust, Trond

On Sun, 2013-07-14 at 10:02 +0200, Toralf Förster wrote:
 This bisected commit produces at a 32 bit user mode linux guest the attached 
 BUG :
 
 commit 245268c951262b861bc1be4e9dc812352499
 Author: Trond Myklebust trond.mykleb...@netapp.com
 Date:   Wed Jul 10 15:33:01 2013 -0400
 
 SUNRPC: Fix a deadlock in rpc_client_register()
 
 Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on
 PipeFS MOUNT notifications) introduces a regression when we call
 rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.
 
 By calling rpcauth_create() while holding the sn-pipefs_sb_lock, we
 end up deadlocking in gss_pipes_dentries_create_net().
 Fix is to register the client and release the mutex before calling
 rpcauth_create().
 
 Reported-by: Weston Andros Adamson d...@netapp.com
 Tested-by: Weston Andros Adamson d...@netapp.com
 Cc: Stanislav Kinsbursky skinsbur...@parallels.com
 Cc: sta...@vger.kernel.org # : 3848160: SUNRPC: fix races on PipeFS 
 MOUNT
 Cc: sta...@vger.kernel.org # : e73f4cc: SUNRPC: split client creation
 Signed-off-by: Trond Myklebust trond.mykleb...@netapp.com
 
 
 
 
 
 
 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Version 1.2.6 starting
 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Backgrounding to 
 notify hosts...
 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1043]: Running as root.  
 chown /var/lib/nfs to choose different user
 2013-07-13T22:09:10.000+02:00 trinity mount[1047]: mount to NFS server 
 'n22stab4' failed: No route to host, retrying
 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: sending IPv6 Router 
 Solicitation
 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: no IPv6 Routers 
 available
 2013-07-13T22:09:13.000+02:00 trinity mount[1048]: mount to NFS server 
 'n22stab4' failed: No route to host, retrying
 2013-07-13T22:09:13.647+02:00 trinity kernel: 
 =
 2013-07-13T22:09:13.647+02:00 trinity kernel: BUG kmalloc-256 (Not tainted): 
 Poison overwritten
 2013-07-13T22:09:13.647+02:00 trinity kernel: 
 -
 2013-07-13T22:09:13.647+02:00 trinity kernel:
 2013-07-13T22:09:13.647+02:00 trinity kernel: Disabling lock debugging due to 
 kernel taint
 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: 0x49b1ed18-0x49b1ed1b. 
 First byte 0x74 instead of 0x6b
 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Allocated in 
 rpc_new_client+0x81/0x3a0 age=300 cpu=0 pid=1049
 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Freed in 
 rpc_new_client+0x35a/0x3a0 age=300 cpu=0 pid=1049
 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Slab 0x0b87bac0 
 objects=13 used=13 fp=0x  (null) flags=0x0080
 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Object 0x49b1ed10 
 @offset=3344 fp=0x49b1ee40
 2013-07-13T22:09:13.653+02:00 trinity kernel:
 2013-07-13T22:09:13.653+02:00 trinity kernel: Bytes b4 49b1ed00: f6 03 00 00 
 bd 94 ff ff 5a 5a 5a 5a 5a 5a 5a 5a  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed10: 6b 6b 6b 6b 6b 
 6b 6b 6b 74 3a 85 49 6b 6b 6b 6b  t:.I
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed20: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed30: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed40: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed50: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed60: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed70: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed80: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ed90: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1eda0: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edb0: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edc0: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edd0: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ede0: 6b 6b 6b 6b 6b 
 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

[GIT PULL] Please pull NFS client updates

2013-07-11 Thread Myklebust, Trond

Hi Linus,

The following pull request mainly contains some small readdir
optimisations that had dependencies on Al Viro's readdir rewrite. There
is also a fix for a nasty deadlock which surfaced earlier in this merge
window.

The following changes since commit a82a729f04232ccd0b59406574ba4cf20027a49d:

  Merge branch 'akpm' (updates from Andrew Morton) (2013-07-09 13:33:36 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-2

for you to fetch changes up to 245268c951262b861bc1be4e9dc812352499:

  SUNRPC: Fix a deadlock in rpc_client_register() (2013-07-10 15:58:55 -0400)


NFS client updates for Linux 3.11 (part 2)

Highlights include:
- Fix an_rpc pipefs regression that causes a deadlock on mount
- Readdir optimisations by Scott Mayhew and Jeff Layton
- clean up the rpc_pipefs dentry operation setup


Fengguang Wu (1):
  rpc_pipe: rpc_dir_inode_operations can be static

Jeff Layton (2):
  nfs: set verifier on existing dentries in nfs_prime_dcache
  rpc_pipe: set dentry operations at d_alloc time

Scott Mayhew (3):
  NFS: Make nfs_attribute_cache_expired() non-static
  NFS: Make nfs_readdir revalidate less often
  NFS: Allow nfs_updatepage to extend a write under additional circumstances

Trond Myklebust (1):
  SUNRPC: Fix a deadlock in rpc_client_register()

 fs/nfs/dir.c   |  6 --
 fs/nfs/inode.c |  2 +-
 fs/nfs/write.c | 31 +++
 include/linux/nfs_fs.h |  1 +
 net/sunrpc/clnt.c  | 16 +---
 net/sunrpc/rpc_pipe.c  | 25 -
 6 files changed, 58 insertions(+), 23 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates

2013-07-11 Thread Myklebust, Trond

Hi Linus,

The following pull request mainly contains some small readdir
optimisations that had dependencies on Al Viro's readdir rewrite. There
is also a fix for a nasty deadlock which surfaced earlier in this merge
window.

The following changes since commit a82a729f04232ccd0b59406574ba4cf20027a49d:

  Merge branch 'akpm' (updates from Andrew Morton) (2013-07-09 13:33:36 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-2

for you to fetch changes up to 245268c951262b861bc1be4e9dc812352499:

  SUNRPC: Fix a deadlock in rpc_client_register() (2013-07-10 15:58:55 -0400)


NFS client updates for Linux 3.11 (part 2)

Highlights include:
- Fix an_rpc pipefs regression that causes a deadlock on mount
- Readdir optimisations by Scott Mayhew and Jeff Layton
- clean up the rpc_pipefs dentry operation setup


Fengguang Wu (1):
  rpc_pipe: rpc_dir_inode_operations can be static

Jeff Layton (2):
  nfs: set verifier on existing dentries in nfs_prime_dcache
  rpc_pipe: set dentry operations at d_alloc time

Scott Mayhew (3):
  NFS: Make nfs_attribute_cache_expired() non-static
  NFS: Make nfs_readdir revalidate less often
  NFS: Allow nfs_updatepage to extend a write under additional circumstances

Trond Myklebust (1):
  SUNRPC: Fix a deadlock in rpc_client_register()

 fs/nfs/dir.c   |  6 --
 fs/nfs/inode.c |  2 +-
 fs/nfs/write.c | 31 +++
 include/linux/nfs_fs.h |  1 +
 net/sunrpc/clnt.c  | 16 +---
 net/sunrpc/rpc_pipe.c  | 25 -
 6 files changed, 58 insertions(+), 23 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates

2013-07-08 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f722406faae2d073cc1d01063d1123c35425939e:

  Linux 3.10-rc1 (2013-05-11 17:14:08 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-1

for you to fetch changes up to 959d921f5eb8878ea16049a7f6e9bcbb6dfbcb88:

  Merge branch 'labeled-nfs' into linux-next (2013-06-28 16:29:51 -0400)



NFS client updates for Linux 3.11

Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
  add support for NFSv4.1 state protection.

Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation


Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been reviewed
and acked by James Morris.


Andy Adamson (6):
  NFSv4.1 Fix a pNFS session draining deadlock
  NFSv4.1 end back channel session draining
  NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
  NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
  NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
  NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs

Bryan Schumaker (4):
  NFS: Make callbacks minor version generic
  NFS: Add in v4.2 callback operation
  NFS: Apply v4.1 capabilities to v4.2
  NFS: Improve legacy idmapping fallback

Chuck Lever (3):
  NFS: Fix SETCLIENTID fallback if GSS is not available
  NFS: Fix security flavor negotiation with legacy binary mounts
  NFS: Set NFS_CS_MIGRATION for NFSv4 mounts

David Quigley (10):
  Security: Add hook to calculate context based on a negative dentry.
  Security: Add Hook to test if the particular xattr is part of a MAC model.
  LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
  SELinux: Add new labeling type native labels
  NFSv4: Add label recommended attribute and NFSv4 flags
  NFSv4: Extend fattr bitmaps to support all 3 words
  NFS:Add labels to client function prototypes
  NFS: Add label lifecycle management
  NFS: Client implementation of Labeled-NFS
  NFS: Extend NFS xattr handlers to accept the security namespace

Djalal Harouni (1):
  NFSv4: SETCLIENTID add the format string for the NETID

Jeff Layton (5):
  rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
  nfs: refactor "need_mount" code out of nfs_try_mount
  nfs: move server_authlist into nfs_try_mount_request
  nfs: have nfs_mount fake up a auth_flavs list when the server didn't 
provide it
  nfs: have NFSv3 try server-specified auth flavors in turn

Stanislav Kinsbursky (4):
  SUNRPC: fix races on PipeFS MOUNT notifications
  SUNRPC: fix races on PipeFS UMOUNT notifications
  SUNRPC: split client creation routine into setup and registration
  SUNRPC: PipeFS MOUNT notification optimization for dying clients

Steve Dickson (4):
  NFS: Add NFSv4.2 protocol constants
  NFSv4.2: Added NFS v4.2 support to the NFS client
  NFSv4: Introduce new label structure
  Kconfig: Add Kconfig entry for Labeled NFS V4 client

Trond Myklebust (26):
  SUNRPC: Fix a bug in gss_create_upcall
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Prevent an rpc_task wakeup race
  NFSv4: Fix a thinko in nfs4_try_open_cached
  NFSv4.1: Ensure that layoutget is called using the layout credential
  NFSv4.1: Ensure that layoutreturn uses the correct credential
  NFSv4.1: Ensure that reclaim_complete uses the right credential
  NFSv4.1: Ensure that test_stateid and free_stateid use correct credentials
  NFSv4.1: Use layout credentials for get_deviceinfo calls
  NFSv4.1: Enable state protection
  NFSv4.1: Simplify setting the layout header credential
  SUNRPC: Fix a potential race in rpc_execute
  SUNRPC: Remove unused function rpc_queue_empty
  SUNRPC: Remove the unused helpers task_for_each() and task_for_first()
  SUNRPC: Remove unused functions rpc_task_set/has_priority
  SUNRPC: Remove redundant call to rpc_set_running() in __rpc_execute()
  NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_open
  NFSv4: Cleanup: pass the nfs_open_context to nfs4_do_open
  NFSv4: Refactor _nfs4_open_and_get_state to set ctx->state
  NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code
  NFSv4: Close another NFSv4 recovery race
  NFSv4: Move the DNS resolver into the NFSv4 module

Re: [PATCH v3 24/25] sunrpc: Change how dentry's d_lock field is accessed

2013-07-08 Thread Myklebust, Trond

On Thu, 2013-07-04 at 05:20 +0100, Al Viro wrote:
> On Wed, Jul 03, 2013 at 04:25:32PM -0400, Waiman Long wrote:
> > There is no change in logic and everything should just work.
> 
> > -   spin_lock(>f_path.dentry->d_lock);
> > +   d_lock(file->f_path.dentry);
> > if (!d_unhashed(file->f_path.dentry))
> > clnt = RPC_I(inode)->private;
> > if (clnt != NULL && atomic_inc_not_zero(>cl_count)) {
> > -   spin_unlock(>f_path.dentry->d_lock);
> > +   d_unlock(file->f_path.dentry);
> 
> Could somebody explain WTF is being protected here?  It's not ->private -
> that gets set (and, more importantly, cleared) without ->d_lock in sight.
> Trond, that seems to be your code from about three years ago (introduced
> in "SUNRPC: Fix a race in rpc_info_open").  What's going on there?

AFAICR we're using the fact that the dentry will remain hashed until
we're in the process of freeing up the rpc_client. By testing that the
dentry is hashed under the dentry->d_lock, we are assured that the
non-NULL 'clnt' is still pointing to a valid rpc_client, and that it is
safe to access clnt->cl_count.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v3 24/25] sunrpc: Change how dentry's d_lock field is accessed

2013-07-08 Thread Myklebust, Trond

On Thu, 2013-07-04 at 05:20 +0100, Al Viro wrote:
 On Wed, Jul 03, 2013 at 04:25:32PM -0400, Waiman Long wrote:
  There is no change in logic and everything should just work.
 
  -   spin_lock(file-f_path.dentry-d_lock);
  +   d_lock(file-f_path.dentry);
  if (!d_unhashed(file-f_path.dentry))
  clnt = RPC_I(inode)-private;
  if (clnt != NULL  atomic_inc_not_zero(clnt-cl_count)) {
  -   spin_unlock(file-f_path.dentry-d_lock);
  +   d_unlock(file-f_path.dentry);
 
 Could somebody explain WTF is being protected here?  It's not -private -
 that gets set (and, more importantly, cleared) without -d_lock in sight.
 Trond, that seems to be your code from about three years ago (introduced
 in SUNRPC: Fix a race in rpc_info_open).  What's going on there?

AFAICR we're using the fact that the dentry will remain hashed until
we're in the process of freeing up the rpc_client. By testing that the
dentry is hashed under the dentry-d_lock, we are assured that the
non-NULL 'clnt' is still pointing to a valid rpc_client, and that it is
safe to access clnt-cl_count.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates

2013-07-08 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f722406faae2d073cc1d01063d1123c35425939e:

  Linux 3.10-rc1 (2013-05-11 17:14:08 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-1

for you to fetch changes up to 959d921f5eb8878ea16049a7f6e9bcbb6dfbcb88:

  Merge branch 'labeled-nfs' into linux-next (2013-06-28 16:29:51 -0400)



NFS client updates for Linux 3.11

Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
  add support for NFSv4.1 state protection.

Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation


Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been reviewed
and acked by James Morris.


Andy Adamson (6):
  NFSv4.1 Fix a pNFS session draining deadlock
  NFSv4.1 end back channel session draining
  NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
  NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
  NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
  NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs

Bryan Schumaker (4):
  NFS: Make callbacks minor version generic
  NFS: Add in v4.2 callback operation
  NFS: Apply v4.1 capabilities to v4.2
  NFS: Improve legacy idmapping fallback

Chuck Lever (3):
  NFS: Fix SETCLIENTID fallback if GSS is not available
  NFS: Fix security flavor negotiation with legacy binary mounts
  NFS: Set NFS_CS_MIGRATION for NFSv4 mounts

David Quigley (10):
  Security: Add hook to calculate context based on a negative dentry.
  Security: Add Hook to test if the particular xattr is part of a MAC model.
  LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
  SELinux: Add new labeling type native labels
  NFSv4: Add label recommended attribute and NFSv4 flags
  NFSv4: Extend fattr bitmaps to support all 3 words
  NFS:Add labels to client function prototypes
  NFS: Add label lifecycle management
  NFS: Client implementation of Labeled-NFS
  NFS: Extend NFS xattr handlers to accept the security namespace

Djalal Harouni (1):
  NFSv4: SETCLIENTID add the format string for the NETID

Jeff Layton (5):
  rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
  nfs: refactor need_mount code out of nfs_try_mount
  nfs: move server_authlist into nfs_try_mount_request
  nfs: have nfs_mount fake up a auth_flavs list when the server didn't 
provide it
  nfs: have NFSv3 try server-specified auth flavors in turn

Stanislav Kinsbursky (4):
  SUNRPC: fix races on PipeFS MOUNT notifications
  SUNRPC: fix races on PipeFS UMOUNT notifications
  SUNRPC: split client creation routine into setup and registration
  SUNRPC: PipeFS MOUNT notification optimization for dying clients

Steve Dickson (4):
  NFS: Add NFSv4.2 protocol constants
  NFSv4.2: Added NFS v4.2 support to the NFS client
  NFSv4: Introduce new label structure
  Kconfig: Add Kconfig entry for Labeled NFS V4 client

Trond Myklebust (26):
  SUNRPC: Fix a bug in gss_create_upcall
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Prevent an rpc_task wakeup race
  NFSv4: Fix a thinko in nfs4_try_open_cached
  NFSv4.1: Ensure that layoutget is called using the layout credential
  NFSv4.1: Ensure that layoutreturn uses the correct credential
  NFSv4.1: Ensure that reclaim_complete uses the right credential
  NFSv4.1: Ensure that test_stateid and free_stateid use correct credentials
  NFSv4.1: Use layout credentials for get_deviceinfo calls
  NFSv4.1: Enable state protection
  NFSv4.1: Simplify setting the layout header credential
  SUNRPC: Fix a potential race in rpc_execute
  SUNRPC: Remove unused function rpc_queue_empty
  SUNRPC: Remove the unused helpers task_for_each() and task_for_first()
  SUNRPC: Remove unused functions rpc_task_set/has_priority
  SUNRPC: Remove redundant call to rpc_set_running() in __rpc_execute()
  NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_open
  NFSv4: Cleanup: pass the nfs_open_context to nfs4_do_open
  NFSv4: Refactor _nfs4_open_and_get_state to set ctx-state
  NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code
  NFSv4: Close another NFSv4 recovery race
  NFSv4: Move the DNS resolver into the NFSv4 module

Re: [PATCH v3 2/4] SUNRPC: fix races on PipeFS UMOUNT notifications

2013-06-25 Thread Myklebust, Trond

On Mon, 2013-06-24 at 11:52 +0400, Stanislav Kinsbursky wrote:
> CPU#0   CPU#1
> -   -
> rpc_kill_sb
> sn->pipefs_sb = NULLrpc_release_client
> (UMOUNT_EVENT)  rpc_free_auth
> rpc_pipefs_event
> rpc_get_client_for_event
> !atomic_inc_not_zero(cl_count)
> 
> atomic_inc(cl_count)
> rpc_free_client
> rpc_clnt_remove_pipedir
> 
> 
> To fix this, this patch does the following:
> 
> 1) Calls RPC_PIPEFS_UMOUNT notification with sn->pipefs_sb_lock being held.
> 2) Removes SUNRPC client from the list AFTER pipes destroying.
> 3) Doesn't hold RPC client on notification: if client in the list, then it
> can't be destroyed while sn->pipefs_sb_lock in hold by notification caller.
> 
> Signed-off-by: Stanislav Kinsbursky 
> Cc: sta...@vger.kernel.org
> ---
>  net/sunrpc/clnt.c |5 +
>  net/sunrpc/rpc_pipe.c |2 +-
>  2 files changed, 2 insertions(+), 5 deletions(-)



> diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
> index c512448..efca2f7 100644
> --- a/net/sunrpc/rpc_pipe.c
> +++ b/net/sunrpc/rpc_pipe.c
> @@ -1165,7 +1165,6 @@ static void rpc_kill_sb(struct super_block *sb)
>   goto out;
>   }
>   sn->pipefs_sb = NULL;
> - mutex_unlock(>pipefs_sb_lock);
>   dprintk("RPC:   sending pipefs UMOUNT notification for net %p%s\n",
>   net, NET_NAME(net));
>   blocking_notifier_call_chain(_pipefs_notifier_list,
> @@ -1173,6 +1172,7 @@ static void rpc_kill_sb(struct super_block *sb)
>  sb);
>   put_net(net);
>  out:
> + mutex_unlock(>pipefs_sb_lock);

Is this safe to do after the put_net()?

>   kill_litter_super(sb);
>  }
>  
> 


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/4] SUNRPC: fix races on PipeFS UMOUNT notifications

2013-06-25 Thread Myklebust, Trond

On Mon, 2013-06-24 at 11:52 +0400, Stanislav Kinsbursky wrote:
 CPU#0   CPU#1
 -   -
 rpc_kill_sb
 sn-pipefs_sb = NULLrpc_release_client
 (UMOUNT_EVENT)  rpc_free_auth
 rpc_pipefs_event
 rpc_get_client_for_event
 !atomic_inc_not_zero(cl_count)
 skip the client
 atomic_inc(cl_count)
 rpc_free_client
 rpc_clnt_remove_pipedir
 skip client dir removing
 
 To fix this, this patch does the following:
 
 1) Calls RPC_PIPEFS_UMOUNT notification with sn-pipefs_sb_lock being held.
 2) Removes SUNRPC client from the list AFTER pipes destroying.
 3) Doesn't hold RPC client on notification: if client in the list, then it
 can't be destroyed while sn-pipefs_sb_lock in hold by notification caller.
 
 Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com
 Cc: sta...@vger.kernel.org
 ---
  net/sunrpc/clnt.c |5 +
  net/sunrpc/rpc_pipe.c |2 +-
  2 files changed, 2 insertions(+), 5 deletions(-)

snip

 diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
 index c512448..efca2f7 100644
 --- a/net/sunrpc/rpc_pipe.c
 +++ b/net/sunrpc/rpc_pipe.c
 @@ -1165,7 +1165,6 @@ static void rpc_kill_sb(struct super_block *sb)
   goto out;
   }
   sn-pipefs_sb = NULL;
 - mutex_unlock(sn-pipefs_sb_lock);
   dprintk(RPC:   sending pipefs UMOUNT notification for net %p%s\n,
   net, NET_NAME(net));
   blocking_notifier_call_chain(rpc_pipefs_notifier_list,
 @@ -1173,6 +1172,7 @@ static void rpc_kill_sb(struct super_block *sb)
  sb);
   put_net(net);
  out:
 + mutex_unlock(sn-pipefs_sb_lock);

Is this safe to do after the put_net()?

   kill_litter_super(sb);
  }
  
 


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/4] SUNRPC: fix races on PipeFS MOUNT notifications

2013-06-17 Thread Myklebust, Trond

On Tue, 2013-06-11 at 18:39 +0400, Stanislav Kinsbursky wrote:
> Below are races, when RPC client can be created without PiepFS dentries
> 
> CPU#0 CPU#1
> - -
> rpc_new_clientrpc_fill_super
> rpc_setup_pipedir
> mutex_lock(>pipefs_sb_lock)
> rpc_get_sb_net == NULL
> (no per-net PipeFS superblock)
>   sn->pipefs_sb = sb;
>   notifier_call_chain(MOUNT)
>   (client is not in the list)
> rpc_register_client
> (client without pipes dentries)
> 
> To fix this patch:
> 1) makes PipeFS mount notification call with pipefs_sb_lock being held.
> 2) releases pipefs_sb_lock on new SUNRPC client creation only after
> registration.
> 
> Signed-off-by: Stanislav Kinsbursky 
> Cc: sta...@vger.kernel.org

Hi Stanislav,

This isn't going to apply to the stable kernels without the cleanup
patch. Could you please reorganise this patch series so that the cleanup
comes last.

Thanks,
  Trond


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/4] SUNRPC: fix races on PipeFS MOUNT notifications

2013-06-17 Thread Myklebust, Trond

On Tue, 2013-06-11 at 18:39 +0400, Stanislav Kinsbursky wrote:
 Below are races, when RPC client can be created without PiepFS dentries
 
 CPU#0 CPU#1
 - -
 rpc_new_clientrpc_fill_super
 rpc_setup_pipedir
 mutex_lock(sn-pipefs_sb_lock)
 rpc_get_sb_net == NULL
 (no per-net PipeFS superblock)
   sn-pipefs_sb = sb;
   notifier_call_chain(MOUNT)
   (client is not in the list)
 rpc_register_client
 (client without pipes dentries)
 
 To fix this patch:
 1) makes PipeFS mount notification call with pipefs_sb_lock being held.
 2) releases pipefs_sb_lock on new SUNRPC client creation only after
 registration.
 
 Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com
 Cc: sta...@vger.kernel.org

Hi Stanislav,

This isn't going to apply to the stable kernels without the cleanup
patch. Could you please reorganise this patch series so that the cleanup
comes last.

Thanks,
  Trond


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull 2 NFS client bugfixes

2013-05-31 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 83c168bf8017212a9d502536f9dcd0b54d24e330:

  NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 
-0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-4

for you to fetch changes up to eb54d43707c69340581940e1fcaecb4d7d17b814:

  NFS: Fix security flavor negotiation with legacy binary mounts (2013-05-30 
16:31:34 -0400)


NFS client fixes:

- Fix a regression that broke NFS mounting using klibc and busybox
- Stable fix to check access modes correctly on NFSv4 delegated open()


Chuck Lever (1):
  NFS: Fix security flavor negotiation with legacy binary mounts

Trond Myklebust (1):
  NFSv4: Fix a thinko in nfs4_try_open_cached

 fs/nfs/nfs4proc.c | 2 +-
 fs/nfs/super.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com


signature.asc
Description: This is a digitally signed message part

[GIT PULL] Please pull 2 NFS client bugfixes

2013-05-31 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 83c168bf8017212a9d502536f9dcd0b54d24e330:

  NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 
-0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-4

for you to fetch changes up to eb54d43707c69340581940e1fcaecb4d7d17b814:

  NFS: Fix security flavor negotiation with legacy binary mounts (2013-05-30 
16:31:34 -0400)


NFS client fixes:

- Fix a regression that broke NFS mounting using klibc and busybox
- Stable fix to check access modes correctly on NFSv4 delegated open()


Chuck Lever (1):
  NFS: Fix security flavor negotiation with legacy binary mounts

Trond Myklebust (1):
  NFSv4: Fix a thinko in nfs4_try_open_cached

 fs/nfs/nfs4proc.c | 2 +-
 fs/nfs/super.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com


signature.asc
Description: This is a digitally signed message part

Re: 3.10-rc3 NFSv3 mount issues

2013-05-30 Thread Myklebust, Trond

On Thu, 2013-05-30 at 16:26 -0400, Chuck Lever wrote:
> On May 30, 2013, at 4:19 PM, Jim Schutt  wrote:
> 
> > Hi,
> > 
> > I've been trying to test 3.10-rc3 on some diskless clients, and found
> > that I can no longer mount my root file system via NFSv3.
> > 
> > I poked around looking at NFS changes for 3.10, and found these two
> > commits:
> > 
> > d497ab9751 "NFSv3: match sec= flavor against server list"
> > 4580a92d44 "NFS: Use server-recommended security flavor by default (NFSv3)"
> > 
> > If I revert both of these commits from 3.10-rc3, then my diskless
> > client can mount its root file system.
> > 
> > The busybox mount command fails like this, when using 3.10-rc3:
> > 
> > / # mount  -t nfs -o ro,nolock,vers=3,proto=tcp 
> > 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt
> > mount: mounting 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x on /mnt 
> > failed: Invalid argument
> > 
> > The commit messages for both these commits seem to say that mounting
> > with the "sys=sec" option should work, but unfortunately, my busybox doesn't
> > seem to understand the "sec=" mount option:
> > 
> > / # mount  -t nfs -o ro,nolock,vers=3,proto=tcp,sec=sys 
> > 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt
> > mount: invalid number 'sys'
> > 
> > My NFS server is based on RHEL6, and is not using any "sec=" option
> > in its export for this file system.  I did try exporting with "sec=sys",
> > but it didn't seem to make any difference either.
> > 
> > So far, this seems like a regression to me 
> > Any ideas what I might be doing wrong?  How can I
> > help make this work again?
> 
> 3.10-rc3 appears to be missing the fix for this.  See:
> 
>   http://marc.info/?l=linux-nfs=136855668104598=2
> 
> Trond, can we get this applied?
> 

For some reason it got lost in the mail heap. I've applied it now to the
'bugfixes' branch. Will push upstream in the next few days...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.10-rc3 NFSv3 mount issues

2013-05-30 Thread Myklebust, Trond

On Thu, 2013-05-30 at 16:26 -0400, Chuck Lever wrote:
 On May 30, 2013, at 4:19 PM, Jim Schutt jasc...@sandia.gov wrote:
 
  Hi,
  
  I've been trying to test 3.10-rc3 on some diskless clients, and found
  that I can no longer mount my root file system via NFSv3.
  
  I poked around looking at NFS changes for 3.10, and found these two
  commits:
  
  d497ab9751 NFSv3: match sec= flavor against server list
  4580a92d44 NFS: Use server-recommended security flavor by default (NFSv3)
  
  If I revert both of these commits from 3.10-rc3, then my diskless
  client can mount its root file system.
  
  The busybox mount command fails like this, when using 3.10-rc3:
  
  / # mount  -t nfs -o ro,nolock,vers=3,proto=tcp 
  172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt
  mount: mounting 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x on /mnt 
  failed: Invalid argument
  
  The commit messages for both these commits seem to say that mounting
  with the sys=sec option should work, but unfortunately, my busybox doesn't
  seem to understand the sec= mount option:
  
  / # mount  -t nfs -o ro,nolock,vers=3,proto=tcp,sec=sys 
  172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt
  mount: invalid number 'sys'
  
  My NFS server is based on RHEL6, and is not using any sec= option
  in its export for this file system.  I did try exporting with sec=sys,
  but it didn't seem to make any difference either.
  
  So far, this seems like a regression to me 
  Any ideas what I might be doing wrong?  How can I
  help make this work again?
 
 3.10-rc3 appears to be missing the fix for this.  See:
 
   http://marc.info/?l=linux-nfsm=136855668104598w=2
 
 Trond, can we get this applied?
 

For some reason it got lost in the mail heap. I've applied it now to the
'bugfixes' branch. Will push upstream in the next few days...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.9-stable] NFSv4.1 Fix a pNFS session draining deadlock

2013-05-26 Thread Myklebust, Trond

On Mon, 2013-05-27 at 09:23 +0900, Jonghwan Choi wrote:
> This patch looks like it should be in the 3.9-stable tree, should we apply
> it?

It's a condition which appears to be extremely rare: so far, we've only
seen it during extreme stress testing at NetApp. For that reason, and
because it is NFSv4.1 only, I'm inclined to wait until we see real-world
cases before making it a stable patch.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client bugfixes

2013-05-26 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f722406faae2d073cc1d01063d1123c35425939e:

  Linux 3.10-rc1 (2013-05-11 17:14:08 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-3

for you to fetch changes up to 83c168bf8017212a9d502536f9dcd0b54d24e330:

  NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 
-0400)


NFS client bugfixes for 3.10

- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available


Andy Adamson (1):
  NFSv4.1 Fix a pNFS session draining deadlock

Chuck Lever (1):
  NFS: Fix SETCLIENTID fallback if GSS is not available

Trond Myklebust (4):
  SUNRPC: Fix a bug in gss_create_upcall
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Prevent an rpc_task wakeup race

 fs/nfs/callback_proc.c |  2 +-
 fs/nfs/callback_xdr.c  |  2 +-
 fs/nfs/nfs4client.c|  2 +-
 fs/nfs/nfs4proc.c  |  2 +-
 fs/nfs/nfs4session.c   |  4 +--
 fs/nfs/nfs4session.h   | 13 +
 fs/nfs/nfs4state.c | 15 +-
 net/sunrpc/auth_gss/auth_gss.c | 62 --
 net/sunrpc/netns.h |  4 +++
 net/sunrpc/rpc_pipe.c  |  5 
 net/sunrpc/sched.c |  8 +-
 11 files changed, 78 insertions(+), 41 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client bugfixes

2013-05-26 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f722406faae2d073cc1d01063d1123c35425939e:

  Linux 3.10-rc1 (2013-05-11 17:14:08 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-3

for you to fetch changes up to 83c168bf8017212a9d502536f9dcd0b54d24e330:

  NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 
-0400)


NFS client bugfixes for 3.10

- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available


Andy Adamson (1):
  NFSv4.1 Fix a pNFS session draining deadlock

Chuck Lever (1):
  NFS: Fix SETCLIENTID fallback if GSS is not available

Trond Myklebust (4):
  SUNRPC: Fix a bug in gss_create_upcall
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Prevent an rpc_task wakeup race

 fs/nfs/callback_proc.c |  2 +-
 fs/nfs/callback_xdr.c  |  2 +-
 fs/nfs/nfs4client.c|  2 +-
 fs/nfs/nfs4proc.c  |  2 +-
 fs/nfs/nfs4session.c   |  4 +--
 fs/nfs/nfs4session.h   | 13 +
 fs/nfs/nfs4state.c | 15 +-
 net/sunrpc/auth_gss/auth_gss.c | 62 --
 net/sunrpc/netns.h |  4 +++
 net/sunrpc/rpc_pipe.c  |  5 
 net/sunrpc/sched.c |  8 +-
 11 files changed, 78 insertions(+), 41 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.9-stable] NFSv4.1 Fix a pNFS session draining deadlock

2013-05-26 Thread Myklebust, Trond

On Mon, 2013-05-27 at 09:23 +0900, Jonghwan Choi wrote:
 This patch looks like it should be in the 3.9-stable tree, should we apply
 it?

It's a condition which appears to be extremely rare: so far, we've only
seen it during extreme stress testing at NetApp. For that reason, and
because it is NFSv4.1 only, I'm inclined to wait until we see real-world
cases before making it a stable patch.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-15 Thread Myklebust, Trond

On Wed, 2013-05-15 at 16:19 -0400, J. Bruce Fields wrote:
> On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote:
> > This crude patch illustrates the simplest plumbing involved in
> > supporting sys_call_range with the NFS COPY operation that's pending in
> > the 4.2 draft spec.
> > 
> > The patch is based on a previous prototype that used the COPY op to
> > implement sys_copyfileat which created a new file (based on the ocfs2
> > reflink ioctl).  By contrast, this copies file contents between existing
> > files.
> > 
> > There's still a lot of implementation and testing to do, but this can
> > get discussion going.
> 
> I'm using:
> 
>   git://github.com/loghyr/NFSv4.2
> 
> as my reference for the draft protocol.
> 
> On a quick skim, one thing this is missing before it complies is a
> client implementation of CB_OFFLOAD: "If a client desires an
> intra-server file copy, then it MUST support the COPY and CB_OFFLOAD
> operations."

Note that Bryan is currently working on updating the NFS implementation
to match the draft protocol.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-15 Thread Myklebust, Trond

On Wed, 2013-05-15 at 16:19 -0400, J. Bruce Fields wrote:
 On Tue, May 14, 2013 at 02:15:26PM -0700, Zach Brown wrote:
  This crude patch illustrates the simplest plumbing involved in
  supporting sys_call_range with the NFS COPY operation that's pending in
  the 4.2 draft spec.
  
  The patch is based on a previous prototype that used the COPY op to
  implement sys_copyfileat which created a new file (based on the ocfs2
  reflink ioctl).  By contrast, this copies file contents between existing
  files.
  
  There's still a lot of implementation and testing to do, but this can
  get discussion going.
 
 I'm using:
 
   git://github.com/loghyr/NFSv4.2
 
 as my reference for the draft protocol.
 
 On a quick skim, one thing this is missing before it complies is a
 client implementation of CB_OFFLOAD: If a client desires an
 intra-server file copy, then it MUST support the COPY and CB_OFFLOAD
 operations.

Note that Bryan is currently working on updating the NFS implementation
to match the draft protocol.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 350 matches

Mail list logo