Re: [PATCH] Add support for getting and setting SACLs

2020-12-07 Thread Pavel Shilovsky
Hi Boris,

Are you talking about this patch "[PATCH] Extend cifs acl utilities to
handle SACLs"?

Just for the future, I am trying to monitor the samba-dev mailing list
but if you would like to get the fastest response to your patches then
please include me directly or at least the linux-cifs mailing list.

I stage pending patches in the "next" branch on my github tree, so,
will include the one above.

https://github.com/piastry/cifs-utils/commits/next

--
Best regards,
Pavel Shilovsky

пн, 7 дек. 2020 г. в 07:28, Boris Protopopov :
>
> Hello, Shyam,
>
> sorry for the delayed reply and thanks for looking at this patch. Yes,
> the testing was done using the extended versions of
> getcifsacl/setcifsacl (added setting owner and SACL support), the
> patch for that posted recently via samba-technical (message ID
> <20201120214918.12517-1-pbo...@amazon.com>). I have tested
> setting/getting the owner, DACL, and SACL, for all the DACL/SACL flags
> (-a, -D, -M, -S), SACL type SYSTEM_AUDIT. This testing was done
> against 5.10.0-rc1 and 4.14.203 (the latter required porting the
> user-space patch). I believe this testing has fully exercised the code
> changes in question.
>
> I will look at contributing to the fsxtesting-cifs code, but I think
> the setcifsacl/getcifsact patch that enables easy access to the
> descriptor components is a pre-requisite for such contributions.
>
> Thanks!
>
>
> On Wed, Dec 2, 2020 at 5:43 AM Shyam Prasad N  wrote:
> >
> > Hi Boris,
> >
> > At a high level, the changes look good to me. Will go through the
> > changes in more detail tomorrow morning.
> >
> > On a related note, we may need more test coverage in this area.
> > Can you please share your testing output against your changes?
> > Or even better, if you can contribute some xfstests for this use case:
> > https://wiki.samba.org/index.php/Xfstesting-cifs
> >
> > Regards,
> > Shyam
> >
> > On Wed, Dec 2, 2020 at 12:24 AM Boris Protopopov
> >  wrote:
> > >
> > > Hello,
> > > I am checking in to see if anyone had a chance to take a look at this
> > > patch. I would appreciate any feedback.
> > > Thanks!
> > >
> > > On Tue, Oct 27, 2020 at 5:01 PM Boris Protopopov via samba-technical
> > >  wrote:
> > > >
> > > > Add SYSTEM_SECURITY access flag and use with smb2 when opening
> > > > files for getting/setting SACLs. Add "system.cifs_ntsd_full"
> > > > extended attribute to allow user-space access to the functionality.
> > > > Avoid multiple server calls when setting owner, DACL, and SACL.
> > > >
> > > > Signed-off-by: Boris Protopopov 
> > > > ---
> > > ...
> >
> >
> >
> > --
> > -Shyam


Re: [PATCH AUTOSEL 5.7 37/53] cifs: Fix double add page to memcg when cifs_readpages

2020-07-02 Thread Pavel Shilovsky
ср, 1 июл. 2020 г. в 18:35, Sasha Levin :
>
> From: Zhang Xiaoxu 
>
> [ Upstream commit 95a3d8f3af9b0d63b43f221b630beaab9739d13a ]
>
> When xfstests generic/451, there is an BUG at mm/memcontrol.c:
>   page:ea000560f2c0 refcount:2 mapcount:0 mapping:8544e0ea
>index:0xf
>   mapping->aops:cifs_addr_ops dentry name:"tst-aio-dio-cycle-write.451"
>   flags: 0x2f8001(locked)
>   raw: 002f8001 c90002023c50 ea0005280088 88815cda0210
>   raw: 000f  0002 88817287d000
>   page dumped because: VM_BUG_ON_PAGE(page->mem_cgroup)
>   page->mem_cgroup:88817287d000
>   [ cut here ]
>   kernel BUG at mm/memcontrol.c:2659!
>   invalid opcode:  [#1] SMP
>   CPU: 2 PID: 2038 Comm: xfs_io Not tainted 5.8.0-rc1 #44
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_
> 073836-buildvm-ppc64le-16.ppc.4
>   RIP: 0010:commit_charge+0x35/0x50
>   Code: 0d 48 83 05 54 b2 02 05 01 48 89 77 38 c3 48 c7
> c6 78 4a ea ba 48 83 05 38 b2 02 05 01 e8 63 0d9
>   RSP: 0018:c90002023a50 EFLAGS: 00010202
>   RAX:  RBX: 88817287d000 RCX: 
>   RDX:  RSI: 88817ac97ea0 RDI: 88817ac97ea0
>   RBP: ea000560f2c0 R08: 0203 R09: 0005
>   R10: 0030 R11: c900020237a8 R12: 
>   R13: 0001 R14: 0001 R15: 88815a1272c0
>   FS:  7f5071ab0800() GS:88817ac8() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: 55efcd5ca000 CR3: 00015d312000 CR4: 06e0
>   DR0:  DR1:  DR2: 
>   DR3:  DR6: fffe0ff0 DR7: 0400
>   Call Trace:
>mem_cgroup_charge+0x166/0x4f0
>__add_to_page_cache_locked+0x4a9/0x710
>add_to_page_cache_locked+0x15/0x20
>cifs_readpages+0x217/0x1270
>read_pages+0x29a/0x670
>page_cache_readahead_unbounded+0x24f/0x390
>__do_page_cache_readahead+0x3f/0x60
>ondemand_readahead+0x1f1/0x470
>page_cache_async_readahead+0x14c/0x170
>generic_file_buffered_read+0x5df/0x1100
>generic_file_read_iter+0x10c/0x1d0
>cifs_strict_readv+0x139/0x170
>new_sync_read+0x164/0x250
>__vfs_read+0x39/0x60
>vfs_read+0xb5/0x1e0
>ksys_pread64+0x85/0xf0
>__x64_sys_pread64+0x22/0x30
>do_syscall_64+0x69/0x150
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7f5071fcb1af
>   Code: Bad RIP value.
>   RSP: 002b:7ffde2cdb8e0 EFLAGS: 0293 ORIG_RAX: 0011
>   RAX: ffda RBX: 7ffde2cdb990 RCX: 7f5071fcb1af
>   RDX: 1000 RSI: 55efcd5ca000 RDI: 0003
>   RBP: 0003 R08:  R09: 
>   R10: 1000 R11: 0293 R12: 0001
>   R13: 0009f000 R14:  R15: 1000
>   Modules linked in:
>   ---[ end trace 725fa14a3e1af65c ]---
>
> Since commit 3fea5a499d57 ("mm: memcontrol: convert page cache to a new
> mem_cgroup_charge() API") not cancel the page charge, the pages maybe
> double add to pagecache:
> thread1   | thread2
> cifs_readpages
> readpages_get_pages
>  add_to_page_cache_locked(head,index=n)=0
>   | readpages_get_pages
>   | add_to_page_cache_locked(head,index=n+1)=0
>  add_to_page_cache_locked(head, index=n+1)=-EEXIST
>  then, will next loop with list head page's
>  index=n+1 and the page->mapping not NULL
> readpages_get_pages
> add_to_page_cache_locked(head, index=n+1)
>  commit_charge
>   VM_BUG_ON_PAGE
>
> So, we should not do the next loop when any page add to page cache
> failed.
>
> Reported-by: Hulk Robot 
> Signed-off-by: Zhang Xiaoxu 
> Signed-off-by: Steve French 
> Acked-by: Ronnie Sahlberg 
> Signed-off-by: Sasha Levin 

Hi Sasha,

The patch description mentions the commit 3fea5a499d57 that changed
the behavior and was introduced in v5.8-rc1. I noticed that you are
targeting this patch for 4.9, 4.14, 4.19, 5.4 and 5.7 stable branches.
Are you going to backport the commit 3fea5a499d57 as well?

--
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: Fix missed free operations

2019-10-14 Thread Pavel Shilovsky
пн, 14 окт. 2019 г. в 00:18, Chuhong Yuan :
>
> cifs_setattr_nounix has two paths which miss free operations
> for xid and fullpath.
> Use goto cifs_setattr_exit like other paths to fix them.
>
> Fixes: aa081859b10c ("cifs: flush before set-info if we have writeable 
> handles")
> Signed-off-by: Chuhong Yuan 
> ---
>  fs/cifs/inode.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index 5dcc95b38310..df9377828e2f 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -2475,9 +2475,9 @@ cifs_setattr_nounix(struct dentry *direntry, struct 
> iattr *attrs)
> rc = tcon->ses->server->ops->flush(xid, tcon, 
> >fid);
> cifsFileInfo_put(wfile);
> if (rc)
> -   return rc;
> +   goto cifs_setattr_exit;
> } else if (rc != -EBADF)
> -   return rc;
> +   goto cifs_setattr_exit;
>     else
> rc = 0;
> }
> --
> 2.20.1
>

Looks good, thanks.

Reviewed-by: Pavel Shilovsky 

The original patch was tagged for stable, so, it seems that this one
should be tagged too.

--
Best regards,
Pavel Shilovsky


Re: build_path_from_dentry_optional_prefix() may schedule from invalid context

2019-09-30 Thread Pavel Shilovsky
сб, 21 сент. 2019 г. в 15:38, Al Viro :
>
> On Thu, Sep 19, 2019 at 05:11:54PM -0700, Pavel Shilovsky wrote:
>
> > Good catch. I think we should have another version of
> > build_path_from_dentry() which takes pre-allocated (probably on stack)
> > full_path as an argument. This would allow us to avoid allocations
> > under the spin lock.
>
> On _stack_?  For relative pathname?  Er...  You do realize that
> kernel stack is small, right?  And said relative pathname can
> bloody well be up to 4Kb (i.e. the half of said stack already,
> on top of whatever the call chain has already eaten up)...

My idea was to use a small stack-allocated array which satisfies most
cases (say 100-200 bytes) and fallback to dynamic a heap allocation
for longer path names.

>
> BTW, looking at build_path_from_dentry()...  WTF is this?
> temp = temp->d_parent;
> if (temp == NULL) {
> cifs_dbg(VFS, "corrupt dentry\n");
> rcu_read_unlock();
> return NULL;
> }
> Why not check for any number of other forms of memory corruption?
> Like, say it, if (temp == (void *)0xf0adf0adf0adf0ad)?
>
> IOW, kindly lose that nonsense.  More importantly, why bother
> with that kmalloc()?  Just __getname() in the very beginning
> and __putname() on failure (and for freeing the result afterwards).
>
> What's more, you are open-coding dentry_path_raw(), badly.
> The only differences are
> * use of dirsep instead of '/' and
> * a prefix slapped in the beginning.
>
> I'm fairly sure that
> char *buf = __getname();
> char *s;
>
> *to_free = NULL;
> if (unlikely(!buf))
> return NULL;
>
> s = dentry_path_raw(dentry, buf, PATH_MAX);
> if (IS_ERR(s) || s < buf + prefix_len)
> __putname(buf);
> return NULL; // assuming that you don't care about details
> }
>
> if (dirsep != '/') {
> char *p = s;
> while ((p = strchr(p, '/')) != NULL)
> *p++ = dirsep;
> }
>
> s -= prefix_len;
> memcpy(s, prefix, prefix_len);
>
> *to_free = buf;
> return s;
>
> would end up being faster, not to mention much easier to understand.
> With the caller expected to pass _free among the arguments and
> __putname() it once it's done.
>
> Or just do __getname() in the caller and pass it to the function -
> in that case freeing (in all cases) would be up to the caller.

Thanks for pointing this out. Someone should look at this closely and
clean it up.

--
Best regards,
Pavel Shilovsky


Re: build_path_from_dentry_optional_prefix() may schedule from invalid context

2019-09-19 Thread Pavel Shilovsky
ср, 28 авг. 2019 г. в 22:02, Sergey Senozhatsky
:
>
> Hello,
>
> Looking at commit "cifs: create a helper to find a writeable handle
> by path name":
>
> ->open_file_lock scope is atomic context, while build_path_from_dentry()
> can schedule - kmalloc(GFP_KERNEL)
>
>spin_lock(>open_file_lock);
>list_for_each(tmp, >openFileList) {
>cfile = list_entry(tmp, struct cifsFileInfo,
> tlist);
>full_path = build_path_from_dentry(cfile->dentry);
>if (full_path == NULL) {
>spin_unlock(>open_file_lock);
>return -ENOMEM;
>}
>if (strcmp(full_path, name)) {
>kfree(full_path);
>continue;
>}
>kfree(full_path);
>
>cinode = CIFS_I(d_inode(cfile->dentry));
>spin_unlock(>open_file_lock);
>return cifs_get_writable_file(cinode, 0, ret_file);
>}
>
>spin_unlock(>open_file_lock);
>
> Additionally, kfree() can (and should) be done outside of
> ->open_file_lock scope.
>
> -ss

Good catch. I think we should have another version of
build_path_from_dentry() which takes pre-allocated (probably on stack)
full_path as an argument. This would allow us to avoid allocations
under the spin lock.
--
Best regards,
Pavel Shilovsky


Re: [PATCH 2/2] cifs: Allocate memory for all iovs in smb2_ioctl

2019-05-15 Thread Pavel Shilovsky
ср, 15 мая 2019 г. в 14:10, :
>
> From: Long Li 
>
> An IOCTL uses up to 2 iovs. The 1st iov is the command itself, the 2nd iov is
> optional data for that command. The 1st iov is always allocated on the heap
> but the 2nd iov may point to a variable on the stack. This will trigger an
> error when passing the 2nd iov for RDMA I/O.
>
> Fix this by allocating a buffer for the 2nd iov.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/smb2pdu.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index 29f011d8d8e2..710ceb875161 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -2538,11 +2538,25 @@ SMB2_ioctl_init(struct cifs_tcon *tcon, struct 
> smb_rqst *rqst,
> struct kvec *iov = rqst->rq_iov;
> unsigned int total_len;
> int rc;
> +   char *in_data_buf;
>
> rc = smb2_plain_req_init(SMB2_IOCTL, tcon, (void **) , 
> _len);
> if (rc)
> return rc;
>
> +   if (indatalen) {
> +   /*
> +* indatalen is usually small at a couple of bytes max, so
> +* just allocate through generic pool
> +*/
> +   in_data_buf = kmalloc(indatalen, GFP_NOFS);
> +   if (!in_data_buf) {
> +   cifs_small_buf_release(req);
> +   return -ENOMEM;
> +   }
> +   memcpy(in_data_buf, in_data, indatalen);
> +   }
> +
> req->CtlCode = cpu_to_le32(opcode);
> req->PersistentFileId = persistent_fid;
> req->VolatileFileId = volatile_fid;
> @@ -2563,7 +2577,7 @@ SMB2_ioctl_init(struct cifs_tcon *tcon, struct smb_rqst 
> *rqst,
>cpu_to_le32(offsetof(struct smb2_ioctl_req, Buffer));
> rqst->rq_nvec = 2;
> iov[0].iov_len = total_len - 1;
> -   iov[1].iov_base = in_data;
> +   iov[1].iov_base = in_data_buf;
> iov[1].iov_len = indatalen;
> } else {
> rqst->rq_nvec = 1;
> @@ -2605,8 +2619,11 @@ SMB2_ioctl_init(struct cifs_tcon *tcon, struct 
> smb_rqst *rqst,
>  void
>  SMB2_ioctl_free(struct smb_rqst *rqst)
>  {
> -   if (rqst && rqst->rq_iov)
> +   if (rqst && rqst->rq_iov) {
> cifs_small_buf_release(rqst->rq_iov[0].iov_base); /* request 
> */
> +   if (rqst->rq_iov[1].iov_len)
> +   kfree(rqst->rq_iov[1].iov_base);
> +   }
>  }
>
>
> --
> 2.17.1
>

Looks correct.

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH 2/2] cifs:smbd Use the correct DMA direction when sending data

2019-05-14 Thread Pavel Shilovsky
пн, 13 мая 2019 г. в 21:02, :
>
> From: Long Li 
>
> When sending data, use the DMA_TO_DEVICE to map buffers. Also log the number
> of requests in a compounding request from upper layer.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/smbdirect.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index 251ef1223206..caac37b1de8c 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -903,7 +903,7 @@ static int smbd_create_header(struct smbd_connection 
> *info,
> request->sge[0].addr = ib_dma_map_single(info->id->device,
>  (void *)packet,
>  header_length,
> -DMA_BIDIRECTIONAL);
> +DMA_TO_DEVICE);
> if (ib_dma_mapping_error(info->id->device, request->sge[0].addr)) {
> mempool_free(request, info->request_mempool);
> rc = -EIO;
> @@ -1005,7 +1005,7 @@ static int smbd_post_send_sgl(struct smbd_connection 
> *info,
> for_each_sg(sgl, sg, num_sgs, i) {
> request->sge[i+1].addr =
> ib_dma_map_page(info->id->device, sg_page(sg),
> -  sg->offset, sg->length, DMA_BIDIRECTIONAL);
> +  sg->offset, sg->length, DMA_TO_DEVICE);
> if (ib_dma_mapping_error(
> info->id->device, request->sge[i+1].addr)) {
> rc = -EIO;
> @@ -2110,8 +2110,10 @@ int smbd_send(struct TCP_Server_Info *server,
> goto done;
> }
>
> -   rqst_idx = 0;
> +   log_write(INFO, "num_rqst=%d total length=%u\n",
> +       num_rqst, remaining_data_length);
>
> +   rqst_idx = 0;
>  next_rqst:
> rqst = _array[rqst_idx];
> iov = rqst->rq_iov;
> --
> 2.17.1
>

Acked-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH 1/2] cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called

2019-05-14 Thread Pavel Shilovsky
пн, 13 мая 2019 г. в 21:02, :
>
> From: Long Li 
>
> commit 214bab448476 ("cifs: Call MID callback before destroying transport")
> assumes that the MID callback should not take srv_mutex, this may not always
> be true. SMB Direct requires the MID callback completed before calling
> transport so all pending memory registration can be freed. So restore the
> orignal calling sequence so TCP transport will use the same code, but moving
> smbd_destroy() after all MID has been called.
>
> fixes: 214bab448476 ("cifs: Call MID callback before destroying transport")
> Signed-off-by: Long Li 
> ---
>  fs/cifs/connect.c | 37 -
>  1 file changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 084756cfdaee..0b3ac8b76d18 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -528,6 +528,21 @@ cifs_reconnect(struct TCP_Server_Info *server)
> /* do not want to be sending data on a socket we are freeing */
> cifs_dbg(FYI, "%s: tearing down socket\n", __func__);
> mutex_lock(>srv_mutex);
> +   if (server->ssocket) {
> +   cifs_dbg(FYI, "State: 0x%x Flags: 0x%lx\n",
> +server->ssocket->state, server->ssocket->flags);
> +   kernel_sock_shutdown(server->ssocket, SHUT_WR);
> +   cifs_dbg(FYI, "Post shutdown state: 0x%x Flags: 0x%lx\n",
> +server->ssocket->state, server->ssocket->flags);
> +   sock_release(server->ssocket);
> +   server->ssocket = NULL;
> +   }
> +   server->sequence_number = 0;
> +   server->session_estab = false;
> +   kfree(server->session_key.response);
> +   server->session_key.response = NULL;
> +   server->session_key.len = 0;
> +   server->lstrp = jiffies;
>
> /* mark submitted MIDs for retry and issue callback */
> INIT_LIST_HEAD(_list);
> @@ -540,6 +555,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> list_move(_entry->qhead, _list);
> }
> spin_unlock(_Lock);
> +   mutex_unlock(>srv_mutex);
>
> cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> list_for_each_safe(tmp, tmp2, _list) {
> @@ -548,24 +564,11 @@ cifs_reconnect(struct TCP_Server_Info *server)
> mid_entry->callback(mid_entry);
> }
>
> -   if (server->ssocket) {
> -   cifs_dbg(FYI, "State: 0x%x Flags: 0x%lx\n",
> -server->ssocket->state, server->ssocket->flags);
> -   kernel_sock_shutdown(server->ssocket, SHUT_WR);
> -   cifs_dbg(FYI, "Post shutdown state: 0x%x Flags: 0x%lx\n",
> -server->ssocket->state, server->ssocket->flags);
> -   sock_release(server->ssocket);
> -   server->ssocket = NULL;
> -   } else if (cifs_rdma_enabled(server))
> +   if (cifs_rdma_enabled(server)) {
> +   mutex_lock(>srv_mutex);
> smbd_destroy(server);
> -   server->sequence_number = 0;
> -   server->session_estab = false;
> -   kfree(server->session_key.response);
> -   server->session_key.response = NULL;
> -   server->session_key.len = 0;
> -   server->lstrp = jiffies;
> -
> -   mutex_unlock(>srv_mutex);
> +   mutex_unlock(>srv_mutex);
> +   }
>
> do {
> try_to_freeze();
> --
> 2.17.1
>

Thanks for quickly fixing it!

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [Patch (resend) 5/5] cifs: Call MID callback before destroying transport

2019-05-09 Thread Pavel Shilovsky
пт, 5 апр. 2019 г. в 14:39, Long Li :
>
> From: Long Li 
>
> When transport is being destroyed, it's possible that some processes may
> hold memory registrations that need to be deregistred.
>
> Call them first so nobody is using transport resources, and it can be
> destroyed.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/connect.c | 36 +++-
>  1 file changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 33e4d98..084756cf 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -528,22 +528,6 @@ cifs_reconnect(struct TCP_Server_Info *server)
> /* do not want to be sending data on a socket we are freeing */
> cifs_dbg(FYI, "%s: tearing down socket\n", __func__);
> mutex_lock(>srv_mutex);
> -   if (server->ssocket) {
> -   cifs_dbg(FYI, "State: 0x%x Flags: 0x%lx\n",
> -server->ssocket->state, server->ssocket->flags);
> -   kernel_sock_shutdown(server->ssocket, SHUT_WR);
> -   cifs_dbg(FYI, "Post shutdown state: 0x%x Flags: 0x%lx\n",
> -server->ssocket->state, server->ssocket->flags);
> -   sock_release(server->ssocket);
> -   server->ssocket = NULL;
> -   } else if (cifs_rdma_enabled(server))
> -   smbd_destroy(server);
> -   server->sequence_number = 0;
> -   server->session_estab = false;
> -   kfree(server->session_key.response);
> -   server->session_key.response = NULL;
> -   server->session_key.len = 0;
> -   server->lstrp = jiffies;
>
> /* mark submitted MIDs for retry and issue callback */
> INIT_LIST_HEAD(_list);
> @@ -556,7 +540,6 @@ cifs_reconnect(struct TCP_Server_Info *server)
> list_move(_entry->qhead, _list);
> }
> spin_unlock(_Lock);
> -   mutex_unlock(>srv_mutex);
>
> cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
> list_for_each_safe(tmp, tmp2, _list) {
> @@ -565,6 +548,25 @@ cifs_reconnect(struct TCP_Server_Info *server)
> mid_entry->callback(mid_entry);
> }

The original call was issuing callbacks without holding srv_mutex -
callbacks may take this mutex for its internal needs. With the
proposed patch the code will deadlock.

Also the idea of destroying the socket first is to allow possible
retries (from callbacks) to return a proper error instead of trying to
send anything through the reconnecting socket.

>
> +   if (server->ssocket) {
> +   cifs_dbg(FYI, "State: 0x%x Flags: 0x%lx\n",
> +server->ssocket->state, server->ssocket->flags);
> +   kernel_sock_shutdown(server->ssocket, SHUT_WR);
> +   cifs_dbg(FYI, "Post shutdown state: 0x%x Flags: 0x%lx\n",
> +server->ssocket->state, server->ssocket->flags);
> +   sock_release(server->ssocket);
> +   server->ssocket = NULL;
> +   } else if (cifs_rdma_enabled(server))
> +   smbd_destroy(server);

If we need to call smbd_destroy() *after* callbacks, let's just move
it alone without the rest of the code.


> +   server->sequence_number = 0;
> +   server->session_estab = false;
> +   kfree(server->session_key.response);
> +   server->session_key.response = NULL;
> +   server->session_key.len = 0;
> +   server->lstrp = jiffies;
> +
> +   mutex_unlock(>srv_mutex);
> +
> do {
> try_to_freeze();
>
> --
> 2.7.4
>


--
Best regards,
Pavel Shilovsky


Re: [PATCH v2] cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()

2019-05-08 Thread Pavel Shilovsky
ср, 8 мая 2019 г. в 01:23, Kai-Heng Feng :
>
> at 02:28, Pavel Shilovsky  wrote:
>
> > вт, 7 мая 2019 г. в 09:13, Steve French via samba-technical
> > :
> >> merged into cifs-2.6.git for-next
> >>
> >> On Tue, May 7, 2019 at 10:17 AM Christoph Probst via samba-technical
> >>  wrote:
> >>> Change strcat to strncpy in the "None" case to fix a buffer overflow
> >>> when cinode->oplock is reset to 0 by another thread accessing the same
> >>> cinode. It is never valid to append "None" to any other message.
> >>>
> >>> Consolidate multiple writes to cinode->oplock to reduce raciness.
> >>>
> >>> Signed-off-by: Christoph Probst 
> >>> ---
> >>>  fs/cifs/smb2ops.c | 14 --
> >>>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> >>> index c36ff0d..aa61dcf 100644
> >>> --- a/fs/cifs/smb2ops.c
> >>> +++ b/fs/cifs/smb2ops.c
> >>> @@ -2917,26 +2917,28 @@ smb21_set_oplock_level(struct cifsInodeInfo
> >>> *cinode, __u32 oplock,
> >>>unsigned int epoch, bool *purge_cache)
> >>>  {
> >>> char message[5] = {0};
> >>> +   unsigned int new_oplock = 0;
> >>>
> >>> oplock &= 0xFF;
> >>> if (oplock == SMB2_OPLOCK_LEVEL_NOCHANGE)
> >>> return;
> >>>
> >>> -   cinode->oplock = 0;
> >>> if (oplock & SMB2_LEASE_READ_CACHING_HE) {
> >>> -   cinode->oplock |= CIFS_CACHE_READ_FLG;
> >>> +   new_oplock |= CIFS_CACHE_READ_FLG;
> >>> strcat(message, "R");
> >>> }
> >>> if (oplock & SMB2_LEASE_HANDLE_CACHING_HE) {
> >>> -   cinode->oplock |= CIFS_CACHE_HANDLE_FLG;
> >>> +   new_oplock |= CIFS_CACHE_HANDLE_FLG;
> >>> strcat(message, "H");
> >>> }
> >>> if (oplock & SMB2_LEASE_WRITE_CACHING_HE) {
> >>> -   cinode->oplock |= CIFS_CACHE_WRITE_FLG;
> >>> +   new_oplock |= CIFS_CACHE_WRITE_FLG;
> >>> strcat(message, "W");
> >>> }
> >>> -   if (!cinode->oplock)
> >>> -   strcat(message, "None");
> >>> +   if (!new_oplock)
> >>> +   strncpy(message, "None", sizeof(message));
> >>> +
> >>> +   cinode->oplock = new_oplock;
> >>> cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,
> >>>  >vfs_inode);
> >>>  }
> >>> --
> >>> 2.1.4
> >
>
> Doesn’t the race still happen, but implicitly here?
> cinode->oplock = new_oplock;
>
> Is it possible to just introduce a lock to force its proper ordering?
> e.g.
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index bf5b8264e119..a3c3c6156d17 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -267,6 +267,7 @@ cifs_alloc_inode(struct super_block *sb)
>   * server, can not assume caching of file data or metadata.
>   */
>  cifs_set_oplock_level(cifs_inode, 0);
> +   mutex_init(_inode->oplock_mutex);
>  cifs_inode->flags = 0;
>  spin_lock_init(_inode->writers_lock);
>  cifs_inode->writers = 0;
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 37b5ddf27ff1..6dfd4ab16c4f 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1214,6 +1214,7 @@ struct cifsInodeInfo {
>  struct list_head openFileList;
>  __u32 cifsAttrs; /* e.g. DOS archive bit, sparse, compressed, system 
> */
>  unsigned int oplock;/* oplock/lease level we have */
> +   struct mutex oplock_mutex;
>  unsigned int epoch; /* used to track lease state changes 
> */
>   #define CIFS_INODE_PENDING_OPLOCK_BREAK   (0) /* oplock break in progress */
>   #define CIFS_INODE_PENDING_WRITERS   (1) /* Writes in progress */
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index b20063cf774f..796b23712e71 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -1901,6 +1901,7 @@ smb21_set_oplock_level(struct cifsInodeInfo *cinode,
> __u32 oplock,
>  if (oplock == SMB2_OPLO

Re: [PATCH v2] cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()

2019-05-07 Thread Pavel Shilovsky
вт, 7 мая 2019 г. в 09:13, Steve French via samba-technical
:
>
> merged into cifs-2.6.git for-next
>
> On Tue, May 7, 2019 at 10:17 AM Christoph Probst via samba-technical
>  wrote:
> >
> > Change strcat to strncpy in the "None" case to fix a buffer overflow
> > when cinode->oplock is reset to 0 by another thread accessing the same
> > cinode. It is never valid to append "None" to any other message.
> >
> > Consolidate multiple writes to cinode->oplock to reduce raciness.
> >
> > Signed-off-by: Christoph Probst 
> > ---
> >  fs/cifs/smb2ops.c | 14 --
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> > index c36ff0d..aa61dcf 100644
> > --- a/fs/cifs/smb2ops.c
> > +++ b/fs/cifs/smb2ops.c
> > @@ -2917,26 +2917,28 @@ smb21_set_oplock_level(struct cifsInodeInfo 
> > *cinode, __u32 oplock,
> >unsigned int epoch, bool *purge_cache)
> >  {
> > char message[5] = {0};
> > +   unsigned int new_oplock = 0;
> >
> > oplock &= 0xFF;
> > if (oplock == SMB2_OPLOCK_LEVEL_NOCHANGE)
> > return;
> >
> > -   cinode->oplock = 0;
> > if (oplock & SMB2_LEASE_READ_CACHING_HE) {
> > -   cinode->oplock |= CIFS_CACHE_READ_FLG;
> > +   new_oplock |= CIFS_CACHE_READ_FLG;
> > strcat(message, "R");
> > }
> > if (oplock & SMB2_LEASE_HANDLE_CACHING_HE) {
> > -   cinode->oplock |= CIFS_CACHE_HANDLE_FLG;
> > +   new_oplock |= CIFS_CACHE_HANDLE_FLG;
> > strcat(message, "H");
> > }
> > if (oplock & SMB2_LEASE_WRITE_CACHING_HE) {
> > -   cinode->oplock |= CIFS_CACHE_WRITE_FLG;
> > +   new_oplock |= CIFS_CACHE_WRITE_FLG;
> > strcat(message, "W");
> > }
> > -   if (!cinode->oplock)
> > -   strcat(message, "None");
> > +   if (!new_oplock)
> > +   strncpy(message, "None", sizeof(message));
> > +
> > +   cinode->oplock = new_oplock;
> > cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,
> >  >vfs_inode);
> >  }
> > --
> > 2.1.4
> >
> >
>

Thanks for cleaning it up!

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: fix strcat buffer overflow in smb21_set_oplock_level()

2019-05-06 Thread Pavel Shilovsky
The patch itself is fine but I think we have a bigger problem here:

3052 >---cinode->oplock = 0;

here we reset oplock level of the inode, so forcing all the IO go to the server.

3053 >---if (oplock & SMB2_LEASE_READ_CACHING_HE) {
3054 >--->---cinode->oplock |= CIFS_CACHE_READ_FLG;
3055 >--->---strcat(message, "R");
3056 >---}
3057 >---if (oplock & SMB2_LEASE_HANDLE_CACHING_HE) {
3058 >--->---cinode->oplock |= CIFS_CACHE_HANDLE_FLG;
3059 >--->---strcat(message, "H");
3060 >---}
3061 >---if (oplock & SMB2_LEASE_WRITE_CACHING_HE) {
3062 >--->---cinode->oplock |= CIFS_CACHE_WRITE_FLG;

this and 3 above cases are all racy with other threads opening the
same file and the oplock break thread.

3063 >--->---strcat(message, "W");
3064 >---}
3065 >---if (!cinode->oplock)
3066 >--->---strcat(message, "None");
3067 >---cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,
3068 >--->--- >vfs_inode);

Besides the fix in the patch I would temporarily suggest to not touch
cinode->oplock more than once in this function - have a local variable
cifs_oplock which accumulates flags and set cinode->oplock at the very
end. It reduce raciness a little bit but this code requires much more
thinking about proper locking I guess.

Best regards,
Pavel Shilovskiy

пн, 6 мая 2019 г. в 10:02, Steve French via samba-technical
:
>
> We could always switch it to strncpy :)
>
> In any case - he is correct, it is better than what was there because
> we should not strcat unless the array were locked across the whole
> function
>
> On Mon, May 6, 2019 at 11:57 AM Jeremy Allison  wrote:
> >
> > On Mon, May 06, 2019 at 11:53:44AM -0500, Steve French via samba-technical 
> > wrote:
> > > I think strcpy is clearer - but I don't think it can overflow since if
> > > R, W or W were written to "message" then cinode->oplock would be
> > > non-zero so we would never strcap "None"
> >
> > Ahem. In Samba we have :
> >
> > lib/util/safe_string.h:#define strcpy(dest,src) 
> > __ERROR__XX__NEVER_USE_STRCPY___;
> >
> > Maybe you should do likewise :-).
> >
> > > On Mon, May 6, 2019 at 10:26 AM Christoph Probst  wrote:
> > > >
> > > > Change strcat to strcpy in the "None" case as it is never valid to 
> > > > append
> > > > "None" to any other message. It may also overflow char message[5], in a
> > > > race condition on cinode if cinode->oplock is unset by another thread
> > > > after "RHW" or "RH" had been written to message.
> > > >
> > > > Signed-off-by: Christoph Probst 
> > > > ---
> > > >  fs/cifs/smb2ops.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> > > > index c36ff0d..5fd5567 100644
> > > > --- a/fs/cifs/smb2ops.c
> > > > +++ b/fs/cifs/smb2ops.c
> > > > @@ -2936,7 +2936,7 @@ smb21_set_oplock_level(struct cifsInodeInfo 
> > > > *cinode, __u32 oplock,
> > > > strcat(message, "W");
> > > > }
> > > > if (!cinode->oplock)
> > > > -   strcat(message, "None");
> > > > +   strcpy(message, "None");
> > > > cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,
> > > >  >vfs_inode);
> > > >  }
> > > > --
> > > > 2.1.4
> > > >
> > >
> > >
> > > --
> > > Thanks,
> > >
> > > Steve
> > >
>
>
>
> --
> Thanks,
>
> Steve
>


Re: [PATCH] cifs: fix page reference leak with readv/writev

2019-04-18 Thread Pavel Shilovsky
lize ctx->bv to NULL and 
> > > ctx->direct_io
> > > +* to false so that we know when we have to unreference pages 
> > > within
> > > +* cifs_aio_ctx_release()
> > > +*/
> > > ctx = kzalloc(sizeof(struct cifs_aio_ctx), GFP_KERNEL);
> > > if (!ctx)
> > > return NULL;
> > > @@ -786,7 +791,23 @@ cifs_aio_ctx_release(struct kref *refcount)
> > > struct cifs_aio_ctx, refcount);
> > >
> > > cifsFileInfo_put(ctx->cfile);
> > > -   kvfree(ctx->bv);
> > > +
> > > +   /*
> > > +* ctx->bv is only set if setup_aio_ctx_iter() was call 
> > > successfuly
> > > +* which means that iov_iter_get_pages() was a success and thus 
> > > that
> > > +* we have taken reference on pages.
> > > +*/
> > > +   if (ctx->bv) {
> > > +   unsigned i;
> > > +
> > > +   for (i = 0; i < ctx->npages; i++) {
> > > +   if (ctx->should_dirty)
> > > +   set_page_dirty(ctx->bv[i].bv_page);
> > > +   put_page(ctx->bv[i].bv_page);
> > > +   }
> > > +   kvfree(ctx->bv);
> > > +   }
> > > +
> > > kfree(ctx);
> > >  }
> > >
> > > --
> > > 2.20.1
> > >

Good catch, thanks!

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH] secdesc-ui.py: a UI to view the security descriptors on SMB2+ shares

2019-04-01 Thread Pavel Shilovsky
This is really cool! Thanks Ronnie. I will be targeting this for the
next release of cifs-utils (not the one that I am about to cut off),
so we will have time to stabilize it.

Best regards,
Pavel Shilovsky

вс, 31 мар. 2019 г. в 21:51, Steve French via samba-technical
:

>
> The tool that Ronnie proposed below looks useful (see below) and
> attached screenshot.  With this as a sample (along with 'smbinfo' tool
> in cifs-utils) and a starting point, those with python/GUI interest
> should be able to extend it in very interesting ways now that we have
> the ability to query server information much more broadly.  Managing
> ACLs, quotas, snapshots, alerts and many other fun features across
> such a broad set of servers (from Samba, to Windows, Azure and the
> Cloud, Macs, NetApp and various filers).
>
> Ronnie,
> Great idea.
>
> -- Forwarded message -
> From: Ronnie Sahlberg 
> Date: Sun, Mar 31, 2019 at 10:54 PM
> Subject: [PATCH] secdesc-ui.py: a UI to view the security descriptors
> on SMB2+ shares
> To: linux-cifs 
> Cc: Steve French , Pavel Shilovsky
> , Ronnie Sahlberg 
>
> a simple python program with a basic UI to view the security descriptor
> for SMB2+ resources.
>
> With a basic starting point like this my hope is we can get some interest
> from people with python skills that may want to make it better until
> it becomes a full-fledged utility.
>
> Signed-off-by: Ronnie Sahlberg 
> ---
>  secdesc-ui.py | 421 
> ++
>  1 file changed, 421 insertions(+)
>  create mode 100755 secdesc-ui.py
>
> diff --git a/secdesc-ui.py b/secdesc-ui.py
> new file mode 100755
> index 000..dcb9dbf
> --- /dev/null
> +++ b/secdesc-ui.py
> @@ -0,0 +1,421 @@
> +#!/usr/bin/env python
> +# coding: utf-8
> +
> +import array
> +import enum
> +import fcntl
> +import os
> +import struct
> +import stat
> +import sys
> +from Tkinter import *
> +
> +FULL_CONTROL = 0x001f01ff
> +EWRITE = 0x0116
> +ALL_READ_BITS = 0x00020089
> +EREAD = 0x001200a9
> +CHANGE = 0x001301bf
> +
> +TRAV_EXEC = 0x00100020
> +LIST_READ = 0x0011
> +READ_ATTR = 0x00100080
> +READ_XATT = 0x0018
> +CREA_WRIT = 0x0012
> +CREA_APPE = 0x0014
> +WRIT_ATTR = 0x00100100
> +WRIT_XATT = 0x00100010
> +DELE = 0x0011
> +READ_PERM = 0x0012
> +CHAN_PERM = 0x0014
> +TAKE_OWNR = 0x0018
> +
> +class App:
> +   def __init__(self, root, sd, is_dir):
> +   self.sd = sd
> +self.is_dir = is_dir
> +   self.tf = Frame(bd=1)
> +   self.tf.grid(columnspan=5, rowspan=5, padx=5, pady=5)
> +
> +   # Owner
> +   Label(self.tf, text='Owner: %s' %
> (self.sd.owner)).grid(row=0, column=0, columnspan=6, sticky='W')
> +
> +   # Group
> +   Label(self.tf, text='Group: %s' %
> (self.sd.group)).grid(row=1, column=0, columnspan=6, sticky='W')
> +
> +   self.sb = Scrollbar(self.tf, orient=VERTICAL)
> +   self.lb = Listbox(self.tf, height=5, selectmode=SINGLE,
> +   yscrollcommand=self.sb.set)
> +   self.sb.config(command=self.lb.yview)
> +   self.sb.grid(row=2, column=1, sticky='NS')
> +   self.lb.grid(row=2, column=0, sticky='W')
> +
> +max = 0
> +   for idx, item in enumerate(self.sd.dacl.ace):
> +   if item.type != 0 and item.type != 1:
> +   continue
> +sid = '%s %s' % ("ALLOW" if item.type == 0
> else "DENY", item.sid)
> +if max > len(sid):
> +max = len(sid)
> +   self.lb.insert(idx, sid)
> +   if not self.lb.curselection():
> +   self.lb.selection_set(idx)
> +self.lb.config(width=max)
> +self.lb.bind("", self.select_sid)
> +
> +   self.bas = Button(self.tf, text='Basic', relief=SUNKEN,
> +   command=self.click_bas)
> +   self.bas.grid(row=2, column=2, sticky='NW')
> +
> +   self.adv = Button(self.tf, text='Advanced',
> +   command=self.click_adv)
> +   self.adv.grid(row=2, column=3, sticky='NW')
> +
> +   # Basic Panel
> +   self.bf_bas = Frame(master=self.tf, bd=1)
> +   self.bf_bas.grid(row=3, column=0, columnspan=4, padx=5, 
> pady=5)
> +   self.bf_bas_name = Label(self.bf_bas, text='')
> +   sel

Re: [Patch v2 2/2] CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN

2019-03-16 Thread Pavel Shilovsky
пт, 15 мар. 2019 г. в 00:56, Long Li :
>
> From: Long Li 
>
> When sending a rdata, transport may return -EAGAIN. In this case
> we should re-obtain credits because the session may have been
> reconnected.
>
> Change in v2: adjust_credits before re-sending
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/file.c | 71 
> +-
>  1 file changed, 41 insertions(+), 30 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 321df1d27422..9d90cc07e38b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3347,44 +3347,55 @@ static int cifs_resend_rdata(struct cifs_readdata 
> *rdata,
> struct TCP_Server_Info *server =
> tlink_tcon(rdata->cfile->tlink)->ses->server;
>
> -   /*
> -* Wait for credits to resend this rdata.
> -* Note: we are attempting to resend the whole rdata not in segments
> -*/
> do {
> -   rc = server->ops->wait_mtu_credits(server, rdata->bytes,
> +   if (rdata->cfile->invalidHandle) {
> +   rc = cifs_reopen_file(rdata->cfile, true);
> +   if (rc == -EAGAIN)
> +   continue;
> +   else if (rc)
> +   break;
> +   }
> +
> +   /*
> +* Wait for credits to resend this rdata.
> +* Note: we are attempting to resend the whole rdata not in
> +* segments
> +*/
> +   do {
> +   rc = server->ops->wait_mtu_credits(server, 
> rdata->bytes,
> , );
>
> -   if (rc)
> -   goto out;
> +   if (rc)
> +   goto fail;
>
> -   if (rsize < rdata->bytes) {
> -   add_credits_and_wake_if(server, , 0);
> -   msleep(1000);
> -   }
> -   } while (rsize < rdata->bytes);
> +   if (rsize < rdata->bytes) {
> +   add_credits_and_wake_if(server, , 0);
> +   msleep(1000);
> +   }
> +   } while (rsize < rdata->bytes);
> +   rdata->credits = credits;
>
> -   rdata->credits = credits;
> -   rc = -EAGAIN;
> -   while (rc == -EAGAIN) {
> -   rc = 0;
> -   if (rdata->cfile->invalidHandle)
> -   rc = cifs_reopen_file(rdata->cfile, true);
> -   if (!rc)
> -   rc = server->ops->async_readv(rdata);
> -   }
> +   rc = adjust_credits(server, >credits, rdata->bytes);
> +   if (!rc) {
> +   if (rdata->cfile->invalidHandle)
> +   rc = -EAGAIN;
> +   else
> +   rc = server->ops->async_readv(rdata);
> +   }
>
> -   if (!rc) {
> -   /* Add to aio pending list */
> -   list_add_tail(>list, rdata_list);
> -   return 0;
> -   }
> +   /* If the read was successfully sent, we are done */
> +   if (!rc) {
> +   /* Add to aio pending list */
> +   list_add_tail(>list, rdata_list);
> +   return 0;
> +   }
>
> -   add_credits_and_wake_if(server, >credits, 0);
> -out:
> -   kref_put(>refcount,
> -   cifs_uncached_readdata_release);
> +   /* Roll back credits and retry if needed */
> +   add_credits_and_wake_if(server, >credits, 0);
> +   } while (rc == -EAGAIN);
>
> +fail:
> +   kref_put(>refcount, cifs_uncached_readdata_release);
> return rc;
>  }
>
> --
> 2.14.1
>

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [Patch v2 1/2] CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN

2019-03-16 Thread Pavel Shilovsky
пт, 15 мар. 2019 г. в 00:56, Long Li :
>
> From: Long Li 
>
> When sending a wdata, transport may return -EAGAIN. In this case
> we should re-obtain credits because the session may have been
> reconnected.
>
> Change in v2: adjust_credits before re-sending
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/file.c | 77 
> ++
>  1 file changed, 45 insertions(+), 32 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 9b53f33137b3..321df1d27422 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2620,43 +2620,56 @@ cifs_resend_wdata(struct cifs_writedata *wdata, 
> struct list_head *wdata_list,
> struct TCP_Server_Info *server =
> tlink_tcon(wdata->cfile->tlink)->ses->server;
>
> -   /*
> -* Wait for credits to resend this wdata.
> -* Note: we are attempting to resend the whole wdata not in segments
> -*/
> do {
> -   rc = server->ops->wait_mtu_credits(server, wdata->bytes, 
> ,
> -  );
> +   if (wdata->cfile->invalidHandle) {
> +   rc = cifs_reopen_file(wdata->cfile, false);
> +   if (rc == -EAGAIN)
> +   continue;
> +   else if (rc)
> +   break;
> +   }
>
> -   if (rc)
> -   goto out;
>
> -   if (wsize < wdata->bytes) {
> -   add_credits_and_wake_if(server, , 0);
> -   msleep(1000);
> -   }
> -   } while (wsize < wdata->bytes);
> +   /*
> +* Wait for credits to resend this wdata.
> +* Note: we are attempting to resend the whole wdata not in
> +* segments
> +*/
> +   do {
> +   rc = server->ops->wait_mtu_credits(server, 
> wdata->bytes,
> +   , );
> +   if (rc)
> +   goto fail;
> +
> +   if (wsize < wdata->bytes) {
> +   add_credits_and_wake_if(server, , 0);
> +   msleep(1000);
> +   }
> +   } while (wsize < wdata->bytes);
> +   wdata->credits = credits;
>
> -   wdata->credits = credits;
> -   rc = -EAGAIN;
> -   while (rc == -EAGAIN) {
> -   rc = 0;
> -   if (wdata->cfile->invalidHandle)
> -   rc = cifs_reopen_file(wdata->cfile, false);
> -   if (!rc)
> -   rc = server->ops->async_writev(wdata,
> +   rc = adjust_credits(server, >credits, wdata->bytes);
> +
> +   if (!rc) {
> +   if (wdata->cfile->invalidHandle)
> +   rc = -EAGAIN;
> +   else
> +   rc = server->ops->async_writev(wdata,
> cifs_uncached_writedata_release);
> -   }
> +   }
>
> -   if (!rc) {
> -   list_add_tail(>list, wdata_list);
> -   return 0;
> -   }
> +   /* If the write was successfully sent, we are done */
> +   if (!rc) {
> +   list_add_tail(>list, wdata_list);
> +   return 0;
> +   }
>
> -   add_credits_and_wake_if(server, >credits, 0);
> -out:
> -   kref_put(>refcount, cifs_uncached_writedata_release);
> +   /* Roll back credits and retry if needed */
> +   add_credits_and_wake_if(server, >credits, 0);
> +   } while (rc == -EAGAIN);
>
> +fail:
> +   kref_put(>refcount, cifs_uncached_writedata_release);
> return rc;
>  }
>
> @@ -2884,12 +2897,12 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> wdata->bytes, _from,
> ctx->cfile, cifs_sb, 
> _list,
> ctx);
> +
> +   kref_put(>refcount,
> +   
> cifs_uncached_writedata_release);
> }
>
> list_splice(_list, >list);
> -
> -   kref_put(>refcount,
> -cifs_uncached_writedata_release);
> goto restart_loop;
> }
> }
> --
> 2.14.1
>

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH 1/2] CIFS: Fix a bug with re-sending wdata when transport returning -EAGAIN

2019-03-05 Thread Pavel Shilovsky
 
> cifs_aio_ctx *ctx)
> wdata->bytes, _from,
> ctx->cfile, cifs_sb, 
> _list,
> ctx);
> +
> +   kref_put(>refcount,
> +   
> cifs_uncached_writedata_release);
>         }
>
> list_splice(_list, >list);
> -
> -   kref_put(>refcount,
> -cifs_uncached_writedata_release);
> goto restart_loop;
> }
> }
> --
> 2.17.1
>

The similar concepts applies for async reads as well.

--
Best regards,
Pavel Shilovsky


Re: [PATCH AUTOSEL 4.20 65/77] CIFS: Do not assume one credit for async responses

2019-02-27 Thread Pavel Shilovsky
ср, 27 февр. 2019 г. в 09:54, Sasha Levin :
>
> On Fri, Feb 15, 2019 at 08:10:47PM +, Pavel Shilovskiy wrote:
> >чт, 14 февр. 2019 г. в 18:40, Sasha Levin :
> >>
> >> From: Pavel Shilovsky 
> >>
> >> [ Upstream commit 0fd1d37b0501efc6e295f56ab55cdaff784aa50c ]
> >>
> >> If we don't receive a response we can't assume that the server
> >> granted one credit. Assume zero credits in such cases.
> >>
> >> Signed-off-by: Pavel Shilovsky 
> >> Reviewed-by: Ronnie Sahlberg 
> >> Signed-off-by: Steve French 
> >> Signed-off-by: Sasha Levin 
> >> ---
> >>  fs/cifs/smb2pdu.c | 15 +++
> >>  1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> >> index d1ae7cdb236d..3c44c51310c4 100644
> >> --- a/fs/cifs/smb2pdu.c
> >> +++ b/fs/cifs/smb2pdu.c
> >> @@ -2834,9 +2834,10 @@ smb2_echo_callback(struct mid_q_entry *mid)
> >>  {
> >> struct TCP_Server_Info *server = mid->callback_data;
> >> struct smb2_echo_rsp *rsp = (struct smb2_echo_rsp *)mid->resp_buf;
> >> -   unsigned int credits_received = 1;
> >> +   unsigned int credits_received = 0;
> >>
> >> -   if (mid->mid_state == MID_RESPONSE_RECEIVED)
> >> +   if (mid->mid_state == MID_RESPONSE_RECEIVED
> >> +   || mid->mid_state == MID_RESPONSE_MALFORMED)
> >> credits_received = 
> >> le16_to_cpu(rsp->sync_hdr.CreditRequest);
> >>
> >> DeleteMidQEntry(mid);
> >> @@ -3093,7 +3094,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
> >> struct TCP_Server_Info *server = tcon->ses->server;
> >> struct smb2_sync_hdr *shdr =
> >> (struct smb2_sync_hdr 
> >> *)rdata->iov[0].iov_base;
> >> -   unsigned int credits_received = 1;
> >> +   unsigned int credits_received = 0;
> >> struct smb_rqst rqst = { .rq_iov = rdata->iov,
> >>  .rq_nvec = 2,
> >>  .rq_pages = rdata->pages,
> >> @@ -3132,6 +3133,9 @@ smb2_readv_callback(struct mid_q_entry *mid)
> >> task_io_account_read(rdata->got_bytes);
> >> cifs_stats_bytes_read(tcon, rdata->got_bytes);
> >> break;
> >> +   case MID_RESPONSE_MALFORMED:
> >> +   credits_received = le16_to_cpu(shdr->CreditRequest);
> >> +   /* fall through */
> >> default:
> >> if (rdata->result != -ENODATA)
> >> rdata->result = -EIO;
> >> @@ -3325,7 +3329,7 @@ smb2_writev_callback(struct mid_q_entry *mid)
> >> struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
> >> unsigned int written;
> >> struct smb2_write_rsp *rsp = (struct smb2_write_rsp 
> >> *)mid->resp_buf;
> >> -   unsigned int credits_received = 1;
> >> +   unsigned int credits_received = 0;
> >>
> >> switch (mid->mid_state) {
> >> case MID_RESPONSE_RECEIVED:
> >> @@ -3353,6 +3357,9 @@ smb2_writev_callback(struct mid_q_entry *mid)
> >> case MID_RETRY_NEEDED:
> >> wdata->result = -EAGAIN;
> >> break;
> >> +   case MID_RESPONSE_MALFORMED:
> >> +   credits_received = 
> >> le16_to_cpu(rsp->sync_hdr.CreditRequest);
> >> +       /* fall through */
> >> default:
> >> wdata->result = -EIO;
> >> break;
> >> --
> >> 2.19.1
> >>
> >
> >Can you also apply the following patch to 4.20.y and 4.19.y, please?
> >
> >https://patchwork.ozlabs.org/patch/1030180/
>
> I'll queue it up for the next cycle as it has dependencies on stable@
> commits that didn't make it in yet and I don't want to mess up with
> Greg's flow. Thanks you!

It seems that this one has already been merged to 4.20.11:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.20.y=7cb453e5a88169b2a5cec1e00778fc969efd1a77

So, between those two stable kernels only 4.19 doesn't have the patch.

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: Avoid returning EBUSY to upper layer VFS

2018-12-06 Thread Pavel Shilovsky
чт, 6 дек. 2018 г. в 00:29, Steve French via samba-technical
:
>
> Tentatively pushed to cifs-2.6.git for-next
> On Wed, Dec 5, 2018 at 10:52 PM Long Li  wrote:
> >
> > From: Long Li 
> >
> > EBUSY is not handled by VFS, and will be passed to user-mode. This is not
> > correct as we need to wait for more credits.
> >
> > This patch also fixes a bug where rsize or wsize is used uninitialized when
> > the call to server->ops->wait_mtu_credits() fails.
> >
> > Reported-by: Dan Carpenter 
> > Signed-off-by: Long Li 
> > ---
> >  fs/cifs/file.c | 31 ++-
> >  1 file changed, 6 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index 74c33d5..c9bc56b 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2541,14 +2541,13 @@ static int
> >  cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head 
> > *wdata_list,
> > struct cifs_aio_ctx *ctx)
> >  {
> > -   int wait_retry = 0;
> > unsigned int wsize, credits;
> > int rc;
> > struct TCP_Server_Info *server =
> > tlink_tcon(wdata->cfile->tlink)->ses->server;
> >
> > /*
> > -* Try to resend this wdata, waiting for credits up to 3 seconds.
> > +* Wait for credits to resend this wdata.
> >  * Note: we are attempting to resend the whole wdata not in segments
> >  */
> > do {
> > @@ -2556,19 +2555,13 @@ cifs_resend_wdata(struct cifs_writedata *wdata, 
> > struct list_head *wdata_list,
> > server, wdata->bytes, , );
> >
> > if (rc)
> > -   break;
> > +   goto out;
> >
> > if (wsize < wdata->bytes) {
> > add_credits_and_wake_if(server, credits, 0);
> > msleep(1000);
> > -   wait_retry++;
> > }
> > -   } while (wsize < wdata->bytes && wait_retry < 3);
> > -
> > -   if (wsize < wdata->bytes) {
> > -   rc = -EBUSY;
> > -   goto out;
> > -   }
> > +   } while (wsize < wdata->bytes);
> >
> > rc = -EAGAIN;
> > while (rc == -EAGAIN) {
> > @@ -3234,14 +3227,13 @@ static int cifs_resend_rdata(struct cifs_readdata 
> > *rdata,
> > struct list_head *rdata_list,
> > struct cifs_aio_ctx *ctx)
> >  {
> > -   int wait_retry = 0;
> > unsigned int rsize, credits;
> > int rc;
> > struct TCP_Server_Info *server =
> > tlink_tcon(rdata->cfile->tlink)->ses->server;
> >
> > /*
> > -* Try to resend this rdata, waiting for credits up to 3 seconds.
> > +* Wait for credits to resend this rdata.
> >  * Note: we are attempting to resend the whole rdata not in segments
> >  */
> > do {
> > @@ -3249,24 +3241,13 @@ static int cifs_resend_rdata(struct cifs_readdata 
> > *rdata,
> > , );
> >
> > if (rc)
> > -   break;
> > +   goto out;
> >
> > if (rsize < rdata->bytes) {
> > add_credits_and_wake_if(server, credits, 0);
> > msleep(1000);
> > -   wait_retry++;
> >     }
> > -   } while (rsize < rdata->bytes && wait_retry < 3);
> > -
> > -   /*
> > -* If we can't find enough credits to send this rdata
> > -* release the rdata and return failure, this will pass
> > -* whatever I/O amount we have finished to VFS.
> > -*/
> > -   if (rsize < rdata->bytes) {
> > -   rc = -EBUSY;
> > -   goto out;
> > -   }
> > +   } while (rsize < rdata->bytes);
> >
> > rc = -EAGAIN;
> > while (rc == -EAGAIN) {
> > --
> > 2.7.4
> >
>
>
> --
> Thanks,
>
> Steve
>

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: Avoid returning EBUSY to upper layer VFS

2018-12-06 Thread Pavel Shilovsky
чт, 6 дек. 2018 г. в 00:29, Steve French via samba-technical
:
>
> Tentatively pushed to cifs-2.6.git for-next
> On Wed, Dec 5, 2018 at 10:52 PM Long Li  wrote:
> >
> > From: Long Li 
> >
> > EBUSY is not handled by VFS, and will be passed to user-mode. This is not
> > correct as we need to wait for more credits.
> >
> > This patch also fixes a bug where rsize or wsize is used uninitialized when
> > the call to server->ops->wait_mtu_credits() fails.
> >
> > Reported-by: Dan Carpenter 
> > Signed-off-by: Long Li 
> > ---
> >  fs/cifs/file.c | 31 ++-
> >  1 file changed, 6 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index 74c33d5..c9bc56b 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2541,14 +2541,13 @@ static int
> >  cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head 
> > *wdata_list,
> > struct cifs_aio_ctx *ctx)
> >  {
> > -   int wait_retry = 0;
> > unsigned int wsize, credits;
> > int rc;
> > struct TCP_Server_Info *server =
> > tlink_tcon(wdata->cfile->tlink)->ses->server;
> >
> > /*
> > -* Try to resend this wdata, waiting for credits up to 3 seconds.
> > +* Wait for credits to resend this wdata.
> >  * Note: we are attempting to resend the whole wdata not in segments
> >  */
> > do {
> > @@ -2556,19 +2555,13 @@ cifs_resend_wdata(struct cifs_writedata *wdata, 
> > struct list_head *wdata_list,
> > server, wdata->bytes, , );
> >
> > if (rc)
> > -   break;
> > +   goto out;
> >
> > if (wsize < wdata->bytes) {
> > add_credits_and_wake_if(server, credits, 0);
> > msleep(1000);
> > -   wait_retry++;
> > }
> > -   } while (wsize < wdata->bytes && wait_retry < 3);
> > -
> > -   if (wsize < wdata->bytes) {
> > -   rc = -EBUSY;
> > -   goto out;
> > -   }
> > +   } while (wsize < wdata->bytes);
> >
> > rc = -EAGAIN;
> > while (rc == -EAGAIN) {
> > @@ -3234,14 +3227,13 @@ static int cifs_resend_rdata(struct cifs_readdata 
> > *rdata,
> > struct list_head *rdata_list,
> > struct cifs_aio_ctx *ctx)
> >  {
> > -   int wait_retry = 0;
> > unsigned int rsize, credits;
> > int rc;
> > struct TCP_Server_Info *server =
> > tlink_tcon(rdata->cfile->tlink)->ses->server;
> >
> > /*
> > -* Try to resend this rdata, waiting for credits up to 3 seconds.
> > +* Wait for credits to resend this rdata.
> >  * Note: we are attempting to resend the whole rdata not in segments
> >  */
> > do {
> > @@ -3249,24 +3241,13 @@ static int cifs_resend_rdata(struct cifs_readdata 
> > *rdata,
> > , );
> >
> > if (rc)
> > -   break;
> > +   goto out;
> >
> > if (rsize < rdata->bytes) {
> > add_credits_and_wake_if(server, credits, 0);
> > msleep(1000);
> > -   wait_retry++;
> >     }
> > -   } while (rsize < rdata->bytes && wait_retry < 3);
> > -
> > -   /*
> > -* If we can't find enough credits to send this rdata
> > -* release the rdata and return failure, this will pass
> > -* whatever I/O amount we have finished to VFS.
> > -*/
> > -   if (rsize < rdata->bytes) {
> > -   rc = -EBUSY;
> > -   goto out;
> > -   }
> > +   } while (rsize < rdata->bytes);
> >
> > rc = -EAGAIN;
> > while (rc == -EAGAIN) {
> > --
> > 2.7.4
> >
>
>
> --
> Thanks,
>
> Steve
>

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 1/3] CIFS: Add support for direct I/O read

2018-11-29 Thread Pavel Shilovsky
   
> > > > > cifs_uncached_readdata_release);
> > > > > goto again;
> > > > > } else if (rdata->result)
> > > > > rc = rdata->result;
> > > > > -   else
> > > > > +   else if (!ctx->direct_io)
> > > > > rc = cifs_readdata_to_iov(rdata,
> > > > > to);
> > > > >
> > > > > /* if there was a short read -- discard 
> > > > > anything left */
> > > > > if (rdata->got_bytes && rdata->got_bytes < 
> > > > > rdata->bytes)
> > > > > rc = -ENODATA;
> > > > > +
> > > > > +   ctx->total_len += rdata->got_bytes;
> > > > > }
> > > > > list_del_init(>list);
> > > > > kref_put(>refcount, 
> > > > > cifs_uncached_readdata_release);
> > > > > }
> > > > >
> > > > > -   for (i = 0; i < ctx->npages; i++) {
> > > > > -   if (ctx->should_dirty)
> > > > > -   set_page_dirty(ctx->bv[i].bv_page);
> > > > > -   put_page(ctx->bv[i].bv_page);
> > > > > -   }
> > > > > +   if (!ctx->direct_io) {
> > > > > +   for (i = 0; i < ctx->npages; i++) {
> > > > > +   if (ctx->should_dirty)
> > > > > +   set_page_dirty(ctx->bv[i].bv_page);
> > > > > +   put_page(ctx->bv[i].bv_page);
> > > > > +   }
> > > > >
> > > > > -   ctx->total_len = ctx->len - iov_iter_count(to);
> > > > > +   ctx->total_len = ctx->len - iov_iter_count(to);
> > > > > +   }
> > > > >
> > > > > cifs_stats_bytes_read(tcon, ctx->total_len);
> > > > >
> > > > > @@ -3267,18 +3383,27 @@ collect_uncached_read_data(struct
> > > > cifs_aio_ctx *ctx)
> > > > > complete(>done);
> > > >
> > > > and then ctx->rc is set to -EBUSY too.
> > > >
> > > > >  }
> > > > >
> > > > > -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> > > > > +ssize_t __cifs_readv(struct kiocb *iocb, struct iov_iter *to,
> > > > > +bool
> > > > > +direct)
> > > > >  {
> > > > > -   struct file *file = iocb->ki_filp;
> > > > > -   ssize_t rc;
> > > > > size_t len;
> > > > > -   ssize_t total_read = 0;
> > > > > -   loff_t offset = iocb->ki_pos;
> > > > > +   struct file *file = iocb->ki_filp;
> > > > > struct cifs_sb_info *cifs_sb;
> > > > > -   struct cifs_tcon *tcon;
> > > > > struct cifsFileInfo *cfile;
> > > > > +   struct cifs_tcon *tcon;
> > > > > +   ssize_t rc, total_read = 0;
> > > > > +   loff_t offset = iocb->ki_pos;
> > > > > struct cifs_aio_ctx *ctx;
> > > > >
> > > > > +   /*
> > > > > +* iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > > > > +* fall back to data copy read path
> > > > > +* this could be improved by getting pages directly in 
> > > > > ITER_KVEC
> > > > > +*/
> > > > > +   if (direct && to->type & ITER_KVEC) {
> > > > > +   cifs_dbg(FYI, "use non-direct cifs_user_readv for 
> > > > > kvec I/O\n");
> > > > > +   direct = false;
> > > > > +   }
> > > > > +
> > > > > len = iov_iter_count(to);
> > > > > if (!len)
> > > > > return 0;
> > > > > @@ -3305,14 +3430,20 @@ ssize_t cifs_user_readv(struct kiocb
> > > > > *iocb,
> > > > struct iov_iter *to)
> > > > > if (to->type == ITER_IOVEC)
> > > > > ctx->should_dirty = true;
> > > > >
> > > > > -   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > > > -   if (rc) {
> > > > > -   kref_put(>refcount, cifs_aio_ctx_release);
> > > > > -   return rc;
> > > > > +   if (direct) {
> > > > > +   ctx->pos = offset;
> > > > > +   ctx->direct_io = true;
> > > > > +   ctx->iter = *to;
> > > > > +   ctx->len = len;
> > > > > +   } else {
> > > > > +   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > > > +   if (rc) {
> > > > > +   kref_put(>refcount, 
> > > > > cifs_aio_ctx_release);
> > > > > +   return rc;
> > > > > +   }
> > > > > +   len = ctx->len;
> > > > > }
> > > > >
> > > > > -   len = ctx->len;
> > > > > -
> > > > > /* grab a lock here due to read response handlers can access 
> > > > > ctx */
> > > > > mutex_lock(>aio_mutex);
> > > > >
> > > > > @@ -3354,6 +3485,16 @@ ssize_t cifs_user_readv(struct kiocb *iocb,
> > > > struct iov_iter *to)
> > > > > return rc;
> > > >
> > > > In case of a blocking read we will return -EBUSY to the caller but
> > > > read man page doesn't say anything about a possibility to return
> > > > this error code. Is it being mapped somehow by VFS or is there
> > > > another consideration? The same question applies to the write patch as
> > well.
> > >
> > > Thanks Pavel,  Yes this is a bug.
> > >
> > > In practice, the retry logic will rarely get hit. If getting hit, 
> > > normally the
> > server will give us far more credits for resending this whole packet.
> > >
> > > How about just retrying forever (instead of trying 3 times) on this resend
> > packet? This will change the code to the same behavior before direct I/O
> > patches.
> > >
> > > e.g.:
> > >
> > > /*
> > >  * Try to resend this wdata, waiting for credits before sending
> > >  * Note: we are attempting to resend the whole wdata not in
> > segments
> > >  */
> > > do {
> > > rc = server->ops->wait_mtu_credits(
> > > server, wdata->bytes, , );
> > >
> > > if (rc)
> > > break;
> > >
> > > if (wsize < wdata->bytes) {
> > > add_credits_and_wake_if(server, credits, 0);
> > > msleep(1000);
> > > }
> > > } while (wsize < wdata->bytes);
> > >
> >
> > We can do that but it won't be the same behavior because we were using a
> > partial send:
>
> Yes it's a little different.
>
> I don't see any ill effects of doing this, other than we may wait for a 
> little longer on a full credit to send the whole packet. In practice the 
> server always offers a full credit on the first call to wait_mtu_credits(), I 
> am yet to see the while loop actually get looped.
>
> The resend path is rarely getting hit in stress tests on TCP, I estimate 
> maybe less than 0.001% on my test setup, the while loop for waiting credits 
> is ever harder to hit that I never see one.
>
> On RDMA, the resend path is never used.

Ok. Another way to fix this is to break the request into several
smaller ones consuming 1 credit per request - see
cifs_writev_requeue() function that does the same thing for writes.

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 1/3] CIFS: Add support for direct I/O read

2018-11-29 Thread Pavel Shilovsky
   
> > > > > cifs_uncached_readdata_release);
> > > > > goto again;
> > > > > } else if (rdata->result)
> > > > > rc = rdata->result;
> > > > > -   else
> > > > > +   else if (!ctx->direct_io)
> > > > > rc = cifs_readdata_to_iov(rdata,
> > > > > to);
> > > > >
> > > > > /* if there was a short read -- discard 
> > > > > anything left */
> > > > > if (rdata->got_bytes && rdata->got_bytes < 
> > > > > rdata->bytes)
> > > > > rc = -ENODATA;
> > > > > +
> > > > > +   ctx->total_len += rdata->got_bytes;
> > > > > }
> > > > > list_del_init(>list);
> > > > > kref_put(>refcount, 
> > > > > cifs_uncached_readdata_release);
> > > > > }
> > > > >
> > > > > -   for (i = 0; i < ctx->npages; i++) {
> > > > > -   if (ctx->should_dirty)
> > > > > -   set_page_dirty(ctx->bv[i].bv_page);
> > > > > -   put_page(ctx->bv[i].bv_page);
> > > > > -   }
> > > > > +   if (!ctx->direct_io) {
> > > > > +   for (i = 0; i < ctx->npages; i++) {
> > > > > +   if (ctx->should_dirty)
> > > > > +   set_page_dirty(ctx->bv[i].bv_page);
> > > > > +   put_page(ctx->bv[i].bv_page);
> > > > > +   }
> > > > >
> > > > > -   ctx->total_len = ctx->len - iov_iter_count(to);
> > > > > +   ctx->total_len = ctx->len - iov_iter_count(to);
> > > > > +   }
> > > > >
> > > > > cifs_stats_bytes_read(tcon, ctx->total_len);
> > > > >
> > > > > @@ -3267,18 +3383,27 @@ collect_uncached_read_data(struct
> > > > cifs_aio_ctx *ctx)
> > > > > complete(>done);
> > > >
> > > > and then ctx->rc is set to -EBUSY too.
> > > >
> > > > >  }
> > > > >
> > > > > -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> > > > > +ssize_t __cifs_readv(struct kiocb *iocb, struct iov_iter *to,
> > > > > +bool
> > > > > +direct)
> > > > >  {
> > > > > -   struct file *file = iocb->ki_filp;
> > > > > -   ssize_t rc;
> > > > > size_t len;
> > > > > -   ssize_t total_read = 0;
> > > > > -   loff_t offset = iocb->ki_pos;
> > > > > +   struct file *file = iocb->ki_filp;
> > > > > struct cifs_sb_info *cifs_sb;
> > > > > -   struct cifs_tcon *tcon;
> > > > > struct cifsFileInfo *cfile;
> > > > > +   struct cifs_tcon *tcon;
> > > > > +   ssize_t rc, total_read = 0;
> > > > > +   loff_t offset = iocb->ki_pos;
> > > > > struct cifs_aio_ctx *ctx;
> > > > >
> > > > > +   /*
> > > > > +* iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > > > > +* fall back to data copy read path
> > > > > +* this could be improved by getting pages directly in 
> > > > > ITER_KVEC
> > > > > +*/
> > > > > +   if (direct && to->type & ITER_KVEC) {
> > > > > +   cifs_dbg(FYI, "use non-direct cifs_user_readv for 
> > > > > kvec I/O\n");
> > > > > +   direct = false;
> > > > > +   }
> > > > > +
> > > > > len = iov_iter_count(to);
> > > > > if (!len)
> > > > > return 0;
> > > > > @@ -3305,14 +3430,20 @@ ssize_t cifs_user_readv(struct kiocb
> > > > > *iocb,
> > > > struct iov_iter *to)
> > > > > if (to->type == ITER_IOVEC)
> > > > > ctx->should_dirty = true;
> > > > >
> > > > > -   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > > > -   if (rc) {
> > > > > -   kref_put(>refcount, cifs_aio_ctx_release);
> > > > > -   return rc;
> > > > > +   if (direct) {
> > > > > +   ctx->pos = offset;
> > > > > +   ctx->direct_io = true;
> > > > > +   ctx->iter = *to;
> > > > > +   ctx->len = len;
> > > > > +   } else {
> > > > > +   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > > > +   if (rc) {
> > > > > +   kref_put(>refcount, 
> > > > > cifs_aio_ctx_release);
> > > > > +   return rc;
> > > > > +   }
> > > > > +   len = ctx->len;
> > > > > }
> > > > >
> > > > > -   len = ctx->len;
> > > > > -
> > > > > /* grab a lock here due to read response handlers can access 
> > > > > ctx */
> > > > > mutex_lock(>aio_mutex);
> > > > >
> > > > > @@ -3354,6 +3485,16 @@ ssize_t cifs_user_readv(struct kiocb *iocb,
> > > > struct iov_iter *to)
> > > > > return rc;
> > > >
> > > > In case of a blocking read we will return -EBUSY to the caller but
> > > > read man page doesn't say anything about a possibility to return
> > > > this error code. Is it being mapped somehow by VFS or is there
> > > > another consideration? The same question applies to the write patch as
> > well.
> > >
> > > Thanks Pavel,  Yes this is a bug.
> > >
> > > In practice, the retry logic will rarely get hit. If getting hit, 
> > > normally the
> > server will give us far more credits for resending this whole packet.
> > >
> > > How about just retrying forever (instead of trying 3 times) on this resend
> > packet? This will change the code to the same behavior before direct I/O
> > patches.
> > >
> > > e.g.:
> > >
> > > /*
> > >  * Try to resend this wdata, waiting for credits before sending
> > >  * Note: we are attempting to resend the whole wdata not in
> > segments
> > >  */
> > > do {
> > > rc = server->ops->wait_mtu_credits(
> > > server, wdata->bytes, , );
> > >
> > > if (rc)
> > > break;
> > >
> > > if (wsize < wdata->bytes) {
> > > add_credits_and_wake_if(server, credits, 0);
> > > msleep(1000);
> > > }
> > > } while (wsize < wdata->bytes);
> > >
> >
> > We can do that but it won't be the same behavior because we were using a
> > partial send:
>
> Yes it's a little different.
>
> I don't see any ill effects of doing this, other than we may wait for a 
> little longer on a full credit to send the whole packet. In practice the 
> server always offers a full credit on the first call to wait_mtu_credits(), I 
> am yet to see the while loop actually get looped.
>
> The resend path is rarely getting hit in stress tests on TCP, I estimate 
> maybe less than 0.001% on my test setup, the while loop for waiting credits 
> is ever harder to hit that I never see one.
>
> On RDMA, the resend path is never used.

Ok. Another way to fix this is to break the request into several
smaller ones consuming 1 credit per request - see
cifs_writev_requeue() function that does the same thing for writes.

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 1/3] CIFS: Add support for direct I/O read

2018-11-29 Thread Pavel Shilovsky
  * direct I/O
> > > +*/
> > > +   rc = cifs_resend_rdata(
> > > +   rdata,
> > > +   _list, ctx);
> >
> > ...so we set rc to -EBUSY...
> >
> >
> > > +   } else {
> > > +   rc = cifs_send_async_read(
> > > rdata->offset + got_bytes,
> > > rdata->bytes - got_bytes,
> > > rdata->cfile, cifs_sb,
> > > _list, ctx);
> > >
> > > +   kref_put(>refcount,
> > > +   
> > > cifs_uncached_readdata_release);
> > > +   }
> > > +
> > > list_splice(_list, >list);
> > >
> > > -   kref_put(>refcount,
> > > -cifs_uncached_readdata_release);
> > > goto again;
> > > } else if (rdata->result)
> > > rc = rdata->result;
> > > -   else
> > > +   else if (!ctx->direct_io)
> > > rc = cifs_readdata_to_iov(rdata, to);
> > >
> > > /* if there was a short read -- discard anything 
> > > left */
> > > if (rdata->got_bytes && rdata->got_bytes < 
> > > rdata->bytes)
> > > rc = -ENODATA;
> > > +
> > > +   ctx->total_len += rdata->got_bytes;
> > > }
> > > list_del_init(>list);
> > > kref_put(>refcount, 
> > > cifs_uncached_readdata_release);
> > > }
> > >
> > > -   for (i = 0; i < ctx->npages; i++) {
> > > -   if (ctx->should_dirty)
> > > -   set_page_dirty(ctx->bv[i].bv_page);
> > > -   put_page(ctx->bv[i].bv_page);
> > > -   }
> > > +   if (!ctx->direct_io) {
> > > +   for (i = 0; i < ctx->npages; i++) {
> > > +   if (ctx->should_dirty)
> > > +   set_page_dirty(ctx->bv[i].bv_page);
> > > +   put_page(ctx->bv[i].bv_page);
> > > +   }
> > >
> > > -   ctx->total_len = ctx->len - iov_iter_count(to);
> > > +   ctx->total_len = ctx->len - iov_iter_count(to);
> > > +   }
> > >
> > > cifs_stats_bytes_read(tcon, ctx->total_len);
> > >
> > > @@ -3267,18 +3383,27 @@ collect_uncached_read_data(struct
> > cifs_aio_ctx *ctx)
> > > complete(>done);
> >
> > and then ctx->rc is set to -EBUSY too.
> >
> > >  }
> > >
> > > -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> > > +ssize_t __cifs_readv(struct kiocb *iocb, struct iov_iter *to, bool
> > > +direct)
> > >  {
> > > -   struct file *file = iocb->ki_filp;
> > > -   ssize_t rc;
> > > size_t len;
> > > -   ssize_t total_read = 0;
> > > -   loff_t offset = iocb->ki_pos;
> > > +   struct file *file = iocb->ki_filp;
> > > struct cifs_sb_info *cifs_sb;
> > > -   struct cifs_tcon *tcon;
> > > struct cifsFileInfo *cfile;
> > > +   struct cifs_tcon *tcon;
> > > +   ssize_t rc, total_read = 0;
> > > +   loff_t offset = iocb->ki_pos;
> > > struct cifs_aio_ctx *ctx;
> > >
> > > +   /*
> > > +* iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > > +* fall back to data copy read path
> > > +* this could be improved by getting pages directly in ITER_KVEC
> > > +*/
> > > +   if (direct && to->type & ITER_KVEC) {
> > > +   cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec 
> > > I/O\n");
> > > +   direct = false;
> > > +   }
> > > +
> > > len = iov_iter_count(to);
> > > if (!len)
> > > return 0;
> > > @@ -3305,14 +3430,20 @@ ssize_t cifs_user_readv(struct kiocb *iocb,
> > struct iov_iter *to)
> > > if (to->type == ITER_IOVEC)
> > > ctx->should_dirty = true;
> > >
> > > -   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > -   if (rc) {
> > > -   kref_put(>refcount, cifs_aio_ctx_release);
> > > -   return rc;
> > > +   if (direct) {
> > > +   ctx->pos = offset;
> > > +   ctx->direct_io = true;
> > > +   ctx->iter = *to;
> > > +   ctx->len = len;
> > > +   } else {
> > > +   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > +   if (rc) {
> > > +   kref_put(>refcount, cifs_aio_ctx_release);
> > > +   return rc;
> > > +   }
> > > +   len = ctx->len;
> > > }
> > >
> > > -   len = ctx->len;
> > > -
> > > /* grab a lock here due to read response handlers can access ctx 
> > > */
> > > mutex_lock(>aio_mutex);
> > >
> > > @@ -3354,6 +3485,16 @@ ssize_t cifs_user_readv(struct kiocb *iocb,
> > struct iov_iter *to)
> > > return rc;
> >
> > In case of a blocking read we will return -EBUSY to the caller but read man
> > page doesn't say anything about a possibility to return this error code. Is 
> > it
> > being mapped somehow by VFS or is there another consideration? The same
> > question applies to the write patch as well.
>
> Thanks Pavel,  Yes this is a bug.
>
> In practice, the retry logic will rarely get hit. If getting hit, normally 
> the server will give us far more credits for resending this whole packet.
>
> How about just retrying forever (instead of trying 3 times) on this resend 
> packet? This will change the code to the same behavior before direct I/O 
> patches.
>
> e.g.:
>
> /*
>  * Try to resend this wdata, waiting for credits before sending
>  * Note: we are attempting to resend the whole wdata not in segments
>  */
> do {
> rc = server->ops->wait_mtu_credits(
> server, wdata->bytes, , );
>
> if (rc)
> break;
>
> if (wsize < wdata->bytes) {
> add_credits_and_wake_if(server, credits, 0);
> msleep(1000);
> }
> } while (wsize < wdata->bytes);
>

We can do that but it won't be the same behavior because we were using
a partial send:

+   if (ctx->direct_io) {
+   /*
+* Re-use rdata as this is a
+* direct I/O
+*/
+   rc = cifs_resend_rdata(
+   rdata,
+   _list, ctx);
+   } else {
+   rc = cifs_send_async_read(
rdata->offset + got_bytes,
rdata->bytes - got_bytes,
rdata->cfile, cifs_sb,
_list, ctx);

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 1/3] CIFS: Add support for direct I/O read

2018-11-29 Thread Pavel Shilovsky
  * direct I/O
> > > +*/
> > > +   rc = cifs_resend_rdata(
> > > +   rdata,
> > > +   _list, ctx);
> >
> > ...so we set rc to -EBUSY...
> >
> >
> > > +   } else {
> > > +   rc = cifs_send_async_read(
> > > rdata->offset + got_bytes,
> > > rdata->bytes - got_bytes,
> > > rdata->cfile, cifs_sb,
> > > _list, ctx);
> > >
> > > +   kref_put(>refcount,
> > > +   
> > > cifs_uncached_readdata_release);
> > > +   }
> > > +
> > > list_splice(_list, >list);
> > >
> > > -   kref_put(>refcount,
> > > -cifs_uncached_readdata_release);
> > > goto again;
> > > } else if (rdata->result)
> > > rc = rdata->result;
> > > -   else
> > > +   else if (!ctx->direct_io)
> > > rc = cifs_readdata_to_iov(rdata, to);
> > >
> > > /* if there was a short read -- discard anything 
> > > left */
> > > if (rdata->got_bytes && rdata->got_bytes < 
> > > rdata->bytes)
> > > rc = -ENODATA;
> > > +
> > > +   ctx->total_len += rdata->got_bytes;
> > > }
> > > list_del_init(>list);
> > > kref_put(>refcount, 
> > > cifs_uncached_readdata_release);
> > > }
> > >
> > > -   for (i = 0; i < ctx->npages; i++) {
> > > -   if (ctx->should_dirty)
> > > -   set_page_dirty(ctx->bv[i].bv_page);
> > > -   put_page(ctx->bv[i].bv_page);
> > > -   }
> > > +   if (!ctx->direct_io) {
> > > +   for (i = 0; i < ctx->npages; i++) {
> > > +   if (ctx->should_dirty)
> > > +   set_page_dirty(ctx->bv[i].bv_page);
> > > +   put_page(ctx->bv[i].bv_page);
> > > +   }
> > >
> > > -   ctx->total_len = ctx->len - iov_iter_count(to);
> > > +   ctx->total_len = ctx->len - iov_iter_count(to);
> > > +   }
> > >
> > > cifs_stats_bytes_read(tcon, ctx->total_len);
> > >
> > > @@ -3267,18 +3383,27 @@ collect_uncached_read_data(struct
> > cifs_aio_ctx *ctx)
> > > complete(>done);
> >
> > and then ctx->rc is set to -EBUSY too.
> >
> > >  }
> > >
> > > -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> > > +ssize_t __cifs_readv(struct kiocb *iocb, struct iov_iter *to, bool
> > > +direct)
> > >  {
> > > -   struct file *file = iocb->ki_filp;
> > > -   ssize_t rc;
> > > size_t len;
> > > -   ssize_t total_read = 0;
> > > -   loff_t offset = iocb->ki_pos;
> > > +   struct file *file = iocb->ki_filp;
> > > struct cifs_sb_info *cifs_sb;
> > > -   struct cifs_tcon *tcon;
> > > struct cifsFileInfo *cfile;
> > > +   struct cifs_tcon *tcon;
> > > +   ssize_t rc, total_read = 0;
> > > +   loff_t offset = iocb->ki_pos;
> > > struct cifs_aio_ctx *ctx;
> > >
> > > +   /*
> > > +* iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > > +* fall back to data copy read path
> > > +* this could be improved by getting pages directly in ITER_KVEC
> > > +*/
> > > +   if (direct && to->type & ITER_KVEC) {
> > > +   cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec 
> > > I/O\n");
> > > +   direct = false;
> > > +   }
> > > +
> > > len = iov_iter_count(to);
> > > if (!len)
> > > return 0;
> > > @@ -3305,14 +3430,20 @@ ssize_t cifs_user_readv(struct kiocb *iocb,
> > struct iov_iter *to)
> > > if (to->type == ITER_IOVEC)
> > > ctx->should_dirty = true;
> > >
> > > -   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > -   if (rc) {
> > > -   kref_put(>refcount, cifs_aio_ctx_release);
> > > -   return rc;
> > > +   if (direct) {
> > > +   ctx->pos = offset;
> > > +   ctx->direct_io = true;
> > > +   ctx->iter = *to;
> > > +   ctx->len = len;
> > > +   } else {
> > > +   rc = setup_aio_ctx_iter(ctx, to, READ);
> > > +   if (rc) {
> > > +   kref_put(>refcount, cifs_aio_ctx_release);
> > > +   return rc;
> > > +   }
> > > +   len = ctx->len;
> > > }
> > >
> > > -   len = ctx->len;
> > > -
> > > /* grab a lock here due to read response handlers can access ctx 
> > > */
> > > mutex_lock(>aio_mutex);
> > >
> > > @@ -3354,6 +3485,16 @@ ssize_t cifs_user_readv(struct kiocb *iocb,
> > struct iov_iter *to)
> > > return rc;
> >
> > In case of a blocking read we will return -EBUSY to the caller but read man
> > page doesn't say anything about a possibility to return this error code. Is 
> > it
> > being mapped somehow by VFS or is there another consideration? The same
> > question applies to the write patch as well.
>
> Thanks Pavel,  Yes this is a bug.
>
> In practice, the retry logic will rarely get hit. If getting hit, normally 
> the server will give us far more credits for resending this whole packet.
>
> How about just retrying forever (instead of trying 3 times) on this resend 
> packet? This will change the code to the same behavior before direct I/O 
> patches.
>
> e.g.:
>
> /*
>  * Try to resend this wdata, waiting for credits before sending
>  * Note: we are attempting to resend the whole wdata not in segments
>  */
> do {
> rc = server->ops->wait_mtu_credits(
> server, wdata->bytes, , );
>
> if (rc)
> break;
>
> if (wsize < wdata->bytes) {
> add_credits_and_wake_if(server, credits, 0);
> msleep(1000);
> }
> } while (wsize < wdata->bytes);
>

We can do that but it won't be the same behavior because we were using
a partial send:

+   if (ctx->direct_io) {
+   /*
+* Re-use rdata as this is a
+* direct I/O
+*/
+   rc = cifs_resend_rdata(
+   rdata,
+   _list, ctx);
+   } else {
+   rc = cifs_send_async_read(
rdata->offset + got_bytes,
rdata->bytes - got_bytes,
rdata->cfile, cifs_sb,
_list, ctx);

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 2/3] CIFS: Add support for direct I/O write

2018-11-29 Thread Pavel Shilovsky
t; > +static ssize_t __cifs_writev(struct kiocb *iocb, struct iov_iter
> > > +*from, bool direct)
> > >  {
> > > struct file *file = iocb->ki_filp;
> > > ssize_t total_written = 0;
> > > @@ -2712,13 +2803,18 @@ ssize_t cifs_user_writev(struct kiocb *iocb,
> > struct iov_iter *from)
> > > struct cifs_sb_info *cifs_sb;
> > > struct cifs_aio_ctx *ctx;
> > > struct iov_iter saved_from = *from;
> > > +   size_t len = iov_iter_count(from);
> > > int rc;
> > >
> > > /*
> > > -* BB - optimize the way when signing is disabled. We can drop 
> > > this
> > > -* extra memory-to-memory copying and use iovec buffers for
> > constructing
> > > -* write request.
> > > +* iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> > > +* In this case, fall back to non-direct write function.
> > > +* this could be improved by getting pages directly in
> > > + ITER_KVEC
> > >  */
> > > +   if (direct && from->type & ITER_KVEC) {
> > > +   cifs_dbg(FYI, "use non-direct cifs_writev for kvec 
> > > I/O\n");
> > > +   direct = false;
> > > +   }
> > >
> > > rc = generic_write_checks(iocb, from);
> > > if (rc <= 0)
> > > @@ -2742,10 +2838,16 @@ ssize_t cifs_user_writev(struct kiocb *iocb,
> > > struct iov_iter *from)
> > >
> > > ctx->pos = iocb->ki_pos;
> > >
> > > -   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> > > -   if (rc) {
> > > -   kref_put(>refcount, cifs_aio_ctx_release);
> > > -   return rc;
> > > +   if (direct) {
> > > +   ctx->direct_io = true;
> > > +   ctx->iter = *from;
> > > +   ctx->len = len;
> > > +   } else {
> > > +   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> > > +   if (rc) {
> > > +   kref_put(>refcount, cifs_aio_ctx_release);
> > > +   return rc;
> > > +   }
> > > }
> > >
> > > /* grab a lock here due to read response handlers can access
> > > ctx */ @@ -2795,6 +2897,16 @@ ssize_t cifs_user_writev(struct kiocb
> > *iocb, struct iov_iter *from)
> > > return total_written;
> > >  }
> > >
> > > +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> > > +{
> > > +   return __cifs_writev(iocb, from, true); }
> > > +
> > > +ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from) {
> > > +   return __cifs_writev(iocb, from, false); }
> > > +
> > >  static ssize_t
> > >  cifs_writev(struct kiocb *iocb, struct iov_iter *from)  {
> > > --
> > > 2.7.4
> > >
> >
> > Did you measure the performance benefit of using O_DIRECT with your
> > patches for non-RDMA mode?
>
> Here are some of the results measured on TCP/IP over IPoIB 40G Infiniband. 
> Please note that IPoIB is done in software so it's slower than a 40G Ethernet 
> NIC.
>
> All servers are running 2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. Tested 
> on FIO 64k read or write with queue depths 1 or 16. All the tests are running 
> with 1 FIO job, and with "direct=1".
>
> Without direct I/O:
> read 64k d1 113669KB/s
> read 64k d16618428KB/s
> write 64k d1134911KB/s
> write 64k d16   457005KB/s
>
> With direct I/O:
> read 64k d1 127134KB/s
> read 64k d16714629KB/s
> write 64k d1129268KB/s
> write 64k d16   461054KB/s
>
> Direct I/O improvement:
> read 64k d1 11%
> read 64k d1615%
> write 64k d1-5%
  ^^^
This is strange. Is it consistent across multiple runs?

> write 64k d16   1%

Good read performance improvements overall and it seems write
performance results need more investigations.

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 2/3] CIFS: Add support for direct I/O write

2018-11-29 Thread Pavel Shilovsky
t; > +static ssize_t __cifs_writev(struct kiocb *iocb, struct iov_iter
> > > +*from, bool direct)
> > >  {
> > > struct file *file = iocb->ki_filp;
> > > ssize_t total_written = 0;
> > > @@ -2712,13 +2803,18 @@ ssize_t cifs_user_writev(struct kiocb *iocb,
> > struct iov_iter *from)
> > > struct cifs_sb_info *cifs_sb;
> > > struct cifs_aio_ctx *ctx;
> > > struct iov_iter saved_from = *from;
> > > +   size_t len = iov_iter_count(from);
> > > int rc;
> > >
> > > /*
> > > -* BB - optimize the way when signing is disabled. We can drop 
> > > this
> > > -* extra memory-to-memory copying and use iovec buffers for
> > constructing
> > > -* write request.
> > > +* iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> > > +* In this case, fall back to non-direct write function.
> > > +* this could be improved by getting pages directly in
> > > + ITER_KVEC
> > >  */
> > > +   if (direct && from->type & ITER_KVEC) {
> > > +   cifs_dbg(FYI, "use non-direct cifs_writev for kvec 
> > > I/O\n");
> > > +   direct = false;
> > > +   }
> > >
> > > rc = generic_write_checks(iocb, from);
> > > if (rc <= 0)
> > > @@ -2742,10 +2838,16 @@ ssize_t cifs_user_writev(struct kiocb *iocb,
> > > struct iov_iter *from)
> > >
> > > ctx->pos = iocb->ki_pos;
> > >
> > > -   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> > > -   if (rc) {
> > > -   kref_put(>refcount, cifs_aio_ctx_release);
> > > -   return rc;
> > > +   if (direct) {
> > > +   ctx->direct_io = true;
> > > +   ctx->iter = *from;
> > > +   ctx->len = len;
> > > +   } else {
> > > +   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> > > +   if (rc) {
> > > +   kref_put(>refcount, cifs_aio_ctx_release);
> > > +   return rc;
> > > +   }
> > > }
> > >
> > > /* grab a lock here due to read response handlers can access
> > > ctx */ @@ -2795,6 +2897,16 @@ ssize_t cifs_user_writev(struct kiocb
> > *iocb, struct iov_iter *from)
> > > return total_written;
> > >  }
> > >
> > > +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> > > +{
> > > +   return __cifs_writev(iocb, from, true); }
> > > +
> > > +ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from) {
> > > +   return __cifs_writev(iocb, from, false); }
> > > +
> > >  static ssize_t
> > >  cifs_writev(struct kiocb *iocb, struct iov_iter *from)  {
> > > --
> > > 2.7.4
> > >
> >
> > Did you measure the performance benefit of using O_DIRECT with your
> > patches for non-RDMA mode?
>
> Here are some of the results measured on TCP/IP over IPoIB 40G Infiniband. 
> Please note that IPoIB is done in software so it's slower than a 40G Ethernet 
> NIC.
>
> All servers are running 2 X Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. Tested 
> on FIO 64k read or write with queue depths 1 or 16. All the tests are running 
> with 1 FIO job, and with "direct=1".
>
> Without direct I/O:
> read 64k d1 113669KB/s
> read 64k d16618428KB/s
> write 64k d1134911KB/s
> write 64k d16   457005KB/s
>
> With direct I/O:
> read 64k d1 127134KB/s
> read 64k d16714629KB/s
> write 64k d1129268KB/s
> write 64k d16   461054KB/s
>
> Direct I/O improvement:
> read 64k d1 11%
> read 64k d1615%
> write 64k d1-5%
  ^^^
This is strange. Is it consistent across multiple runs?

> write 64k d16   1%

Good read performance improvements overall and it seems write
performance results need more investigations.

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 2/3] CIFS: Add support for direct I/O write

2018-11-16 Thread Pavel Shilovsky
  wdata->credits = credits;
> wdata->ctx = ctx;
> kref_get(>refcount);
> @@ -2668,13 +2754,17 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> INIT_LIST_HEAD(_list);
> list_del_init(>list);
>
> -   iov_iter_advance(_from,
> +   if (ctx->direct_io)
> +   rc = cifs_resend_wdata(wdata, 
> _list, ctx);
> +   else {
> +   iov_iter_advance(_from,
>  wdata->offset - ctx->pos);
>
> -   rc = cifs_write_from_iter(wdata->offset,
> +   rc = 
> cifs_write_from_iter(wdata->offset,
> wdata->bytes, _from,
> ctx->cfile, cifs_sb, 
> _list,
> ctx);
> +   }
>
> list_splice(_list, >list);
>
> @@ -2687,8 +2777,9 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> kref_put(>refcount, cifs_uncached_writedata_release);
> }
>
> -   for (i = 0; i < ctx->npages; i++)
> -   put_page(ctx->bv[i].bv_page);
> +   if (!ctx->direct_io)
> +   for (i = 0; i < ctx->npages; i++)
> +   put_page(ctx->bv[i].bv_page);
>
> cifs_stats_bytes_written(tcon, ctx->total_len);
> set_bit(CIFS_INO_INVALID_MAPPING, _I(dentry->d_inode)->flags);
> @@ -2703,7 +2794,7 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> complete(>done);
>  }
>
> -ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> +static ssize_t __cifs_writev(struct kiocb *iocb, struct iov_iter *from, bool 
> direct)
>  {
> struct file *file = iocb->ki_filp;
> ssize_t total_written = 0;
> @@ -2712,13 +2803,18 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
> iov_iter *from)
> struct cifs_sb_info *cifs_sb;
> struct cifs_aio_ctx *ctx;
> struct iov_iter saved_from = *from;
> +   size_t len = iov_iter_count(from);
> int rc;
>
> /*
> -* BB - optimize the way when signing is disabled. We can drop this
> -* extra memory-to-memory copying and use iovec buffers for 
> constructing
> -* write request.
> +* iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> +* In this case, fall back to non-direct write function.
> +* this could be improved by getting pages directly in ITER_KVEC
>  */
> +   if (direct && from->type & ITER_KVEC) {
> +   cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n");
> +   direct = false;
> +   }
>
> rc = generic_write_checks(iocb, from);
> if (rc <= 0)
> @@ -2742,10 +2838,16 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
> iov_iter *from)
>
> ctx->pos = iocb->ki_pos;
>
> -   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> -   if (rc) {
> -   kref_put(>refcount, cifs_aio_ctx_release);
> -   return rc;
> +   if (direct) {
> +   ctx->direct_io = true;
> +   ctx->iter = *from;
> +   ctx->len = len;
> +   } else {
> +   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> +   if (rc) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return rc;
> +   }
> }
>
> /* grab a lock here due to read response handlers can access ctx */
> @@ -2795,6 +2897,16 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
> iov_iter *from)
> return total_written;
>  }
>
> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> +{
> +   return __cifs_writev(iocb, from, true);
> +}
> +
> +ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> +{
> +   return __cifs_writev(iocb, from, false);
> +}
> +
>  static ssize_t
>  cifs_writev(struct kiocb *iocb, struct iov_iter *from)
>  {
> --
> 2.7.4
>

Did you measure the performance benefit of using O_DIRECT with your
patches for non-RDMA mode?

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 2/3] CIFS: Add support for direct I/O write

2018-11-16 Thread Pavel Shilovsky
  wdata->credits = credits;
> wdata->ctx = ctx;
> kref_get(>refcount);
> @@ -2668,13 +2754,17 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> INIT_LIST_HEAD(_list);
> list_del_init(>list);
>
> -   iov_iter_advance(_from,
> +   if (ctx->direct_io)
> +   rc = cifs_resend_wdata(wdata, 
> _list, ctx);
> +   else {
> +   iov_iter_advance(_from,
>  wdata->offset - ctx->pos);
>
> -   rc = cifs_write_from_iter(wdata->offset,
> +   rc = 
> cifs_write_from_iter(wdata->offset,
> wdata->bytes, _from,
> ctx->cfile, cifs_sb, 
> _list,
> ctx);
> +   }
>
> list_splice(_list, >list);
>
> @@ -2687,8 +2777,9 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> kref_put(>refcount, cifs_uncached_writedata_release);
> }
>
> -   for (i = 0; i < ctx->npages; i++)
> -   put_page(ctx->bv[i].bv_page);
> +   if (!ctx->direct_io)
> +   for (i = 0; i < ctx->npages; i++)
> +   put_page(ctx->bv[i].bv_page);
>
> cifs_stats_bytes_written(tcon, ctx->total_len);
> set_bit(CIFS_INO_INVALID_MAPPING, _I(dentry->d_inode)->flags);
> @@ -2703,7 +2794,7 @@ static void collect_uncached_write_data(struct 
> cifs_aio_ctx *ctx)
> complete(>done);
>  }
>
> -ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> +static ssize_t __cifs_writev(struct kiocb *iocb, struct iov_iter *from, bool 
> direct)
>  {
> struct file *file = iocb->ki_filp;
> ssize_t total_written = 0;
> @@ -2712,13 +2803,18 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
> iov_iter *from)
> struct cifs_sb_info *cifs_sb;
> struct cifs_aio_ctx *ctx;
> struct iov_iter saved_from = *from;
> +   size_t len = iov_iter_count(from);
> int rc;
>
> /*
> -* BB - optimize the way when signing is disabled. We can drop this
> -* extra memory-to-memory copying and use iovec buffers for 
> constructing
> -* write request.
> +* iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> +* In this case, fall back to non-direct write function.
> +* this could be improved by getting pages directly in ITER_KVEC
>  */
> +   if (direct && from->type & ITER_KVEC) {
> +   cifs_dbg(FYI, "use non-direct cifs_writev for kvec I/O\n");
> +   direct = false;
> +   }
>
> rc = generic_write_checks(iocb, from);
> if (rc <= 0)
> @@ -2742,10 +2838,16 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
> iov_iter *from)
>
> ctx->pos = iocb->ki_pos;
>
> -   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> -   if (rc) {
> -   kref_put(>refcount, cifs_aio_ctx_release);
> -   return rc;
> +   if (direct) {
> +   ctx->direct_io = true;
> +   ctx->iter = *from;
> +   ctx->len = len;
> +   } else {
> +   rc = setup_aio_ctx_iter(ctx, from, WRITE);
> +   if (rc) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return rc;
> +   }
> }
>
> /* grab a lock here due to read response handlers can access ctx */
> @@ -2795,6 +2897,16 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct 
> iov_iter *from)
> return total_written;
>  }
>
> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> +{
> +   return __cifs_writev(iocb, from, true);
> +}
> +
> +ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> +{
> +   return __cifs_writev(iocb, from, false);
> +}
> +
>  static ssize_t
>  cifs_writev(struct kiocb *iocb, struct iov_iter *from)
>  {
> --
> 2.7.4
>

Did you measure the performance benefit of using O_DIRECT with your
patches for non-RDMA mode?

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 1/3] CIFS: Add support for direct I/O read

2018-11-16 Thread Pavel Shilovsky
Y too.

>  }
>
> -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> +ssize_t __cifs_readv(struct kiocb *iocb, struct iov_iter *to, bool direct)
>  {
> -   struct file *file = iocb->ki_filp;
> -   ssize_t rc;
> size_t len;
> -   ssize_t total_read = 0;
> -   loff_t offset = iocb->ki_pos;
> +   struct file *file = iocb->ki_filp;
> struct cifs_sb_info *cifs_sb;
> -   struct cifs_tcon *tcon;
> struct cifsFileInfo *cfile;
> +   struct cifs_tcon *tcon;
> +   ssize_t rc, total_read = 0;
> +   loff_t offset = iocb->ki_pos;
> struct cifs_aio_ctx *ctx;
>
> +   /*
> +* iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> +* fall back to data copy read path
> +* this could be improved by getting pages directly in ITER_KVEC
> +*/
> +   if (direct && to->type & ITER_KVEC) {
> +   cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec 
> I/O\n");
> +   direct = false;
> +   }
> +
> len = iov_iter_count(to);
> if (!len)
> return 0;
> @@ -3305,14 +3430,20 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct 
> iov_iter *to)
> if (to->type == ITER_IOVEC)
> ctx->should_dirty = true;
>
> -   rc = setup_aio_ctx_iter(ctx, to, READ);
> -   if (rc) {
> -   kref_put(>refcount, cifs_aio_ctx_release);
> -   return rc;
> +   if (direct) {
> +   ctx->pos = offset;
> +   ctx->direct_io = true;
> +   ctx->iter = *to;
> +   ctx->len = len;
> +   } else {
> +   rc = setup_aio_ctx_iter(ctx, to, READ);
> +   if (rc) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return rc;
> +   }
> +   len = ctx->len;
> }
>
> -   len = ctx->len;
> -
> /* grab a lock here due to read response handlers can access ctx */
> mutex_lock(>aio_mutex);
>
> @@ -3354,6 +3485,16 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct 
> iov_iter *to)
> return rc;

In case of a blocking read we will return -EBUSY to the caller but
read man page doesn't say anything about a possibility to return this
error code. Is it being mapped somehow by VFS or is there another
consideration? The same question applies to the write patch as well.

>  }
>
> +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> +   return __cifs_readv(iocb, to, true);
> +}
> +
> +ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> +   return __cifs_readv(iocb, to, false);
> +}
> +
>  ssize_t
>  cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
>  {
> --
> 2.7.4
>

--
Best regards,
Pavel Shilovsky


Re: [Patch v4 1/3] CIFS: Add support for direct I/O read

2018-11-16 Thread Pavel Shilovsky
Y too.

>  }
>
> -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> +ssize_t __cifs_readv(struct kiocb *iocb, struct iov_iter *to, bool direct)
>  {
> -   struct file *file = iocb->ki_filp;
> -   ssize_t rc;
> size_t len;
> -   ssize_t total_read = 0;
> -   loff_t offset = iocb->ki_pos;
> +   struct file *file = iocb->ki_filp;
> struct cifs_sb_info *cifs_sb;
> -   struct cifs_tcon *tcon;
> struct cifsFileInfo *cfile;
> +   struct cifs_tcon *tcon;
> +   ssize_t rc, total_read = 0;
> +   loff_t offset = iocb->ki_pos;
> struct cifs_aio_ctx *ctx;
>
> +   /*
> +* iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> +* fall back to data copy read path
> +* this could be improved by getting pages directly in ITER_KVEC
> +*/
> +   if (direct && to->type & ITER_KVEC) {
> +   cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec 
> I/O\n");
> +   direct = false;
> +   }
> +
> len = iov_iter_count(to);
> if (!len)
> return 0;
> @@ -3305,14 +3430,20 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct 
> iov_iter *to)
> if (to->type == ITER_IOVEC)
> ctx->should_dirty = true;
>
> -   rc = setup_aio_ctx_iter(ctx, to, READ);
> -   if (rc) {
> -   kref_put(>refcount, cifs_aio_ctx_release);
> -   return rc;
> +   if (direct) {
> +   ctx->pos = offset;
> +   ctx->direct_io = true;
> +   ctx->iter = *to;
> +   ctx->len = len;
> +   } else {
> +   rc = setup_aio_ctx_iter(ctx, to, READ);
> +   if (rc) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return rc;
> +   }
> +   len = ctx->len;
> }
>
> -   len = ctx->len;
> -
> /* grab a lock here due to read response handlers can access ctx */
> mutex_lock(>aio_mutex);
>
> @@ -3354,6 +3485,16 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct 
> iov_iter *to)
> return rc;

In case of a blocking read we will return -EBUSY to the caller but
read man page doesn't say anything about a possibility to return this
error code. Is it being mapped somehow by VFS or is there another
consideration? The same question applies to the write patch as well.

>  }
>
> +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> +   return __cifs_readv(iocb, to, true);
> +}
> +
> +ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> +   return __cifs_readv(iocb, to, false);
> +}
> +
>  ssize_t
>  cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
>  {
> --
> 2.7.4
>

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: Print message when attempting mount

2018-10-01 Thread Pavel Shilovsky
пн, 1 окт. 2018 г. в 17:59, Rodrigo Freire :
>
> By default, no messages are printed when mounting a CIFS filesystem.
> This information is valuable when troubleshooting and/or forensic
> analyzing a system and finding out if was a CIFS endpoint mount
> attempted.
> Other filesystems such as XFS, EXT* does issue a printk() when mounting
> their filesystems.
>
> Sample output:
>
> [root@ll-rhel7 ~]# mount -o user=administrator //172.25.250.18/c$ /mnt
> (non-existent system)
>
> [root@ll-rhel7 ~]# mount -o user=administrator //172.25.250.19/c$ /mnt
> (Valid system)
>
> Kernel message log:
>
> [  450.464543] CIFS VFS: Attempting to mount //172.25.250.18/c$
> [  456.478186] CIFS VFS: Error connecting to socket. Aborting operation.
> [  456.478381] CIFS VFS: cifs_mount failed w/return code = -113
> [  467.688866] CIFS VFS: Attempting to mount //172.25.250.19/c$
>
> Signed-off-by: Rodrigo Freire 

Hi Rodrigo,

> ---
>  fs/cifs/cifsfs.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 7065426..3f5a31e 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -707,6 +707,7 @@ static int cifs_set_super(struct super_block *sb, void 
> *data)
> struct cifs_mnt_data mnt_data;
> struct dentry *root;
>
> +   cifs_dbg(VFS, "Attempting to mount %s\n", dev_name);
> cifs_dbg(FYI, "Devname: %s flags: %d\n", dev_name, flags);

Let's change the next FYI line to not contain devname thus avoiding
duplicate logging when FYI is enabled.

>
> volume_info = cifs_get_volume_info((char *)data, dev_name, is_smb3);
> --
> 1.8.3.1
>

Other than the comment above I agree with the change.

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: Print message when attempting mount

2018-10-01 Thread Pavel Shilovsky
пн, 1 окт. 2018 г. в 17:59, Rodrigo Freire :
>
> By default, no messages are printed when mounting a CIFS filesystem.
> This information is valuable when troubleshooting and/or forensic
> analyzing a system and finding out if was a CIFS endpoint mount
> attempted.
> Other filesystems such as XFS, EXT* does issue a printk() when mounting
> their filesystems.
>
> Sample output:
>
> [root@ll-rhel7 ~]# mount -o user=administrator //172.25.250.18/c$ /mnt
> (non-existent system)
>
> [root@ll-rhel7 ~]# mount -o user=administrator //172.25.250.19/c$ /mnt
> (Valid system)
>
> Kernel message log:
>
> [  450.464543] CIFS VFS: Attempting to mount //172.25.250.18/c$
> [  456.478186] CIFS VFS: Error connecting to socket. Aborting operation.
> [  456.478381] CIFS VFS: cifs_mount failed w/return code = -113
> [  467.688866] CIFS VFS: Attempting to mount //172.25.250.19/c$
>
> Signed-off-by: Rodrigo Freire 

Hi Rodrigo,

> ---
>  fs/cifs/cifsfs.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 7065426..3f5a31e 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -707,6 +707,7 @@ static int cifs_set_super(struct super_block *sb, void 
> *data)
> struct cifs_mnt_data mnt_data;
> struct dentry *root;
>
> +   cifs_dbg(VFS, "Attempting to mount %s\n", dev_name);
> cifs_dbg(FYI, "Devname: %s flags: %d\n", dev_name, flags);

Let's change the next FYI line to not contain devname thus avoiding
duplicate logging when FYI is enabled.

>
> volume_info = cifs_get_volume_info((char *)data, dev_name, is_smb3);
> --
> 1.8.3.1
>

Other than the comment above I agree with the change.

--
Best regards,
Pavel Shilovsky


Re: [PATCH V3 (resend) 4/7] CIFS: Add support for direct I/O write

2018-09-21 Thread Pavel Shilovsky
gt; +   if (!ctx)
> +   return -ENOMEM;
> +
> +   ctx->cfile = cifsFileInfo_get(cfile);
> +
> +   if (!is_sync_kiocb(iocb))
> +   ctx->iocb = iocb;
> +
> +   ctx->pos = iocb->ki_pos;
> +
> +   ctx->direct_io = true;
> +   ctx->iter = *from;
> +   ctx->len = len;
> +
> +   /* grab a lock here due to read response handlers can access ctx */
> +   mutex_lock(>aio_mutex);
> +
> +   rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, from,
> + cfile, cifs_sb, >list, ctx);
> +
> +   /*
> +* If at least one write was successfully sent, then discard any rc
> +* value from the later writes. If the other write succeeds, then
> +* we'll end up returning whatever was written. If it fails, then
> +* we'll get a new rc value from that.
> +*/
> +   if (!list_empty(>list))
> +   rc = 0;
> +
> +   mutex_unlock(>aio_mutex);
> +
> +   if (rc) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return rc;
> +   }
> +
> +   if (!is_sync_kiocb(iocb)) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return -EIOCBQUEUED;
> +   }
> +
> +   rc = wait_for_completion_killable(>done);
> +   if (rc) {
> +   mutex_lock(>aio_mutex);
> +   ctx->rc = rc = -EINTR;
> +   total_written = ctx->total_len;
> +   mutex_unlock(>aio_mutex);
> +   } else {
> +   rc = ctx->rc;
> +   total_written = ctx->total_len;
> +   }
> +
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +
> +   if (unlikely(!total_written))
> +   return rc;
> +
> +   iocb->ki_pos += total_written;
> +   return total_written;
> +}
> +
>  ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
>  {
> struct file *file = iocb->ki_filp;
> --
> 2.7.4
>


--
Best regards,
Pavel Shilovsky


Re: [PATCH V3 (resend) 4/7] CIFS: Add support for direct I/O write

2018-09-21 Thread Pavel Shilovsky
gt; +   if (!ctx)
> +   return -ENOMEM;
> +
> +   ctx->cfile = cifsFileInfo_get(cfile);
> +
> +   if (!is_sync_kiocb(iocb))
> +   ctx->iocb = iocb;
> +
> +   ctx->pos = iocb->ki_pos;
> +
> +   ctx->direct_io = true;
> +   ctx->iter = *from;
> +   ctx->len = len;
> +
> +   /* grab a lock here due to read response handlers can access ctx */
> +   mutex_lock(>aio_mutex);
> +
> +   rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, from,
> + cfile, cifs_sb, >list, ctx);
> +
> +   /*
> +* If at least one write was successfully sent, then discard any rc
> +* value from the later writes. If the other write succeeds, then
> +* we'll end up returning whatever was written. If it fails, then
> +* we'll get a new rc value from that.
> +*/
> +   if (!list_empty(>list))
> +   rc = 0;
> +
> +   mutex_unlock(>aio_mutex);
> +
> +   if (rc) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return rc;
> +   }
> +
> +   if (!is_sync_kiocb(iocb)) {
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +   return -EIOCBQUEUED;
> +   }
> +
> +   rc = wait_for_completion_killable(>done);
> +   if (rc) {
> +   mutex_lock(>aio_mutex);
> +   ctx->rc = rc = -EINTR;
> +   total_written = ctx->total_len;
> +   mutex_unlock(>aio_mutex);
> +   } else {
> +   rc = ctx->rc;
> +   total_written = ctx->total_len;
> +   }
> +
> +   kref_put(>refcount, cifs_aio_ctx_release);
> +
> +   if (unlikely(!total_written))
> +   return rc;
> +
> +   iocb->ki_pos += total_written;
> +   return total_written;
> +}
> +
>  ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
>  {
> struct file *file = iocb->ki_filp;
> --
> 2.7.4
>


--
Best regards,
Pavel Shilovsky


Re: [PATCH V3 (resend) 3/7] CIFS: Add support for direct I/O read

2018-09-21 Thread Pavel Shilovsky
чт, 20 сент. 2018 г. в 14:22, Long Li :
>
> From: Long Li 
>
> With direct I/O read, we transfer the data directly from transport layer to
> the user data buffer.
>
> Change in v3: add support for kernel AIO
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/cifsfs.h   |   1 +
>  fs/cifs/cifsglob.h |   5 ++
>  fs/cifs/file.c | 210 
> +
>  3 files changed, 187 insertions(+), 29 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index f047e87..ed5479c 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -101,6 +101,7 @@ extern int cifs_open(struct inode *inode, struct file 
> *file);
>  extern int cifs_close(struct inode *inode, struct file *file);
>  extern int cifs_closedir(struct inode *inode, struct file *file);
>  extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>  extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>  extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
>  extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 9dcaed0..2131fec 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1172,6 +1172,11 @@ struct cifs_aio_ctx {
> unsigned intlen;
> unsigned inttotal_len;
> boolshould_dirty;
> +   /*
> +* Indicates if this aio_ctx is for direct_io,
> +* If yes, iter is a copy of the user passed iov_iter
> +*/
> +   booldirect_io;
>  };
>
>  struct cifs_readdata;
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 8d41ca7..6a939fa 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2965,7 +2965,6 @@ cifs_uncached_readdata_release(struct kref *refcount)
> kref_put(>ctx->refcount, cifs_aio_ctx_release);
> for (i = 0; i < rdata->nr_pages; i++) {
> put_page(rdata->pages[i]);
> -   rdata->pages[i] = NULL;

why is this needed?

> }
> cifs_readdata_release(refcount);
>  }
> @@ -3004,7 +3003,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, 
> struct iov_iter *iter)
> return remaining ? -EFAULT : 0;
>  }
>
> -static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
> +static void collect_uncached_read_data(struct cifs_readdata *rdata, struct 
> cifs_aio_ctx *ctx);
>
>  static void
>  cifs_uncached_readv_complete(struct work_struct *work)
> @@ -3013,7 +3012,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
> struct cifs_readdata, work);
>
> complete(>done);
> -   collect_uncached_read_data(rdata->ctx);
> +   collect_uncached_read_data(rdata, rdata->ctx);
> /* the below call can possibly free the last ref to aio ctx */
> kref_put(>refcount, cifs_uncached_readdata_release);
>  }
> @@ -3103,6 +3102,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct 
> cifsFileInfo *open_file,
> int rc;
> pid_t pid;
> struct TCP_Server_Info *server;
> +   struct page **pagevec;
> +   size_t start;
> +   struct iov_iter direct_iov = ctx->iter;
>
> server = tlink_tcon(open_file->tlink)->ses->server;
>
> @@ -3111,6 +3113,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct 
> cifsFileInfo *open_file,
> else
> pid = current->tgid;
>
> +   if (ctx->direct_io)
> +   iov_iter_advance(_iov, offset - ctx->pos);
> +
> do {
> rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
>, );
> @@ -3118,20 +3123,56 @@ cifs_send_async_read(loff_t offset, size_t len, 
> struct cifsFileInfo *open_file,
> break;
>
> cur_len = min_t(const size_t, len, rsize);
> -   npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
>
> -   /* allocate a readdata struct */
> -   rdata = cifs_readdata_alloc(npages,
> +   if (ctx->direct_io) {
> +
> +   cur_len = iov_iter_get_pages_alloc(
> +   _iov, ,
> +   cur_len, );
> +   if (cur_len < 0) {
> +   cifs_dbg(VFS,
> +   "couldn't get user pages 
> (cur_len=%zd)"
> +   " iter type %d"
> +   " iov_offset %zd count %zd\n",
> +   cur_len, direct_iov.type, 
> direct_iov.iov_offset,
> +   direct_iov.count);
> +   dump_stack();
> +   break;
> +   }
> +

Re: [PATCH V3 (resend) 3/7] CIFS: Add support for direct I/O read

2018-09-21 Thread Pavel Shilovsky
чт, 20 сент. 2018 г. в 14:22, Long Li :
>
> From: Long Li 
>
> With direct I/O read, we transfer the data directly from transport layer to
> the user data buffer.
>
> Change in v3: add support for kernel AIO
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/cifsfs.h   |   1 +
>  fs/cifs/cifsglob.h |   5 ++
>  fs/cifs/file.c | 210 
> +
>  3 files changed, 187 insertions(+), 29 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index f047e87..ed5479c 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -101,6 +101,7 @@ extern int cifs_open(struct inode *inode, struct file 
> *file);
>  extern int cifs_close(struct inode *inode, struct file *file);
>  extern int cifs_closedir(struct inode *inode, struct file *file);
>  extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>  extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>  extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
>  extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 9dcaed0..2131fec 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1172,6 +1172,11 @@ struct cifs_aio_ctx {
> unsigned intlen;
> unsigned inttotal_len;
> boolshould_dirty;
> +   /*
> +* Indicates if this aio_ctx is for direct_io,
> +* If yes, iter is a copy of the user passed iov_iter
> +*/
> +   booldirect_io;
>  };
>
>  struct cifs_readdata;
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 8d41ca7..6a939fa 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2965,7 +2965,6 @@ cifs_uncached_readdata_release(struct kref *refcount)
> kref_put(>ctx->refcount, cifs_aio_ctx_release);
> for (i = 0; i < rdata->nr_pages; i++) {
> put_page(rdata->pages[i]);
> -   rdata->pages[i] = NULL;

why is this needed?

> }
> cifs_readdata_release(refcount);
>  }
> @@ -3004,7 +3003,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, 
> struct iov_iter *iter)
> return remaining ? -EFAULT : 0;
>  }
>
> -static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
> +static void collect_uncached_read_data(struct cifs_readdata *rdata, struct 
> cifs_aio_ctx *ctx);
>
>  static void
>  cifs_uncached_readv_complete(struct work_struct *work)
> @@ -3013,7 +3012,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
> struct cifs_readdata, work);
>
> complete(>done);
> -   collect_uncached_read_data(rdata->ctx);
> +   collect_uncached_read_data(rdata, rdata->ctx);
> /* the below call can possibly free the last ref to aio ctx */
> kref_put(>refcount, cifs_uncached_readdata_release);
>  }
> @@ -3103,6 +3102,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct 
> cifsFileInfo *open_file,
> int rc;
> pid_t pid;
> struct TCP_Server_Info *server;
> +   struct page **pagevec;
> +   size_t start;
> +   struct iov_iter direct_iov = ctx->iter;
>
> server = tlink_tcon(open_file->tlink)->ses->server;
>
> @@ -3111,6 +3113,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct 
> cifsFileInfo *open_file,
> else
> pid = current->tgid;
>
> +   if (ctx->direct_io)
> +   iov_iter_advance(_iov, offset - ctx->pos);
> +
> do {
> rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
>, );
> @@ -3118,20 +3123,56 @@ cifs_send_async_read(loff_t offset, size_t len, 
> struct cifsFileInfo *open_file,
> break;
>
> cur_len = min_t(const size_t, len, rsize);
> -   npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
>
> -   /* allocate a readdata struct */
> -   rdata = cifs_readdata_alloc(npages,
> +   if (ctx->direct_io) {
> +
> +   cur_len = iov_iter_get_pages_alloc(
> +   _iov, ,
> +   cur_len, );
> +   if (cur_len < 0) {
> +   cifs_dbg(VFS,
> +   "couldn't get user pages 
> (cur_len=%zd)"
> +   " iter type %d"
> +   " iov_offset %zd count %zd\n",
> +   cur_len, direct_iov.type, 
> direct_iov.iov_offset,
> +   direct_iov.count);
> +   dump_stack();
> +   break;
> +   }
> +

Re: [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations

2018-06-07 Thread Pavel Shilovsky
2018-05-30 12:48 GMT-07:00 Long Li :
> From: Long Li 
>
> With direct read/write functions implemented, add them to file_operations.
> When mounting with "cache=none", CIFS uses direct I/O.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/cifsfs.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 62f1662..e84f8c2 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
>  };
>
>  const struct file_operations cifs_file_direct_ops = {
> -   /* BB reevaluate whether they can be done with directio, no cache */
> -   .read_iter = cifs_user_readv,
> -   .write_iter = cifs_user_writev,
> +   .read_iter = cifs_direct_readv,
> +   .write_iter = cifs_direct_writev,

I would postpone making this change until we have asynchronous I/O
support for direct mode.

--
Best regards,
Pavel Shilovsky


Re: [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations

2018-06-07 Thread Pavel Shilovsky
2018-05-30 12:48 GMT-07:00 Long Li :
> From: Long Li 
>
> With direct read/write functions implemented, add them to file_operations.
> When mounting with "cache=none", CIFS uses direct I/O.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/cifsfs.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 62f1662..e84f8c2 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
>  };
>
>  const struct file_operations cifs_file_direct_ops = {
> -   /* BB reevaluate whether they can be done with directio, no cache */
> -   .read_iter = cifs_user_readv,
> -   .write_iter = cifs_user_writev,
> +   .read_iter = cifs_direct_readv,
> +   .write_iter = cifs_direct_writev,

I would postpone making this change until we have asynchronous I/O
support for direct mode.

--
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: dir: fix memory leak in cifs_mknod

2018-04-20 Thread Pavel Shilovsky
2018-04-20 10:37 GMT-07:00 Steve French <smfre...@gmail.com>:
> I noticed a similar problem with the tcon link leak on that (which
> Colin and Gustavo pointed out - thank you!) but also in another return
> statement, so updated the original patch of Ronnie's merging the fixes
>
> https://git.samba.org/sfrench/cifs-2.6.git/?p=sfrench/cifs-2.6.git;a=commit;h=167bc5de08dc97695f9d5c7069c3e69f409ff80b
>
>
> Let me know if you see any problems with it.
>

Looks good.

Reviewed-by: Pavel Shilovsky <pshi...@microsoft.com>

--
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: dir: fix memory leak in cifs_mknod

2018-04-20 Thread Pavel Shilovsky
2018-04-20 10:37 GMT-07:00 Steve French :
> I noticed a similar problem with the tcon link leak on that (which
> Colin and Gustavo pointed out - thank you!) but also in another return
> statement, so updated the original patch of Ronnie's merging the fixes
>
> https://git.samba.org/sfrench/cifs-2.6.git/?p=sfrench/cifs-2.6.git;a=commit;h=167bc5de08dc97695f9d5c7069c3e69f409ff80b
>
>
> Let me know if you see any problems with it.
>

Looks good.

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [Patch v4] cifs: Allocate validate negotiation request through kmalloc

2018-04-20 Thread Pavel Shilovsky
2018-04-20 7:55 GMT-07:00 Tom Talpey :
> Looks good, but I have two possibly style-related comments.
>
>
> On 4/19/2018 5:38 PM, Long Li wrote:
>>
>> From: Long Li 
>>
>> The data buffer allocated on the stack can't be DMA'ed, ib_dma_map_page
>> will
>> return an invalid DMA address for a buffer on stack. Even worse, this
>> incorrect address can't be detected by ib_dma_mapping_error. Sending data
>> from this address to hardware will not fail, but the remote peer will get
>> junk data.
>>
>> Fix this by allocating the request on the heap in smb3_validate_negotiate.
>>
>> Changes in v2:
>> Removed duplicated code on freeing buffers on function exit.
>> (Thanks to Parav Pandit )
>> Fixed typo in the patch title.
>>
>> Changes in v3:
>> Added "Fixes" to the patch.
>> Changed several sizeof() to use *pointer in place of struct.
>>
>> Changes in v4:
>> Added detailed comments on the failure through RDMA.
>> Allocate request buffer using GPF_NOFS.
>> Fixed possible memory leak.
>>
>> Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks")
>> Signed-off-by: Long Li 
>> ---
>>   fs/cifs/smb2pdu.c | 61
>> ++-
>>   1 file changed, 33 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
>> index 0f044c4..caa2a1e 100644
>> --- a/fs/cifs/smb2pdu.c
>> +++ b/fs/cifs/smb2pdu.c
>> @@ -729,8 +729,8 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses
>> *ses)
>> int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon
>> *tcon)
>>   {
>> -   int rc = 0;
>> -   struct validate_negotiate_info_req vneg_inbuf;
>> +   int ret, rc = -EIO;
>
>
> Seems awkward to have "rc" and "ret", and based on the code below I
> don't see why two variables are needed. Simplify? (see later comment)
>
>
>> +   struct validate_negotiate_info_req *pneg_inbuf;
>> struct validate_negotiate_info_rsp *pneg_rsp = NULL;
>> u32 rsplen;
>> u32 inbuflen; /* max of 4 dialects */
>> @@ -741,7 +741,6 @@ int smb3_validate_negotiate(const unsigned int xid,
>> struct cifs_tcon *tcon)
>> if (tcon->ses->server->rdma)
>> return 0;
>>   #endif
>> -
>> /* In SMB3.11 preauth integrity supersedes validate negotiate */
>> if (tcon->ses->server->dialect == SMB311_PROT_ID)
>> return 0;
>> @@ -764,63 +763,67 @@ int smb3_validate_negotiate(const unsigned int xid,
>> struct cifs_tcon *tcon)
>> if (tcon->ses->session_flags & SMB2_SESSION_FLAG_IS_NULL)
>> cifs_dbg(VFS, "Unexpected null user (anonymous) auth flag
>> sent by server\n");
>>   - vneg_inbuf.Capabilities =
>> +   pneg_inbuf = kmalloc(sizeof(*pneg_inbuf), GFP_NOFS);
>> +   if (!pneg_inbuf)
>> +   return -ENOMEM;
>> +
>> +   pneg_inbuf->Capabilities =
>>
>> cpu_to_le32(tcon->ses->server->vals->req_capabilities);
>> -   memcpy(vneg_inbuf.Guid, tcon->ses->server->client_guid,
>> +   memcpy(pneg_inbuf->Guid, tcon->ses->server->client_guid,
>> SMB2_CLIENT_GUID_SIZE);
>> if (tcon->ses->sign)
>> -   vneg_inbuf.SecurityMode =
>> +   pneg_inbuf->SecurityMode =
>> cpu_to_le16(SMB2_NEGOTIATE_SIGNING_REQUIRED);
>> else if (global_secflags & CIFSSEC_MAY_SIGN)
>> -   vneg_inbuf.SecurityMode =
>> +   pneg_inbuf->SecurityMode =
>> cpu_to_le16(SMB2_NEGOTIATE_SIGNING_ENABLED);
>> else
>> -   vneg_inbuf.SecurityMode = 0;
>> +   pneg_inbuf->SecurityMode = 0;
>> if (strcmp(tcon->ses->server->vals->version_string,
>> SMB3ANY_VERSION_STRING) == 0) {
>> -   vneg_inbuf.Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
>> -   vneg_inbuf.Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
>> -   vneg_inbuf.DialectCount = cpu_to_le16(2);
>> +   pneg_inbuf->Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
>> +   pneg_inbuf->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
>> +   pneg_inbuf->DialectCount = cpu_to_le16(2);
>> /* structure is big enough for 3 dialects, sending only 2
>> */
>> inbuflen = sizeof(struct validate_negotiate_info_req) - 2;
>
>
> The "- 2" is a little confusing here. This was existing code, but I
> suggest you change this to a sizeof (u16) construct for consistency.
> It's reducing by the length of a single Dialects[n] entry.

I would suggest to make it "- sizeof(pneg_inbuf->Dialects[0])"...

>
>> } else if (strcmp(tcon->ses->server->vals->version_string,
>> SMBDEFAULT_VERSION_STRING) == 0) {
>> -   vneg_inbuf.Dialects[0] = cpu_to_le16(SMB21_PROT_ID);
>> -   vneg_inbuf.Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
>> -   

Re: [Patch v4] cifs: Allocate validate negotiation request through kmalloc

2018-04-20 Thread Pavel Shilovsky
2018-04-20 7:55 GMT-07:00 Tom Talpey :
> Looks good, but I have two possibly style-related comments.
>
>
> On 4/19/2018 5:38 PM, Long Li wrote:
>>
>> From: Long Li 
>>
>> The data buffer allocated on the stack can't be DMA'ed, ib_dma_map_page
>> will
>> return an invalid DMA address for a buffer on stack. Even worse, this
>> incorrect address can't be detected by ib_dma_mapping_error. Sending data
>> from this address to hardware will not fail, but the remote peer will get
>> junk data.
>>
>> Fix this by allocating the request on the heap in smb3_validate_negotiate.
>>
>> Changes in v2:
>> Removed duplicated code on freeing buffers on function exit.
>> (Thanks to Parav Pandit )
>> Fixed typo in the patch title.
>>
>> Changes in v3:
>> Added "Fixes" to the patch.
>> Changed several sizeof() to use *pointer in place of struct.
>>
>> Changes in v4:
>> Added detailed comments on the failure through RDMA.
>> Allocate request buffer using GPF_NOFS.
>> Fixed possible memory leak.
>>
>> Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks")
>> Signed-off-by: Long Li 
>> ---
>>   fs/cifs/smb2pdu.c | 61
>> ++-
>>   1 file changed, 33 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
>> index 0f044c4..caa2a1e 100644
>> --- a/fs/cifs/smb2pdu.c
>> +++ b/fs/cifs/smb2pdu.c
>> @@ -729,8 +729,8 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses
>> *ses)
>> int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon
>> *tcon)
>>   {
>> -   int rc = 0;
>> -   struct validate_negotiate_info_req vneg_inbuf;
>> +   int ret, rc = -EIO;
>
>
> Seems awkward to have "rc" and "ret", and based on the code below I
> don't see why two variables are needed. Simplify? (see later comment)
>
>
>> +   struct validate_negotiate_info_req *pneg_inbuf;
>> struct validate_negotiate_info_rsp *pneg_rsp = NULL;
>> u32 rsplen;
>> u32 inbuflen; /* max of 4 dialects */
>> @@ -741,7 +741,6 @@ int smb3_validate_negotiate(const unsigned int xid,
>> struct cifs_tcon *tcon)
>> if (tcon->ses->server->rdma)
>> return 0;
>>   #endif
>> -
>> /* In SMB3.11 preauth integrity supersedes validate negotiate */
>> if (tcon->ses->server->dialect == SMB311_PROT_ID)
>> return 0;
>> @@ -764,63 +763,67 @@ int smb3_validate_negotiate(const unsigned int xid,
>> struct cifs_tcon *tcon)
>> if (tcon->ses->session_flags & SMB2_SESSION_FLAG_IS_NULL)
>> cifs_dbg(VFS, "Unexpected null user (anonymous) auth flag
>> sent by server\n");
>>   - vneg_inbuf.Capabilities =
>> +   pneg_inbuf = kmalloc(sizeof(*pneg_inbuf), GFP_NOFS);
>> +   if (!pneg_inbuf)
>> +   return -ENOMEM;
>> +
>> +   pneg_inbuf->Capabilities =
>>
>> cpu_to_le32(tcon->ses->server->vals->req_capabilities);
>> -   memcpy(vneg_inbuf.Guid, tcon->ses->server->client_guid,
>> +   memcpy(pneg_inbuf->Guid, tcon->ses->server->client_guid,
>> SMB2_CLIENT_GUID_SIZE);
>> if (tcon->ses->sign)
>> -   vneg_inbuf.SecurityMode =
>> +   pneg_inbuf->SecurityMode =
>> cpu_to_le16(SMB2_NEGOTIATE_SIGNING_REQUIRED);
>> else if (global_secflags & CIFSSEC_MAY_SIGN)
>> -   vneg_inbuf.SecurityMode =
>> +   pneg_inbuf->SecurityMode =
>> cpu_to_le16(SMB2_NEGOTIATE_SIGNING_ENABLED);
>> else
>> -   vneg_inbuf.SecurityMode = 0;
>> +   pneg_inbuf->SecurityMode = 0;
>> if (strcmp(tcon->ses->server->vals->version_string,
>> SMB3ANY_VERSION_STRING) == 0) {
>> -   vneg_inbuf.Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
>> -   vneg_inbuf.Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
>> -   vneg_inbuf.DialectCount = cpu_to_le16(2);
>> +   pneg_inbuf->Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
>> +   pneg_inbuf->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
>> +   pneg_inbuf->DialectCount = cpu_to_le16(2);
>> /* structure is big enough for 3 dialects, sending only 2
>> */
>> inbuflen = sizeof(struct validate_negotiate_info_req) - 2;
>
>
> The "- 2" is a little confusing here. This was existing code, but I
> suggest you change this to a sizeof (u16) construct for consistency.
> It's reducing by the length of a single Dialects[n] entry.

I would suggest to make it "- sizeof(pneg_inbuf->Dialects[0])"...

>
>> } else if (strcmp(tcon->ses->server->vals->version_string,
>> SMBDEFAULT_VERSION_STRING) == 0) {
>> -   vneg_inbuf.Dialects[0] = cpu_to_le16(SMB21_PROT_ID);
>> -   vneg_inbuf.Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
>> -   vneg_inbuf.Dialects[2] = cpu_to_le16(SMB302_PROT_ID);
>> -   vneg_inbuf.DialectCount = 

Re: [PATCH] cifs: smb2ops: Fix NULL check in smb2_query_symlink

2018-04-13 Thread Pavel Shilovsky
2018-04-13 8:13 GMT-07:00 Gustavo A. R. Silva <gust...@embeddedor.com>:
> The current code null checks variable err_buf, which is always null
> when it is checked, hence utf16_path is free'd and the function
> returns -ENOENT everytime it is called, making it impossible for the
> execution path to reach the following code:
>
> err_buf = err_iov.iov_base;
>
> Fix this by null checking err_iov.iov_base instead of err_buf. Also,
> notice that err_buf no longer needs to be initialized to NULL.
>
> Addresses-Coverity-ID: 1467876 ("Logically dead code")
> Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error 
> parameter")
> Signed-off-by: Gustavo A. R. Silva <gust...@embeddedor.com>
> ---
>  fs/cifs/smb2ops.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index b4ae932..38ebf3f 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -1452,7 +1452,7 @@ smb2_query_symlink(const unsigned int xid, struct 
> cifs_tcon *tcon,
> struct cifs_open_parms oparms;
> struct cifs_fid fid;
> struct kvec err_iov = {NULL, 0};
> -   struct smb2_err_rsp *err_buf = NULL;
> +   struct smb2_err_rsp *err_buf;
> struct smb2_symlink_err_rsp *symlink;
> unsigned int sub_len;
> unsigned int sub_offset;
> @@ -1476,7 +1476,7 @@ smb2_query_symlink(const unsigned int xid, struct 
> cifs_tcon *tcon,
>
> rc = SMB2_open(xid, , utf16_path, , NULL, _iov);
>
> -   if (!rc || !err_buf) {
> +   if (!rc || !err_iov.iov_base) {
> kfree(utf16_path);
> return -ENOENT;
> }
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks!

Reviewed-by: Pavel Shilovsky <pshi...@microsoft.com>

--
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: smb2ops: Fix NULL check in smb2_query_symlink

2018-04-13 Thread Pavel Shilovsky
2018-04-13 8:13 GMT-07:00 Gustavo A. R. Silva :
> The current code null checks variable err_buf, which is always null
> when it is checked, hence utf16_path is free'd and the function
> returns -ENOENT everytime it is called, making it impossible for the
> execution path to reach the following code:
>
> err_buf = err_iov.iov_base;
>
> Fix this by null checking err_iov.iov_base instead of err_buf. Also,
> notice that err_buf no longer needs to be initialized to NULL.
>
> Addresses-Coverity-ID: 1467876 ("Logically dead code")
> Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error 
> parameter")
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  fs/cifs/smb2ops.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index b4ae932..38ebf3f 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -1452,7 +1452,7 @@ smb2_query_symlink(const unsigned int xid, struct 
> cifs_tcon *tcon,
> struct cifs_open_parms oparms;
> struct cifs_fid fid;
> struct kvec err_iov = {NULL, 0};
> -   struct smb2_err_rsp *err_buf = NULL;
> +   struct smb2_err_rsp *err_buf;
> struct smb2_symlink_err_rsp *symlink;
> unsigned int sub_len;
> unsigned int sub_offset;
> @@ -1476,7 +1476,7 @@ smb2_query_symlink(const unsigned int xid, struct 
> cifs_tcon *tcon,
>
> rc = SMB2_open(xid, , utf16_path, , NULL, _iov);
>
> -   if (!rc || !err_buf) {
> +   if (!rc || !err_iov.iov_base) {
> kfree(utf16_path);
> return -ENOENT;
> }
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks!

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-22 Thread Pavel Shilovsky
2018-03-21 22:12 GMT-07:00 Srivatsa S. Bhat <sriva...@csail.mit.edu>:
> On 3/21/18 7:02 PM, Steve French wrote:
>> Found a patch which solves the dependency issue.  In my testing (on
>> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this
>> appears to fix the problem, but I will let Srivatsa confirm that it
>> also fixes it for him.  The two attached patches for 4.9 should work.
>>
>
> Indeed, those two patches fix the problem for me on 4.9. Thanks a lot
> Steve, Pavel and Aurelien for all your efforts in fixing this!
>
> I was also interested in getting this fixed on 4.4, so I modified the
> patches to apply on 4.4.88 and verified that they fix the mount
> failure. I have attached my patches for 4.4 with this mail.
>
> Steve, Pavel, could you kindly double-check the second patch for 4.4,
> especially around the keygen_exit error path?

The patch looks good. Thanks for the backport.

--
Best regards,
Pavel Shilovsky


Re: [PATCH 4.13 28/43] SMB3: Validate negotiate request must always be signed

2018-03-22 Thread Pavel Shilovsky
2018-03-21 22:12 GMT-07:00 Srivatsa S. Bhat :
> On 3/21/18 7:02 PM, Steve French wrote:
>> Found a patch which solves the dependency issue.  In my testing (on
>> 4.9, with Windows 2016, and also to Samba) as Pavel suggested this
>> appears to fix the problem, but I will let Srivatsa confirm that it
>> also fixes it for him.  The two attached patches for 4.9 should work.
>>
>
> Indeed, those two patches fix the problem for me on 4.9. Thanks a lot
> Steve, Pavel and Aurelien for all your efforts in fixing this!
>
> I was also interested in getting this fixed on 4.4, so I modified the
> patches to apply on 4.4.88 and verified that they fix the mount
> failure. I have attached my patches for 4.4 with this mail.
>
> Steve, Pavel, could you kindly double-check the second patch for 4.4,
> especially around the keygen_exit error path?

The patch looks good. Thanks for the backport.

--
Best regards,
Pavel Shilovsky


Re: [Patch v5 08/21] CIFS: SMBD: Upper layer reconnects to SMB Direct session

2017-11-01 Thread Pavel Shilovsky
2017-10-18 16:09 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> Do a reconnect on SMB Direct when it is used as the connection. Reconnect can
> happen for many reasons and it's mostly the decision of SMB2 upper layer.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/connect.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 2c0b34a..8ca3c13 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -406,7 +406,14 @@ cifs_reconnect(struct TCP_Server_Info *server)
>
> /* we should try only the port we connected to before */
> mutex_lock(>srv_mutex);
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +   if (server->rdma)
> +   rc = smbd_reconnect(server);
> +   else
> +   rc = generic_ip_connect(server);

Minor: here and in other similar places we can remove #else part below
and put #endif before else block above.

> +#else
> rc = generic_ip_connect(server);
> +#endif
> if (rc) {
> cifs_dbg(FYI, "reconnect error %d\n", rc);
> mutex_unlock(>srv_mutex);
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Best regards,
Pavel Shilovsky


Re: [Patch v5 08/21] CIFS: SMBD: Upper layer reconnects to SMB Direct session

2017-11-01 Thread Pavel Shilovsky
2017-10-18 16:09 GMT-07:00 Long Li :
> From: Long Li 
>
> Do a reconnect on SMB Direct when it is used as the connection. Reconnect can
> happen for many reasons and it's mostly the decision of SMB2 upper layer.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/connect.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 2c0b34a..8ca3c13 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -406,7 +406,14 @@ cifs_reconnect(struct TCP_Server_Info *server)
>
> /* we should try only the port we connected to before */
> mutex_lock(>srv_mutex);
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +   if (server->rdma)
> +   rc = smbd_reconnect(server);
> +   else
> +   rc = generic_ip_connect(server);

Minor: here and in other similar places we can remove #else part below
and put #endif before else block above.

> +#else
> rc = generic_ip_connect(server);
> +#endif
> if (rc) {
> cifs_dbg(FYI, "reconnect error %d\n", rc);
> mutex_unlock(>srv_mutex);
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Best regards,
Pavel Shilovsky


Re: [Patch v5 02/21] CIFS: SMBD: Establish SMB Direct connection

2017-11-01 Thread Pavel Shilovsky
2017-10-18 16:09 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> Add code to implement the core functions to establish a SMB Direct connection.
>
> 1. Establish an RDMA connection to SMB server.
> 2. Negotiate and setup SMB Direct protocol.
> 3. Implement idle connection timer and credit management.
>
> SMB Direct is enabled by setting CONFIG_CIFS_SMB_DIRECT.
>
> Add to Kconfig and Makefile to enable building SMB Direct.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/Kconfig |7 +
>  fs/cifs/Makefile|2 +
>  fs/cifs/smbdirect.c | 1576 
> +++
>  fs/cifs/smbdirect.h |  265 +
>  4 files changed, 1850 insertions(+)
>
> diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
> index f724361..cb7db8d 100644
> --- a/fs/cifs/Kconfig
> +++ b/fs/cifs/Kconfig
> @@ -191,6 +191,13 @@ config CIFS_SMB311
>   This dialect includes improved security negotiation features.
>   If unsure, say N
>
> +config CIFS_SMB_DIRECT
> +   bool "SMB Direct support (Experimental)"
> +   depends on CIFS && INFINIBAND
> +   help
> + Enables SMB Direct experimental support for SMB 3.0, 3.02 and 3.1.1.
> + SMB Direct allows transferring SMB packets over RDMA.
> +

Please add "If unsure, say N."

--
Best regards,
Pavel Shilovsky


Re: [Patch v5 02/21] CIFS: SMBD: Establish SMB Direct connection

2017-11-01 Thread Pavel Shilovsky
2017-10-18 16:09 GMT-07:00 Long Li :
> From: Long Li 
>
> Add code to implement the core functions to establish a SMB Direct connection.
>
> 1. Establish an RDMA connection to SMB server.
> 2. Negotiate and setup SMB Direct protocol.
> 3. Implement idle connection timer and credit management.
>
> SMB Direct is enabled by setting CONFIG_CIFS_SMB_DIRECT.
>
> Add to Kconfig and Makefile to enable building SMB Direct.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/Kconfig |7 +
>  fs/cifs/Makefile|2 +
>  fs/cifs/smbdirect.c | 1576 
> +++
>  fs/cifs/smbdirect.h |  265 +
>  4 files changed, 1850 insertions(+)
>
> diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
> index f724361..cb7db8d 100644
> --- a/fs/cifs/Kconfig
> +++ b/fs/cifs/Kconfig
> @@ -191,6 +191,13 @@ config CIFS_SMB311
>   This dialect includes improved security negotiation features.
>   If unsure, say N
>
> +config CIFS_SMB_DIRECT
> +   bool "SMB Direct support (Experimental)"
> +   depends on CIFS && INFINIBAND
> +   help
> + Enables SMB Direct experimental support for SMB 3.0, 3.02 and 3.1.1.
> + SMB Direct allows transferring SMB packets over RDMA.
> +

Please add "If unsure, say N."

--
Best regards,
Pavel Shilovsky


Re: [Patch v5 19/21] CIFS: SMBD: Read correct returned data length for RDMA write (SMB read) I/O

2017-11-01 Thread Pavel Shilovsky
2017-10-18 16:09 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> This patch is for preparing upper layer doing SMB read via RDMA write.
>
> When RDMA write is used for SMB read, the returned data length is in
> DataRemaining in the response packet. Reading it properly by adding a
> parameter to specifiy where the returned data length is.
>
> Add the defition for memory registration to wdata and return the correct
> length based on if RDMA write is used.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/cifsglob.h | 13 +++--
>  fs/cifs/cifssmb.c  |  7 ++-
>  fs/cifs/smb1ops.c  |  6 +-
>  fs/cifs/smb2ops.c  | 12 ++--
>  4 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 2ae7d02..ddf83d8 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -228,8 +228,14 @@ struct smb_version_operations {
> __u64 (*get_next_mid)(struct TCP_Server_Info *);
> /* data offset from read response message */
> unsigned int (*read_data_offset)(char *);
> -   /* data length from read response message */
> -   unsigned int (*read_data_length)(char *);
> +   /*
> +* Data length from read response message
> +* When in_remaining is true, the returned data length is in
> +* message field DataRemaining for out-of-band data read (e.g through
> +* Memory Registration RDMA write in SMBD).
> +* Otherwise, the returned data length is in message field DataLength.
> +*/
> +   unsigned int (*read_data_length)(char *, bool in_remaining);
> /* map smb to linux error */
> int (*map_error)(char *, bool);
> /* find mid corresponding to the response message */
> @@ -1148,6 +1154,9 @@ struct cifs_readdata {
> struct cifs_readdata *rdata,
> struct iov_iter *iter);
> struct kvec iov[2];
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +   struct smbd_mr  *mr;
> +#endif
> unsigned intpagesz;
> unsigned inttailsz;
> unsigned intcredits;
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 08ff56a..a343db4 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -1533,8 +1533,13 @@ cifs_readv_receive(struct TCP_Server_Info *server, 
> struct mid_q_entry *mid)
>  rdata->iov[0].iov_base, server->total_read);
>
> /* how much data is in the response? */
> -   data_len = server->ops->read_data_length(buf);
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +   data_len = server->ops->read_data_length(buf, rdata->mr);
> +   if (!rdata->mr && (data_offset + data_len > buflen)) {
> +#else
> +   data_len = server->ops->read_data_length(buf, false);
> if (data_offset + data_len > buflen) {
> +#endif
> /* data_len is corrupt -- discard frame */
> rdata->result = -EIO;
> return cifs_readv_discard(server, mid);
> diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c
> index a723df3..b718bb8 100644
> --- a/fs/cifs/smb1ops.c
> +++ b/fs/cifs/smb1ops.c
> @@ -87,9 +87,13 @@ cifs_read_data_offset(char *buf)
>  }
>
>  static unsigned int
> -cifs_read_data_length(char *buf)
> +cifs_read_data_length(char *buf, bool in_remaining)
>  {
> READ_RSP *rsp = (READ_RSP *)buf;
> +   if (in_remaining) {
> +   cifs_dbg(VFS, "Invalid SMB1 calling path, check code\n");

I would suggest to do WARN_ON(1) instead.

> +   return 0;
> +   }
> return (le16_to_cpu(rsp->DataLengthHigh) << 16) +
>le16_to_cpu(rsp->DataLength);
>  }


--
Best regards,
Pavel Shilovsky


Re: [Patch v5 19/21] CIFS: SMBD: Read correct returned data length for RDMA write (SMB read) I/O

2017-11-01 Thread Pavel Shilovsky
2017-10-18 16:09 GMT-07:00 Long Li :
> From: Long Li 
>
> This patch is for preparing upper layer doing SMB read via RDMA write.
>
> When RDMA write is used for SMB read, the returned data length is in
> DataRemaining in the response packet. Reading it properly by adding a
> parameter to specifiy where the returned data length is.
>
> Add the defition for memory registration to wdata and return the correct
> length based on if RDMA write is used.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/cifsglob.h | 13 +++--
>  fs/cifs/cifssmb.c  |  7 ++-
>  fs/cifs/smb1ops.c  |  6 +-
>  fs/cifs/smb2ops.c  | 12 ++--
>  4 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 2ae7d02..ddf83d8 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -228,8 +228,14 @@ struct smb_version_operations {
> __u64 (*get_next_mid)(struct TCP_Server_Info *);
> /* data offset from read response message */
> unsigned int (*read_data_offset)(char *);
> -   /* data length from read response message */
> -   unsigned int (*read_data_length)(char *);
> +   /*
> +* Data length from read response message
> +* When in_remaining is true, the returned data length is in
> +* message field DataRemaining for out-of-band data read (e.g through
> +* Memory Registration RDMA write in SMBD).
> +* Otherwise, the returned data length is in message field DataLength.
> +*/
> +   unsigned int (*read_data_length)(char *, bool in_remaining);
> /* map smb to linux error */
> int (*map_error)(char *, bool);
> /* find mid corresponding to the response message */
> @@ -1148,6 +1154,9 @@ struct cifs_readdata {
> struct cifs_readdata *rdata,
> struct iov_iter *iter);
> struct kvec iov[2];
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +   struct smbd_mr  *mr;
> +#endif
> unsigned intpagesz;
> unsigned inttailsz;
> unsigned intcredits;
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 08ff56a..a343db4 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -1533,8 +1533,13 @@ cifs_readv_receive(struct TCP_Server_Info *server, 
> struct mid_q_entry *mid)
>  rdata->iov[0].iov_base, server->total_read);
>
> /* how much data is in the response? */
> -   data_len = server->ops->read_data_length(buf);
> +#ifdef CONFIG_CIFS_SMB_DIRECT
> +   data_len = server->ops->read_data_length(buf, rdata->mr);
> +   if (!rdata->mr && (data_offset + data_len > buflen)) {
> +#else
> +   data_len = server->ops->read_data_length(buf, false);
> if (data_offset + data_len > buflen) {
> +#endif
> /* data_len is corrupt -- discard frame */
> rdata->result = -EIO;
> return cifs_readv_discard(server, mid);
> diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c
> index a723df3..b718bb8 100644
> --- a/fs/cifs/smb1ops.c
> +++ b/fs/cifs/smb1ops.c
> @@ -87,9 +87,13 @@ cifs_read_data_offset(char *buf)
>  }
>
>  static unsigned int
> -cifs_read_data_length(char *buf)
> +cifs_read_data_length(char *buf, bool in_remaining)
>  {
> READ_RSP *rsp = (READ_RSP *)buf;
> +   if (in_remaining) {
> +   cifs_dbg(VFS, "Invalid SMB1 calling path, check code\n");

I would suggest to do WARN_ON(1) instead.

> +   return 0;
> +   }
> return (le16_to_cpu(rsp->DataLengthHigh) << 16) +
>le16_to_cpu(rsp->DataLength);
>  }


--
Best regards,
Pavel Shilovsky


Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and transport constants

2017-08-30 Thread Pavel Shilovsky
2017-08-30 11:24 GMT-07:00 Long Li <lon...@microsoft.com>:
>> -Original Message-----
>> From: Pavel Shilovsky [mailto:piastr...@gmail.com]
>> Sent: Wednesday, August 30, 2017 11:19 AM
>> To: Long Li <lon...@microsoft.com>
>> Cc: Steve French <sfre...@samba.org>; linux-cifs > c...@vger.kernel.org>; samba-technical <samba-techni...@lists.samba.org>;
>> Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-
>> r...@vger.kernel.org; Christoph Hellwig <h...@infradead.org>; Tom Talpey
>> <ttal...@microsoft.com>; Matthew Wilcox <mawil...@microsoft.com>
>> Subject: Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and
>> transport constants
>>
>> 2017-08-29 16:00 GMT-07:00 Long Li <lon...@microsoft.com>:
>> >> -Original Message-
>> >> From: Pavel Shilovsky [mailto:piastr...@gmail.com]
>> >> Sent: Tuesday, August 29, 2017 3:45 PM
>> >> To: Long Li <lon...@microsoft.com>
>> >> Cc: Steve French <sfre...@samba.org>; linux-cifs > >> c...@vger.kernel.org>; samba-technical
>> >> <samba-techni...@lists.samba.org>;
>> >> Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-
>> >> r...@vger.kernel.org; Christoph Hellwig <h...@infradead.org>; Tom
>> >> Talpey <ttal...@microsoft.com>; Matthew Wilcox
>> >> <mawil...@microsoft.com>; Long Li <lon...@microsoft.com>
>> >> Subject: Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and
>> >> transport constants
>> >>
>> >> 2017-08-29 12:28 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
>> >> > From: Long Li <lon...@microsoft.com>
>> >> >
>> >> > To prepare for protocol implementation, add constants and
>> >> > user-configurable values in the SMBDirect protocol.
>> >> >
>> >> > Signed-off-by: Long Li <lon...@microsoft.com>
>> >> > ---
>> >> >  fs/cifs/smbdirect.c | 78
>> >> > +
>> >> >  fs/cifs/smbdirect.h | 20 ++
>> >> >  2 files changed, 98 insertions(+)
>> >> >  create mode 100644 fs/cifs/smbdirect.c  create mode 100644
>> >> > fs/cifs/smbdirect.h
>> >> >
>> >> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c new file
>> >> > mode
>> >> > 100644 index 000..d785bc1
>> >> > --- /dev/null
>> >> > +++ b/fs/cifs/smbdirect.c
>> >> > @@ -0,0 +1,78 @@
>> >> > +/*
>> >> > + *   Copyright (C) 2017, Microsoft Corporation.
>> >> > + *
>> >> > + *   Author(s): Long Li <lon...@microsoft.com>
>> >> > + *
>> >> > + *   This program is free software;  you can redistribute it and/or 
>> >> > modify
>> >> > + *   it under the terms of the GNU General Public License as published
>> by
>> >> > + *   the Free Software Foundation; either version 2 of the License, or
>> >> > + *   (at your option) any later version.
>> >> > + *
>> >> > + *   This program is distributed in the hope that it will be useful,
>> >> > + *   but WITHOUT ANY WARRANTY;  without even the implied warranty
>> of
>> >> > + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
>> >> > + *   the GNU General Public License for more details.
>> >> > + */
>> >> > +#include 
>> >> > +#include "smbdirect.h"
>> >> > +#include "cifs_debug.h"
>> >> > +
>> >> > +/* SMBD version number */
>> >> > +#define SMBD_V10x0100
>> >> > +
>> >> > +/* Port numbers for SMBD transport */
>> >> > +#define SMB_PORT   445
>> >> > +#define SMBD_PORT  5445
>> >> > +
>> >> > +/* Address lookup and resolve timeout in ms */
>> >> > +#define RDMA_RESOLVE_TIMEOUT   5000
>> >> > +
>> >> > +/* SMBD negotiation timeout in seconds */ #define
>> >> > +SMBD_NEGOTIATE_TIMEOUT 120
>> >> > +
>> >> > +/* SMBD minimum receive size and fragmented sized defined in [MS-
>> >> SMBD] */
>> >> > +#define SMBD_MIN_RECEIVE_SIZE  128
>> >> > +#define SMBD_MIN_FRAGMENTED_SIZE   131072
>> >

Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and transport constants

2017-08-30 Thread Pavel Shilovsky
2017-08-30 11:24 GMT-07:00 Long Li :
>> -Original Message-
>> From: Pavel Shilovsky [mailto:piastr...@gmail.com]
>> Sent: Wednesday, August 30, 2017 11:19 AM
>> To: Long Li 
>> Cc: Steve French ; linux-cifs > c...@vger.kernel.org>; samba-technical ;
>> Kernel Mailing List ; linux-
>> r...@vger.kernel.org; Christoph Hellwig ; Tom Talpey
>> ; Matthew Wilcox 
>> Subject: Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and
>> transport constants
>>
>> 2017-08-29 16:00 GMT-07:00 Long Li :
>> >> -Original Message-
>> >> From: Pavel Shilovsky [mailto:piastr...@gmail.com]
>> >> Sent: Tuesday, August 29, 2017 3:45 PM
>> >> To: Long Li 
>> >> Cc: Steve French ; linux-cifs > >> c...@vger.kernel.org>; samba-technical
>> >> ;
>> >> Kernel Mailing List ; linux-
>> >> r...@vger.kernel.org; Christoph Hellwig ; Tom
>> >> Talpey ; Matthew Wilcox
>> >> ; Long Li 
>> >> Subject: Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and
>> >> transport constants
>> >>
>> >> 2017-08-29 12:28 GMT-07:00 Long Li :
>> >> > From: Long Li 
>> >> >
>> >> > To prepare for protocol implementation, add constants and
>> >> > user-configurable values in the SMBDirect protocol.
>> >> >
>> >> > Signed-off-by: Long Li 
>> >> > ---
>> >> >  fs/cifs/smbdirect.c | 78
>> >> > +
>> >> >  fs/cifs/smbdirect.h | 20 ++
>> >> >  2 files changed, 98 insertions(+)
>> >> >  create mode 100644 fs/cifs/smbdirect.c  create mode 100644
>> >> > fs/cifs/smbdirect.h
>> >> >
>> >> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c new file
>> >> > mode
>> >> > 100644 index 000..d785bc1
>> >> > --- /dev/null
>> >> > +++ b/fs/cifs/smbdirect.c
>> >> > @@ -0,0 +1,78 @@
>> >> > +/*
>> >> > + *   Copyright (C) 2017, Microsoft Corporation.
>> >> > + *
>> >> > + *   Author(s): Long Li 
>> >> > + *
>> >> > + *   This program is free software;  you can redistribute it and/or 
>> >> > modify
>> >> > + *   it under the terms of the GNU General Public License as published
>> by
>> >> > + *   the Free Software Foundation; either version 2 of the License, or
>> >> > + *   (at your option) any later version.
>> >> > + *
>> >> > + *   This program is distributed in the hope that it will be useful,
>> >> > + *   but WITHOUT ANY WARRANTY;  without even the implied warranty
>> of
>> >> > + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
>> >> > + *   the GNU General Public License for more details.
>> >> > + */
>> >> > +#include 
>> >> > +#include "smbdirect.h"
>> >> > +#include "cifs_debug.h"
>> >> > +
>> >> > +/* SMBD version number */
>> >> > +#define SMBD_V10x0100
>> >> > +
>> >> > +/* Port numbers for SMBD transport */
>> >> > +#define SMB_PORT   445
>> >> > +#define SMBD_PORT  5445
>> >> > +
>> >> > +/* Address lookup and resolve timeout in ms */
>> >> > +#define RDMA_RESOLVE_TIMEOUT   5000
>> >> > +
>> >> > +/* SMBD negotiation timeout in seconds */ #define
>> >> > +SMBD_NEGOTIATE_TIMEOUT 120
>> >> > +
>> >> > +/* SMBD minimum receive size and fragmented sized defined in [MS-
>> >> SMBD] */
>> >> > +#define SMBD_MIN_RECEIVE_SIZE  128
>> >> > +#define SMBD_MIN_FRAGMENTED_SIZE   131072
>> >> > +
>> >> > +/*
>> >> > + * Default maximum number of RDMA read/write outstanding on this
>> >> > +connection
>> >> > + * This value is possibly decreased during QP creation on hardware
>> >> > +limit  */
>> >> > +#define SMBD_CM_RESPONDER_RESOURCES32
>> >> > +
>> >> > +/* Maximum number of retries on data transfer operations */
>> >> > +#define SMBD_CM_RETRY  6
>> >> > +/* No need to retry on Receiver Not Ready since SMBD manages
>> >> > +credits
>> >> */
>> >> > +#define SMBD_CM_RNR_RETRY  0
>> >> > +
>> >> > +/*
>> >> > + * User configurable initial values per SMBD transport connection
>> >> > + * as defined in [MS-SMBD] 3.1.1.1
>> >> > + * Those may change after a SMBD negotiation  */
>> >>
>> >> Since these value are per transport connection, why they are global?
>> >> Shouldn't they be inside a some structure that is created for a
>> >> particular connection? Also the constants below should be defines.
>> >
>> > Those are configurable initial values (default values) for all connections.
>> >
>> > Each connection has its own values based on those initial values. But
>> connection-based values can change after negotiation is done, or after
>> RDMA hardware capabilities are probed.
>>
>> If these are configurable values, let's add a way to actually configure them.
>> Through /proc?
>
> You mean putting them in /proc/fs/cifs?
>
> Or how about module_param?

I am ok with either way. If it is safe to change them without
reloading the module let's do /proc/fs/cifs which would make it easier
to use. Otherwise - module_param is also good.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and transport constants

2017-08-30 Thread Pavel Shilovsky
2017-08-29 16:00 GMT-07:00 Long Li <lon...@microsoft.com>:
>> -Original Message-----
>> From: Pavel Shilovsky [mailto:piastr...@gmail.com]
>> Sent: Tuesday, August 29, 2017 3:45 PM
>> To: Long Li <lon...@microsoft.com>
>> Cc: Steve French <sfre...@samba.org>; linux-cifs > c...@vger.kernel.org>; samba-technical <samba-techni...@lists.samba.org>;
>> Kernel Mailing List <linux-kernel@vger.kernel.org>; linux-
>> r...@vger.kernel.org; Christoph Hellwig <h...@infradead.org>; Tom Talpey
>> <ttal...@microsoft.com>; Matthew Wilcox <mawil...@microsoft.com>;
>> Long Li <lon...@microsoft.com>
>> Subject: Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and
>> transport constants
>>
>> 2017-08-29 12:28 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
>> > From: Long Li <lon...@microsoft.com>
>> >
>> > To prepare for protocol implementation, add constants and
>> > user-configurable values in the SMBDirect protocol.
>> >
>> > Signed-off-by: Long Li <lon...@microsoft.com>
>> > ---
>> >  fs/cifs/smbdirect.c | 78
>> > +
>> >  fs/cifs/smbdirect.h | 20 ++
>> >  2 files changed, 98 insertions(+)
>> >  create mode 100644 fs/cifs/smbdirect.c  create mode 100644
>> > fs/cifs/smbdirect.h
>> >
>> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c new file mode
>> > 100644 index 000..d785bc1
>> > --- /dev/null
>> > +++ b/fs/cifs/smbdirect.c
>> > @@ -0,0 +1,78 @@
>> > +/*
>> > + *   Copyright (C) 2017, Microsoft Corporation.
>> > + *
>> > + *   Author(s): Long Li <lon...@microsoft.com>
>> > + *
>> > + *   This program is free software;  you can redistribute it and/or modify
>> > + *   it under the terms of the GNU General Public License as published by
>> > + *   the Free Software Foundation; either version 2 of the License, or
>> > + *   (at your option) any later version.
>> > + *
>> > + *   This program is distributed in the hope that it will be useful,
>> > + *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
>> > + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
>> > + *   the GNU General Public License for more details.
>> > + */
>> > +#include 
>> > +#include "smbdirect.h"
>> > +#include "cifs_debug.h"
>> > +
>> > +/* SMBD version number */
>> > +#define SMBD_V10x0100
>> > +
>> > +/* Port numbers for SMBD transport */
>> > +#define SMB_PORT   445
>> > +#define SMBD_PORT  5445
>> > +
>> > +/* Address lookup and resolve timeout in ms */
>> > +#define RDMA_RESOLVE_TIMEOUT   5000
>> > +
>> > +/* SMBD negotiation timeout in seconds */ #define
>> > +SMBD_NEGOTIATE_TIMEOUT 120
>> > +
>> > +/* SMBD minimum receive size and fragmented sized defined in [MS-
>> SMBD] */
>> > +#define SMBD_MIN_RECEIVE_SIZE  128
>> > +#define SMBD_MIN_FRAGMENTED_SIZE   131072
>> > +
>> > +/*
>> > + * Default maximum number of RDMA read/write outstanding on this
>> > +connection
>> > + * This value is possibly decreased during QP creation on hardware
>> > +limit  */
>> > +#define SMBD_CM_RESPONDER_RESOURCES32
>> > +
>> > +/* Maximum number of retries on data transfer operations */
>> > +#define SMBD_CM_RETRY  6
>> > +/* No need to retry on Receiver Not Ready since SMBD manages credits
>> */
>> > +#define SMBD_CM_RNR_RETRY      0
>> > +
>> > +/*
>> > + * User configurable initial values per SMBD transport connection
>> > + * as defined in [MS-SMBD] 3.1.1.1
>> > + * Those may change after a SMBD negotiation  */
>>
>> Since these value are per transport connection, why they are global?
>> Shouldn't they be inside a some structure that is created for a particular
>> connection? Also the constants below should be defines.
>
> Those are configurable initial values (default values) for all connections.
>
> Each connection has its own values based on those initial values. But 
> connection-based values can change after negotiation is done, or after RDMA 
> hardware capabilities are probed.

If these are configurable values, let's add a way to actually
configure them. Through /proc?

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and transport constants

2017-08-30 Thread Pavel Shilovsky
2017-08-29 16:00 GMT-07:00 Long Li :
>> -Original Message-
>> From: Pavel Shilovsky [mailto:piastr...@gmail.com]
>> Sent: Tuesday, August 29, 2017 3:45 PM
>> To: Long Li 
>> Cc: Steve French ; linux-cifs > c...@vger.kernel.org>; samba-technical ;
>> Kernel Mailing List ; linux-
>> r...@vger.kernel.org; Christoph Hellwig ; Tom Talpey
>> ; Matthew Wilcox ;
>> Long Li 
>> Subject: Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and
>> transport constants
>>
>> 2017-08-29 12:28 GMT-07:00 Long Li :
>> > From: Long Li 
>> >
>> > To prepare for protocol implementation, add constants and
>> > user-configurable values in the SMBDirect protocol.
>> >
>> > Signed-off-by: Long Li 
>> > ---
>> >  fs/cifs/smbdirect.c | 78
>> > +
>> >  fs/cifs/smbdirect.h | 20 ++
>> >  2 files changed, 98 insertions(+)
>> >  create mode 100644 fs/cifs/smbdirect.c  create mode 100644
>> > fs/cifs/smbdirect.h
>> >
>> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c new file mode
>> > 100644 index 000..d785bc1
>> > --- /dev/null
>> > +++ b/fs/cifs/smbdirect.c
>> > @@ -0,0 +1,78 @@
>> > +/*
>> > + *   Copyright (C) 2017, Microsoft Corporation.
>> > + *
>> > + *   Author(s): Long Li 
>> > + *
>> > + *   This program is free software;  you can redistribute it and/or modify
>> > + *   it under the terms of the GNU General Public License as published by
>> > + *   the Free Software Foundation; either version 2 of the License, or
>> > + *   (at your option) any later version.
>> > + *
>> > + *   This program is distributed in the hope that it will be useful,
>> > + *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
>> > + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
>> > + *   the GNU General Public License for more details.
>> > + */
>> > +#include 
>> > +#include "smbdirect.h"
>> > +#include "cifs_debug.h"
>> > +
>> > +/* SMBD version number */
>> > +#define SMBD_V10x0100
>> > +
>> > +/* Port numbers for SMBD transport */
>> > +#define SMB_PORT   445
>> > +#define SMBD_PORT  5445
>> > +
>> > +/* Address lookup and resolve timeout in ms */
>> > +#define RDMA_RESOLVE_TIMEOUT   5000
>> > +
>> > +/* SMBD negotiation timeout in seconds */ #define
>> > +SMBD_NEGOTIATE_TIMEOUT 120
>> > +
>> > +/* SMBD minimum receive size and fragmented sized defined in [MS-
>> SMBD] */
>> > +#define SMBD_MIN_RECEIVE_SIZE  128
>> > +#define SMBD_MIN_FRAGMENTED_SIZE   131072
>> > +
>> > +/*
>> > + * Default maximum number of RDMA read/write outstanding on this
>> > +connection
>> > + * This value is possibly decreased during QP creation on hardware
>> > +limit  */
>> > +#define SMBD_CM_RESPONDER_RESOURCES32
>> > +
>> > +/* Maximum number of retries on data transfer operations */
>> > +#define SMBD_CM_RETRY  6
>> > +/* No need to retry on Receiver Not Ready since SMBD manages credits
>> */
>> > +#define SMBD_CM_RNR_RETRY  0
>> > +
>> > +/*
>> > + * User configurable initial values per SMBD transport connection
>> > + * as defined in [MS-SMBD] 3.1.1.1
>> > + * Those may change after a SMBD negotiation  */
>>
>> Since these value are per transport connection, why they are global?
>> Shouldn't they be inside a some structure that is created for a particular
>> connection? Also the constants below should be defines.
>
> Those are configurable initial values (default values) for all connections.
>
> Each connection has its own values based on those initial values. But 
> connection-based values can change after negotiation is done, or after RDMA 
> hardware capabilities are probed.

If these are configurable values, let's add a way to actually
configure them. Through /proc?

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 12/19] CIFS: SMBD: Fix the definition for SMB2_CHANNEL_RDMA_V1_INVALIDATE

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> The channel value for requesting server remote invalidating local memory
> registration should be 0x0002
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/smb2pdu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/smb2pdu.h b/fs/cifs/smb2pdu.h
> index 18700fd..0417a36 100644
> --- a/fs/cifs/smb2pdu.h
> +++ b/fs/cifs/smb2pdu.h
> @@ -832,7 +832,7 @@ struct smb2_flush_rsp {
>  /* Channel field for read and write: exactly one of following flags can be 
> set*/
>  #define SMB2_CHANNEL_NONE  0x
>  #define SMB2_CHANNEL_RDMA_V1   0x0001 /* SMB3 or later */
> -#define SMB2_CHANNEL_RDMA_V1_INVALIDATE 0x0001 /* SMB3.02 or later */
> +#define SMB2_CHANNEL_RDMA_V1_INVALIDATE 0x0002 /* SMB3.02 or later */
>
>  /* SMB2 read request without RFC1001 length at the beginning */
>  struct smb2_read_plain_req {
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

If this is a bug in the existing code, this patch should go at the
very beginning of the series.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 12/19] CIFS: SMBD: Fix the definition for SMB2_CHANNEL_RDMA_V1_INVALIDATE

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li :
> From: Long Li 
>
> The channel value for requesting server remote invalidating local memory
> registration should be 0x0002
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/smb2pdu.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/smb2pdu.h b/fs/cifs/smb2pdu.h
> index 18700fd..0417a36 100644
> --- a/fs/cifs/smb2pdu.h
> +++ b/fs/cifs/smb2pdu.h
> @@ -832,7 +832,7 @@ struct smb2_flush_rsp {
>  /* Channel field for read and write: exactly one of following flags can be 
> set*/
>  #define SMB2_CHANNEL_NONE  0x
>  #define SMB2_CHANNEL_RDMA_V1   0x0001 /* SMB3 or later */
> -#define SMB2_CHANNEL_RDMA_V1_INVALIDATE 0x0001 /* SMB3.02 or later */
> +#define SMB2_CHANNEL_RDMA_V1_INVALIDATE 0x0002 /* SMB3.02 or later */
>
>  /* SMB2 read request without RFC1001 length at the beginning */
>  struct smb2_read_plain_req {
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

If this is a bug in the existing code, this patch should go at the
very beginning of the series.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 11/19] CIFS: SMBD: Define memory registration for I/O data

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> To prepare for RDMA read/write using memory registration, add memory
> registartion pointers to upper layer data I/O context.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/cifsglob.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index dc5404d..dcd2b63 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1166,6 +1166,7 @@ struct cifs_readdata {
> struct cifs_readdata *rdata,
> struct iov_iter *iter);
> struct kvec iov[2];
> +   struct smbd_mr  *mr;
> unsigned intpagesz;
> unsigned inttailsz;
> unsigned intcredits;
> @@ -1188,6 +1189,7 @@ struct cifs_writedata {
> pid_t   pid;
> unsigned intbytes;
> int result;
> +   struct smbd_mr  *mr;
> unsigned intpagesz;
> unsigned inttailsz;
> unsigned intcredits;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

The same thing: let's not add structure fields that the code doesn't
do anything with.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 11/19] CIFS: SMBD: Define memory registration for I/O data

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li :
> From: Long Li 
>
> To prepare for RDMA read/write using memory registration, add memory
> registartion pointers to upper layer data I/O context.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/cifsglob.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index dc5404d..dcd2b63 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1166,6 +1166,7 @@ struct cifs_readdata {
> struct cifs_readdata *rdata,
> struct iov_iter *iter);
> struct kvec iov[2];
> +   struct smbd_mr  *mr;
> unsigned intpagesz;
> unsigned inttailsz;
> unsigned intcredits;
> @@ -1188,6 +1189,7 @@ struct cifs_writedata {
> pid_t   pid;
> unsigned intbytes;
> int result;
> +   struct smbd_mr  *mr;
> unsigned intpagesz;
> unsigned inttailsz;
> unsigned intcredits;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

The same thing: let's not add structure fields that the code doesn't
do anything with.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 06/19] CIFS: SMBD: Reconnect to SMBDirect session

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> Do a reconnect on SMBDirect when it is used as the connection. Reconnect can
> happen for many reasons and it's mostly the decision of upper layer SMB2 not
> SMBDirect.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/connect.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 43b4d54..341a3fd 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -409,7 +409,11 @@ cifs_reconnect(struct TCP_Server_Info *server)
>
> /* we should try only the port we connected to before */
> mutex_lock(>srv_mutex);
> -   rc = generic_ip_connect(server);
> +   if (server->rdma)
> +   rc = smbd_reconnect(server);
> +   else
> +   rc = generic_ip_connect(server);
> +
> if (rc) {
> cifs_dbg(FYI, "reconnect error %d\n", rc);
> mutex_unlock(>srv_mutex);
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is a easier to follow the logic if the definition is introduced
together with using of this function in the existing code. I suggest
to add smbd_reconnect() definition in this patch.
--
Best regards,
Pavel Shilovsky


Re: [Patch v3 06/19] CIFS: SMBD: Reconnect to SMBDirect session

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li :
> From: Long Li 
>
> Do a reconnect on SMBDirect when it is used as the connection. Reconnect can
> happen for many reasons and it's mostly the decision of upper layer SMB2 not
> SMBDirect.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/connect.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 43b4d54..341a3fd 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -409,7 +409,11 @@ cifs_reconnect(struct TCP_Server_Info *server)
>
> /* we should try only the port we connected to before */
> mutex_lock(>srv_mutex);
> -   rc = generic_ip_connect(server);
> +   if (server->rdma)
> +   rc = smbd_reconnect(server);
> +   else
> +   rc = generic_ip_connect(server);
> +
> if (rc) {
> cifs_dbg(FYI, "reconnect error %d\n", rc);
> mutex_unlock(>srv_mutex);
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is a easier to follow the logic if the definition is introduced
together with using of this function in the existing code. I suggest
to add smbd_reconnect() definition in this patch.
--
Best regards,
Pavel Shilovsky


Re: [Patch v3 04/19] CIFS: SMBD: Add SMBDirect transport to SMB connection and Makefile

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> Add SMBDirect as an optional connection to the SMB session structure in CIFS.
> When SMB session is connected through SMBDirect, upper layer uses this
> connection to carry payloads.
>
> With the transport code hooked up to upper layer, add SMBDirect code to
> Makefile.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/Makefile   | 2 +-
>  fs/cifs/cifsglob.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/cifs/Makefile b/fs/cifs/Makefile
> index eed7eb0..6bb9863 100644
> --- a/fs/cifs/Makefile
> +++ b/fs/cifs/Makefile
> @@ -18,4 +18,4 @@ cifs-$(CONFIG_CIFS_DFS_UPCALL) += dns_resolve.o 
> cifs_dfs_ref.o
>  cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o cache.o
>
>  cifs-$(CONFIG_CIFS_SMB2) += smb2ops.o smb2maperror.o smb2transport.o \
> -   smb2misc.o smb2pdu.o smb2inode.o smb2file.o
> +   smb2misc.o smb2pdu.o smb2inode.o smb2file.o 
> smbdirect.o
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 703c2fb..dc5404d 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -652,6 +652,8 @@ struct TCP_Server_Info {
> boollarge_buf;  /* is current buffer large? */
> /* use SMBD connection instead of socket */
> boolrdma;
> +   /* point to the SMBD connection if RDMA is used instead of socket */
> +   struct smbd_connection *smbd_conn;
> struct delayed_work echo; /* echo ping workqueue job */
> char*smallbuf;  /* pointer to current "small" buffer */
> char*bigbuf;/* pointer to current "big" buffer */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is better to split the current patch into 2 and merge the Makefile
part to #3 and cifsglob.h part to #5.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 04/19] CIFS: SMBD: Add SMBDirect transport to SMB connection and Makefile

2017-08-29 Thread Pavel Shilovsky
2017-08-29 12:29 GMT-07:00 Long Li :
> From: Long Li 
>
> Add SMBDirect as an optional connection to the SMB session structure in CIFS.
> When SMB session is connected through SMBDirect, upper layer uses this
> connection to carry payloads.
>
> With the transport code hooked up to upper layer, add SMBDirect code to
> Makefile.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/Makefile   | 2 +-
>  fs/cifs/cifsglob.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/cifs/Makefile b/fs/cifs/Makefile
> index eed7eb0..6bb9863 100644
> --- a/fs/cifs/Makefile
> +++ b/fs/cifs/Makefile
> @@ -18,4 +18,4 @@ cifs-$(CONFIG_CIFS_DFS_UPCALL) += dns_resolve.o 
> cifs_dfs_ref.o
>  cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o cache.o
>
>  cifs-$(CONFIG_CIFS_SMB2) += smb2ops.o smb2maperror.o smb2transport.o \
> -   smb2misc.o smb2pdu.o smb2inode.o smb2file.o
> +   smb2misc.o smb2pdu.o smb2inode.o smb2file.o 
> smbdirect.o
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 703c2fb..dc5404d 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -652,6 +652,8 @@ struct TCP_Server_Info {
> boollarge_buf;  /* is current buffer large? */
> /* use SMBD connection instead of socket */
> boolrdma;
> +   /* point to the SMBD connection if RDMA is used instead of socket */
> +   struct smbd_connection *smbd_conn;
> struct delayed_work echo; /* echo ping workqueue job */
> char*smallbuf;  /* pointer to current "small" buffer */
> char*bigbuf;/* pointer to current "big" buffer */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is better to split the current patch into 2 and merge the Makefile
part to #3 and cifsglob.h part to #5.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and transport constants

2017-08-29 Thread Pavel Shilovsky
e hope that it will be useful,
> + *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
> + *   the GNU General Public License for more details.
> + */
> +#ifndef _SMBDIRECT_H
> +#define _SMBDIRECT_H
> +
> +#define SMBDIRECT_MAX_SGE  16
> +#endif
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Best regards,
Pavel Shilovsky


Re: [Patch v3 02/19] CIFS: SMBD: Add SMBDirect protocol and transport constants

2017-08-29 Thread Pavel Shilovsky
S FOR A PARTICULAR PURPOSE.  See
> + *   the GNU General Public License for more details.
> + */
> +#ifndef _SMBDIRECT_H
> +#define _SMBDIRECT_H
> +
> +#define SMBDIRECT_MAX_SGE  16
> +#endif
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Best regards,
Pavel Shilovsky


Re: [Patch v3 01/19] CIFS: Add rdma mount option

2017-08-29 Thread Pavel Shilovsky
/
> if (vol->multiuser) {
> @@ -2134,6 +2143,9 @@ static int match_server(struct TCP_Server_Info *server, 
> struct smb_vol *vol)
> if (server->echo_interval != vol->echo_interval * HZ)
> return 0;
>
> +   if (server->rdma != vol->rdma)
> +   return 0;
> +
> return 1;
>  }
>
> @@ -2234,6 +2246,7 @@ cifs_get_tcp_session(struct smb_vol *volume_info)
> tcp_ses->noblocksnd = volume_info->noblocksnd;
> tcp_ses->noautotune = volume_info->noautotune;
> tcp_ses->tcp_nodelay = volume_info->sockopt_tcp_nodelay;
> +   tcp_ses->rdma = volume_info->rdma;
> tcp_ses->in_flight = 0;
> tcp_ses->credits = 1;
> init_waitqueue_head(_ses->response_q);
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is better to introduce a mount option after the functional changes
not before them. In this case we do not result in having the mount
option that does nothing if we break the patchset.

--
Best regards,
Pavel Shilovsky


Re: [Patch v3 01/19] CIFS: Add rdma mount option

2017-08-29 Thread Pavel Shilovsky
erver_Info *server, 
> struct smb_vol *vol)
> if (server->echo_interval != vol->echo_interval * HZ)
> return 0;
>
> +   if (server->rdma != vol->rdma)
> +   return 0;
> +
> return 1;
>  }
>
> @@ -2234,6 +2246,7 @@ cifs_get_tcp_session(struct smb_vol *volume_info)
> tcp_ses->noblocksnd = volume_info->noblocksnd;
> tcp_ses->noautotune = volume_info->noautotune;
> tcp_ses->tcp_nodelay = volume_info->sockopt_tcp_nodelay;
> +   tcp_ses->rdma = volume_info->rdma;
> tcp_ses->in_flight = 0;
> tcp_ses->credits = 1;
> init_waitqueue_head(_ses->response_q);
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is better to introduce a mount option after the functional changes
not before them. In this case we do not result in having the mount
option that does nothing if we break the patchset.

--
Best regards,
Pavel Shilovsky


Re: [PATCH v3] cifs: Do not modify mid entry after submitting I/O in cifs_call_async

2017-06-28 Thread Pavel Shilovsky
2017-06-28 15:02 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> In cifs_call_async, server may respond as soon as I/O is submitted. Because
> mid entry is freed on the return path, it should not be modified after I/O
> is submitted.
>
> cifs_save_when_sent modifies the sent timestamp in mid entry, and should not
> be called after I/O. Call it before I/O.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/transport.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 47a125e..f49b73f 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -536,11 +536,13 @@ cifs_call_async(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst,
> list_add_tail(>qhead, >pending_mid_q);
> spin_unlock(_Lock);
>
> -
> +   /* Need to store the time in mid before calling I/O. For call_async,
> +* I/O response can come back and free the mid entry on another 
> thread.
> +*/
> +   cifs_save_when_sent(mid);
> cifs_in_send_inc(server);
> rc = smb_send_rqst(server, rqst, flags);
> cifs_in_send_dec(server);
> -   cifs_save_when_sent(mid);
>
> if (rc < 0) {
> server->sequence_number -= 2;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks! Please fix the comment style to

/*
 * 
 */

Other than that -

Reviewed-by: Pavel Shilovsky <pshi...@microsoft.com>

--
Best regards,
Pavel Shilovsky


Re: [PATCH v3] cifs: Do not modify mid entry after submitting I/O in cifs_call_async

2017-06-28 Thread Pavel Shilovsky
2017-06-28 15:02 GMT-07:00 Long Li :
> From: Long Li 
>
> In cifs_call_async, server may respond as soon as I/O is submitted. Because
> mid entry is freed on the return path, it should not be modified after I/O
> is submitted.
>
> cifs_save_when_sent modifies the sent timestamp in mid entry, and should not
> be called after I/O. Call it before I/O.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/transport.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 47a125e..f49b73f 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -536,11 +536,13 @@ cifs_call_async(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst,
> list_add_tail(>qhead, >pending_mid_q);
> spin_unlock(_Lock);
>
> -
> +   /* Need to store the time in mid before calling I/O. For call_async,
> +* I/O response can come back and free the mid entry on another 
> thread.
> +*/
> +   cifs_save_when_sent(mid);
> cifs_in_send_inc(server);
> rc = smb_send_rqst(server, rqst, flags);
> cifs_in_send_dec(server);
> -   cifs_save_when_sent(mid);
>
> if (rc < 0) {
> server->sequence_number -= 2;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks! Please fix the comment style to

/*
 * 
 */

Other than that -

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH resend] cifs: Do not modify mid entry after submitting I/O in cifs_call_async

2017-06-28 Thread Pavel Shilovsky
2017-06-28 11:05 GMT-07:00 Long Li <lon...@exchange.microsoft.com>:
> From: Long Li <lon...@microsoft.com>
>
> In cifs_call_async, server response may return as soon as I/O is submitted.
> Because mid entry is freed on the return path, do not modify it after I/O is
> submitted.
>
> Signed-off-by: Long Li <lon...@microsoft.com>
> ---
>  fs/cifs/transport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 47a125e..ba62aaf 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -537,10 +537,10 @@ cifs_call_async(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst,
> spin_unlock(_Lock);
>
>
> +   cifs_save_when_sent(mid);
> cifs_in_send_inc(server);
> rc = smb_send_rqst(server, rqst, flags);
> cifs_in_send_dec(server);
> -   cifs_save_when_sent(mid);
>
> if (rc < 0) {
> server->sequence_number -= 2;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks good. Could you please add an additional comment describing the
behavior in the code?

--
Best regards,
Pavel Shilovsky


Re: [PATCH resend] cifs: Do not modify mid entry after submitting I/O in cifs_call_async

2017-06-28 Thread Pavel Shilovsky
2017-06-28 11:05 GMT-07:00 Long Li :
> From: Long Li 
>
> In cifs_call_async, server response may return as soon as I/O is submitted.
> Because mid entry is freed on the return path, do not modify it after I/O is
> submitted.
>
> Signed-off-by: Long Li 
> ---
>  fs/cifs/transport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 47a125e..ba62aaf 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -537,10 +537,10 @@ cifs_call_async(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst,
> spin_unlock(_Lock);
>
>
> +   cifs_save_when_sent(mid);
> cifs_in_send_inc(server);
> rc = smb_send_rqst(server, rqst, flags);
> cifs_in_send_dec(server);
> -   cifs_save_when_sent(mid);
>
> if (rc < 0) {
> server->sequence_number -= 2;
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks good. Could you please add an additional comment describing the
behavior in the code?

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: check if pages is null rather than bv for a failed allocation

2017-05-17 Thread Pavel Shilovsky
2017-05-17 11:24 GMT-07:00 Colin King <colin.k...@canonical.com>:
> From: Colin Ian King <colin.k...@canonical.com>
>
> pages is being allocated however a null check on bv is being used
> to see if the allocation failed. Fix this by checking if pages is
> null.
>
> Detected by CoverityScan, CID#1432974 ("Logically dead code")
>
> Fixes: ccf7f4088af2dd ("CIFS: Add asynchronous context to support kernel AIO")
> Signed-off-by: Colin Ian King <colin.k...@canonical.com>
> ---
>  fs/cifs/misc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index b08531977daa..3b147dc6af63 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -810,7 +810,7 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct 
> iov_iter *iter, int rw)
>
> if (!pages) {
> pages = vmalloc(max_pages * sizeof(struct page *));
> -   if (!bv) {
> +   if (!pages) {
> kvfree(bv);
> return -ENOMEM;
> }
> --
> 2.11.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for catching this!

Reviewed-by: Pavel Shilovsky <pshi...@microsoft.com>

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: check if pages is null rather than bv for a failed allocation

2017-05-17 Thread Pavel Shilovsky
2017-05-17 11:24 GMT-07:00 Colin King :
> From: Colin Ian King 
>
> pages is being allocated however a null check on bv is being used
> to see if the allocation failed. Fix this by checking if pages is
> null.
>
> Detected by CoverityScan, CID#1432974 ("Logically dead code")
>
> Fixes: ccf7f4088af2dd ("CIFS: Add asynchronous context to support kernel AIO")
> Signed-off-by: Colin Ian King 
> ---
>  fs/cifs/misc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index b08531977daa..3b147dc6af63 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -810,7 +810,7 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct 
> iov_iter *iter, int rw)
>
> if (!pages) {
> pages = vmalloc(max_pages * sizeof(struct page *));
> -   if (!bv) {
> +   if (!pages) {
> kvfree(bv);
> return -ENOMEM;
> }
> --
> 2.11.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for catching this!

Reviewed-by: Pavel Shilovsky 

--
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: silence lockdep splat in cifs_relock_file()

2017-05-09 Thread Pavel Shilovsky
2017-05-03 8:17 GMT-07:00 Rabin Vincent <rabin.vinc...@axis.com>:
> From: Rabin Vincent <rab...@axis.com>
>
> cifs_relock_file() can perform a down_write() on the inode's lock_sem even
> though it was already performed in cifs_strict_readv().  Lockdep complains
> about this.  AFAICS, there is no problem here, and lockdep just needs to be
> told that this nesting is OK.
>
>  =
>  [ INFO: possible recursive locking detected ]
>  4.11.0+ #20 Not tainted
>  -
>  cat/701 is trying to acquire lock:
>   (>lock_sem){.+}, at: cifs_reopen_file+0x7a7/0xc00
>
>  but task is already holding lock:
>   (>lock_sem){.+}, at: cifs_strict_readv+0x177/0x310
>
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>
> CPU0
> 
>lock(>lock_sem);
>lock(>lock_sem);
>
>   *** DEADLOCK ***
>
>   May be due to missing lock nesting notation
>
>  1 lock held by cat/701:
>   #0:  (>lock_sem){.+}, at: cifs_strict_readv+0x177/0x310
>
>  stack backtrace:
>  CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
>  Call Trace:
>   dump_stack+0x85/0xc2
>   __lock_acquire+0x17dd/0x2260
>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>   ? preempt_schedule_irq+0x6b/0x80
>   lock_acquire+0xcc/0x260
>   ? lock_acquire+0xcc/0x260
>   ? cifs_reopen_file+0x7a7/0xc00
>   down_read+0x2d/0x70
>   ? cifs_reopen_file+0x7a7/0xc00
>   cifs_reopen_file+0x7a7/0xc00
>   ? printk+0x43/0x4b
>   cifs_readpage_worker+0x327/0x8a0
>   cifs_readpage+0x8c/0x2a0
>   generic_file_read_iter+0x692/0xd00
>   cifs_strict_readv+0x29f/0x310
>   generic_file_splice_read+0x11c/0x1c0
>   do_splice_to+0xa5/0xc0
>   splice_direct_to_actor+0xfa/0x350
>   ? generic_pipe_buf_nosteal+0x10/0x10
>   do_splice_direct+0xb5/0xe0
>   do_sendfile+0x278/0x3a0
>   SyS_sendfile64+0xc4/0xe0
>   entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> Signed-off-by: Rabin Vincent <rab...@axis.com>
> ---
>  fs/cifs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 21d4045..64b590b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -582,7 +582,7 @@ cifs_relock_file(struct cifsFileInfo *cfile)
> struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
> int rc = 0;
>
> -   down_read(>lock_sem);
> +   down_read_nested(>lock_sem, SINGLE_DEPTH_NESTING);
> if (cinode->can_cache_brlcks) {
> /* can cache locks - no need to relock */
> up_read(>lock_sem);
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Acked-by: Pavel Shilovsky <pshi...@microsoft.com>

Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: silence lockdep splat in cifs_relock_file()

2017-05-09 Thread Pavel Shilovsky
2017-05-03 8:17 GMT-07:00 Rabin Vincent :
> From: Rabin Vincent 
>
> cifs_relock_file() can perform a down_write() on the inode's lock_sem even
> though it was already performed in cifs_strict_readv().  Lockdep complains
> about this.  AFAICS, there is no problem here, and lockdep just needs to be
> told that this nesting is OK.
>
>  =
>  [ INFO: possible recursive locking detected ]
>  4.11.0+ #20 Not tainted
>  -
>  cat/701 is trying to acquire lock:
>   (>lock_sem){.+}, at: cifs_reopen_file+0x7a7/0xc00
>
>  but task is already holding lock:
>   (>lock_sem){.+}, at: cifs_strict_readv+0x177/0x310
>
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>
> CPU0
> 
>lock(>lock_sem);
>lock(>lock_sem);
>
>   *** DEADLOCK ***
>
>   May be due to missing lock nesting notation
>
>  1 lock held by cat/701:
>   #0:  (>lock_sem){.+}, at: cifs_strict_readv+0x177/0x310
>
>  stack backtrace:
>  CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
>  Call Trace:
>   dump_stack+0x85/0xc2
>   __lock_acquire+0x17dd/0x2260
>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>   ? preempt_schedule_irq+0x6b/0x80
>   lock_acquire+0xcc/0x260
>   ? lock_acquire+0xcc/0x260
>   ? cifs_reopen_file+0x7a7/0xc00
>   down_read+0x2d/0x70
>   ? cifs_reopen_file+0x7a7/0xc00
>   cifs_reopen_file+0x7a7/0xc00
>   ? printk+0x43/0x4b
>   cifs_readpage_worker+0x327/0x8a0
>   cifs_readpage+0x8c/0x2a0
>   generic_file_read_iter+0x692/0xd00
>   cifs_strict_readv+0x29f/0x310
>   generic_file_splice_read+0x11c/0x1c0
>   do_splice_to+0xa5/0xc0
>   splice_direct_to_actor+0xfa/0x350
>   ? generic_pipe_buf_nosteal+0x10/0x10
>   do_splice_direct+0xb5/0xe0
>   do_sendfile+0x278/0x3a0
>   SyS_sendfile64+0xc4/0xe0
>   entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> Signed-off-by: Rabin Vincent 
> ---
>  fs/cifs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 21d4045..64b590b 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -582,7 +582,7 @@ cifs_relock_file(struct cifsFileInfo *cfile)
> struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
> int rc = 0;
>
> -   down_read(>lock_sem);
> +   down_read_nested(>lock_sem, SINGLE_DEPTH_NESTING);
> if (cinode->can_cache_brlcks) {
> /* can cache locks - no need to relock */
>     up_read(>lock_sem);
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Acked-by: Pavel Shilovsky 

Best regards,
Pavel Shilovsky


Re: [RFC 08/10] cifs: move to generic async completion

2017-05-08 Thread Pavel Shilovsky
2017-05-06 5:59 GMT-07:00 Gilad Ben-Yossef <gi...@benyossef.com>:
> cifs starts an async. crypto op and waits for their completion.
> Move it over to generic code doing the same.
>
> Signed-off-by: Gilad Ben-Yossef <gi...@benyossef.com>
> ---
>  fs/cifs/smb2ops.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 152e37f..cee7bb3 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -1706,22 +1706,6 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
> return sg;
>  }
>
> -struct cifs_crypt_result {
> -   int err;
> -   struct completion completion;
> -};
> -
> -static void cifs_crypt_complete(struct crypto_async_request *req, int err)
> -{
> -   struct cifs_crypt_result *res = req->data;
> -
> -   if (err == -EINPROGRESS)
> -   return;
> -
> -   res->err = err;
> -   complete(>completion);
> -}
> -
>  static int
>  smb2_get_enc_key(struct TCP_Server_Info *server, __u64 ses_id, int enc, u8 
> *key)
>  {
> @@ -1762,12 +1746,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst, int enc)
> struct aead_request *req;
> char *iv;
> unsigned int iv_len;
> -   struct cifs_crypt_result result = {0, };
> +   DECLARE_CRYPTO_WAIT(wait);
> struct crypto_aead *tfm;
> unsigned int crypt_len = le32_to_cpu(tr_hdr->OriginalMessageSize);
>
> -   init_completion();
> -
> rc = smb2_get_enc_key(server, tr_hdr->SessionId, enc, key);
> if (rc) {
> cifs_dbg(VFS, "%s: Could not get %scryption key\n", __func__,
> @@ -1825,14 +1807,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst, int enc)
> aead_request_set_ad(req, assoc_data_len);
>
> aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> - cifs_crypt_complete, );
> + crypto_req_done, );
>
> -   rc = enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req);
> -
> -   if (rc == -EINPROGRESS || rc == -EBUSY) {
> -   wait_for_completion();
> -   rc = result.err;
> -   }
> +   rc = crypto_wait_req(enc ? crypto_aead_encrypt(req)
> +   : crypto_aead_decrypt(req), );
>
> if (!rc && enc)
> memcpy(_hdr->Signature, sign, SMB2_SIGNATURE_SIZE);
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Acked-by: Pavel Shilovsky <pshi...@microsoft.com>

Best regards,
Pavel Shilovsky


Re: [RFC 08/10] cifs: move to generic async completion

2017-05-08 Thread Pavel Shilovsky
2017-05-06 5:59 GMT-07:00 Gilad Ben-Yossef :
> cifs starts an async. crypto op and waits for their completion.
> Move it over to generic code doing the same.
>
> Signed-off-by: Gilad Ben-Yossef 
> ---
>  fs/cifs/smb2ops.c | 30 --
>  1 file changed, 4 insertions(+), 26 deletions(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 152e37f..cee7bb3 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -1706,22 +1706,6 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
> return sg;
>  }
>
> -struct cifs_crypt_result {
> -   int err;
> -   struct completion completion;
> -};
> -
> -static void cifs_crypt_complete(struct crypto_async_request *req, int err)
> -{
> -   struct cifs_crypt_result *res = req->data;
> -
> -   if (err == -EINPROGRESS)
> -   return;
> -
> -   res->err = err;
> -   complete(>completion);
> -}
> -
>  static int
>  smb2_get_enc_key(struct TCP_Server_Info *server, __u64 ses_id, int enc, u8 
> *key)
>  {
> @@ -1762,12 +1746,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst, int enc)
> struct aead_request *req;
> char *iv;
> unsigned int iv_len;
> -   struct cifs_crypt_result result = {0, };
> +   DECLARE_CRYPTO_WAIT(wait);
> struct crypto_aead *tfm;
> unsigned int crypt_len = le32_to_cpu(tr_hdr->OriginalMessageSize);
>
> -   init_completion();
> -
> rc = smb2_get_enc_key(server, tr_hdr->SessionId, enc, key);
> if (rc) {
> cifs_dbg(VFS, "%s: Could not get %scryption key\n", __func__,
> @@ -1825,14 +1807,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
> smb_rqst *rqst, int enc)
> aead_request_set_ad(req, assoc_data_len);
>
> aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> - cifs_crypt_complete, );
> + crypto_req_done, );
>
> -   rc = enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req);
> -
> -   if (rc == -EINPROGRESS || rc == -EBUSY) {
> -   wait_for_completion();
> -   rc = result.err;
> -   }
> +   rc = crypto_wait_req(enc ? crypto_aead_encrypt(req)
> +   : crypto_aead_decrypt(req), );
>
> if (!rc && enc)
> memcpy(_hdr->Signature, sign, SMB2_SIGNATURE_SIZE);
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Acked-by: Pavel Shilovsky 

Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: use correct format string for size_t

2017-02-02 Thread Pavel Shilovsky
2017-02-02 4:21 GMT-08:00 Arnd Bergmann <a...@arndb.de>:
> This warning is harmless as size_t is always as wide as 'unsigned long':
>
> fs/cifs/smb2ops.c:2008:245: error: format '%lu' expects argument of type 
> 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' 
> [-Werror=format=]
>
> Using %zu however is the correct format string that we don't get a warning 
> for.
>
> Fixes: c9d651a5a2fe ("CIFS: Decrypt and process small encrypted packets")
> Signed-off-by: Arnd Bergmann <a...@arndb.de>
> ---
>  fs/cifs/smb2ops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 0764c32754b0..ac6e10adf000 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2005,7 +2005,7 @@ handle_read_data(struct TCP_Server_Info *server, struct 
> mid_q_entry *mid,
> rdata->iov[0].iov_len = 4;
> rdata->iov[1].iov_base = buf + 4;
> rdata->iov[1].iov_len = server->vals->read_rsp_size - 4;
> -   cifs_dbg(FYI, "0: iov_base=%p iov_len=%lu\n",
> +   cifs_dbg(FYI, "0: iov_base=%p iov_len=%zu\n",
>  rdata->iov[0].iov_base, server->vals->read_rsp_size);
>
> length = rdata->copy_into_pages(server, rdata, );
> --
> 2.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This issue have been already fixed in the Steve's for-next branch. Thanks.

-- 
Best regards,
Pavel Shilovsky


Re: [PATCH] CIFS: use correct format string for size_t

2017-02-02 Thread Pavel Shilovsky
2017-02-02 4:21 GMT-08:00 Arnd Bergmann :
> This warning is harmless as size_t is always as wide as 'unsigned long':
>
> fs/cifs/smb2ops.c:2008:245: error: format '%lu' expects argument of type 
> 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' 
> [-Werror=format=]
>
> Using %zu however is the correct format string that we don't get a warning 
> for.
>
> Fixes: c9d651a5a2fe ("CIFS: Decrypt and process small encrypted packets")
> Signed-off-by: Arnd Bergmann 
> ---
>  fs/cifs/smb2ops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 0764c32754b0..ac6e10adf000 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2005,7 +2005,7 @@ handle_read_data(struct TCP_Server_Info *server, struct 
> mid_q_entry *mid,
> rdata->iov[0].iov_len = 4;
> rdata->iov[1].iov_base = buf + 4;
> rdata->iov[1].iov_len = server->vals->read_rsp_size - 4;
> -   cifs_dbg(FYI, "0: iov_base=%p iov_len=%lu\n",
> +   cifs_dbg(FYI, "0: iov_base=%p iov_len=%zu\n",
>  rdata->iov[0].iov_base, server->vals->read_rsp_size);
>
> length = rdata->copy_into_pages(server, rdata, );
> --
> 2.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This issue have been already fixed in the Steve's for-next branch. Thanks.

-- 
Best regards,
Pavel Shilovsky


Re: [PATCH 0/3] cifs: Better dependencies

2017-01-25 Thread Pavel Shilovsky
2017-01-25 7:06 GMT-08:00 Jean Delvare <jdelv...@suse.de>:
> Hi all,
>
> This is my attempt to fix and improve the dependencies of cifs.
>
> [PATCH 1/3] cifs: Simplify SMB2 and SMB311 dependencies
> [PATCH 2/3] cifs: Only select the required crypto modules
> [PATCH 3/3] cifs: Add soft dependencies
>
> I already sent this patch series one year ago, but did not get any
> reply, so I am trying again.
>
> Thanks,
> --
> Jean Delvare
> SUSE L3 Support
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks good.

Reviewed-by: Pavel Shilovsky <pshi...@microsoft.com>

-- 
Best regards,
Pavel Shilovsky


Re: [PATCH 0/3] cifs: Better dependencies

2017-01-25 Thread Pavel Shilovsky
2017-01-25 7:06 GMT-08:00 Jean Delvare :
> Hi all,
>
> This is my attempt to fix and improve the dependencies of cifs.
>
> [PATCH 1/3] cifs: Simplify SMB2 and SMB311 dependencies
> [PATCH 2/3] cifs: Only select the required crypto modules
> [PATCH 3/3] cifs: Add soft dependencies
>
> I already sent this patch series one year ago, but did not get any
> reply, so I am trying again.
>
> Thanks,
> --
> Jean Delvare
> SUSE L3 Support
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks good.

Reviewed-by: Pavel Shilovsky 

-- 
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: use %16phN for formatting md5 sum

2016-12-14 Thread Pavel Shilovsky
2016-11-30 14:40 GMT-08:00 Rasmus Villemoes <li...@rasmusvillemoes.dk>:
> Passing a gazillion arguments takes a lot of code:
>
> add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-253 (-253)
>
> Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>
> ---
>  fs/cifs/link.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/link.c b/fs/cifs/link.c
> index d031af8d3d4d..c4d996f78e1c 100644
> --- a/fs/cifs/link.c
> +++ b/fs/cifs/link.c
> @@ -45,13 +45,8 @@
> (CIFS_MF_SYMLINK_LINK_OFFSET + CIFS_MF_SYMLINK_LINK_MAXLEN)
>
>  #define CIFS_MF_SYMLINK_LEN_FORMAT "XSym\n%04u\n"
> -#define CIFS_MF_SYMLINK_MD5_FORMAT \
> -   "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x\n"
> -#define CIFS_MF_SYMLINK_MD5_ARGS(md5_hash) \
> -   md5_hash[0],  md5_hash[1],  md5_hash[2],  md5_hash[3], \
> -   md5_hash[4],  md5_hash[5],  md5_hash[6],  md5_hash[7], \
> -   md5_hash[8],  md5_hash[9],  md5_hash[10], md5_hash[11],\
> -   md5_hash[12], md5_hash[13], md5_hash[14], md5_hash[15]
> +#define CIFS_MF_SYMLINK_MD5_FORMAT "%16phN\n"
> +#define CIFS_MF_SYMLINK_MD5_ARGS(md5_hash) md5_hash
>
>  static int
>  symlink_hash(unsigned int link_len, const char *link_str, u8 *md5_hash)
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-by: Pavel Shilovsky <pshi...@microsoft.com>

-- 
Best regards,
Pavel Shilovsky


Re: [PATCH] cifs: use %16phN for formatting md5 sum

2016-12-14 Thread Pavel Shilovsky
2016-11-30 14:40 GMT-08:00 Rasmus Villemoes :
> Passing a gazillion arguments takes a lot of code:
>
> add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-253 (-253)
>
> Signed-off-by: Rasmus Villemoes 
> ---
>  fs/cifs/link.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/link.c b/fs/cifs/link.c
> index d031af8d3d4d..c4d996f78e1c 100644
> --- a/fs/cifs/link.c
> +++ b/fs/cifs/link.c
> @@ -45,13 +45,8 @@
> (CIFS_MF_SYMLINK_LINK_OFFSET + CIFS_MF_SYMLINK_LINK_MAXLEN)
>
>  #define CIFS_MF_SYMLINK_LEN_FORMAT "XSym\n%04u\n"
> -#define CIFS_MF_SYMLINK_MD5_FORMAT \
> -   "%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x\n"
> -#define CIFS_MF_SYMLINK_MD5_ARGS(md5_hash) \
> -   md5_hash[0],  md5_hash[1],  md5_hash[2],  md5_hash[3], \
> -   md5_hash[4],  md5_hash[5],  md5_hash[6],  md5_hash[7], \
> -   md5_hash[8],  md5_hash[9],  md5_hash[10], md5_hash[11],\
> -   md5_hash[12], md5_hash[13], md5_hash[14], md5_hash[15]
> +#define CIFS_MF_SYMLINK_MD5_FORMAT "%16phN\n"
> +#define CIFS_MF_SYMLINK_MD5_ARGS(md5_hash) md5_hash
>
>  static int
>  symlink_hash(unsigned int link_len, const char *link_str, u8 *md5_hash)
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-by: Pavel Shilovsky 

-- 
Best regards,
Pavel Shilovsky


Re: [PATCH 3.14 103/114] CIFS: Fix directory rename error

2014-09-15 Thread Pavel Shilovsky
2014-09-15 23:26 GMT+04:00 Greg Kroah-Hartman :
> 3.14-stable review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: Pavel Shilovsky 
>
> commit a07d322059db66b84c9eb4f98959df468e88b34b upstream.
>
> CIFS servers process nlink counts differently for files and directories.
> In cifs_rename() if we the request fails on the existing target, we
> try to remove it through cifs_unlink() but this is not what we want
> to do for directories. As the result the following sequence of commands
>
> mkdir {1,2}; mv -T 1 2; rmdir {1,2}; mkdir {1,2}; echo foo > 2/bar
>
> and XFS test generic/023 fail with -ENOENT error. That's why the second
> mkdir reuses the existing inode (target inode of the mv -T command) with
> S_DEAD flag.
>
> Fix this by checking whether the target is directory or not and
> calling cifs_rmdir() rather than cifs_unlink() for directories.
>
> Signed-off-by: Pavel Shilovsky 
> Signed-off-by: Steve French 
> Signed-off-by: Greg Kroah-Hartman 
>
> ---
>  fs/cifs/inode.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -1706,7 +1706,10 @@ cifs_rename(struct inode *source_dir, st
>  unlink_target:
> /* Try unlinking the target dentry if it's not negative */
> if (target_dentry->d_inode && (rc == -EACCES || rc == -EEXIST)) {
> -   tmprc = cifs_unlink(target_dir, target_dentry);
> +   if (d_is_dir(target_dentry))
> +   tmprc = cifs_rmdir(target_dir, target_dentry);
> +   else
> +   tmprc = cifs_unlink(target_dir, target_dentry);
> if (tmprc)
> goto cifs_rename_exit;
> rc = cifs_do_rename(xid, source_dentry, from_name,
>
>

Does the next stable-3.14.y have d_is_dir()? v3.14.18 doesn't compile this one.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.14 103/114] CIFS: Fix directory rename error

2014-09-15 Thread Pavel Shilovsky
2014-09-15 23:26 GMT+04:00 Greg Kroah-Hartman gre...@linuxfoundation.org:
 3.14-stable review patch.  If anyone has any objections, please let me know.

 --

 From: Pavel Shilovsky pshilov...@samba.org

 commit a07d322059db66b84c9eb4f98959df468e88b34b upstream.

 CIFS servers process nlink counts differently for files and directories.
 In cifs_rename() if we the request fails on the existing target, we
 try to remove it through cifs_unlink() but this is not what we want
 to do for directories. As the result the following sequence of commands

 mkdir {1,2}; mv -T 1 2; rmdir {1,2}; mkdir {1,2}; echo foo  2/bar

 and XFS test generic/023 fail with -ENOENT error. That's why the second
 mkdir reuses the existing inode (target inode of the mv -T command) with
 S_DEAD flag.

 Fix this by checking whether the target is directory or not and
 calling cifs_rmdir() rather than cifs_unlink() for directories.

 Signed-off-by: Pavel Shilovsky pshilov...@samba.org
 Signed-off-by: Steve French smfre...@gmail.com
 Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org

 ---
  fs/cifs/inode.c |5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 --- a/fs/cifs/inode.c
 +++ b/fs/cifs/inode.c
 @@ -1706,7 +1706,10 @@ cifs_rename(struct inode *source_dir, st
  unlink_target:
 /* Try unlinking the target dentry if it's not negative */
 if (target_dentry-d_inode  (rc == -EACCES || rc == -EEXIST)) {
 -   tmprc = cifs_unlink(target_dir, target_dentry);
 +   if (d_is_dir(target_dentry))
 +   tmprc = cifs_rmdir(target_dir, target_dentry);
 +   else
 +   tmprc = cifs_unlink(target_dir, target_dentry);
 if (tmprc)
 goto cifs_rename_exit;
 rc = cifs_do_rename(xid, source_dentry, from_name,



Does the next stable-3.14.y have d_is_dir()? v3.14.18 doesn't compile this one.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Pavel Shilovsky
2014-03-21 6:23 GMT+04:00 Steven Rostedt :
> On Thu, 20 Mar 2014 17:02:39 -0400
> Jeff Layton  wrote:
>
>> Eventually the server should just allow the read to complete even if
>> the client doesn't respond to the oplock break. It has to since clients
>> can suddenly drop off the net while holding an oplock. That should
>> allow everything to unwedge eventually (though it may take a while).
>>
>> If that's not happening then I'd be curious as to why...
>
> The problem is that the data is being filled in the page and the reader
> is waiting for the page lock to be released. The kworker for the reader
> will issue the complete() and unlock the page to wake up the reader.
>
> But because the other workqueue callback calls down_read(), and there
> can be a down_write() waiting for the reader to finish, this
> down_read() will block on the lock as well (rwsems are fair locks).
> This blocks the other workqueue callback from issuing the complete and
> page_unlock() that will wake up the reader that is holding the rwsem
> with down_read().
>
> DEADLOCK.

Thank you for reporting and clarifying the issue!

Read and write codepaths both obtain lock_sem for read and then wait
for cifsiod_wq to complete and release lock_sem. They don't do any
lock_sem operations inside their work task queued to cifsiod_wq. But
oplock code can obtain/release lock_sem in its work task. So, that's
why I agree with Jeff and suggest to move the oplock code to a
different work queue (cifsioopd_wq?) but leave read and write
codepaths use cifsiod_wq.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

2014-03-21 Thread Pavel Shilovsky
2014-03-21 6:23 GMT+04:00 Steven Rostedt rost...@goodmis.org:
 On Thu, 20 Mar 2014 17:02:39 -0400
 Jeff Layton jlay...@redhat.com wrote:

 Eventually the server should just allow the read to complete even if
 the client doesn't respond to the oplock break. It has to since clients
 can suddenly drop off the net while holding an oplock. That should
 allow everything to unwedge eventually (though it may take a while).

 If that's not happening then I'd be curious as to why...

 The problem is that the data is being filled in the page and the reader
 is waiting for the page lock to be released. The kworker for the reader
 will issue the complete() and unlock the page to wake up the reader.

 But because the other workqueue callback calls down_read(), and there
 can be a down_write() waiting for the reader to finish, this
 down_read() will block on the lock as well (rwsems are fair locks).
 This blocks the other workqueue callback from issuing the complete and
 page_unlock() that will wake up the reader that is holding the rwsem
 with down_read().

 DEADLOCK.

Thank you for reporting and clarifying the issue!

Read and write codepaths both obtain lock_sem for read and then wait
for cifsiod_wq to complete and release lock_sem. They don't do any
lock_sem operations inside their work task queued to cifsiod_wq. But
oplock code can obtain/release lock_sem in its work task. So, that's
why I agree with Jeff and suggest to move the oplock code to a
different work queue (cifsioopd_wq?) but leave read and write
codepaths use cifsiod_wq.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014-02-04 Thread Pavel Shilovsky
2014-02-01 Jeff Layton :
> On Fri, 17 Jan 2014 14:07:06 +0400
> Pavel Shilovsky  wrote:
>
>> This patch adds 3 flags:
>> 1) O_DENYREAD that doesn't permit read access,
>> 2) O_DENYWRITE that doesn't permit write access,
>> 3) O_DENYDELETE that doesn't permit delete or rename.
>>
>> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
>> this change can benefit cifs and nfs modules as well as Samba and
>> NFS file servers that export the same directory for Windows clients,
>> or Wine applications that access the same files simultaneously.
>>
>> These flags are only take affect for opens on mounts with new sharelock
>> option. They are translated to flock's flags:
>>
>> !O_DENYREAD  -> LOCK_READ  | LOCK_MAND
>> !O_DENYWRITE -> LOCK_WRITE | LOCK_MAND
>>
>> and set through flock_lock_file on a file. If the file can't be locked
>> due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
>> error code is returned.
>>
>> Create codepath is slightly changed to prevent data races on newly
>> created files: when open with O_CREAT can return -ESHAREDENIED error
>> for successfully created files due to a sharelock set by another task.
>>
>> Temporary disable O_DENYDELETE support - will enable it in further
>> patches.
>>
>> Signed-off-by: Pavel Shilovsky 
>> ---
>>  arch/alpha/include/uapi/asm/errno.h  |2 +
>>  arch/alpha/include/uapi/asm/fcntl.h  |3 ++
>>  arch/mips/include/uapi/asm/errno.h   |2 +
>>  arch/parisc/include/uapi/asm/errno.h |2 +
>>  arch/parisc/include/uapi/asm/fcntl.h |3 ++
>>  arch/sparc/include/uapi/asm/errno.h  |2 +
>>  arch/sparc/include/uapi/asm/fcntl.h  |3 ++
>>  fs/fcntl.c   |5 +-
>>  fs/locks.c   |   97 
>> +++---
>>  fs/namei.c   |   53 ++-
>>  fs/proc_namespace.c  |1 +
>>  include/linux/fs.h   |8 +++
>>  include/uapi/asm-generic/errno.h |2 +
>>  include/uapi/asm-generic/fcntl.h |   11 
>>  include/uapi/linux/fs.h  |1 +
>>  15 files changed, 185 insertions(+), 10 deletions(-)
>>
>
> You might consider breaking this patch into two. One patch that makes
> LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch
> that hooks that up to open(). Given the locking involved with the
> i_mutex it would be best to present this as a series of small,
> incremental changes.

Good point. So, we can break it into 2:
1) make flock actually work with LOCK_MAND on MS_SHARELOCK mounts,
2) replace flock+LOCK_MAND with open+O_DENY* flags.


>> diff --git a/arch/alpha/include/uapi/asm/errno.h 
>> b/arch/alpha/include/uapi/asm/errno.h
>> index 17f92aa..953a6d6 100644
>> --- a/arch/alpha/include/uapi/asm/errno.h
>> +++ b/arch/alpha/include/uapi/asm/errno.h
>> @@ -124,4 +124,6 @@
>>
>>  #define EHWPOISON139 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 140 /* File is locked with a sharelock */
>> +
>>  #endif
>> diff --git a/arch/alpha/include/uapi/asm/fcntl.h 
>> b/arch/alpha/include/uapi/asm/fcntl.h
>> index 09f49a6..265344b 100644
>> --- a/arch/alpha/include/uapi/asm/fcntl.h
>> +++ b/arch/alpha/include/uapi/asm/fcntl.h
>> @@ -33,6 +33,9 @@
>>
>>  #define O_PATH   04000
>>  #define __O_TMPFILE  01
>> +#define O_DENYREAD   02  /* Do not permit read access */
>> +#define O_DENYWRITE  04  /* Do not permit write access */
>> +#define O_DENYDELETE 010 /* Do not permit delete or rename */
>>
>>  #define F_GETLK  7
>>  #define F_SETLK  8
>> diff --git a/arch/mips/include/uapi/asm/errno.h 
>> b/arch/mips/include/uapi/asm/errno.h
>> index 02d645d..f1a4068 100644
>> --- a/arch/mips/include/uapi/asm/errno.h
>> +++ b/arch/mips/include/uapi/asm/errno.h
>> @@ -123,6 +123,8 @@
>>
>>  #define EHWPOISON168 /* Memory page has hardware error */
>>
>> +#define ESHAREDENIED 169 /* File is locked with a sharelock */
>> +
>>  #define EDQUOT   1133/* Quota exceeded */
>>
>>
>> diff --git a/arch/parisc/include/uapi/asm/errno.h 
>> b/arch/parisc/include/uapi/asm/errno.h
>> index f3a8aa5..654c232 100644
>> --- a/arch/parisc/include/uapi/asm/errno.h
>> +++ b/arch/parisc/include/uapi/asm/errno.h
>> @@ -124,4 +124,6 @@
>>
>>  #define EHWPO

Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014-02-04 Thread Pavel Shilovsky
2014-02-01 Jeff Layton jlay...@redhat.com:
 On Fri, 17 Jan 2014 14:07:06 +0400
 Pavel Shilovsky pias...@etersoft.ru wrote:

 This patch adds 3 flags:
 1) O_DENYREAD that doesn't permit read access,
 2) O_DENYWRITE that doesn't permit write access,
 3) O_DENYDELETE that doesn't permit delete or rename.

 Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
 this change can benefit cifs and nfs modules as well as Samba and
 NFS file servers that export the same directory for Windows clients,
 or Wine applications that access the same files simultaneously.

 These flags are only take affect for opens on mounts with new sharelock
 option. They are translated to flock's flags:

 !O_DENYREAD  - LOCK_READ  | LOCK_MAND
 !O_DENYWRITE - LOCK_WRITE | LOCK_MAND

 and set through flock_lock_file on a file. If the file can't be locked
 due conflicts with another open with O_DENY* flags, a new -ESHAREDENIED
 error code is returned.

 Create codepath is slightly changed to prevent data races on newly
 created files: when open with O_CREAT can return -ESHAREDENIED error
 for successfully created files due to a sharelock set by another task.

 Temporary disable O_DENYDELETE support - will enable it in further
 patches.

 Signed-off-by: Pavel Shilovsky pias...@etersoft.ru
 ---
  arch/alpha/include/uapi/asm/errno.h  |2 +
  arch/alpha/include/uapi/asm/fcntl.h  |3 ++
  arch/mips/include/uapi/asm/errno.h   |2 +
  arch/parisc/include/uapi/asm/errno.h |2 +
  arch/parisc/include/uapi/asm/fcntl.h |3 ++
  arch/sparc/include/uapi/asm/errno.h  |2 +
  arch/sparc/include/uapi/asm/fcntl.h  |3 ++
  fs/fcntl.c   |5 +-
  fs/locks.c   |   97 
 +++---
  fs/namei.c   |   53 ++-
  fs/proc_namespace.c  |1 +
  include/linux/fs.h   |8 +++
  include/uapi/asm-generic/errno.h |2 +
  include/uapi/asm-generic/fcntl.h |   11 
  include/uapi/linux/fs.h  |1 +
  15 files changed, 185 insertions(+), 10 deletions(-)


 You might consider breaking this patch into two. One patch that makes
 LOCK_MAND locks actually work and that adds MS_SHARELOCK, and one patch
 that hooks that up to open(). Given the locking involved with the
 i_mutex it would be best to present this as a series of small,
 incremental changes.

Good point. So, we can break it into 2:
1) make flock actually work with LOCK_MAND on MS_SHARELOCK mounts,
2) replace flock+LOCK_MAND with open+O_DENY* flags.


 diff --git a/arch/alpha/include/uapi/asm/errno.h 
 b/arch/alpha/include/uapi/asm/errno.h
 index 17f92aa..953a6d6 100644
 --- a/arch/alpha/include/uapi/asm/errno.h
 +++ b/arch/alpha/include/uapi/asm/errno.h
 @@ -124,4 +124,6 @@

  #define EHWPOISON139 /* Memory page has hardware error */

 +#define ESHAREDENIED 140 /* File is locked with a sharelock */
 +
  #endif
 diff --git a/arch/alpha/include/uapi/asm/fcntl.h 
 b/arch/alpha/include/uapi/asm/fcntl.h
 index 09f49a6..265344b 100644
 --- a/arch/alpha/include/uapi/asm/fcntl.h
 +++ b/arch/alpha/include/uapi/asm/fcntl.h
 @@ -33,6 +33,9 @@

  #define O_PATH   04000
  #define __O_TMPFILE  01
 +#define O_DENYREAD   02  /* Do not permit read access */
 +#define O_DENYWRITE  04  /* Do not permit write access */
 +#define O_DENYDELETE 010 /* Do not permit delete or rename */

  #define F_GETLK  7
  #define F_SETLK  8
 diff --git a/arch/mips/include/uapi/asm/errno.h 
 b/arch/mips/include/uapi/asm/errno.h
 index 02d645d..f1a4068 100644
 --- a/arch/mips/include/uapi/asm/errno.h
 +++ b/arch/mips/include/uapi/asm/errno.h
 @@ -123,6 +123,8 @@

  #define EHWPOISON168 /* Memory page has hardware error */

 +#define ESHAREDENIED 169 /* File is locked with a sharelock */
 +
  #define EDQUOT   1133/* Quota exceeded */


 diff --git a/arch/parisc/include/uapi/asm/errno.h 
 b/arch/parisc/include/uapi/asm/errno.h
 index f3a8aa5..654c232 100644
 --- a/arch/parisc/include/uapi/asm/errno.h
 +++ b/arch/parisc/include/uapi/asm/errno.h
 @@ -124,4 +124,6 @@

  #define EHWPOISON257 /* Memory page has hardware error */

 +#define ESHAREDENIED 258 /* File is locked with a sharelock */
 +
  #endif
 diff --git a/arch/parisc/include/uapi/asm/fcntl.h 
 b/arch/parisc/include/uapi/asm/fcntl.h
 index 34a46cb..5865964 100644
 --- a/arch/parisc/include/uapi/asm/fcntl.h
 +++ b/arch/parisc/include/uapi/asm/fcntl.h
 @@ -21,6 +21,9 @@

  #define O_PATH   02000
  #define __O_TMPFILE  04000
 +#define O_DENYREAD   02  /* Do not permit read access */
 +#define O_DENYWRITE  04  /* Do not permit write access */
 +#define O_DENYDELETE 010 /* Do not permit delete or rename */

  #define F_GETLK648
  #define F_SETLK649
 diff --git a/arch/sparc/include/uapi/asm/errno.h

Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-31 Thread Pavel Shilovsky
2014-01-17 Pavel Shilovsky :
> This patchset adds support of O_DENY* flags for Linux fs layer. These flags 
> can be used by any application that needs share reservations to organize a 
> file access. VFS already has some sort of this capability - now it's done 
> through flock/LOCK_MAND mechanis, but that approach is non-atomic. This 
> patchset build new capabilities on top of the existing one but doesn't bring 
> any changes into the flock call semantic.
>
> These flags can be used by NFS (built-in-kernel) and CIFS (Samba) servers and 
> Wine applications through VFS (for local filesystems) or CIFS/NFS modules. 
> This will help when e.g. Samba and NFS server share the same directory for 
> Windows and Linux users or Wine applications use Samba/NFS share to access 
> the same data from different clients.
>
> According to the previous discussions the most problematic question is how to 
> prevent situations like DoS attacks where e.g /lib/liba.so file can be open 
> with DENYREAD, or smth like this. That's why extra mount option 'sharelock' 
> is added for switching on/off O_DENY* flags processing. It allows to avoid 
> use of these flags on critical places (like /, /lib) and turn them on if we 
> really want it proccessed that way.
>
> So, we have 3 new flags:
> O_DENYREAD - to prevent other opens with read access,
> O_DENYWRITE - to prevent other opens with write access,
> O_DENYDELETE - to prevent delete operations
>
> The patchset avoids data races problem on newely created files: open with 
> O_CREAT can return the -ESHAREDENIED error for a successfully created file if 
> this files was locked with a sharelock by another task. Also, it turns 
> flock/LOCK_MAND capability off for mounts with 'sharelock' option. This lets 
> not mix one share reservation approach with another.
>
> Patches #1, #2 and #3 add O_DENY* flags to fcntl and implement VFS part. A 
> patch #4 is related to CIFS client changes, #5 -- NFSD, #6 and #7 describe 
> NFSv4 client parts.
>
> The preliminary patch for Samba that replaces the existing use of 
> flock/LOCK_MAND mechanism with O_DENY* flags:
> http://git.etersoft.ru/people/piastry/packages/?p=samba.git;a=commitdiff;h=173070262b7f29134ad6b2e49a760017c11aec4a
>
> The future part of open man page patch:
>
> -
> O_DENYREAD, O_DENYWRIYE, O_DENYDELETE - used to inforce a mandatory share 
> reservation scheme of the file access. If these flag is passed, the open 
> fails with -ESHAREDENIED in following cases:
> 1) if O_DENYREAD flag is specified and there is another open with READ access 
> to the file;
> 2) if O_DENYWRITE flag is specified and there is another open with WRITE 
> access to the file;
> 3) if READ access is requested and there is another open with O_DENYREAD 
> flags;
> 4) if WRITE access is requested and there is another open with O_DENYWRITE 
> flags.
>
> If O_DENYDELETE flag is specified and the open succeded, any further unlink 
> operation will fail with -ESHAREDENIED untill this open is closed. Now this 
> flag is processed by VFS and CIFS filesystem. NFS returns -EINVAL for opens 
> with this flag.
>
> This mechanism is done through flocks. If O_DENY* flags are specified, flock 
> syscall is processed after the open. The share modes are translated into 
> flock according the following rules:
> 1) !O_DENYREAD   -> LOCK_READ   | LOCK_MAND
> 2) !O_DENYWRITE  -> LOCK_WRITE  | LOCK_MAND
> 3) !O_DENYDELETE -> LOCK_DELETE | LOCK_MAND
> -
>
> Pavel Shilovsky (7):
>   VFS: Introduce new O_DENY* open flags
>   VFS: Add O_DENYDELETE support for VFS
>   locks: Disable LOCK_MAND support for MS_SHARELOCK mounts
>   CIFS: Add O_DENY* open flags support
>   NFSD: Pass share reservations flags to VFS
>   NFSv4: Add deny state handling for nfs4_state struct
>   NFSv4: Add O_DENY* open flags support
>
>  arch/alpha/include/uapi/asm/errno.h  |2 +
>  arch/alpha/include/uapi/asm/fcntl.h  |3 +
>  arch/mips/include/uapi/asm/errno.h   |2 +
>  arch/parisc/include/uapi/asm/errno.h |2 +
>  arch/parisc/include/uapi/asm/fcntl.h |3 +
>  arch/sparc/include/uapi/asm/errno.h  |2 +
>  arch/sparc/include/uapi/asm/fcntl.h  |3 +
>  fs/cifs/cifsacl.c|2 +
>  fs/cifs/cifsfs.c |2 +-
>  fs/cifs/cifsglob.h   |   17 +-
>  fs/cifs/cifssmb.c|2 +-
>  fs/cifs/dir.c|6 +
>  fs/cifs/file.c   |   20 ++-
>  fs/cifs/inode.c  |   12 +-
>  fs/cifs/link.c   |2 +
>  fs/cifs/netmisc.c|2 +-
>  fs/cifs/smb1ops.c|3 +
>  fs/cifs/smb2inode.c  

Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-31 Thread Pavel Shilovsky
2014-01-17 Pavel Shilovsky pias...@etersoft.ru:
 This patchset adds support of O_DENY* flags for Linux fs layer. These flags 
 can be used by any application that needs share reservations to organize a 
 file access. VFS already has some sort of this capability - now it's done 
 through flock/LOCK_MAND mechanis, but that approach is non-atomic. This 
 patchset build new capabilities on top of the existing one but doesn't bring 
 any changes into the flock call semantic.

 These flags can be used by NFS (built-in-kernel) and CIFS (Samba) servers and 
 Wine applications through VFS (for local filesystems) or CIFS/NFS modules. 
 This will help when e.g. Samba and NFS server share the same directory for 
 Windows and Linux users or Wine applications use Samba/NFS share to access 
 the same data from different clients.

 According to the previous discussions the most problematic question is how to 
 prevent situations like DoS attacks where e.g /lib/liba.so file can be open 
 with DENYREAD, or smth like this. That's why extra mount option 'sharelock' 
 is added for switching on/off O_DENY* flags processing. It allows to avoid 
 use of these flags on critical places (like /, /lib) and turn them on if we 
 really want it proccessed that way.

 So, we have 3 new flags:
 O_DENYREAD - to prevent other opens with read access,
 O_DENYWRITE - to prevent other opens with write access,
 O_DENYDELETE - to prevent delete operations

 The patchset avoids data races problem on newely created files: open with 
 O_CREAT can return the -ESHAREDENIED error for a successfully created file if 
 this files was locked with a sharelock by another task. Also, it turns 
 flock/LOCK_MAND capability off for mounts with 'sharelock' option. This lets 
 not mix one share reservation approach with another.

 Patches #1, #2 and #3 add O_DENY* flags to fcntl and implement VFS part. A 
 patch #4 is related to CIFS client changes, #5 -- NFSD, #6 and #7 describe 
 NFSv4 client parts.

 The preliminary patch for Samba that replaces the existing use of 
 flock/LOCK_MAND mechanism with O_DENY* flags:
 http://git.etersoft.ru/people/piastry/packages/?p=samba.git;a=commitdiff;h=173070262b7f29134ad6b2e49a760017c11aec4a

 The future part of open man page patch:

 -
 O_DENYREAD, O_DENYWRIYE, O_DENYDELETE - used to inforce a mandatory share 
 reservation scheme of the file access. If these flag is passed, the open 
 fails with -ESHAREDENIED in following cases:
 1) if O_DENYREAD flag is specified and there is another open with READ access 
 to the file;
 2) if O_DENYWRITE flag is specified and there is another open with WRITE 
 access to the file;
 3) if READ access is requested and there is another open with O_DENYREAD 
 flags;
 4) if WRITE access is requested and there is another open with O_DENYWRITE 
 flags.

 If O_DENYDELETE flag is specified and the open succeded, any further unlink 
 operation will fail with -ESHAREDENIED untill this open is closed. Now this 
 flag is processed by VFS and CIFS filesystem. NFS returns -EINVAL for opens 
 with this flag.

 This mechanism is done through flocks. If O_DENY* flags are specified, flock 
 syscall is processed after the open. The share modes are translated into 
 flock according the following rules:
 1) !O_DENYREAD   - LOCK_READ   | LOCK_MAND
 2) !O_DENYWRITE  - LOCK_WRITE  | LOCK_MAND
 3) !O_DENYDELETE - LOCK_DELETE | LOCK_MAND
 -

 Pavel Shilovsky (7):
   VFS: Introduce new O_DENY* open flags
   VFS: Add O_DENYDELETE support for VFS
   locks: Disable LOCK_MAND support for MS_SHARELOCK mounts
   CIFS: Add O_DENY* open flags support
   NFSD: Pass share reservations flags to VFS
   NFSv4: Add deny state handling for nfs4_state struct
   NFSv4: Add O_DENY* open flags support

  arch/alpha/include/uapi/asm/errno.h  |2 +
  arch/alpha/include/uapi/asm/fcntl.h  |3 +
  arch/mips/include/uapi/asm/errno.h   |2 +
  arch/parisc/include/uapi/asm/errno.h |2 +
  arch/parisc/include/uapi/asm/fcntl.h |3 +
  arch/sparc/include/uapi/asm/errno.h  |2 +
  arch/sparc/include/uapi/asm/fcntl.h  |3 +
  fs/cifs/cifsacl.c|2 +
  fs/cifs/cifsfs.c |2 +-
  fs/cifs/cifsglob.h   |   17 +-
  fs/cifs/cifssmb.c|2 +-
  fs/cifs/dir.c|6 +
  fs/cifs/file.c   |   20 ++-
  fs/cifs/inode.c  |   12 +-
  fs/cifs/link.c   |2 +
  fs/cifs/netmisc.c|2 +-
  fs/cifs/smb1ops.c|3 +
  fs/cifs/smb2inode.c  |1 +
  fs/cifs/smb2maperror.c   |2 +-
  fs/cifs/smb2ops.c|6 +
  fs/cifs/smb2pdu.c|2 +-
  fs/fcntl.c   |5 +-
  fs/locks.c   |  161 --
  fs/namei.c   |   56 +-
  fs/nfs/dir.c |   39 -
  fs/nfs

Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-21 Thread Pavel Shilovsky
2014/1/20 Volker Lendecke :
>> You can try to use O_PATH flag. It doesn't give you a delete access
>> but should be ok because further deleting will be done without file
>> descriptor -- through unlink syscall.
>
> Ok, I did not know about O_PATH. Thanks for that!
>
> So I do an open with O_PATH. How do I then make sure that
> nobody else has a O_DENYDELETE set without doing the unlink
> itself?

It's not possible with the current API to do it through open syscall.
Another possibility is to look at /proc/locks. But I think we really
need O_DELETE flag that will force a file to be removed on close - we
will be able to do O_DENYDELETE checks atomically.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014-01-21 Thread Pavel Shilovsky
2014/1/20 One Thousand Gnomes :
>> > Shouldn't this also check for CAP_SYS_DAC or some similar permission so
>> > that root can override such a mess (eg to fix full disks in an
>> > emergency) ?
>>
>> May be it's better to let root an ability to remount the system
>> without sharelock mount option and then fix an emergency?
>
> Doesn't that involve breaking the service for all users of the system
> relying upon those locks, while root being allowed to ignore the locks
> does not ?

If we allow root to remount without "sharelock" (or bypass conflict
checks), it will definitely break the service for all users. Probably
it's better to stop the service (that cause all sharelocks to be
unlocked), fix the emergency and start the service again.

In the current state, the patchset doesn't allow any sort of ignoring
those locks for mounts with "sharelock" option (either remount without
"sharelock" or set special capabilities). It was done to make sure
nothing breaks applications relying upon sharelock behavior. Also,
that's why "sharelock" mount option was added: this behavior is
dangerous to be on critical system paths like "/" or "/lib" and not
suitable for global use.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014-01-21 Thread Pavel Shilovsky
2014/1/20 One Thousand Gnomes gno...@lxorguk.ukuu.org.uk:
  Shouldn't this also check for CAP_SYS_DAC or some similar permission so
  that root can override such a mess (eg to fix full disks in an
  emergency) ?

 May be it's better to let root an ability to remount the system
 without sharelock mount option and then fix an emergency?

 Doesn't that involve breaking the service for all users of the system
 relying upon those locks, while root being allowed to ignore the locks
 does not ?

If we allow root to remount without sharelock (or bypass conflict
checks), it will definitely break the service for all users. Probably
it's better to stop the service (that cause all sharelocks to be
unlocked), fix the emergency and start the service again.

In the current state, the patchset doesn't allow any sort of ignoring
those locks for mounts with sharelock option (either remount without
sharelock or set special capabilities). It was done to make sure
nothing breaks applications relying upon sharelock behavior. Also,
that's why sharelock mount option was added: this behavior is
dangerous to be on critical system paths like / or /lib and not
suitable for global use.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-21 Thread Pavel Shilovsky
2014/1/20 Volker Lendecke volker.lende...@sernet.de:
 You can try to use O_PATH flag. It doesn't give you a delete access
 but should be ok because further deleting will be done without file
 descriptor -- through unlink syscall.

 Ok, I did not know about O_PATH. Thanks for that!

 So I do an open with O_PATH. How do I then make sure that
 nobody else has a O_DENYDELETE set without doing the unlink
 itself?

It's not possible with the current API to do it through open syscall.
Another possibility is to look at /proc/locks. But I think we really
need O_DELETE flag that will force a file to be removed on close - we
will be able to do O_DENYDELETE checks atomically.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/7] VFS: Introduce new O_DENY* open flags

2014-01-20 Thread Pavel Shilovsky
2014/1/17 One Thousand Gnomes :
>> +#define ESHAREDENIED 258 /* File is locked with a sharelock */
>
> Have you prepared C library patches to match this ?

I don't have it for now.

>
> (and why not just use EPERM, it has the meaning you want already)

We need to determine if we have a share reservation error to map it
accurately in file servers and wine.

>
>
>> + * Check to see if there's a share_reservation conflict. 
>> LOCK_READ/LOCK_WRITE
>> + * tell us whether the reservation allows other readers and writers.
>> + */
>> +static int
>> +locks_mand_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
>> +{
>
> Shouldn't this also check for CAP_SYS_DAC or some similar permission so
> that root can override such a mess (eg to fix full disks in an
> emergency) ?

May be it's better to let root an ability to remount the system
without sharelock mount option and then fix an emergency?

>
>
>> +
>> + /*
>> +  * For sharelock mounts if a file was created but not opened, we need
>> +  * to keep parent i_mutex until we finish the open to prevent races 
>> when
>> +  * somebody opens newly created by us file and locks it with a 
>> sharelock
>> +  * before we open it.
>> +  */
>> + if (IS_SHARELOCK(dir->d_inode) && error > 0 && *opened & FILE_CREATED) 
>> {
>> + /* Don't check for write permission, don't truncate */
>> + open_flag &= ~O_TRUNC;
>> + will_truncate = false;
>> + acc_mode = MAY_OPEN;
>> + path_to_nameidata(path, nd);
>> +
>> + error = may_open(>path, acc_mode, open_flag);
>> + if (error) {
>> + mutex_unlock(>d_inode->i_mutex);
>> + goto out;
>> + }
>> + file->f_path.mnt = nd->path.mnt;
>> + error = finish_open(file, nd->path.dentry, NULL, opened);
>> + if (error) {
>> + mutex_unlock(>d_inode->i_mutex);
>> + if (error == -EOPENSTALE)
>> + goto stale_open;
>> + goto out;
>> + }
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(>d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + goto opened;
>> + }
>> +
>>   mutex_unlock(>d_inode->i_mutex);
>
> What stops the file system changing mount flags via a remount between
> these two ?

Nothing, but it doesn't bring problems here: if the first IS_SHARELOCK
condition is true, we don't get into the second (see 'goto opened').

>
>>
>>   if (error <= 0) {
>> @@ -3034,6 +3073,18 @@ finish_open_created:
>>   goto stale_open;
>>   goto out;
>>   }
>> +
>> + if (IS_SHARELOCK(dir->d_inode)) {
>> + /*
>> +  * Lock parent i_mutex to prevent races with sharelocks on
>> +  * newly created files.
>> +  */
>> +     mutex_lock(>d_inode->i_mutex);
>> + error = sharelock_lock_file(file);
>> + mutex_unlock(>d_inode->i_mutex);
>> + if (error)
>> + goto exit_fput;
>> + }
>>  opened:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-20 Thread Pavel Shilovsky
2014/1/20 Volker Lendecke :
> Hi!
>
> On Fri, Jan 17, 2014 at 02:07:05PM +0400, Pavel Shilovsky wrote:
>> If O_DENYDELETE flag is specified and the open succeded,
>> any further unlink operation will fail with -ESHAREDENIED
>> untill this open is closed. Now this flag is processed by
>> VFS and CIFS filesystem. NFS returns -EINVAL for opens
>> with this flag.
>
> This looks really, really good, thanks!
>
> One question: If Samba wants to open a file for delete
> access, there's no corresponding flag in the open
> permissions. There can be the case where Samba wants to open
> *just* for future unlink, no read or write access required.
> Is there a way to achieve this atomically correct?

You can try to use O_PATH flag. It doesn't give you a delete access
but should be ok because further deleting will be done without file
descriptor -- through unlink syscall.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-20 Thread Pavel Shilovsky
2014/1/17 Frank Filz :
> This looks wonderful and will be useful to the Ganesha user space NFS server
> also.
>
> I do have a couple questions.
>
> 1. How will this interact with the idea of private locks from the patch set
> Jeff Layton has been pushing?

They don't touch each other.

>
> 2. If a process opens multiple file descriptors with deny modes, will they
> conflict with each other (which is the behavior we will want for Ganesha).

Yes, a deny mode is associated with file descriptor - so, it will
conflict with any other access/deny modes of file descriptors from any
process.

>
> 3. Is there any functionality to upgrade or downgrade the access and deny
> modes (thinking in terms of NFS v4 support of OPEN upgrade and
> OPEN_DOWNGRADE operations).

The proposed patchset doesn't allow to change deny modes after an open
is done. But we can add a functionality to let flock syscall change
deny modes as on option.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/7] Add O_DENY* support for VFS and CIFS/NFS

2014-01-20 Thread Pavel Shilovsky
2014/1/17 Frank Filz ffilz...@mindspring.com:
 This looks wonderful and will be useful to the Ganesha user space NFS server
 also.

 I do have a couple questions.

 1. How will this interact with the idea of private locks from the patch set
 Jeff Layton has been pushing?

They don't touch each other.


 2. If a process opens multiple file descriptors with deny modes, will they
 conflict with each other (which is the behavior we will want for Ganesha).

Yes, a deny mode is associated with file descriptor - so, it will
conflict with any other access/deny modes of file descriptors from any
process.


 3. Is there any functionality to upgrade or downgrade the access and deny
 modes (thinking in terms of NFS v4 support of OPEN upgrade and
 OPEN_DOWNGRADE operations).

The proposed patchset doesn't allow to change deny modes after an open
is done. But we can add a functionality to let flock syscall change
deny modes as on option.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >