Re: [PATCH 34/33] netfs: Pass flag rather than use in_softirq()

2021-02-19 Thread Sebastian Andrzej Siewior
On 2021-02-18 14:02:36 [+], David Howells wrote:
> How about the attached instead?

Thank you for that flag.

> David

Sebastian


Re: [PATCH 34/33] netfs: Pass flag rather than use in_softirq()

2021-02-18 Thread Marc Dionne
On Thu, Feb 18, 2021 at 10:03 AM David Howells  wrote:
>
> Christoph Hellwig  wrote:
>
> > On Tue, Feb 16, 2021 at 09:29:31AM +, David Howells wrote:
> > > Is there a better way to do it?  The intent is to process the assessment
> > > phase in the calling thread's context if possible rather than bumping over
> > > to a worker thread.  For synchronous I/O, for example, that's done in the
> > > caller's thread.  Maybe that's the answer - if it's known to be
> > > asynchronous, I have to punt, but otherwise don't have to.
> >
> > Yes, i think you want an explicit flag instead.
>
> How about the attached instead?
>
> David
> ---
> commit 29b3e9eed616db01f15c7998c062b4e501ea6582
> Author: David Howells 
> Date:   Mon Feb 15 21:56:43 2021 +
>
> netfs: Pass flag rather than use in_softirq()
>
> The in_softirq() in netfs_rreq_terminated() works fine for the cache being
> on a normal disk, as the completion handlers may get called in softirq
> context, but for an NVMe drive, the completion handler may get called in
> IRQ context.
>
> Fix to pass a flag to netfs_subreq_terminated() to indicate whether we
> think the function isn't being called from a context in which we can do
> allocations, waits and I/O submissions (such as softirq or interrupt
> context).  If this flag is set, netfs lib has to punt to a worker thread 
> to
> handle anything like that.
>
> The symptom involves warnings like the following appearing and the kernel
> hanging:
>
>  WARNING: CPU: 0 PID: 0 at kernel/softirq.c:175 
> __local_bh_enable_ip+0x35/0x50
>  ...
>  RIP: 0010:__local_bh_enable_ip+0x35/0x50
>  ...
>  Call Trace:
>   
>   rxrpc_kernel_begin_call+0x7d/0x1b0 [rxrpc]
>   ? afs_rx_new_call+0x40/0x40 [kafs]
>   ? afs_alloc_call+0x28/0x120 [kafs]
>   afs_make_call+0x120/0x510 [kafs]
>   ? afs_rx_new_call+0x40/0x40 [kafs]
>   ? afs_alloc_flat_call+0xba/0x100 [kafs]
>   ? __kmalloc+0x167/0x2f0
>   ? afs_alloc_flat_call+0x9b/0x100 [kafs]
>   afs_wait_for_operation+0x2d/0x200 [kafs]
>   afs_do_sync_operation+0x16/0x20 [kafs]
>   afs_req_issue_op+0x8c/0xb0 [kafs]
>   netfs_rreq_assess+0x125/0x7d0 [netfs]
>   ? cachefiles_end_operation+0x40/0x40 [cachefiles]
>   netfs_subreq_terminated+0x117/0x220 [netfs]
>   cachefiles_read_complete+0x21/0x60 [cachefiles]
>   iomap_dio_bio_end_io+0xdd/0x110
>   blk_update_request+0x20a/0x380
>   blk_mq_end_request+0x1c/0x120
>   nvme_process_cq+0x159/0x1f0 [nvme]
>   nvme_irq+0x10/0x20 [nvme]
>   __handle_irq_event_percpu+0x37/0x150
>   handle_irq_event+0x49/0xb0
>   handle_edge_irq+0x7c/0x200
>   asm_call_irq_on_stack+0xf/0x20
>   
>   common_interrupt+0xad/0x120
>   asm_common_interrupt+0x1e/0x40
>  ...
>
> Reported-by: Marc Dionne 
> Signed-off-by: David Howells 
> cc: Matthew Wilcox 
> cc: linux...@kvack.org
> cc: linux-cach...@redhat.com
> cc: linux-...@lists.infradead.org
> cc: linux-...@vger.kernel.org
> cc: linux-c...@vger.kernel.org
> cc: ceph-de...@vger.kernel.org
> cc: v9fs-develo...@lists.sourceforge.net
> cc: linux-fsde...@vger.kernel.org
>
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index 8f28d4f4cfd7..6dcdbbfb48e2 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -223,7 +223,7 @@ static void afs_fetch_data_notify(struct afs_operation 
> *op)
>
> if (subreq) {
> __set_bit(NETFS_SREQ_CLEAR_TAIL, >flags);
> -   netfs_subreq_terminated(subreq, error ?: req->actual_len);
> +   netfs_subreq_terminated(subreq, error ?: req->actual_len, 
> false);
> req->subreq = NULL;
> } else if (req->done) {
> req->done(req);
> @@ -289,7 +289,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest 
> *subreq)
>
> fsreq = afs_alloc_read(GFP_NOFS);
> if (!fsreq)
> -   return netfs_subreq_terminated(subreq, -ENOMEM);
> +   return netfs_subreq_terminated(subreq, -ENOMEM, false);
>
> fsreq->subreq   = subreq;
> fsreq->pos  = subreq->start + subreq->transferred;
> @@ -304,7 +304,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest 
> *subreq)
>
> ret = afs_fetch_data(fsreq->vnode, fsreq);
> if (ret < 0)
> -   return netfs_subreq_terminated(subreq, ret);
> +   return netfs_subreq_terminated(subreq, ret, false);
>  }
>
>  static int afs_symlink_readpage(struct page *page)
> diff --git a/fs/cachefiles/rdwr2.c b/fs/cachefiles/rdwr2.c
> index 4cea5a2a2d6e..40668bfe6688 100644
> --- a/fs/cachefiles/rdwr2.c
> +++ b/fs/cachefiles/rdwr2.c
> @@ -23,6 +23,7 @@ struct cachefiles_kiocb {
> };
> netfs_io_terminated_t   term_func;
> void*term_func_priv;
> +   boolwas_async;
>  };
>
>  static inline void 

Re: [PATCH 34/33] netfs: Pass flag rather than use in_softirq()

2021-02-18 Thread Marc Dionne
On Thu, Feb 18, 2021 at 10:03 AM David Howells  wrote:
>
> Christoph Hellwig  wrote:
>
> > On Tue, Feb 16, 2021 at 09:29:31AM +, David Howells wrote:
> > > Is there a better way to do it?  The intent is to process the assessment
> > > phase in the calling thread's context if possible rather than bumping over
> > > to a worker thread.  For synchronous I/O, for example, that's done in the
> > > caller's thread.  Maybe that's the answer - if it's known to be
> > > asynchronous, I have to punt, but otherwise don't have to.
> >
> > Yes, i think you want an explicit flag instead.
>
> How about the attached instead?
>
> David
> ---
> commit 29b3e9eed616db01f15c7998c062b4e501ea6582
> Author: David Howells 
> Date:   Mon Feb 15 21:56:43 2021 +
>
> netfs: Pass flag rather than use in_softirq()
>
> The in_softirq() in netfs_rreq_terminated() works fine for the cache being
> on a normal disk, as the completion handlers may get called in softirq
> context, but for an NVMe drive, the completion handler may get called in
> IRQ context.
>
> Fix to pass a flag to netfs_subreq_terminated() to indicate whether we
> think the function isn't being called from a context in which we can do
> allocations, waits and I/O submissions (such as softirq or interrupt
> context).  If this flag is set, netfs lib has to punt to a worker thread 
> to
> handle anything like that.
>
> The symptom involves warnings like the following appearing and the kernel
> hanging:
>
>  WARNING: CPU: 0 PID: 0 at kernel/softirq.c:175 
> __local_bh_enable_ip+0x35/0x50
>  ...
>  RIP: 0010:__local_bh_enable_ip+0x35/0x50
>  ...
>  Call Trace:
>   
>   rxrpc_kernel_begin_call+0x7d/0x1b0 [rxrpc]
>   ? afs_rx_new_call+0x40/0x40 [kafs]
>   ? afs_alloc_call+0x28/0x120 [kafs]
>   afs_make_call+0x120/0x510 [kafs]
>   ? afs_rx_new_call+0x40/0x40 [kafs]
>   ? afs_alloc_flat_call+0xba/0x100 [kafs]
>   ? __kmalloc+0x167/0x2f0
>   ? afs_alloc_flat_call+0x9b/0x100 [kafs]
>   afs_wait_for_operation+0x2d/0x200 [kafs]
>   afs_do_sync_operation+0x16/0x20 [kafs]
>   afs_req_issue_op+0x8c/0xb0 [kafs]
>   netfs_rreq_assess+0x125/0x7d0 [netfs]
>   ? cachefiles_end_operation+0x40/0x40 [cachefiles]
>   netfs_subreq_terminated+0x117/0x220 [netfs]
>   cachefiles_read_complete+0x21/0x60 [cachefiles]
>   iomap_dio_bio_end_io+0xdd/0x110
>   blk_update_request+0x20a/0x380
>   blk_mq_end_request+0x1c/0x120
>   nvme_process_cq+0x159/0x1f0 [nvme]
>   nvme_irq+0x10/0x20 [nvme]
>   __handle_irq_event_percpu+0x37/0x150
>   handle_irq_event+0x49/0xb0
>   handle_edge_irq+0x7c/0x200
>   asm_call_irq_on_stack+0xf/0x20
>   
>   common_interrupt+0xad/0x120
>   asm_common_interrupt+0x1e/0x40
>  ...
>
> Reported-by: Marc Dionne 
> Signed-off-by: David Howells 
> cc: Matthew Wilcox 
> cc: linux...@kvack.org
> cc: linux-cach...@redhat.com
> cc: linux-...@lists.infradead.org
> cc: linux-...@vger.kernel.org
> cc: linux-c...@vger.kernel.org
> cc: ceph-de...@vger.kernel.org
> cc: v9fs-develo...@lists.sourceforge.net
> cc: linux-fsde...@vger.kernel.org
>
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index 8f28d4f4cfd7..6dcdbbfb48e2 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -223,7 +223,7 @@ static void afs_fetch_data_notify(struct afs_operation 
> *op)
>
> if (subreq) {
> __set_bit(NETFS_SREQ_CLEAR_TAIL, >flags);
> -   netfs_subreq_terminated(subreq, error ?: req->actual_len);
> +   netfs_subreq_terminated(subreq, error ?: req->actual_len, 
> false);
> req->subreq = NULL;
> } else if (req->done) {
> req->done(req);
> @@ -289,7 +289,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest 
> *subreq)
>
> fsreq = afs_alloc_read(GFP_NOFS);
> if (!fsreq)
> -   return netfs_subreq_terminated(subreq, -ENOMEM);
> +   return netfs_subreq_terminated(subreq, -ENOMEM, false);
>
> fsreq->subreq   = subreq;
> fsreq->pos  = subreq->start + subreq->transferred;
> @@ -304,7 +304,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest 
> *subreq)
>
> ret = afs_fetch_data(fsreq->vnode, fsreq);
> if (ret < 0)
> -   return netfs_subreq_terminated(subreq, ret);
> +   return netfs_subreq_terminated(subreq, ret, false);
>  }
>
>  static int afs_symlink_readpage(struct page *page)
> diff --git a/fs/cachefiles/rdwr2.c b/fs/cachefiles/rdwr2.c
> index 4cea5a2a2d6e..40668bfe6688 100644
> --- a/fs/cachefiles/rdwr2.c
> +++ b/fs/cachefiles/rdwr2.c
> @@ -23,6 +23,7 @@ struct cachefiles_kiocb {
> };
> netfs_io_terminated_t   term_func;
> void*term_func_priv;
> +   boolwas_async;
>  };
>
>  static inline void 

[PATCH 34/33] netfs: Pass flag rather than use in_softirq()

2021-02-18 Thread David Howells
Christoph Hellwig  wrote:

> On Tue, Feb 16, 2021 at 09:29:31AM +, David Howells wrote:
> > Is there a better way to do it?  The intent is to process the assessment
> > phase in the calling thread's context if possible rather than bumping over
> > to a worker thread.  For synchronous I/O, for example, that's done in the
> > caller's thread.  Maybe that's the answer - if it's known to be
> > asynchronous, I have to punt, but otherwise don't have to.
> 
> Yes, i think you want an explicit flag instead.

How about the attached instead?

David
---
commit 29b3e9eed616db01f15c7998c062b4e501ea6582
Author: David Howells 
Date:   Mon Feb 15 21:56:43 2021 +

netfs: Pass flag rather than use in_softirq()

The in_softirq() in netfs_rreq_terminated() works fine for the cache being
on a normal disk, as the completion handlers may get called in softirq
context, but for an NVMe drive, the completion handler may get called in
IRQ context.

Fix to pass a flag to netfs_subreq_terminated() to indicate whether we
think the function isn't being called from a context in which we can do
allocations, waits and I/O submissions (such as softirq or interrupt
context).  If this flag is set, netfs lib has to punt to a worker thread to
handle anything like that.

The symptom involves warnings like the following appearing and the kernel
hanging:

 WARNING: CPU: 0 PID: 0 at kernel/softirq.c:175 
__local_bh_enable_ip+0x35/0x50
 ...
 RIP: 0010:__local_bh_enable_ip+0x35/0x50
 ...
 Call Trace:
  
  rxrpc_kernel_begin_call+0x7d/0x1b0 [rxrpc]
  ? afs_rx_new_call+0x40/0x40 [kafs]
  ? afs_alloc_call+0x28/0x120 [kafs]
  afs_make_call+0x120/0x510 [kafs]
  ? afs_rx_new_call+0x40/0x40 [kafs]
  ? afs_alloc_flat_call+0xba/0x100 [kafs]
  ? __kmalloc+0x167/0x2f0
  ? afs_alloc_flat_call+0x9b/0x100 [kafs]
  afs_wait_for_operation+0x2d/0x200 [kafs]
  afs_do_sync_operation+0x16/0x20 [kafs]
  afs_req_issue_op+0x8c/0xb0 [kafs]
  netfs_rreq_assess+0x125/0x7d0 [netfs]
  ? cachefiles_end_operation+0x40/0x40 [cachefiles]
  netfs_subreq_terminated+0x117/0x220 [netfs]
  cachefiles_read_complete+0x21/0x60 [cachefiles]
  iomap_dio_bio_end_io+0xdd/0x110
  blk_update_request+0x20a/0x380
  blk_mq_end_request+0x1c/0x120
  nvme_process_cq+0x159/0x1f0 [nvme]
  nvme_irq+0x10/0x20 [nvme]
  __handle_irq_event_percpu+0x37/0x150
  handle_irq_event+0x49/0xb0
  handle_edge_irq+0x7c/0x200
  asm_call_irq_on_stack+0xf/0x20
  
  common_interrupt+0xad/0x120
  asm_common_interrupt+0x1e/0x40
 ...

Reported-by: Marc Dionne 
Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f28d4f4cfd7..6dcdbbfb48e2 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -223,7 +223,7 @@ static void afs_fetch_data_notify(struct afs_operation *op)
 
if (subreq) {
__set_bit(NETFS_SREQ_CLEAR_TAIL, >flags);
-   netfs_subreq_terminated(subreq, error ?: req->actual_len);
+   netfs_subreq_terminated(subreq, error ?: req->actual_len, 
false);
req->subreq = NULL;
} else if (req->done) {
req->done(req);
@@ -289,7 +289,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest 
*subreq)
 
fsreq = afs_alloc_read(GFP_NOFS);
if (!fsreq)
-   return netfs_subreq_terminated(subreq, -ENOMEM);
+   return netfs_subreq_terminated(subreq, -ENOMEM, false);
 
fsreq->subreq   = subreq;
fsreq->pos  = subreq->start + subreq->transferred;
@@ -304,7 +304,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest 
*subreq)
 
ret = afs_fetch_data(fsreq->vnode, fsreq);
if (ret < 0)
-   return netfs_subreq_terminated(subreq, ret);
+   return netfs_subreq_terminated(subreq, ret, false);
 }
 
 static int afs_symlink_readpage(struct page *page)
diff --git a/fs/cachefiles/rdwr2.c b/fs/cachefiles/rdwr2.c
index 4cea5a2a2d6e..40668bfe6688 100644
--- a/fs/cachefiles/rdwr2.c
+++ b/fs/cachefiles/rdwr2.c
@@ -23,6 +23,7 @@ struct cachefiles_kiocb {
};
netfs_io_terminated_t   term_func;
void*term_func_priv;
+   boolwas_async;
 };
 
 static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
@@ -43,10 +44,9 @@ static void cachefiles_read_complete(struct kiocb *iocb, 
long ret, long ret2)
_enter("%ld,%ld", ret, ret2);
 
if (ki->term_func) {
-   if (ret < 0)
-