Re: [PATCH 2/2] nbd: don't start req until after the dead connection logic

2018-05-17 Thread Bart Van Assche
On Thu, 2018-05-17 at 14:41 -0400, Josef Bacik wrote:
> Yup I can tell you why, on 4.11 where I originally did this work
> __blk_mq_requeue_request() did this
> 
> static void __blk_mq_requeue_request(struct request *rq)
> {
> struct request_queue *q = rq->q;
> 
> trace_block_rq_requeue(q, rq);
> rq_qos_requeue(q, &rq->issue_stat);
> blk_mq_sched_requeue_request(rq);
> 
> if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
> if (q->dma_drain_size && blk_rq_bytes(rq))
> rq->nr_phys_segments--;
> }
> }
> 
> So it was clearing the started part when it did the requeue.  If that's not 
> what
> I'm supposed to be doing anymore then I can send a patch to fix it.  What is
> supposed to be done if I did already do blk_mq_start_request, because I can
> avoid doing the start until after that chunk of code, but there's a part 
> further
> down that needs to have start done before we reach it, so I'll have to do
> whatever the special thing is now there.  Thanks,

Hello Josef,

After having had a closer look I think calling blk_mq_start_request() before
blk_mq_requeue_request() is fine anyway. The v4.16 block layer core namely 
defers
timeout processing until after .queue_rq() has returned. So the timeout code
shouldn't see the requests for which both blk_mq_start_request() and
blk_mq_requeue_request() are called from inside .queue_rq(). I will make sure 
this
behavior is preserved.

Bart.





Re: [PATCH 2/2] nbd: don't start req until after the dead connection logic

2018-05-17 Thread Josef Bacik
On Thu, May 17, 2018 at 06:21:40PM +, Bart Van Assche wrote:
> On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
> > +   blk_mq_start_request(req);
> > if (unlikely(nsock->pending && nsock->pending != req)) {
> > blk_mq_requeue_request(req, true);
> > ret = 0;
> 
> (replying to an e-mail from seven months ago)
> 
> Hello Josef,
> 
> Are you aware that the nbd driver is one of the very few block drivers that
> calls blk_mq_requeue_request() after a request has been started? I think that
> can lead to the block layer core to undesired behavior, e.g. that the timeout
> handler fires concurrently with a request being reinstered. Can you or a
> colleague have a look at this? I would like to add the following code to the
> block layer core and I think that the nbd driver would trigger this warning:
> 
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
>  {
> +   WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
> +
> __blk_mq_requeue_request(rq);
> 

Yup I can tell you why, on 4.11 where I originally did this work
__blk_mq_requeue_request() did this

static void __blk_mq_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;

trace_block_rq_requeue(q, rq);
rq_qos_requeue(q, &rq->issue_stat);
blk_mq_sched_requeue_request(rq);

if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
if (q->dma_drain_size && blk_rq_bytes(rq))
rq->nr_phys_segments--;
}
}

So it was clearing the started part when it did the requeue.  If that's not what
I'm supposed to be doing anymore then I can send a patch to fix it.  What is
supposed to be done if I did already do blk_mq_start_request, because I can
avoid doing the start until after that chunk of code, but there's a part further
down that needs to have start done before we reach it, so I'll have to do
whatever the special thing is now there.  Thanks,

Josef


Re: [PATCH 2/2] nbd: don't start req until after the dead connection logic

2018-05-17 Thread Bart Van Assche
On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
> + blk_mq_start_request(req);
>   if (unlikely(nsock->pending && nsock->pending != req)) {
>   blk_mq_requeue_request(req, true);
>   ret = 0;

(replying to an e-mail from seven months ago)

Hello Josef,

Are you aware that the nbd driver is one of the very few block drivers that
calls blk_mq_requeue_request() after a request has been started? I think that
can lead to the block layer core to undesired behavior, e.g. that the timeout
handler fires concurrently with a request being reinstered. Can you or a
colleague have a look at this? I would like to add the following code to the
block layer core and I think that the nbd driver would trigger this warning:

 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
+   WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
+
__blk_mq_requeue_request(rq);

Thanks,

Bart.