On Wed, 2018-02-21 at 11:21 -0800, t...@kernel.org wrote:
> Hello, Bart.
>
> On Wed, Feb 21, 2018 at 06:53:05PM +, Bart Van Assche wrote:
> > On Sun, 2018-02-18 at 05:11 -0800, t...@kernel.org wrote:
> > > On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote:
> > > > With this
On Sun, 2018-02-18 at 05:11 -0800, t...@kernel.org wrote:
> On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote:
> > With this patch applied the tests I ran so far pass.
>
> Ah, great to hear. Thanks a lot for testing. Can you please verify
> the following? It's the same approach
Hello, Bart.
On Wed, Feb 21, 2018 at 06:53:05PM +, Bart Van Assche wrote:
> On Sun, 2018-02-18 at 05:11 -0800, t...@kernel.org wrote:
> > On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote:
> > > With this patch applied the tests I ran so far pass.
> >
> > Ah, great to hear.
Hello, Bart.
On Wed, Feb 14, 2018 at 04:58:56PM +, Bart Van Assche wrote:
> With this patch applied the tests I ran so far pass.
Ah, great to hear. Thanks a lot for testing. Can you please verify
the following? It's the same approach but with RCU sync batching.
Thanks.
Index:
On Tue, 2018-02-13 at 13:20 -0800, t...@kernel.org wrote:
> On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote:
> > The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23.
> > The
> > instruction at that address tries to dereference scsi_cmnd.device (%rax).
> >
Hello, Bart.
Sorry about the delay.
On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote:
> The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The
> instruction at that address tries to dereference scsi_cmnd.device (%rax). The
> register dump shows that that
On Thu, Feb 08, 2018 at 05:48:32PM +, Bart Van Assche wrote:
> On Thu, 2018-02-08 at 09:40 -0800, t...@kernel.org wrote:
> > Heh, sorry about not being clear. What I'm trying to say is that
> > scmd->device != NULL && device->host == NULL. Or was this what you
> > were saying all along?
>
>
On Thu, 2018-02-08 at 09:40 -0800, t...@kernel.org wrote:
> Heh, sorry about not being clear. What I'm trying to say is that
> scmd->device != NULL && device->host == NULL. Or was this what you
> were saying all along?
What I agree with is that the request pointer (req argument) is stored in
On Thu, Feb 08, 2018 at 05:37:46PM +, Bart Van Assche wrote:
> On Thu, 2018-02-08 at 09:19 -0800, t...@kernel.org wrote:
> > Hello, Bart.
> >
> > On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote:
> > > I think "dereferencing a pointer" means reading the memory location that
>
On Thu, 2018-02-08 at 09:19 -0800, t...@kernel.org wrote:
> Hello, Bart.
>
> On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote:
> > I think "dereferencing a pointer" means reading the memory location that
> > pointer points
> > at? Anyway, I think we both interpret the crash report
Hello, Bart.
On Thu, Feb 08, 2018 at 05:10:45PM +, Bart Van Assche wrote:
> I think "dereferencing a pointer" means reading the memory location that
> pointer points
> at? Anyway, I think we both interpret the crash report in the same way,
> namely that it
> means that scmd->device == NULL.
On Thu, 2018-02-08 at 09:00 -0800, t...@kernel.org wrote:
> On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote:
> > The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23.
> > The
> > instruction at that address tries to dereference scsi_cmnd.device (%rax).
> >
Hello, Bart.
On Thu, Feb 08, 2018 at 04:31:43PM +, Bart Van Assche wrote:
> > That sounds more like a scsi hotplug bug than an issue in the timeout
> > code unless we messed up @req pointer to begin with.
>
> I don't think that this is related to SCSI hotplugging: this crash does not
> occur
On Thu, 2018-02-08 at 07:39 -0800, t...@kernel.org wrote:
> On Thu, Feb 08, 2018 at 01:09:57AM +, Bart Van Assche wrote:
> > On Wed, 2018-02-07 at 23:48 +, Bart Van Assche wrote:
> > > With this patch applied I see requests for which it seems like the
> > > timeout handler
> > > did not
On Thu, Feb 08, 2018 at 07:39:40AM -0800, t...@kernel.org wrote:
> That sounds more like a scsi hotplug but than an issue in the timeout
^bug
> code unless we messed up @req pointer to begin with.
--
tejun
On Thu, Feb 08, 2018 at 01:09:57AM +, Bart Van Assche wrote:
> On Wed, 2018-02-07 at 23:48 +, Bart Van Assche wrote:
> > With this patch applied I see requests for which it seems like the timeout
> > handler
> > did not get invoked: [ ... ]
>
> I just noticed the following in the system
On Wed, 2018-02-07 at 23:48 +, Bart Van Assche wrote:
> With this patch applied I see requests for which it seems like the timeout
> handler
> did not get invoked: [ ... ]
I just noticed the following in the system log, which is probably the reason
why some
requests got stuck:
Feb 7
On Wed, 2018-02-07 at 12:07 -0800, t...@kernel.org wrote:
> Ah, you're right. u64_stat_sync doesn't imply barriers, so we want
> something like the following.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102..d6edf3b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@
Hello,
On Wed, Feb 07, 2018 at 09:02:21PM +, Bart Van Assche wrote:
> The patch that I used in my test had an smp_wmb() call (see also below).
> Anyway,
> I will see whether I can extract more state information through debugfs.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index
On Wed, 2018-02-07 at 12:09 -0800, t...@kernel.org wrote:
> Hello,
>
> On Wed, Feb 07, 2018 at 07:03:56PM +, Bart Van Assche wrote:
> > I tried the above patch but already during the first iteration of the test I
> > noticed that the test hung, probably due to the following request that got
Hello,
On Wed, Feb 07, 2018 at 07:03:56PM +, Bart Van Assche wrote:
> I tried the above patch but already during the first iteration of the test I
> noticed that the test hung, probably due to the following request that got
> stuck:
>
> $ (cd /sys/kernel/debug/block && grep -aH .
Hello, Bart.
On Wed, Feb 07, 2018 at 06:14:13PM +, Bart Van Assche wrote:
> When I wrote my comment I was not sure whether or not non-reentrancy is
> guaranteed for work queue items. However, according to what I found in the
> workqueue implementation I think that is guaranteed. So it
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote:
> Can you see whether by any chance the following patch fixes the issue?
> If not, can you share the repro case?
>
> Thanks.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102..651d18c 100644
> --- a/block/blk-mq.c
> +++
On Wed, 2018-02-07 at 09:35 -0800, t...@kernel.org wrote:
> On Wed, Feb 07, 2018 at 05:27:10PM +, Bart Van Assche wrote:
> > Even with the above change I think that there is still a race between the
> > code that handles timer resets and the completion handler.
>
> Can you elaborate the
Hello, Bart.
On Wed, Feb 07, 2018 at 05:27:10PM +, Bart Van Assche wrote:
> Even with the above change I think that there is still a race between the
> code that handles timer resets and the completion handler. Anyway, the test
Can you elaborate the scenario a bit further? If you're
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote:
> On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote:
> > The following race can occur between the code that resets the timer
> > and completion handling:
> > - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.
> > - A
Hello, Bart.
On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote:
> The following race can occur between the code that resets the timer
> and completion handling:
> - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.
> - A completion occurs and blk_mq_complete_request()
The following race can occur between the code that resets the timer
and completion handling:
- The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.
- A completion occurs and blk_mq_complete_request() calls
__blk_mq_complete_request().
- The timeout code calls blk_add_timer() and that
28 matches
Mail list logo