Re: [dm-devel] Questions around multipath failover and no_path_retry

2018-03-19 Thread Martin Wilck
On Sun, 2018-03-11 at 16:47 +, Karan Vohra wrote:
> Hi Folks,
> 
> Let us assume, there are 2 paths within the path group which dm-
> multipath is sending the I/Os in round-robin fashion. Each of these
> paths are identified as unique block device(s) such as /dev/sdb and
> /dev/sdc. 
> 
> Let us say some I/Os are sent over to the path /dev/sdb and either
> the requests time out or there is a failure on that path, what
> happens to those I/Os? Are they sent over to the other path -
> /dev/sdc or does dm-multipath waits for /dev/sdb to come back online
> and only sends I/O to /dev/sdb?

The I/O is sent to otherr paths (sdc in your example) when the lower
layer (e.g. SCSI) indicates path failure for sdb. That's the very point
of multipathing.

>  One of the reasons we are concerned about the above scenario is- let
> us say there is a write I/O W1 which is routed to /dev/sdb and then
> there is a failure. There was a write I/O W2 which wrote at the same
> block via /dev/sdc. Now if multipath sends W1 through /dev/sdc, W2
> gets overwritten by W1. The expectation was that W2 happens after W1
> and should overwrite W1 but the result is opposite. 

If you send two write IOs to the same sector at the same time, you
can't be sure which one arrives first. That's not specific to
multipath. If you want to guarantee ordering, you have to flush W1 
using e.g. fdatasync() before sending W2. The flush command won't
return before W1 is written to disk.

> Situations like these can cause data inconsistency and corruption.We
> were thinking of using no_path_retry configuration to be set to queue
> to make sure that the I/Os supposed to be going to path1 never make
> it to path2.

That won't work. As the name of the option suggeests, "no_path_retry"
only affects the behavior if there's _no_ healthy path left.

>  But the question is that would not that cause unexpected behavior in
> application layer? Let us say there are I/O Requests R1, R2, R3 and
> so on.. R1 is going to Path1, R2 is going to Path2 and so on. If
> Path1 dies for some reason, with the setting of no_path_retry to
> queue, queueing will not stop until the path is fixed so does not
> that mean that R1, R3,R5 ... will not make it to block device until
> the path is fixed? Would it not cause failures if the issue persists
> for seconds? 

As I said, that's not how it works. R1->P1, R2->P2, ... only holds as
long as all paths are up (and no IO scheduler is active which might re-
order your I/O requests).

> What about the size of queue? Is there any danger of queue getting
> overloaded?Any pointers or references would be of great help.

Theoretically, the queue is only limited by memory size.

Martin

> 
> Thanks!
> Karan
> 
> 
> Get Outlook for Android
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] Questions around multipath - failover and no_path_retry

2018-03-12 Thread Karan Vohra
Hi Folks,

Let us assume, there are 2 paths within the path group which dm-multipath is 
sending the I/Os in round-robin fashion. Each of these paths are identified as 
unique block device(s) such as /dev/sdb and /dev/sdc.


  1.  Let us say some I/Os are sent over to the path /dev/sdb and either the 
requests time out or there is a failure on that path, what happens to those 
I/Os? Are they sent over to the other path - /dev/sdc or does dm-multipath 
waits for /dev/sdb to come back online and only sends I/O to /dev/sdb? One of 
the reasons we are concerned about the above scenario is- let us say there is a 
write I/O W1 which is routed to /dev/sdb and then there is a failure. There was 
a write I/O W2 which wrote at the same block via /dev/sdc. Now if multipath 
sends W1 through /dev/sdc, W2 gets overwritten by W1. The expectation was that 
W2 happens after W1 and should overwrite W1 but the result is opposite. 
Situations like these can cause data inconsistency and corruption.
  2.  We were thinking of using no_path_retry configuration to be set to queue 
to make sure that the I/Os supposed to be going to path1 never make it to 
path2. But the question is that would not that cause unexpected behavior in 
application layer? Let us say there are I/O Requests R1, R2, R3 and so on.. R1 
is going to Path1, R2 is going to Path2 and so on. If Path1 dies for some 
reason, with the setting of no_path_retry to queue, queueing will not stop 
until the path is fixed so does not that mean that R1, R3, R5 ... will not make 
it to block device until the path is fixed? Would it not cause failures if the 
issue persists for seconds? What about the size of queue? Is there any danger 
of queue getting overloaded?

Any pointers or references would be of great help.

Thanks!
Karan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] Questions around multipath failover and no_path_retry

2018-03-11 Thread Karan Vohra
Hi Folks,

Let us assume, there are 2 paths within the path group which dm-multipath is 
sending the I/Os in round-robin fashion. Each of these paths are identified as 
unique block device(s) such as /dev/sdb and /dev/sdc.

Let us say some I/Os are sent over to the path /dev/sdb and either the requests 
time out or there is a failure on that path, what happens to those I/Os? Are 
they sent over to the other path - /dev/sdc or does dm-multipath waits for 
/dev/sdb to come back online and only sends I/O to /dev/sdb? One of the reasons 
we are concerned about the above scenario is- let us say there is a write I/O 
W1 which is routed to /dev/sdb and then there is a failure. There was a write 
I/O W2 which wrote at the same block via /dev/sdc. Now if multipath sends W1 
through /dev/sdc, W2 gets overwritten by W1. The expectation was that W2 
happens after W1 and should overwrite W1 but the result is opposite. Situations 
like these can cause data inconsistency and corruption.We were thinking of 
using no_path_retry configuration to be set to queue to make sure that the I/Os 
supposed to be going to path1 never make it to path2. But the question is that 
would not that cause unexpected behavior in application layer? Let us say there 
are I/O Requests R1, R2, R3 and so on.. R1 is going to Path1, R2 is going to 
Path2 and so on. If Path1 dies for some reason, with the setting of 
no_path_retry to queue, queueing will not stop until the path is fixed so does 
not that mean that R1, R3,R5 ... will not make it to block device until the 
path is fixed? Would it not cause failures if the issue persists for seconds? 
What about the size of queue? Is there any danger of queue getting 
overloaded?Any pointers or references would be of great help.

Thanks!
Karan


Get Outlook for Android

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel