Re: iscsi HA
Generally speaking, you want to avoid using things like cluster vips that float between iscsi portals. A more elegant solution is to use clustered storage management eg. RedHat CLVM and having the initiator log into both nodes and then create multipath maps. The key here is on the iSCSI target you need to access your storage with synchronous direct IO so that as IO is issued over the wire should an iscsi portal go down you dont lose data. This situation scales quite well and allows you to run active/active. We implemented this solution using stgt/open-iscsi and it works really well if you can excuse the unexplained poor read performance of open-iscsi. We were able to write to our storage at around 900MB/s over 10gbit Ethernet but only read at around 300MB/s. -- Greg Procunier, RHCSA, RHCE UNIX Administrator III - Enterprise Servers and Storage 1 Robert Speck Parkway, Suite 400, Mississauga, Ontario L4Z 4E7 Office: 416-673-3320 Mobile: 647-465-9752 Email: gprocun...@symcor.com From: joby xavier joby...@gmail.com To: Mike Christie micha...@cs.wisc.edu Cc: open-iscsi@googlegroups.com Date: 04/18/2012 12:18 AM Subject:Re: iscsi HA Sent by:open-iscsi@googlegroups.com Mike, We really appreciate your help on this issue. We will definitely contact sheepdog team and will let you know the results Many Thanks, Joby Xavier On Tue, Apr 17, 2012 at 10:26 PM, Mike Christie micha...@cs.wisc.edu wrote: On 04/16/2012 10:44 PM, Mike Christie wrote: On 04/16/2012 12:42 AM, joby xavier wrote: sorry for the delayed response... here is my /var/log/messages when Virtual IP points to other server when a failover happens Could you send all of the /var/log/messages? The log seems to be missing the iscsid output, but it looks like the initiator detects the failover, we drop the connection then relogin. When we relogin though, the target is just failing IO with that MEDIUM_ERROR or it is just dropping IO (we see the 1021 errors which mean a IO timedout and we had to run the scsi error handler). I think you need to contact the sheepdog developers or the people that made your target to make sure your config is supported, because it looks like on the initiator side there is not anything more we can do. The device is just failing IO we send it. You need to ask the target people why it is doing that. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. CONFIDENTIALITY WARNING This communication, including any attachments, is for the exclusive use of addressee and may contain proprietary and/or confidential information. If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies. AVERTISSEMENT RELATIF À LA CONFIDENTIALITÉ Ce message, ainsi que les pièces qui y sont jointes, est destiné à l’usage exclusif de la personne à laquelle il s’adresse et peut contenir de l’information personnelle ou confidentielle. Si le lecteur de ce message n’en est pas le destinataire, nous l’avisons par la présente que toute diffusion, distribution, reproduction ou utilisation de son contenu est strictement interdite. Veuillez avertir sur-le-champ l’expéditeur par retour de courrier électronique et supprimez ce message ainsi que toutes les pièces jointes. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
On 04/16/2012 10:44 PM, Mike Christie wrote: On 04/16/2012 12:42 AM, joby xavier wrote: sorry for the delayed response... here is my /var/log/messages when Virtual IP points to other server when a failover happens Could you send all of the /var/log/messages? The log seems to be missing the iscsid output, but it looks like the initiator detects the failover, we drop the connection then relogin. When we relogin though, the target is just failing IO with that MEDIUM_ERROR or it is just dropping IO (we see the 1021 errors which mean a IO timedout and we had to run the scsi error handler). I think you need to contact the sheepdog developers or the people that made your target to make sure your config is supported, because it looks like on the initiator side there is not anything more we can do. The device is just failing IO we send it. You need to ask the target people why it is doing that. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
Mike, We really appreciate your help on this issue. We will definitely contact sheepdog team and will let you know the results Many Thanks, Joby Xavier On Tue, Apr 17, 2012 at 10:26 PM, Mike Christie micha...@cs.wisc.eduwrote: On 04/16/2012 10:44 PM, Mike Christie wrote: On 04/16/2012 12:42 AM, joby xavier wrote: sorry for the delayed response... here is my /var/log/messages when Virtual IP points to other server when a failover happens Could you send all of the /var/log/messages? The log seems to be missing the iscsid output, but it looks like the initiator detects the failover, we drop the connection then relogin. When we relogin though, the target is just failing IO with that MEDIUM_ERROR or it is just dropping IO (we see the 1021 errors which mean a IO timedout and we had to run the scsi error handler). I think you need to contact the sheepdog developers or the people that made your target to make sure your config is supported, because it looks like on the initiator side there is not anything more we can do. The device is just failing IO we send it. You need to ask the target people why it is doing that. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
On 04/16/2012 12:42 AM, joby xavier wrote: sorry for the delayed response... here is my /var/log/messages when Virtual IP points to other server when a failover happens Could you send all of the /var/log/messages? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
sorry for the delayed response... here is my /var/log/messages when Virtual IP points to other server when a failover happens Apr 16 10:57:14 prox1 kernel: scsi7 : iSCSI Initiator over TCP/IP Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] 2252800 512-byte logical blocks: (1.15 GB/1.07 GiB) Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write Protect is off Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 16 10:57:14 prox1 kernel: sdc: unknown partition table Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Attached SCSI disk Apr 16 10:59:47 prox1 kernel: connection2:0: detected conn error (1020) Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current] Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current] Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current] Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00 Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current] Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current] this pattern is continuing... root@prox1:~# pvdisplay /dev/sdc: read failed after 0 of 4096 at 0: Input/output error /dev/sdc: read failed after 0 of 4096 at 4096: Input/output error --- Physical volume --- PV Name /dev/sda2 VG Name pve PV Size 232.39 GiB / not usable 3.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 59490 Free PE 4095 Allocated PE 55395 PV UUID qr1b2t-zLXv-WhWh-ZKm2-2dKX-dmtO-BaADAw On Apr 12, 9:46 pm, Mike Christie micha...@cs.wisc.edu wrote: On 04/11/2012 09:15 PM, joby xavier wrote: I am using tgt ( https://github.com/collie/sheepdog/wiki/General-protocol-support) and open-iscsi on my Ubuntu boxes. When the failover happens do you see the iscsi initiator drop one connection and reconnect in /var/log/messages? You should see something like conn error 1011 then a msg about being reconnected in N retries. On Wed, Apr 11, 2012 at 10:33 PM, Mike Christie micha...@cs.wisc.eduwrote: On 04/11/2012 07:52 AM, joby xavier wrote: no more info on logs,same lines are repeating on var/log/messages. should i use multipathing for this? I am not sure mutlipath will help because you are getting Medium Errors. What target are you using? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
I am using tgt (https://github.com/collie/sheepdog/wiki/General- protocol-support) and open-iscsi on my Ubuntu boxes. On Apr 11, 10:03 pm, Mike Christie micha...@cs.wisc.edu wrote: On 04/11/2012 07:52 AM, joby xavier wrote: no more info on logs,same lines are repeating on var/log/messages. should i use multipathing for this? I am not sure mutlipath will help because you are getting Medium Errors. What target are you using? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
On 04/11/2012 09:15 PM, joby xavier wrote: I am using tgt ( https://github.com/collie/sheepdog/wiki/General-protocol-support) and open-iscsi on my Ubuntu boxes. When the failover happens do you see the iscsi initiator drop one connection and reconnect in /var/log/messages? You should see something like conn error 1011 then a msg about being reconnected in N retries. On Wed, Apr 11, 2012 at 10:33 PM, Mike Christie micha...@cs.wisc.eduwrote: On 04/11/2012 07:52 AM, joby xavier wrote: no more info on logs,same lines are repeating on var/log/messages. should i use multipathing for this? I am not sure mutlipath will help because you are getting Medium Errors. What target are you using? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
no more info on logs,same lines are repeating on var/log/messages. should i use multipathing for this? On Apr 11, 12:43 am, Mike Christie micha...@cs.wisc.edu wrote: On 04/10/2012 05:21 AM, joby xavier wrote: Hi, I want to set up a iscsi high availability with sheepdog distributed storage. Here is my system set up. OS-Ubuntu. Four nodes with sheepdog distributed storage and i am sharing this storage through iscsi using two nodes as well as using a virtual ip set up using ucarp.Two nodes using same iqn. And mounted the iscsi storage as lvm partition (sdc) node a node b node c node d node x is the initiator node a and b having common virtual ip because if 'node a' fails 'node b' should serve as iscsi target, both have same iqn. Problem: when a failover happens ie iscsi switching from node one to two, the iscsi disk fails on initiator 'node x' Code: root@prox1:~# pvdisplay /dev/sdc: read failed after 0 of 4096 at 0: Input/output error /dev/sdc: read failed after 0 of 4096 at 104792064: Input/output error And here is my /var/log/messages errors Code: Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00 00 03 1f 80 00 00 08 00 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Sense Key : Medium Error [current] Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00 00 03 1f f0 00 00 08 00 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code Can anyone give some idea on this? should i do anything on lvm.conf? should i use multipath-tools? is this the right procedure? IO is making it to the target/device ok, but the target/device is returning a failure. Look at the box running the target. Is there some more info in those logs? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
On 04/11/2012 07:52 AM, joby xavier wrote: no more info on logs,same lines are repeating on var/log/messages. should i use multipathing for this? I am not sure mutlipath will help because you are getting Medium Errors. What target are you using? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
I am using tgt ( https://github.com/collie/sheepdog/wiki/General-protocol-support) and open-iscsi on my Ubuntu boxes. On Wed, Apr 11, 2012 at 10:33 PM, Mike Christie micha...@cs.wisc.eduwrote: On 04/11/2012 07:52 AM, joby xavier wrote: no more info on logs,same lines are repeating on var/log/messages. should i use multipathing for this? I am not sure mutlipath will help because you are getting Medium Errors. What target are you using? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi HA
On 04/10/2012 05:21 AM, joby xavier wrote: Hi, I want to set up a iscsi high availability with sheepdog distributed storage. Here is my system set up. OS-Ubuntu. Four nodes with sheepdog distributed storage and i am sharing this storage through iscsi using two nodes as well as using a virtual ip set up using ucarp.Two nodes using same iqn. And mounted the iscsi storage as lvm partition (sdc) node a node b node c node d node x is the initiator node a and b having common virtual ip because if 'node a' fails 'node b' should serve as iscsi target, both have same iqn. Problem: when a failover happens ie iscsi switching from node one to two, the iscsi disk fails on initiator 'node x' Code: root@prox1:~# pvdisplay /dev/sdc: read failed after 0 of 4096 at 0: Input/output error /dev/sdc: read failed after 0 of 4096 at 104792064: Input/output error And here is my /var/log/messages errors Code: Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00 00 03 1f 80 00 00 08 00 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Sense Key : Medium Error [current] Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense: Unrecovered read error Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00 00 03 1f f0 00 00 08 00 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code Can anyone give some idea on this? should i do anything on lvm.conf? should i use multipath-tools? is this the right procedure? IO is making it to the target/device ok, but the target/device is returning a failure. Look at the box running the target. Is there some more info in those logs? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.