Re: iscsi HA

2012-04-23 Thread GProcunier
Generally speaking, you want to avoid using things like cluster vips that 
float between iscsi portals.  A more elegant solution is to use clustered 
storage management eg. RedHat CLVM and having the initiator log into both 
nodes and then create multipath maps.

The key here is on the iSCSI target you need to access your storage with 
synchronous direct IO so that as IO is issued over the wire should an 
iscsi portal go down you dont lose data.

This situation scales quite well and allows you to run active/active.

We implemented this solution using stgt/open-iscsi and it works really 
well if you can excuse the unexplained poor read performance of 
open-iscsi.  We were able to write to our storage at around 900MB/s over 
10gbit Ethernet but only read at around 300MB/s.

--

Greg Procunier, RHCSA, RHCE
UNIX Administrator III - Enterprise Servers and Storage
1 Robert Speck Parkway, Suite 400, Mississauga, Ontario L4Z 4E7
Office: 416-673-3320 
Mobile: 647-465-9752
Email: gprocun...@symcor.com



From:   joby xavier joby...@gmail.com
To: Mike Christie micha...@cs.wisc.edu
Cc: open-iscsi@googlegroups.com
Date:   04/18/2012 12:18 AM
Subject:Re: iscsi HA
Sent by:open-iscsi@googlegroups.com



Mike,

We really appreciate your help on this issue. We will definitely contact 
sheepdog team and will let you know the results

Many Thanks,
Joby Xavier

On Tue, Apr 17, 2012 at 10:26 PM, Mike Christie micha...@cs.wisc.edu 
wrote:
On 04/16/2012 10:44 PM, Mike Christie wrote:
 On 04/16/2012 12:42 AM, joby xavier wrote:
 sorry for the delayed response...

 here is my /var/log/messages when Virtual IP points to other server
 when a failover happens


 Could you send all of the /var/log/messages?


The log seems to be missing the iscsid output, but it looks like the
initiator detects the failover, we drop the connection then relogin.
When we relogin though, the target is just failing IO with that
MEDIUM_ERROR or it is just dropping IO (we see the 1021 errors which
mean a IO timedout and we had to run the scsi error handler).

I think you need to contact the sheepdog developers or the people that
made your target to make sure your config is supported, because it looks
like on the initiator side there is not anything more we can do. The
device is just failing IO we send it. You need to ask the target people
why it is doing that.



-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



CONFIDENTIALITY WARNING 
This communication, including any attachments, is for the exclusive use of 
addressee and may contain proprietary and/or confidential information. If you 
are not the intended recipient, any use, copying, disclosure, dissemination or 
distribution is strictly prohibited. If you are not the intended recipient, 
please notify the sender immediately by return e-mail, delete this 
communication and destroy all copies.

AVERTISSEMENT RELATIF À LA CONFIDENTIALITÉ 
Ce message, ainsi que les pièces qui y sont jointes, est destiné à l’usage 
exclusif de la personne à laquelle il s’adresse et peut contenir de 
l’information personnelle ou confidentielle. Si le lecteur de ce message n’en 
est pas le destinataire, nous l’avisons par la présente que toute diffusion, 
distribution, reproduction ou utilisation de son contenu est strictement 
interdite. Veuillez avertir sur-le-champ l’expéditeur par retour de courrier 
électronique et supprimez ce message ainsi que toutes les pièces jointes.
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-17 Thread Mike Christie
On 04/16/2012 10:44 PM, Mike Christie wrote:
 On 04/16/2012 12:42 AM, joby xavier wrote:
 sorry for the delayed response...

 here is my /var/log/messages when Virtual IP points to other server
 when a failover happens

 
 Could you send all of the /var/log/messages?
 

The log seems to be missing the iscsid output, but it looks like the
initiator detects the failover, we drop the connection then relogin.
When we relogin though, the target is just failing IO with that
MEDIUM_ERROR or it is just dropping IO (we see the 1021 errors which
mean a IO timedout and we had to run the scsi error handler).

I think you need to contact the sheepdog developers or the people that
made your target to make sure your config is supported, because it looks
like on the initiator side there is not anything more we can do. The
device is just failing IO we send it. You need to ask the target people
why it is doing that.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-17 Thread joby xavier
Mike,

We really appreciate your help on this issue. We will definitely contact
sheepdog team and will let you know the results

Many Thanks,
Joby Xavier

On Tue, Apr 17, 2012 at 10:26 PM, Mike Christie micha...@cs.wisc.eduwrote:

 On 04/16/2012 10:44 PM, Mike Christie wrote:
  On 04/16/2012 12:42 AM, joby xavier wrote:
  sorry for the delayed response...
 
  here is my /var/log/messages when Virtual IP points to other server
  when a failover happens
 
 
  Could you send all of the /var/log/messages?
 

 The log seems to be missing the iscsid output, but it looks like the
 initiator detects the failover, we drop the connection then relogin.
 When we relogin though, the target is just failing IO with that
 MEDIUM_ERROR or it is just dropping IO (we see the 1021 errors which
 mean a IO timedout and we had to run the scsi error handler).

 I think you need to contact the sheepdog developers or the people that
 made your target to make sure your config is supported, because it looks
 like on the initiator side there is not anything more we can do. The
 device is just failing IO we send it. You need to ask the target people
 why it is doing that.


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-16 Thread Mike Christie
On 04/16/2012 12:42 AM, joby xavier wrote:
 sorry for the delayed response...
 
 here is my /var/log/messages when Virtual IP points to other server
 when a failover happens
 

Could you send all of the /var/log/messages?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-15 Thread joby xavier
sorry for the delayed response...

here is my /var/log/messages when Virtual IP points to other server
when a failover happens

Apr 16 10:57:14 prox1 kernel: scsi7 : iSCSI Initiator over TCP/IP
Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:0: RAID  IET
Controller   0001 PQ: 0 ANSI: 5
Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:1: Direct-Access IET
VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] 2252800 512-byte
logical blocks: (1.15 GB/1.07 GiB)
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write Protect is off
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Apr 16 10:57:14 prox1 kernel: sdc: unknown partition table
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Attached SCSI disk


Apr 16 10:59:47 prox1 kernel: connection2:0: detected conn error
(1020)
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result:
hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium
Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense:
Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00
00 00 00 00 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result:
hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium
Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense:
Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00
00 00 00 00 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result:
hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium
Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense:
Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00
00 00 00 08 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result:
hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium
Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense:
Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00
00 00 00 00 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result:
hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium
Error [current]

this pattern is continuing...

root@prox1:~# pvdisplay
  /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdc: read failed after 0 of 4096 at 4096: Input/output error
  --- Physical volume ---
  PV Name   /dev/sda2
  VG Name   pve
  PV Size   232.39 GiB / not usable 3.00 MiB
  Allocatable   yes
  PE Size   4.00 MiB
  Total PE  59490
  Free PE   4095
  Allocated PE  55395
  PV UUID   qr1b2t-zLXv-WhWh-ZKm2-2dKX-dmtO-BaADAw



On Apr 12, 9:46 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 04/11/2012 09:15 PM, joby xavier wrote:

  I am using tgt (
 https://github.com/collie/sheepdog/wiki/General-protocol-support) and
  open-iscsi on my Ubuntu boxes.

 When the failover happens do you see the iscsi initiator drop one
 connection and reconnect in /var/log/messages? You should see something
 like conn error 1011 then a msg about being reconnected in N retries.









  On Wed, Apr 11, 2012 at 10:33 PM, Mike Christie micha...@cs.wisc.eduwrote:

  On 04/11/2012 07:52 AM, joby xavier wrote:
  no more info on logs,same lines are repeating on var/log/messages.
  should i use multipathing for this?

  I am not sure mutlipath will help because you are getting Medium Errors.
  What target are you using?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-12 Thread joby xavier
I am using tgt (https://github.com/collie/sheepdog/wiki/General-
protocol-support) and open-iscsi on my Ubuntu boxes.


On Apr 11, 10:03 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 04/11/2012 07:52 AM, joby xavier wrote:

  no more info on logs,same lines are repeating on var/log/messages.
  should i use multipathing for this?

 I am not sure mutlipath will help because you are getting Medium Errors.
 What target are you using?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-12 Thread Mike Christie
On 04/11/2012 09:15 PM, joby xavier wrote:
 I am using tgt (
 https://github.com/collie/sheepdog/wiki/General-protocol-support) and
 open-iscsi on my Ubuntu boxes.

When the failover happens do you see the iscsi initiator drop one
connection and reconnect in /var/log/messages? You should see something
like conn error 1011 then a msg about being reconnected in N retries.




 
 On Wed, Apr 11, 2012 at 10:33 PM, Mike Christie micha...@cs.wisc.eduwrote:
 
 On 04/11/2012 07:52 AM, joby xavier wrote:
 no more info on logs,same lines are repeating on var/log/messages.
 should i use multipathing for this?


 I am not sure mutlipath will help because you are getting Medium Errors.
 What target are you using?

 

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-11 Thread joby xavier
no more info on logs,same lines are repeating on var/log/messages.
should i use multipathing for this?


On Apr 11, 12:43 am, Mike Christie micha...@cs.wisc.edu wrote:
 On 04/10/2012 05:21 AM, joby xavier wrote:









  Hi,

  I want to set up a iscsi high availability with sheepdog distributed
  storage.

  Here is my system set up. OS-Ubuntu. Four nodes with sheepdog
  distributed storage and i am sharing this storage through iscsi using
  two nodes as well as using a virtual ip set up using ucarp.Two nodes
  using same iqn. And mounted the iscsi storage as lvm partition (sdc)

  node a
  node b
  node c
  node d
  node x is the initiator
  node a and b having common virtual ip because if 'node a' fails 'node
  b' should serve as iscsi target, both have same iqn.

  Problem: when a failover happens ie iscsi switching from node one to
  two, the iscsi disk fails on initiator 'node x'

  Code:
  root@prox1:~# pvdisplay
    /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
    /dev/sdc: read failed after 0 of 4096 at 104792064: Input/output
  error

  And here is my /var/log/messages errors

  Code:
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense:
  Unrecovered read error
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00
  00 03 1f 80 00 00 08 00
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Result:
  hostbyte=invalid driverbyte=DRIVER_SENSE
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Sense Key : Medium
  Error [current]
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense:
  Unrecovered read error
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense:
  Unrecovered read error
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00
  00 03 1f f0 00 00 08 00
  Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code

  Can anyone give some idea on this? should i do anything on lvm.conf?
  should i use multipath-tools? is this the right procedure?

 IO is making it to the target/device ok, but the target/device is
 returning a failure. Look at the box running the target. Is there some
 more info in those logs?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-11 Thread Mike Christie
On 04/11/2012 07:52 AM, joby xavier wrote:
 no more info on logs,same lines are repeating on var/log/messages.
 should i use multipathing for this?
 

I am not sure mutlipath will help because you are getting Medium Errors.
What target are you using?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-11 Thread joby xavier
I am using tgt (
https://github.com/collie/sheepdog/wiki/General-protocol-support) and
open-iscsi on my Ubuntu boxes.

On Wed, Apr 11, 2012 at 10:33 PM, Mike Christie micha...@cs.wisc.eduwrote:

 On 04/11/2012 07:52 AM, joby xavier wrote:
  no more info on logs,same lines are repeating on var/log/messages.
  should i use multipathing for this?
 

 I am not sure mutlipath will help because you are getting Medium Errors.
 What target are you using?


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi HA

2012-04-10 Thread Mike Christie
On 04/10/2012 05:21 AM, joby xavier wrote:
 Hi,
 
 I want to set up a iscsi high availability with sheepdog distributed
 storage.
 
 Here is my system set up. OS-Ubuntu. Four nodes with sheepdog
 distributed storage and i am sharing this storage through iscsi using
 two nodes as well as using a virtual ip set up using ucarp.Two nodes
 using same iqn. And mounted the iscsi storage as lvm partition (sdc)
 
 node a
 node b
 node c
 node d
 node x is the initiator
 node a and b having common virtual ip because if 'node a' fails 'node
 b' should serve as iscsi target, both have same iqn.
 
 
 Problem: when a failover happens ie iscsi switching from node one to
 two, the iscsi disk fails on initiator 'node x'
 
 Code:
 root@prox1:~# pvdisplay
   /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
   /dev/sdc: read failed after 0 of 4096 at 104792064: Input/output
 error
 
 And here is my /var/log/messages errors
 
 
 Code:
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense:
 Unrecovered read error
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00
 00 03 1f 80 00 00 08 00
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Result:
 hostbyte=invalid driverbyte=DRIVER_SENSE
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Sense Key : Medium
 Error [current]
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense:
 Unrecovered read error
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Add. Sense:
 Unrecovered read error
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] CDB: Read(10): 28 00
 00 03 1f f0 00 00 08 00
 Apr 10 13:08:39 prox1 kernel: sd 30:0:0:1: [sdc] Unhandled sense code
 
 Can anyone give some idea on this? should i do anything on lvm.conf?
 should i use multipath-tools? is this the right procedure?
 

IO is making it to the target/device ok, but the target/device is
returning a failure. Look at the box running the target. Is there some
more info in those logs?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.