Re: Disk Disappears From System When open-iscsi Re-connects To COMSTAR HA Target

2010-05-21 Thread Preston Connors
Runnig udevadm trigger (udevadm version 147) does not repopulate the /
dev disk entries. During the fail over IO is being sent to the disk
and there are IO errors relating to this disk during this time. I will
report back when this development environment is accessible again with
more specific details and error messages and try dd if=/dev/sdd of=/
dev/null on the disk while fail over is occurring as well.

On May 5, 11:51 am, Mike Christie micha...@cs.wisc.edu wrote:
 On 04/30/2010 02:07 PM, Preston Connors wrote:

  FDISK OUTPUT AFTER FAIL OVER:
  r...@kvm-host-3:~# fdisk -l /dev/sdd
  no output

 It is weird that the /dev/sdd link is now gone, but iscsiadm can see the
 disk below. iscsiadm looks in /sys/block. It does not look at the /dev dir.

 The iscsi layer does not really do anything wrt /dev population except
 transport requests. I wonder what is removing the /dev links during this
 time. It is not the iscsi layer or tools (we do not touch that stuff).
 Are you using udev? Is it possible to rerun udev to create the /dev
 links? Do you see anything in /dev/disk?

 While the failover is occurring are you running IO through the FS? Do
 you see IO errors in /var/log/messages during this time?

 After the failover, if you try to send IO to the FS what IO errors do
 you see in /var/log/messages?

 If you just do IO directly to the disk, before and after the failover
 (so just leave a dd if=/dev/sdd of=/dev/null running during the test) do
 you see any IO errors in /var/log/messages?



  ISCSIADM OUTPUT AFTER FAIL OVER:
  r...@host:~# iscsiadm -m session -P 3
  iSCSI Transport Class version 2.0-870
  iscsiadm version 2.0-870

  Target: iqn.1986-03.com.sun:mirror:iscsi-failover-test
     Current Portal: 192.168.1.1:3260,2
     Persistent Portal: 192.168.1.1:3260,2
             **
             Interface:
             **
             Iface Name: default
             Iface Transport: tcp
             Iface Initiatorname: iqn.1993-08.org.debian:01:bb82d0f5e87f
             Iface IPaddress: 192.168.1.2
             Iface HWaddress: default
             Iface Netdev: default
             SID: 3
             iSCSI Connection State: LOGGED IN
             iSCSI Session State: LOGGED_IN
             Internal iscsid Session State: NO CHANGE
             
             Negotiated iSCSI params:
             
             HeaderDigest: None
             DataDigest: None
             MaxRecvDataSegmentLength: 131072
             MaxXmitDataSegmentLength: 32768
             FirstBurstLength: 65536
             MaxBurstLength: 524288
             ImmediateData: Yes
             InitialR2T: Yes
             MaxOutstandingR2T: 1
             
             Attached SCSI devices:
             
             Host Number: 6  State: running
             scsi6 Channel 00 Id 0 Lun: 0
                     Attached scsi disk sdd          State: running

 --
 You received this message because you are subscribed to the Google Groups 
 open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group 
 athttp://groups.google.com/group/open-iscsi?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsid: Kernel reported iSCSI connection 1:0 error (1020) state (3)

2010-05-21 Thread 立凡 王
Hello, Mike
Thank you for help me a lot.

The timeout is 30.
# cat /sys/block/sdb/device/timeout
30

I'm afraid that problem is the storage self.
So I called the HP customer service for asking the normal performace
of the storage.
The HP customer service show me that.
http://people.chu.edu.tw/~b8902110/temp/Outlook.jpg

The differences are our storage use SATA disk, 100M ethernat and just
single controll module.
(100M is used for testing)
The performace should be 10 to 20 MB/sec but the real performace on
Fedora is terrible as I posted(1 MB/sec).
So I tried it on Win 7 to make sure the storage is fine.
I copied an iso file which is 3 GB to iscsi disk.
The average speed is 11 MB/sec and that is just in the range 10 to 20.
So the storage is fine.

On 5月19日, 下午7時52分, Mike Christie micha...@cs.wisc.edu wrote:
 On 05/17/2010 07:39 PM, 立凡 王 wrote:

  The version of the kernel is '2.6.32.12-114.fc12.i686.PAE'.
  The log file after debugging turned on is -  
  http://people.chu.edu.tw/~b8902110/temp/messages0517

  There are lots of below messages in the log.
  kernel: session1: iscsi_eh_cmd_timed_out scsi cmd f5eb8480 timedout
  kernel: session1: iscsi_eh_cmd_timed_out return timer reset

  And I didn't see messages like
  aborting sc 0x1234432

 Ah, I guess that is good news and bad news. It is good news we are
 avoiding starting the scsi eh when we do not have to. However it is bad
 news because the iscsi_eh_cmd_timed_out messages means it is taking the
 target a long time to process the IO.

 What is the value of your scsi cmd timeout?

 cat /sys/block/sdX/device/timeout

 And this is connected to that HP target, right?

 --
 You received this message because you are subscribed to the Google Groups 
 open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group 
 athttp://groups.google.com/group/open-iscsi?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Open-iscsi with bnx2i driver, current status ?

2010-05-21 Thread MrJacK
Hi all,

With the latest release of open-iscsi, is there a way to use the
offload capabilities of Broadcom cards like NetXtreme II BCM5709
(bnx2i driver) ?

For me, the only result i get on a 2.6.32 kernel for example is the
following :

kernel: bnx2i [01:00.01]: ISCSI_INIT passed
kernel: bnx2i [02:00.00]: ISCSI_INIT passed
kernel: bnx2i [02:00.01]: ISCSI_INIT passed
kernel: bnx2i [01:00.00]: ISCSI_INIT passed
iscsid: Received iferror -19: No such device.
iscsid: cannot make a connection to 192.168.131.101:3260 (-19,2)
iscsid: Received iferror -19: No such device.
iscsid: cannot make a connection to 192.168.130.102:3260 (-19,2)
iscsid: Received iferror -19: No such device.
iscsid: cannot make a connection to 192.168.131.102:3260 (-19,2)
iscsid: Received iferror -19: No such device.
iscsid: cannot make a connection to 192.168.130.101:3260 (-19,2)
kernel: bnx2i [01:00.00]: ISCSI_INIT passed
iscsid: Received iferror -19: No such device.

Thanks for your help !

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Open-iscsi with bnx2i driver, current status ?

2010-05-21 Thread Mike Christie

On 05/20/2010 04:13 AM, MrJacK wrote:

Hi all,

With the latest release of open-iscsi, is there a way to use the
offload capabilities of Broadcom cards like NetXtreme II BCM5709
(bnx2i driver) ?



It is not supported in the upstream releases of open-iscsi, because 
Broadcom is slacking :) Ccd Broadcom userspace maintainer to shame him :)


You have to use a distro version like from SUSE or Red Hat and maybe 
ubuntu, which stupidly (I only say stupidly to shame myself because I 
added it for Red Hat) added the non upstream bits.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsid: Kernel reported iSCSI connection 1:0 error (1020) state (3)

2010-05-21 Thread Mike Christie

On 05/13/2010 03:46 AM, 立凡 王 wrote:

# hdparm -tT /dev/sdb

/dev/sdb:
  Timing cached reads:   6894 MB in  2.00 seconds = 3450.44 MB/sec
  Timing buffered disk reads:   30 MB in  3.14 seconds =   9.55 MB/sec
(sdb is iscsi disk, the speed(9.55 MB/sec) is limited by network
speed(100 Mb/sec).)

#  dd if=/dev/zero of=/test.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 11.8259 s, 90.8 MB/s
(on local disk)

# dd if=/dev/zero of=/backup2/test.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1152.39 s, 932 kB/s
(on iscsi disk)

# dd if=/dev/zero of=/backup2/test.img bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 480.972 s, 1.1 MB/s



Hey so with Win7 you get around 10 MB/s and when you do reads to the 
iscsi disk /dev/sdb directly you got 9.5 MB/s, but when writing to the 
disk you get around 1 MB/s, right?


For your dd test could you do

// THIS WILL ERASE ANYTHING ON THAT DISK SO DO NOT
// DO IT ON A DISK THAT IS USED IN PRODUCTION!
dd if=/dev/zero of=/dev/sdb bs=1M count=1024

dd if=/dev/sdb of=/dev/null bs=1M count=1024

and

dd if=/dev/sdb of=/dev/null bs=1M count=1024 iflag=direct


And what FS are you using?

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



iscsi timeouts and connection errors

2010-05-21 Thread Taylor
We have a SLES 11 server connected to an Equallogic 10 Gig disk array.

Initially everything seemed to work just fine.  When doing some IO
testing against the mounted volumes, in which we cause very high IO
loads, i.e. 95 to 100% as reported by iostat, we started seeing the
following messages in /var/log/messages:

May 21 14:26:20 hostA kernel:  connection27:0: ping timeout of 5 secs
expired, last rx 4426024362, last ping 4426025612, now 4426026862
May 21 14:26:20 hostA kernel:  connection27:0: detected conn error
(1011)
May 21 14:26:21 hostA iscsid: Kernel reported iSCSI connection 27:0
error (1011) state (3)
May 21 14:27:03 hostA kernel:  connection30:0: detected conn error
(1011)
May 21 14:27:26 hostA iscsid: Target requests logout within 3 seconds
for connection
May 21 14:27:26 hostA iscsid: Target dropping connection 0, reconnect
min 2 max 0
May 21 14:27:26 hostA iscsid: Kernel reported iSCSI connection 30:0
error (1011) state (4)
May 21 14:27:38 hostA kernel:  connection25:0: detected conn error
(1011)
May 21 14:27:39 hostA iscsid: Kernel reported iSCSI connection 21:0
error (1011) state (3)
May 21 14:27:39 hostA iscsid: Kernel reported iSCSI connection 25:0
error (1011) state (3)
May 21 14:28:16 hostA iscsid: connection27:0 is operational after
recovery (3 attempts)
May 21 14:28:20 hostA iscsid: connection21:0 is operational after
recovery (2 attempts)


So far we've turned on flow control on the network switches, tried
adjusting multipath.conf, turned off offload parameters on the iscsi
NICs, and adjusting iscsid.conf timeouts.

We are running a 2.6.27 kernel with open-iscsi 2.0.870-26.5.

Does anyone have any suggestions on how we can avoide these timeouts
and connection errors?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Open-iscsi with bnx2i driver, current status ?

2010-05-21 Thread Benjamin Li
Hi All,

 /me hangs head in shame =)   This is correct, in the upstream release
of the open-scsi there is no Broadcom offload support for your device
BCM5709.

Last year there was some effort gathered to merge the uIP daemon into
iscsid to order to provide an upstream solution; but, this has been
pushed back multiple times due to the lack of time and resources. This
idea is being looked at.

For now like Mike said,  you will have to use a distro's iscsi
initiator's utility.  If you would like I can help you port some of the
bits/patches from the distro's iscsid to the upstream iscsid and provide
you a version of uIP that you could use.  =)  [Note this step hasn't
been attempted by Broadcom QA yet.]

Thanks again.

-Ben

On Fri, 2010-05-21 at 11:25 -0700, Mike Christie wrote:
 On 05/20/2010 04:13 AM, MrJacK wrote:
  Hi all,
 
  With the latest release of open-iscsi, is there a way to use the
  offload capabilities of Broadcom cards like NetXtreme II BCM5709
  (bnx2i driver) ?
 
 
 It is not supported in the upstream releases of open-iscsi, because 
 Broadcom is slacking :) Ccd Broadcom userspace maintainer to shame him :)
 
 You have to use a distro version like from SUSE or Red Hat and maybe 
 ubuntu, which stupidly (I only say stupidly to shame myself because I 
 added it for Red Hat) added the non upstream bits.
 


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi timeouts and connection errors

2010-05-21 Thread Mike Christie

On 05/21/2010 03:43 PM, Taylor wrote:

We have a SLES 11 server connected to an Equallogic 10 Gig disk array.

Initially everything seemed to work just fine.  When doing some IO
testing against the mounted volumes, in which we cause very high IO
loads, i.e. 95 to 100% as reported by iostat, we started seeing the
following messages in /var/log/messages:

May 21 14:26:20 hostA kernel:  connection27:0: ping timeout of 5 secs
expired, last rx 4426024362, last ping 4426025612, now 4426026862
May 21 14:26:20 hostA kernel:  connection27:0: detected conn error
(1011)
May 21 14:26:21 hostA iscsid: Kernel reported iSCSI connection 27:0
error (1011) state (3)
May 21 14:27:03 hostA kernel:  connection30:0: detected conn error
(1011)
May 21 14:27:26 hostA iscsid: Target requests logout within 3 seconds
for connection
May 21 14:27:26 hostA iscsid: Target dropping connection 0, reconnect
min 2 max 0
May 21 14:27:26 hostA iscsid: Kernel reported iSCSI connection 30:0
error (1011) state (4)
May 21 14:27:38 hostA kernel:  connection25:0: detected conn error
(1011)
May 21 14:27:39 hostA iscsid: Kernel reported iSCSI connection 21:0
error (1011) state (3)
May 21 14:27:39 hostA iscsid: Kernel reported iSCSI connection 25:0
error (1011) state (3)
May 21 14:28:16 hostA iscsid: connection27:0 is operational after
recovery (3 attempts)
May 21 14:28:20 hostA iscsid: connection21:0 is operational after
recovery (2 attempts)



Is there more to the log? I want to see if the target requests a logout 
first or if we get a ping timeout first.




So far we've turned on flow control on the network switches, tried
adjusting multipath.conf, turned off offload parameters on the iscsi
NICs, and adjusting iscsid.conf timeouts.


For the ping timeouts you can set node.conn[0].timeo.noop_out_interval 
and node.conn[0].timeo.noop_out_timeout to 0. If you are using 
dm-multipath though you might want them on, but maybe a little longer.


Also on the target side you can turn off their load balancing which 
would remove the target logout request related disruptions, but that of 
course messes with load balancing. If you are using dm-multipath you 
probably do not need the target load balancing on though (not 100% sure 
what equalogic reccomends, but it seems like each sides algorithms could 
end up working against each other).





We are running a 2.6.27 kernel with open-iscsi 2.0.870-26.5.



Is that a SLES 2.6.27 kernel or kernel.org? If a SLES kernel then make 
sure you have the newest once, because SUSE has added fixes for when we 
thought a nop/ping timedout but really it was stuck behind a large 
transfer and that transfer was executing ok. If you are using a 2.6.27 
kernel.org kernel I would upgrade to the newest upstream one.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Centos 5.4 bnx2i iscsi woes

2010-05-21 Thread Tarun Reddy
Mike,

So is there any update to this?

Thank you,
Tarun

On Wed, May 5, 2010 at 9:42 AM, Mike Christie micha...@cs.wisc.edu wrote:

 On 04/29/2010 04:44 AM, Oliver Hookins wrote:

 I believe I'm hitting the exact same error... pity.

 Any word on the fix?


 I believe they have a fix. It is being tested now.


 --
 You received this message because you are subscribed to the Google Groups
 open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to
 open-iscsi+unsubscr...@googlegroups.comopen-iscsi%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/open-iscsi?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi timeouts and connection errors

2010-05-21 Thread Taylor
Mike, thanks for the reply.  The first line in the log i posted shows
the ping timeout.  Or were you wanting to see something else?  Let me
know what specifically you are looking for.

Yes we are using multipath, I can post multipath.conf if that helps.
I tried tweaking various settings in iscisd.conf, including increasing
node.session.cmds_max and node.session.queue_depth and in
multipath.conf, including setting features 1 queue_if_no_path and
disabling no_path_retry.

When you say turn off target load balancing,  Do you mean on the
equallogic side, i.e. a per volume setting?  Understanding you aren't
familliar with the equallogic, would that be how you would do it on a
generic device, or say a clariion?

It is a SLES stock kernel, we are looking at seeing if OpenSuse has a
fix in it, they run virtually the same open-iscsi versions, with
openSuse being slightly newer.

I'll play with the timeout settings, I think they were both set to 5
by default, I tried increasing them, but it didn't seem to have any
impact.

I think I'm close, but would appreciate further help.



-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.