Re: Open iSCSI Performance on IBM

2009-04-13 Thread Gonçalo Borges


Hi...

 Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a
 iscsi/scsi disk?


/apoio04/ is a RAID1 of two disks accessible via iscsi (in the
following tests, I changed the mount point from /apoio04/ to /iscsi04-
lun0/ but they are exactly the same).



 Could you set the IO scheduler to noop
 echo noop  /sys/block/sdX/queue/scheduler and see if that makes a difference.

I checked the definition and I have

[r...@core06 ~]# cat /sys/block/sdh/queue/scheduler
noop anticipatory deadline [cfq]

Now I've changed to

[r...@core06 ~]# cat /sys/block/sdh/queue/scheduler
[noop] anticipatory deadline cfq

and I've run the tests again. This is what I got:


[r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b1 bs=64k
count=125000
125000+0 records in
125000+0 records out
819200 bytes (8.2 GB) copied, 470.332 seconds, 17.4 MB/s

[r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b2 bs=128k
count=62500
62500+0 records in
62500+0 records out
819200 bytes (8.2 GB) copied, 470.973 seconds, 17.4 MB/s

Basically, the performance didn't increase :(


 And then also run
 iscsiadm -m session -P 3


[r...@core06 ~]# iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-724
iscsiadm version 2.0-868
Target: iqn.1992-01.com.lsi:1535.600a0b80003ad11c490ade2d
Current Portal: 10.131.2.14:3260,1
Persistent Portal: 10.131.2.14:3260,1
**
Interface:
**
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294
Iface IPaddress: 10.131.4.6
Iface HWaddress: default
Iface Netdev: default
SID: 37
iSCSI Connection State: LOGGED IN
iSCSI Session State: Unknown
Internal iscsid Session State: NO CHANGE

Negotiated iSCSI params:

HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 131072
MaxXmitDataSegmentLength: 65536
FirstBurstLength: 8192
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1

Attached SCSI devices:

Host Number: 38 State: running
scsi38 Channel 00 Id 0 Lun: 0
scsi38 Channel 00 Id 0 Lun: 1
scsi38 Channel 00 Id 0 Lun: 2
scsi38 Channel 00 Id 0 Lun: 3
scsi38 Channel 00 Id 0 Lun: 4
scsi38 Channel 00 Id 0 Lun: 5
scsi38 Channel 00 Id 0 Lun: 31
Current Portal: 10.131.2.13:3260,1
Persistent Portal: 10.131.2.13:3260,1
**
Interface:
**
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294
Iface IPaddress: 10.131.4.6
Iface HWaddress: default
Iface Netdev: default
SID: 38
iSCSI Connection State: LOGGED IN
iSCSI Session State: Unknown
Internal iscsid Session State: NO CHANGE

Negotiated iSCSI params:

HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 131072
MaxXmitDataSegmentLength: 65536
FirstBurstLength: 8192
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1

Attached SCSI devices:

Host Number: 39 State: running
scsi39 Channel 00 Id 0 Lun: 0
scsi39 Channel 00 Id 0 Lun: 1
scsi39 Channel 00 Id 0 Lun: 2
scsi39 Channel 00 Id 0 Lun: 3
scsi39 Channel 00 Id 0 Lun: 4
scsi39 Channel 00 Id 0 Lun: 5
scsi39 Channel 00 Id 0 Lun: 31
Current Portal: 10.131.2.16:3260,2
Persistent Portal: 10.131.2.16:3260,2
**
Interface:
**
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294
Iface IPaddress: 10.131.4.6
Iface HWaddress: default
Iface Netdev: default
SID: 39
iSCSI Connection State: LOGGED IN
iSCSI Session State: Unknown

Re: Open iSCSI Performance on IBM

2009-04-13 Thread jnantel

Have you made any headway with this issue? I'm having a write issue
that seems to share some similarities with yours.

On Apr 13, 8:14 am, Gonçalo Borges borges.gonc...@gmail.com wrote:
 Hi...

  Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a
  iscsi/scsi disk?

 /apoio04/ is a RAID1 of two disks accessible via iscsi (in the
 following tests, I changed the mount point from /apoio04/ to /iscsi04-
 lun0/ but they are exactly the same).



  Could you set the IO scheduler to noop
  echo noop  /sys/block/sdX/queue/scheduler and see if that makes a 
  difference.

 I checked the definition and I have

 [r...@core06 ~]# cat /sys/block/sdh/queue/scheduler
 noop anticipatory deadline [cfq]

 Now I've changed to

 [r...@core06 ~]# cat /sys/block/sdh/queue/scheduler
 [noop] anticipatory deadline cfq

 and I've run the tests again. This is what I got:

 [r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b1 bs=64k
 count=125000
 125000+0 records in
 125000+0 records out
 819200 bytes (8.2 GB) copied, 470.332 seconds, 17.4 MB/s

 [r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b2 bs=128k
 count=62500
 62500+0 records in
 62500+0 records out
 819200 bytes (8.2 GB) copied, 470.973 seconds, 17.4 MB/s

 Basically, the performance didn't increase :(

  And then also run
  iscsiadm -m session -P 3

 [r...@core06 ~]# iscsiadm -m session -P 3
 iSCSI Transport Class version 2.0-724
 iscsiadm version 2.0-868
 Target: iqn.1992-01.com.lsi:1535.600a0b80003ad11c490ade2d
         Current Portal: 10.131.2.14:3260,1
         Persistent Portal: 10.131.2.14:3260,1
                 **
                 Interface:
                 **
                 Iface Name: default
                 Iface Transport: tcp
                 Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294
                 Iface IPaddress: 10.131.4.6
                 Iface HWaddress: default
                 Iface Netdev: default
                 SID: 37
                 iSCSI Connection State: LOGGED IN
                 iSCSI Session State: Unknown
                 Internal iscsid Session State: NO CHANGE
                 
                 Negotiated iSCSI params:
                 
                 HeaderDigest: None
                 DataDigest: None
                 MaxRecvDataSegmentLength: 131072
                 MaxXmitDataSegmentLength: 65536
                 FirstBurstLength: 8192
                 MaxBurstLength: 262144
                 ImmediateData: Yes
                 InitialR2T: Yes
                 MaxOutstandingR2T: 1
                 
                 Attached SCSI devices:
                 
                 Host Number: 38 State: running
                 scsi38 Channel 00 Id 0 Lun: 0
                 scsi38 Channel 00 Id 0 Lun: 1
                 scsi38 Channel 00 Id 0 Lun: 2
                 scsi38 Channel 00 Id 0 Lun: 3
                 scsi38 Channel 00 Id 0 Lun: 4
                 scsi38 Channel 00 Id 0 Lun: 5
                 scsi38 Channel 00 Id 0 Lun: 31
         Current Portal: 10.131.2.13:3260,1
         Persistent Portal: 10.131.2.13:3260,1
                 **
                 Interface:
                 **
                 Iface Name: default
                 Iface Transport: tcp
                 Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294
                 Iface IPaddress: 10.131.4.6
                 Iface HWaddress: default
                 Iface Netdev: default
                 SID: 38
                 iSCSI Connection State: LOGGED IN
                 iSCSI Session State: Unknown
                 Internal iscsid Session State: NO CHANGE
                 
                 Negotiated iSCSI params:
                 
                 HeaderDigest: None
                 DataDigest: None
                 MaxRecvDataSegmentLength: 131072
                 MaxXmitDataSegmentLength: 65536
                 FirstBurstLength: 8192
                 MaxBurstLength: 262144
                 ImmediateData: Yes
                 InitialR2T: Yes
                 MaxOutstandingR2T: 1
                 
                 Attached SCSI devices:
                 
                 Host Number: 39 State: running
                 scsi39 Channel 00 Id 0 Lun: 0
                 scsi39 Channel 00 Id 0 Lun: 1
                 scsi39 Channel 00 Id 0 Lun: 2
                 scsi39 Channel 00 Id 0 Lun: 3
                 scsi39 Channel 00 Id 0 Lun: 4
                 scsi39 Channel 00 Id 0 Lun: 5
                 scsi39 Channel 00 Id 0 Lun: 31
         Current Portal: 10.131.2.16:3260,2
         Persistent Portal: 10.131.2.16:3260,2
                 **
                 Interface:
                 **
                 Iface Name: default
                 Iface Transport: tcp
                

Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3

2009-04-13 Thread jnantel



I am having a major issue with multipath + iscsi write performance
with anything random or any sequential write with data sizes smaller
than 4meg  (128k 64k 32k 16k 8k).  With 32k block size, I am able to
get a maximum throughput of 33meg/s write.  My performance gets cut by
a third with each smaller size, with 4k blocks giving me a whopping
4meg/s combined throughput.  Now bumping the data size up to 32meg
gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to
top it out 128meg gives me 210megabytes/sec.  My question is what
factors would limit my performance in the 4-128k range?


Some basics about my performance lab:

2 identical 1 gigabit paths (2  dual port intel pro 1000 MTs) in
separate pcie slots.

Hardware:
2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT
Cisco 3750s with 32gigabit stackwise interconnect
2 x Dell Equallogic PS5000XV arrays
1 x Dell Equallogic PS5000E arrays

Operating systems
SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3


/etc/mutipath.conf

defaults {
udev_dir/dev
polling_interval10
selectorround-robin 0
path_grouping_policymultibus
getuid_callout  /sbin/scsi_id -g -u -s /block/%n
prio_callout/bin/true
path_checkerreadsector0
features 1 queue_if_no_path
rr_min_io   10
max_fds 8192
#   rr_weight   priorities
failbackimmediate
#   no_path_retry   fail
#   user_friendly_names yes

/etc/iscsi/iscsi.conf   (non default values)

node.session.timeo.replacement_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 30
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144

discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536

Scheduler:

cat /sys/block/sdb/queue/scheduler
[noop] anticipatory deadline cfq
cat /sys/block/sdc/queue/scheduler
[noop] anticipatory deadline cfq


Command outputs:

iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-724
iscsiadm version 2.0-868
Target: iqn.2001-05.com.equallogic:0-8a0906-2c82dfd03-64c000cfe2249e37-
dc1stgdb15-sas-raid6
Current Portal: 10.1.253.13:3260,1
Persistent Portal: 10.1.253.10:3260,1
**
Interface:
**
Iface Name: ieth1
Iface Transport: tcp
Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15
Iface IPaddress: 10.1.253.148
Iface HWaddress: default
Iface Netdev: eth1
SID: 3
iSCSI Connection State: LOGGED IN
iSCSI Session State: Unknown
Internal iscsid Session State: NO CHANGE

Negotiated iSCSI params:

HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 65536
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: No
MaxOutstandingR2T: 1

Attached SCSI devices:

Host Number: 5  State: running
scsi5 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb  State: running
Current Portal: 10.1.253.12:3260,1
Persistent Portal: 10.1.253.10:3260,1
**
Interface:
**
Iface Name: ieth2
Iface Transport: tcp
Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15
Iface IPaddress: 10.1.253.48
Iface HWaddress: default
Iface Netdev: eth2
SID: 4
iSCSI Connection State: LOGGED IN
iSCSI Session State: Unknown
Internal iscsid Session State: NO CHANGE

Negotiated iSCSI params:

HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 65536
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: No
MaxOutstandingR2T: 1

Attached SCSI devices:

equallogic - load balancing and xfs

2009-04-13 Thread Matthew Kent

Can anyone suggest a timeout I might be hitting or a setting I'm
missing?

The run down:

- EqualLogic target
- CentOS 5.2 client
- xfs  lvm  iscsi

During a period of high load the EqualLogic decides to load balance:

 INFO  4/13/09  12:08:29 AM  eql3iSCSI session to target
'20.20.20.31:3260,
iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was
closed.   Load balancing request was received on the array.  

 INFO  4/13/09  12:08:31 AM  eql3iSCSI login to target
'20.20.20.32:3260,
iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72'
successful, using standard frame length.  

on the client see I get:

Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error:
return code = 0x0002

Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev
sdc, sector 113287552

Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem
(dm-10) meta-data dev dm-10 block 0x6c0a000
(xfs_trans_read_buf) error 5 buf count 4096

Apr 13 00:08:32 moo kernel: [4576852.471845]
xfs_force_shutdown(dm-10,0x1) called from line 415 of
file /builddir/build/BUILD/xfs-kmod-0.5/_kmod_build_/xfs_trans_buf.c.
Return address = 0x884420b5

Apr 13 00:08:32 moo kernel: [4576852.475055] Filesystem dm-10: I/O
Error Detected.  Shutting down filesystem: dm-10

Apr 13 00:08:32 moo kernel: [4576852.475688] Please umount the
filesystem, and rectify the problem(s)

Checkout the timestamps, sync's up quite nicely. 

The funny thing is from the logs this load balancing seems to happen
every couple of days without a peep in the logs. Then twice in the last
couple nights, during a period of high load, it seems to trigger an
instant error that makes xfs want to bail out.

Suggestions?
-- 
Matthew Kent \ SA \ bravenet.com


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3

2009-04-13 Thread Mike Christie

jnantel wrote:
 
 
 I am having a major issue with multipath + iscsi write performance
 with anything random or any sequential write with data sizes smaller
 than 4meg  (128k 64k 32k 16k 8k).  With 32k block size, I am able to
 get a maximum throughput of 33meg/s write.  My performance gets cut by
 a third with each smaller size, with 4k blocks giving me a whopping
 4meg/s combined throughput.  Now bumping the data size up to 32meg
 gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to
 top it out 128meg gives me 210megabytes/sec.  My question is what
 factors would limit my performance in the 4-128k range?

I think linux is just not so good with smaller IO sizes like 4K. I do 
not see good performance with Fibre Channel or iscsi.

64K+ should be fine, but you want to get lots of 64K+ IOs in flight. If 
you run iostat or blktrace you should see more than 1 IO in flight. If 
while the test is running if you
cat /sys/class/scsi_host/hostX/host_busy
you should also see lots of IO running.

What limits the number of IO? On the iscsi initiator side, it could be 
params like node.session.cmds_max or node.session.queue_depth. For a 
decent target like the ones you have I would increase 
node.session.cmds_max to 1024 and increase node.session.queue_depth to 128.

What IO tool are you using? Are you doing direct IO or are you doing 
file system IO? If you just use something like dd with bs=64K then you 
are not going to get lots of IO running. I think you will get 1 64K IO 
in flight, so throughput is not going to be high. If you use something 
like disktest
disktest -PT -T30 -h1 -K128 -B64k -ID /dev/sdb

you should see a lot of IOs (depends on merging).

If you were using dd with bs=128m then that IO is going to get broken 
down into lots of smaller IOs (probably around 256K), and so the pipe is 
nice and full.

Another thing I noticed in RHEL is if you increase the nice value of the 
iscsi threads it will increase write perforamnce sometimes. So for RHEL 
or Oracle do

ps -u root | grep scsi_wq

Then patch the scsi_wq_%HOST_ID with the iscsiadm -m session -P 3 Host 
Number. And then renive the thread to -20.


Also check the logs and make sure you do not see any conn error messages.

And then what do you get when running the IO test to the individual 
iscsi disks instead of the dm one? Is there any difference? You might 
want to change the rr_min_io. If you are sending smaller IOs then 
rr_min_io of 10 is probably too small. The path is not going to get lots 
of nice large IOs like you would want.



 
 
 Some basics about my performance lab:
 
 2 identical 1 gigabit paths (2  dual port intel pro 1000 MTs) in
 separate pcie slots.
 
 Hardware:
 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT
 Cisco 3750s with 32gigabit stackwise interconnect
 2 x Dell Equallogic PS5000XV arrays
 1 x Dell Equallogic PS5000E arrays
 
 Operating systems
 SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3
 
 
 /etc/mutipath.conf
 
 defaults {
 udev_dir/dev
 polling_interval10
 selectorround-robin 0
 path_grouping_policymultibus
 getuid_callout  /sbin/scsi_id -g -u -s /block/%n
 prio_callout/bin/true
 path_checkerreadsector0
 features 1 queue_if_no_path
 rr_min_io   10
 max_fds 8192
 #   rr_weight   priorities
 failbackimmediate
 #   no_path_retry   fail
 #   user_friendly_names yes
 
 /etc/iscsi/iscsi.conf   (non default values)
 
 node.session.timeo.replacement_timeout = 15
 node.conn[0].timeo.noop_out_interval = 5
 node.conn[0].timeo.noop_out_timeout = 30
 node.session.cmds_max = 128
 node.session.queue_depth = 32
 node.session.iscsi.FirstBurstLength = 262144
 node.session.iscsi.MaxBurstLength = 16776192
 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
 node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144
 
 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536
 
 Scheduler:
 
 cat /sys/block/sdb/queue/scheduler
 [noop] anticipatory deadline cfq
 cat /sys/block/sdc/queue/scheduler
 [noop] anticipatory deadline cfq
 
 
 Command outputs:
 
 iscsiadm -m session -P 3
 iSCSI Transport Class version 2.0-724
 iscsiadm version 2.0-868
 Target: iqn.2001-05.com.equallogic:0-8a0906-2c82dfd03-64c000cfe2249e37-
 dc1stgdb15-sas-raid6
 Current Portal: 10.1.253.13:3260,1
 Persistent Portal: 10.1.253.10:3260,1
 **
 Interface:
 **
 Iface Name: ieth1
 Iface Transport: tcp
 Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15
 Iface IPaddress: 10.1.253.148
 Iface HWaddress: default
 Iface Netdev: eth1
 SID: 3
 iSCSI Connection State: LOGGED IN
 

Re: equallogic - load balancing and xfs

2009-04-13 Thread Mike Christie

Matthew Kent wrote:
 Can anyone suggest a timeout I might be hitting or a setting I'm
 missing?
 
 The run down:
 
 - EqualLogic target
 - CentOS 5.2 client

You will want to upgrade that to 5.3 when you can. The iscsi code in 
there fixes a bug where the initiator dropped the session when it should 
not.

 - xfs  lvm  iscsi
 
 During a period of high load the EqualLogic decides to load balance:
 
  INFO  4/13/09  12:08:29 AM  eql3iSCSI session to target
 '20.20.20.31:3260,
 iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
 initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was
 closed.   Load balancing request was received on the array.  


So is this what you get in the EQL log when it decides to load balance 
the initiator and send us to a different portal?


 
  INFO  4/13/09  12:08:31 AM  eql3iSCSI login to target
 '20.20.20.32:3260,
 iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
 initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72'
 successful, using standard frame length.  
 
 on the client see I get:
 
 Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error:
 return code = 0x0002
 
 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev
 sdc, sector 113287552
 
 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem
 (dm-10) meta-data dev dm-10 block 0x6c0a000

Are you using dm-multipath over iscsi? Does this load balance issue 
affect all the paths at the same time? What is your multipath 
no_path_retry value? I think you might want to set that higher to avoid 
the FS from getting IO errors at this time if all paths are affected at 
the same time.

I am not sure how to config the EQL box to not load balance or load 
balance at different thresholds.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: equallogic - load balancing and xfs

2009-04-13 Thread Matthew Kent

On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote:
 Matthew Kent wrote:
  Can anyone suggest a timeout I might be hitting or a setting I'm
  missing?
  
  The run down:
  
  - EqualLogic target
  - CentOS 5.2 client
 
 You will want to upgrade that to 5.3 when you can. The iscsi code in 
 there fixes a bug where the initiator dropped the session when it should 
 not.
 

Will do, probably Wednesday night and we'll see if this goes away. I'll
be sure to follow up for the archives.

  - xfs  lvm  iscsi
  
  During a period of high load the EqualLogic decides to load balance:
  
   INFO  4/13/09  12:08:29 AM  eql3iSCSI session to target
  '20.20.20.31:3260,
  iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
  initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was
  closed.   Load balancing request was received on the array.  
 
 
 So is this what you get in the EQL log when it decides to load balance 
 the initiator and send us to a different portal?
 

Yes, a straight copy from event log in the java web interface.

 
  
   INFO  4/13/09  12:08:31 AM  eql3iSCSI login to target
  '20.20.20.32:3260,
  iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
  initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72'
  successful, using standard frame length.  
  
  on the client see I get:
  
  Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error:
  return code = 0x0002
  
  Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev
  sdc, sector 113287552
  
  Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem
  (dm-10) meta-data dev dm-10 block 0x6c0a000
 
 Are you using dm-multipath over iscsi? Does this load balance issue 
 affect all the paths at the same time? What is your multipath 
 no_path_retry value? I think you might want to set that higher to avoid 
 the FS from getting IO errors at this time if all paths are affected at 
 the same time.
 

Not using multipath on this one.

 I am not sure how to config the EQL box to not load balance or load 
 balance at different thresholds.
 

Yeah I haven't ever seen anything in the manual or gui related to
configuring the load balancing, it seems to want to just to it whenever
it wants. Though I imagine if I pulled all but one network line it would
stop ;)

Thanks for the quick reply.
-- 
Matthew Kent \ SA \ bravenet.com


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: equallogic - load balancing and xfs

2009-04-13 Thread Konrad Rzeszutek

 
 I am not sure how to config the EQL box to not load balance or load 

At the array CLI prompt type:

grpparams conn-balancing disable

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: equallogic - load balancing and xfs

2009-04-13 Thread Donald Williams
You don't want to disable connection load balancing (CLB) in the long run.
 CLB will balance out IO across the available ports as servers need IO.
 I.e. during the day your file server or SQL server will be busy.  Then at
night other servers or backups are running.   Without CLB you could end up
stacking connections onto a single interface while other ports are idle.
Upgrading to 5.3 and enabling MPIO is the best solution.

Don



On Mon, Apr 13, 2009 at 4:56 PM, Konrad Rzeszutek kon...@virtualiron.comwrote:


 
  I am not sure how to config the EQL box to not load balance or load

 At the array CLI prompt type:

 grpparams conn-balancing disable

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



RE: Open iSCSI Performance on IBM

2009-04-13 Thread Simone Morellato

Hi Mike,

Is bs=128K a linux, iscsi or IBM parameter?

Thanks,
Simone
 

 -Original Message-
 From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On
 Behalf Of Mike Christie
 Sent: Thursday, April 09, 2009 10:55 AM
 To: open-iscsi@googlegroups.com
 Subject: Re: Open iSCSI Performance on IBM
 
 
 Gonçalo Borges wrote:
  Hi All...
 
  Sorry, the following could be a little bit off topic...
 
  Does any one has an idea of what is the expected performance for a IBM
  DS 3300 system connected via open iSCSI? Using a RAID 1 with 2 disks,
  I got the following numbers:
 
 
 Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a
 iscsi/scsi disk?
 
 Could you set the IO scheduler to noop
 echo noop  /sys/block/sdX/queue/scheduler
 
 and see if that makes a difference.
 
 Also try
 
 bs=128k
 
 And then also run
 iscsiadm -m session -P 3
 
 
 
  Sequential Write:
 
   [r...@core12 ~]# dd if=/dev/zero of=/apoio04/b1 bs=64k count=125000
  125000+0 records in
  125000+0 records out
  819200 bytes (8.2 GB) copied, 454.522 seconds, 18.0 MB/s
 
  Sequential Read:
  [r...@core12 ~]# dd if=/apoio04/b1 of=/dev/null bs=64k count=125000
  125000+0 records in
  125000+0 records out
  819200 bytes (8.2 GB) copied, 94.9401 seconds, 86.3 MB/s
 
  I restricted the RAM to be only 1GB, therefore there are no cache
  effects in these numbers. Because the read stats are good, we exclude
  network bottlenecks. Nevertheless, we were expecting more or less the
  performance of a single disk (~50MB/s) for the write tests and we are
  getting less than half. I do not know if this is really the physical
  limit of the system or if there is a problem somewhere...
 
  I could not find any IBM official numbers, therefore, I though that
  someone over here could give me any hint about the numbers they are
  getting...
 
  Thanks in Advance
  Cheers
  Goncalo Borges
  
 
 
 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



RE: Open iSCSI Performance on IBM

2009-04-13 Thread Simone Morellato

Disregard I just saw it was a dd option.

Simone

 -Original Message-
 From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On
 Behalf Of Simone Morellato
 Sent: Monday, April 13, 2009 3:04 PM
 To: open-iscsi@googlegroups.com
 Subject: RE: Open iSCSI Performance on IBM
 
 
 Hi Mike,
 
 Is bs=128K a linux, iscsi or IBM parameter?
 
 Thanks,
 Simone
 
 
  -Original Message-
  From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com]
 On
  Behalf Of Mike Christie
  Sent: Thursday, April 09, 2009 10:55 AM
  To: open-iscsi@googlegroups.com
  Subject: Re: Open iSCSI Performance on IBM
 
 
  Gonçalo Borges wrote:
   Hi All...
  
   Sorry, the following could be a little bit off topic...
  
   Does any one has an idea of what is the expected performance for a IBM
   DS 3300 system connected via open iSCSI? Using a RAID 1 with 2 disks,
   I got the following numbers:
  
 
  Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a
  iscsi/scsi disk?
 
  Could you set the IO scheduler to noop
  echo noop  /sys/block/sdX/queue/scheduler
 
  and see if that makes a difference.
 
  Also try
 
  bs=128k
 
  And then also run
  iscsiadm -m session -P 3
 
 
 
   Sequential Write:
  
[r...@core12 ~]# dd if=/dev/zero of=/apoio04/b1 bs=64k count=125000
   125000+0 records in
   125000+0 records out
   819200 bytes (8.2 GB) copied, 454.522 seconds, 18.0 MB/s
  
   Sequential Read:
   [r...@core12 ~]# dd if=/apoio04/b1 of=/dev/null bs=64k count=125000
   125000+0 records in
   125000+0 records out
   819200 bytes (8.2 GB) copied, 94.9401 seconds, 86.3 MB/s
  
   I restricted the RAM to be only 1GB, therefore there are no cache
   effects in these numbers. Because the read stats are good, we exclude
   network bottlenecks. Nevertheless, we were expecting more or less the
   performance of a single disk (~50MB/s) for the write tests and we are
   getting less than half. I do not know if this is really the physical
   limit of the system or if there is a problem somewhere...
  
   I could not find any IBM official numbers, therefore, I though that
   someone over here could give me any hint about the numbers they are
   getting...
  
   Thanks in Advance
   Cheers
   Goncalo Borges
   
 
 
 
 
 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: equallogic - load balancing and xfs

2009-04-13 Thread Mike Christie

Matthew Kent wrote:
 On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote:
 Matthew Kent wrote:
 Can anyone suggest a timeout I might be hitting or a setting I'm
 missing?

 The run down:

 - EqualLogic target
 - CentOS 5.2 client
 You will want to upgrade that to 5.3 when you can. The iscsi code in 
 there fixes a bug where the initiator dropped the session when it should 
 not.

 
 Will do, probably Wednesday night and we'll see if this goes away. I'll
 be sure to follow up for the archives.
 
 - xfs  lvm  iscsi

 During a period of high load the EqualLogic decides to load balance:

  INFO  4/13/09  12:08:29 AM  eql3iSCSI session to target
 '20.20.20.31:3260,
 iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
 initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was
 closed.   Load balancing request was received on the array.  

 So is this what you get in the EQL log when it decides to load balance 
 the initiator and send us to a different portal?

 
 Yes, a straight copy from event log in the java web interface.
 
  INFO  4/13/09  12:08:31 AM  eql3iSCSI login to target
 '20.20.20.32:3260,
 iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
 initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72'
 successful, using standard frame length.  

 on the client see I get:

 Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error:
 return code = 0x0002

 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev
 sdc, sector 113287552

 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem
 (dm-10) meta-data dev dm-10 block 0x6c0a000
 Are you using dm-multipath over iscsi? Does this load balance issue 
 affect all the paths at the same time? What is your multipath 
 no_path_retry value? I think you might want to set that higher to avoid 
 the FS from getting IO errors at this time if all paths are affected at 
 the same time.

 
 Not using multipath on this one.
 

Do you have xfs on sdc or is there something like LVM or RAID on top of sdc?

That is really strange then. 0x0002 is DID_BUS_BUSY. The iscsi 
initiator layer would return this when the target does its load 
balancing. The initiator does this to ask he scsi layer to retry the IO. 
If dm-multipath was used then it is failed to the multipath layer right 
away. If dm-multipath is not used then we get 5 retries so we should not 
see the error if there was only the one rebalancing at the time. If 
there was a bunch of load rebalancing within a couple minutes then it 
makes sense.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: equallogic - load balancing and xfs

2009-04-13 Thread Matthew Kent

On Mon, 2009-04-13 at 17:28 -0500, Mike Christie wrote:
 Matthew Kent wrote:
  On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote:
  Matthew Kent wrote:
  Can anyone suggest a timeout I might be hitting or a setting I'm
  missing?
 
  The run down:
 
  - EqualLogic target
  - CentOS 5.2 client
  You will want to upgrade that to 5.3 when you can. The iscsi code in 
  there fixes a bug where the initiator dropped the session when it should 
  not.
 
  
  Will do, probably Wednesday night and we'll see if this goes away. I'll
  be sure to follow up for the archives.
  
  - xfs  lvm  iscsi
 
  During a period of high load the EqualLogic decides to load balance:
 
   INFO  4/13/09  12:08:29 AM  eql3iSCSI session to target
  '20.20.20.31:3260,
  iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
  initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was
  closed.   Load balancing request was received on the array.  
 
  So is this what you get in the EQL log when it decides to load balance 
  the initiator and send us to a different portal?
 
  
  Yes, a straight copy from event log in the java web interface.
  
   INFO  4/13/09  12:08:31 AM  eql3iSCSI login to target
  '20.20.20.32:3260,
  iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
  initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72'
  successful, using standard frame length.  
 
  on the client see I get:
 
  Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error:
  return code = 0x0002
 
  Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev
  sdc, sector 113287552
 
  Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem
  (dm-10) meta-data dev dm-10 block 0x6c0a000
  Are you using dm-multipath over iscsi? Does this load balance issue 
  affect all the paths at the same time? What is your multipath 
  no_path_retry value? I think you might want to set that higher to avoid 
  the FS from getting IO errors at this time if all paths are affected at 
  the same time.
 
  
  Not using multipath on this one.
  
 
 Do you have xfs on sdc or is there something like LVM or RAID on top of sdc?
 
 That is really strange then. 0x0002 is DID_BUS_BUSY. The iscsi 
 initiator layer would return this when the target does its load 
 balancing. The initiator does this to ask he scsi layer to retry the IO. 
 If dm-multipath was used then it is failed to the multipath layer right 
 away. If dm-multipath is not used then we get 5 retries so we should not 
 see the error if there was only the one rebalancing at the time. If 
 there was a bunch of load rebalancing within a couple minutes then it 
 makes sense.
 

Yeah xfs on top of lvm, no multipath.

Logs only show the one load balancing request around that time.

Funny thing is this system, and the load balancing etc, has been going
error free for months now, but the last couple days it's flared up right
around the time of some log rotation and heavy i/o.

We'll see what happens after the centos 5.3 upgrade. We'll also be
upgrading the firmware on all the equallogics to the latest version.
-- 
Matthew Kent \ SA \ bravenet.com


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---