Re: Open iSCSI Performance on IBM
Hi... Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a iscsi/scsi disk? /apoio04/ is a RAID1 of two disks accessible via iscsi (in the following tests, I changed the mount point from /apoio04/ to /iscsi04- lun0/ but they are exactly the same). Could you set the IO scheduler to noop echo noop /sys/block/sdX/queue/scheduler and see if that makes a difference. I checked the definition and I have [r...@core06 ~]# cat /sys/block/sdh/queue/scheduler noop anticipatory deadline [cfq] Now I've changed to [r...@core06 ~]# cat /sys/block/sdh/queue/scheduler [noop] anticipatory deadline cfq and I've run the tests again. This is what I got: [r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b1 bs=64k count=125000 125000+0 records in 125000+0 records out 819200 bytes (8.2 GB) copied, 470.332 seconds, 17.4 MB/s [r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b2 bs=128k count=62500 62500+0 records in 62500+0 records out 819200 bytes (8.2 GB) copied, 470.973 seconds, 17.4 MB/s Basically, the performance didn't increase :( And then also run iscsiadm -m session -P 3 [r...@core06 ~]# iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-724 iscsiadm version 2.0-868 Target: iqn.1992-01.com.lsi:1535.600a0b80003ad11c490ade2d Current Portal: 10.131.2.14:3260,1 Persistent Portal: 10.131.2.14:3260,1 ** Interface: ** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294 Iface IPaddress: 10.131.4.6 Iface HWaddress: default Iface Netdev: default SID: 37 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 131072 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 Attached SCSI devices: Host Number: 38 State: running scsi38 Channel 00 Id 0 Lun: 0 scsi38 Channel 00 Id 0 Lun: 1 scsi38 Channel 00 Id 0 Lun: 2 scsi38 Channel 00 Id 0 Lun: 3 scsi38 Channel 00 Id 0 Lun: 4 scsi38 Channel 00 Id 0 Lun: 5 scsi38 Channel 00 Id 0 Lun: 31 Current Portal: 10.131.2.13:3260,1 Persistent Portal: 10.131.2.13:3260,1 ** Interface: ** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294 Iface IPaddress: 10.131.4.6 Iface HWaddress: default Iface Netdev: default SID: 38 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 131072 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 Attached SCSI devices: Host Number: 39 State: running scsi39 Channel 00 Id 0 Lun: 0 scsi39 Channel 00 Id 0 Lun: 1 scsi39 Channel 00 Id 0 Lun: 2 scsi39 Channel 00 Id 0 Lun: 3 scsi39 Channel 00 Id 0 Lun: 4 scsi39 Channel 00 Id 0 Lun: 5 scsi39 Channel 00 Id 0 Lun: 31 Current Portal: 10.131.2.16:3260,2 Persistent Portal: 10.131.2.16:3260,2 ** Interface: ** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294 Iface IPaddress: 10.131.4.6 Iface HWaddress: default Iface Netdev: default SID: 39 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown
Re: Open iSCSI Performance on IBM
Have you made any headway with this issue? I'm having a write issue that seems to share some similarities with yours. On Apr 13, 8:14 am, Gonçalo Borges borges.gonc...@gmail.com wrote: Hi... Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a iscsi/scsi disk? /apoio04/ is a RAID1 of two disks accessible via iscsi (in the following tests, I changed the mount point from /apoio04/ to /iscsi04- lun0/ but they are exactly the same). Could you set the IO scheduler to noop echo noop /sys/block/sdX/queue/scheduler and see if that makes a difference. I checked the definition and I have [r...@core06 ~]# cat /sys/block/sdh/queue/scheduler noop anticipatory deadline [cfq] Now I've changed to [r...@core06 ~]# cat /sys/block/sdh/queue/scheduler [noop] anticipatory deadline cfq and I've run the tests again. This is what I got: [r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b1 bs=64k count=125000 125000+0 records in 125000+0 records out 819200 bytes (8.2 GB) copied, 470.332 seconds, 17.4 MB/s [r...@core06 ~]# dd if=/dev/zero of=/iscsi04-lun0/b2 bs=128k count=62500 62500+0 records in 62500+0 records out 819200 bytes (8.2 GB) copied, 470.973 seconds, 17.4 MB/s Basically, the performance didn't increase :( And then also run iscsiadm -m session -P 3 [r...@core06 ~]# iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-724 iscsiadm version 2.0-868 Target: iqn.1992-01.com.lsi:1535.600a0b80003ad11c490ade2d Current Portal: 10.131.2.14:3260,1 Persistent Portal: 10.131.2.14:3260,1 ** Interface: ** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294 Iface IPaddress: 10.131.4.6 Iface HWaddress: default Iface Netdev: default SID: 37 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 131072 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 Attached SCSI devices: Host Number: 38 State: running scsi38 Channel 00 Id 0 Lun: 0 scsi38 Channel 00 Id 0 Lun: 1 scsi38 Channel 00 Id 0 Lun: 2 scsi38 Channel 00 Id 0 Lun: 3 scsi38 Channel 00 Id 0 Lun: 4 scsi38 Channel 00 Id 0 Lun: 5 scsi38 Channel 00 Id 0 Lun: 31 Current Portal: 10.131.2.13:3260,1 Persistent Portal: 10.131.2.13:3260,1 ** Interface: ** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:8c56e324f294 Iface IPaddress: 10.131.4.6 Iface HWaddress: default Iface Netdev: default SID: 38 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 131072 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 Attached SCSI devices: Host Number: 39 State: running scsi39 Channel 00 Id 0 Lun: 0 scsi39 Channel 00 Id 0 Lun: 1 scsi39 Channel 00 Id 0 Lun: 2 scsi39 Channel 00 Id 0 Lun: 3 scsi39 Channel 00 Id 0 Lun: 4 scsi39 Channel 00 Id 0 Lun: 5 scsi39 Channel 00 Id 0 Lun: 31 Current Portal: 10.131.2.16:3260,2 Persistent Portal: 10.131.2.16:3260,2 ** Interface: ** Iface Name: default Iface Transport: tcp
Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
I am having a major issue with multipath + iscsi write performance with anything random or any sequential write with data sizes smaller than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to get a maximum throughput of 33meg/s write. My performance gets cut by a third with each smaller size, with 4k blocks giving me a whopping 4meg/s combined throughput. Now bumping the data size up to 32meg gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to top it out 128meg gives me 210megabytes/sec. My question is what factors would limit my performance in the 4-128k range? Some basics about my performance lab: 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in separate pcie slots. Hardware: 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT Cisco 3750s with 32gigabit stackwise interconnect 2 x Dell Equallogic PS5000XV arrays 1 x Dell Equallogic PS5000E arrays Operating systems SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 /etc/mutipath.conf defaults { udev_dir/dev polling_interval10 selectorround-robin 0 path_grouping_policymultibus getuid_callout /sbin/scsi_id -g -u -s /block/%n prio_callout/bin/true path_checkerreadsector0 features 1 queue_if_no_path rr_min_io 10 max_fds 8192 # rr_weight priorities failbackimmediate # no_path_retry fail # user_friendly_names yes /etc/iscsi/iscsi.conf (non default values) node.session.timeo.replacement_timeout = 15 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 30 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536 Scheduler: cat /sys/block/sdb/queue/scheduler [noop] anticipatory deadline cfq cat /sys/block/sdc/queue/scheduler [noop] anticipatory deadline cfq Command outputs: iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-724 iscsiadm version 2.0-868 Target: iqn.2001-05.com.equallogic:0-8a0906-2c82dfd03-64c000cfe2249e37- dc1stgdb15-sas-raid6 Current Portal: 10.1.253.13:3260,1 Persistent Portal: 10.1.253.10:3260,1 ** Interface: ** Iface Name: ieth1 Iface Transport: tcp Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15 Iface IPaddress: 10.1.253.148 Iface HWaddress: default Iface Netdev: eth1 SID: 3 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 65536 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: No MaxOutstandingR2T: 1 Attached SCSI devices: Host Number: 5 State: running scsi5 Channel 00 Id 0 Lun: 0 Attached scsi disk sdb State: running Current Portal: 10.1.253.12:3260,1 Persistent Portal: 10.1.253.10:3260,1 ** Interface: ** Iface Name: ieth2 Iface Transport: tcp Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15 Iface IPaddress: 10.1.253.48 Iface HWaddress: default Iface Netdev: eth2 SID: 4 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 65536 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: No MaxOutstandingR2T: 1 Attached SCSI devices:
equallogic - load balancing and xfs
Can anyone suggest a timeout I might be hitting or a setting I'm missing? The run down: - EqualLogic target - CentOS 5.2 client - xfs lvm iscsi During a period of high load the EqualLogic decides to load balance: INFO 4/13/09 12:08:29 AM eql3iSCSI session to target '20.20.20.31:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was closed. Load balancing request was received on the array. INFO 4/13/09 12:08:31 AM eql3iSCSI login to target '20.20.20.32:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72' successful, using standard frame length. on the client see I get: Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error: return code = 0x0002 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev sdc, sector 113287552 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem (dm-10) meta-data dev dm-10 block 0x6c0a000 (xfs_trans_read_buf) error 5 buf count 4096 Apr 13 00:08:32 moo kernel: [4576852.471845] xfs_force_shutdown(dm-10,0x1) called from line 415 of file /builddir/build/BUILD/xfs-kmod-0.5/_kmod_build_/xfs_trans_buf.c. Return address = 0x884420b5 Apr 13 00:08:32 moo kernel: [4576852.475055] Filesystem dm-10: I/O Error Detected. Shutting down filesystem: dm-10 Apr 13 00:08:32 moo kernel: [4576852.475688] Please umount the filesystem, and rectify the problem(s) Checkout the timestamps, sync's up quite nicely. The funny thing is from the logs this load balancing seems to happen every couple of days without a peep in the logs. Then twice in the last couple nights, during a period of high load, it seems to trigger an instant error that makes xfs want to bail out. Suggestions? -- Matthew Kent \ SA \ bravenet.com --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
jnantel wrote: I am having a major issue with multipath + iscsi write performance with anything random or any sequential write with data sizes smaller than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to get a maximum throughput of 33meg/s write. My performance gets cut by a third with each smaller size, with 4k blocks giving me a whopping 4meg/s combined throughput. Now bumping the data size up to 32meg gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to top it out 128meg gives me 210megabytes/sec. My question is what factors would limit my performance in the 4-128k range? I think linux is just not so good with smaller IO sizes like 4K. I do not see good performance with Fibre Channel or iscsi. 64K+ should be fine, but you want to get lots of 64K+ IOs in flight. If you run iostat or blktrace you should see more than 1 IO in flight. If while the test is running if you cat /sys/class/scsi_host/hostX/host_busy you should also see lots of IO running. What limits the number of IO? On the iscsi initiator side, it could be params like node.session.cmds_max or node.session.queue_depth. For a decent target like the ones you have I would increase node.session.cmds_max to 1024 and increase node.session.queue_depth to 128. What IO tool are you using? Are you doing direct IO or are you doing file system IO? If you just use something like dd with bs=64K then you are not going to get lots of IO running. I think you will get 1 64K IO in flight, so throughput is not going to be high. If you use something like disktest disktest -PT -T30 -h1 -K128 -B64k -ID /dev/sdb you should see a lot of IOs (depends on merging). If you were using dd with bs=128m then that IO is going to get broken down into lots of smaller IOs (probably around 256K), and so the pipe is nice and full. Another thing I noticed in RHEL is if you increase the nice value of the iscsi threads it will increase write perforamnce sometimes. So for RHEL or Oracle do ps -u root | grep scsi_wq Then patch the scsi_wq_%HOST_ID with the iscsiadm -m session -P 3 Host Number. And then renive the thread to -20. Also check the logs and make sure you do not see any conn error messages. And then what do you get when running the IO test to the individual iscsi disks instead of the dm one? Is there any difference? You might want to change the rr_min_io. If you are sending smaller IOs then rr_min_io of 10 is probably too small. The path is not going to get lots of nice large IOs like you would want. Some basics about my performance lab: 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in separate pcie slots. Hardware: 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT Cisco 3750s with 32gigabit stackwise interconnect 2 x Dell Equallogic PS5000XV arrays 1 x Dell Equallogic PS5000E arrays Operating systems SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 /etc/mutipath.conf defaults { udev_dir/dev polling_interval10 selectorround-robin 0 path_grouping_policymultibus getuid_callout /sbin/scsi_id -g -u -s /block/%n prio_callout/bin/true path_checkerreadsector0 features 1 queue_if_no_path rr_min_io 10 max_fds 8192 # rr_weight priorities failbackimmediate # no_path_retry fail # user_friendly_names yes /etc/iscsi/iscsi.conf (non default values) node.session.timeo.replacement_timeout = 15 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 30 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536 Scheduler: cat /sys/block/sdb/queue/scheduler [noop] anticipatory deadline cfq cat /sys/block/sdc/queue/scheduler [noop] anticipatory deadline cfq Command outputs: iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-724 iscsiadm version 2.0-868 Target: iqn.2001-05.com.equallogic:0-8a0906-2c82dfd03-64c000cfe2249e37- dc1stgdb15-sas-raid6 Current Portal: 10.1.253.13:3260,1 Persistent Portal: 10.1.253.10:3260,1 ** Interface: ** Iface Name: ieth1 Iface Transport: tcp Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15 Iface IPaddress: 10.1.253.148 Iface HWaddress: default Iface Netdev: eth1 SID: 3 iSCSI Connection State: LOGGED IN
Re: equallogic - load balancing and xfs
Matthew Kent wrote: Can anyone suggest a timeout I might be hitting or a setting I'm missing? The run down: - EqualLogic target - CentOS 5.2 client You will want to upgrade that to 5.3 when you can. The iscsi code in there fixes a bug where the initiator dropped the session when it should not. - xfs lvm iscsi During a period of high load the EqualLogic decides to load balance: INFO 4/13/09 12:08:29 AM eql3iSCSI session to target '20.20.20.31:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was closed. Load balancing request was received on the array. So is this what you get in the EQL log when it decides to load balance the initiator and send us to a different portal? INFO 4/13/09 12:08:31 AM eql3iSCSI login to target '20.20.20.32:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72' successful, using standard frame length. on the client see I get: Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error: return code = 0x0002 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev sdc, sector 113287552 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem (dm-10) meta-data dev dm-10 block 0x6c0a000 Are you using dm-multipath over iscsi? Does this load balance issue affect all the paths at the same time? What is your multipath no_path_retry value? I think you might want to set that higher to avoid the FS from getting IO errors at this time if all paths are affected at the same time. I am not sure how to config the EQL box to not load balance or load balance at different thresholds. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: equallogic - load balancing and xfs
On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote: Matthew Kent wrote: Can anyone suggest a timeout I might be hitting or a setting I'm missing? The run down: - EqualLogic target - CentOS 5.2 client You will want to upgrade that to 5.3 when you can. The iscsi code in there fixes a bug where the initiator dropped the session when it should not. Will do, probably Wednesday night and we'll see if this goes away. I'll be sure to follow up for the archives. - xfs lvm iscsi During a period of high load the EqualLogic decides to load balance: INFO 4/13/09 12:08:29 AM eql3iSCSI session to target '20.20.20.31:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was closed. Load balancing request was received on the array. So is this what you get in the EQL log when it decides to load balance the initiator and send us to a different portal? Yes, a straight copy from event log in the java web interface. INFO 4/13/09 12:08:31 AM eql3iSCSI login to target '20.20.20.32:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72' successful, using standard frame length. on the client see I get: Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error: return code = 0x0002 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev sdc, sector 113287552 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem (dm-10) meta-data dev dm-10 block 0x6c0a000 Are you using dm-multipath over iscsi? Does this load balance issue affect all the paths at the same time? What is your multipath no_path_retry value? I think you might want to set that higher to avoid the FS from getting IO errors at this time if all paths are affected at the same time. Not using multipath on this one. I am not sure how to config the EQL box to not load balance or load balance at different thresholds. Yeah I haven't ever seen anything in the manual or gui related to configuring the load balancing, it seems to want to just to it whenever it wants. Though I imagine if I pulled all but one network line it would stop ;) Thanks for the quick reply. -- Matthew Kent \ SA \ bravenet.com --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: equallogic - load balancing and xfs
I am not sure how to config the EQL box to not load balance or load At the array CLI prompt type: grpparams conn-balancing disable --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: equallogic - load balancing and xfs
You don't want to disable connection load balancing (CLB) in the long run. CLB will balance out IO across the available ports as servers need IO. I.e. during the day your file server or SQL server will be busy. Then at night other servers or backups are running. Without CLB you could end up stacking connections onto a single interface while other ports are idle. Upgrading to 5.3 and enabling MPIO is the best solution. Don On Mon, Apr 13, 2009 at 4:56 PM, Konrad Rzeszutek kon...@virtualiron.comwrote: I am not sure how to config the EQL box to not load balance or load At the array CLI prompt type: grpparams conn-balancing disable --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
RE: Open iSCSI Performance on IBM
Hi Mike, Is bs=128K a linux, iscsi or IBM parameter? Thanks, Simone -Original Message- From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On Behalf Of Mike Christie Sent: Thursday, April 09, 2009 10:55 AM To: open-iscsi@googlegroups.com Subject: Re: Open iSCSI Performance on IBM Gonçalo Borges wrote: Hi All... Sorry, the following could be a little bit off topic... Does any one has an idea of what is the expected performance for a IBM DS 3300 system connected via open iSCSI? Using a RAID 1 with 2 disks, I got the following numbers: Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a iscsi/scsi disk? Could you set the IO scheduler to noop echo noop /sys/block/sdX/queue/scheduler and see if that makes a difference. Also try bs=128k And then also run iscsiadm -m session -P 3 Sequential Write: [r...@core12 ~]# dd if=/dev/zero of=/apoio04/b1 bs=64k count=125000 125000+0 records in 125000+0 records out 819200 bytes (8.2 GB) copied, 454.522 seconds, 18.0 MB/s Sequential Read: [r...@core12 ~]# dd if=/apoio04/b1 of=/dev/null bs=64k count=125000 125000+0 records in 125000+0 records out 819200 bytes (8.2 GB) copied, 94.9401 seconds, 86.3 MB/s I restricted the RAM to be only 1GB, therefore there are no cache effects in these numbers. Because the read stats are good, we exclude network bottlenecks. Nevertheless, we were expecting more or less the performance of a single disk (~50MB/s) for the write tests and we are getting less than half. I do not know if this is really the physical limit of the system or if there is a problem somewhere... I could not find any IBM official numbers, therefore, I though that someone over here could give me any hint about the numbers they are getting... Thanks in Advance Cheers Goncalo Borges --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
RE: Open iSCSI Performance on IBM
Disregard I just saw it was a dd option. Simone -Original Message- From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On Behalf Of Simone Morellato Sent: Monday, April 13, 2009 3:04 PM To: open-iscsi@googlegroups.com Subject: RE: Open iSCSI Performance on IBM Hi Mike, Is bs=128K a linux, iscsi or IBM parameter? Thanks, Simone -Original Message- From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On Behalf Of Mike Christie Sent: Thursday, April 09, 2009 10:55 AM To: open-iscsi@googlegroups.com Subject: Re: Open iSCSI Performance on IBM Gonçalo Borges wrote: Hi All... Sorry, the following could be a little bit off topic... Does any one has an idea of what is the expected performance for a IBM DS 3300 system connected via open iSCSI? Using a RAID 1 with 2 disks, I got the following numbers: Is /apoio04/b1 a scsi/iscsi disk or is it LVM/DM/RAID on top of a iscsi/scsi disk? Could you set the IO scheduler to noop echo noop /sys/block/sdX/queue/scheduler and see if that makes a difference. Also try bs=128k And then also run iscsiadm -m session -P 3 Sequential Write: [r...@core12 ~]# dd if=/dev/zero of=/apoio04/b1 bs=64k count=125000 125000+0 records in 125000+0 records out 819200 bytes (8.2 GB) copied, 454.522 seconds, 18.0 MB/s Sequential Read: [r...@core12 ~]# dd if=/apoio04/b1 of=/dev/null bs=64k count=125000 125000+0 records in 125000+0 records out 819200 bytes (8.2 GB) copied, 94.9401 seconds, 86.3 MB/s I restricted the RAM to be only 1GB, therefore there are no cache effects in these numbers. Because the read stats are good, we exclude network bottlenecks. Nevertheless, we were expecting more or less the performance of a single disk (~50MB/s) for the write tests and we are getting less than half. I do not know if this is really the physical limit of the system or if there is a problem somewhere... I could not find any IBM official numbers, therefore, I though that someone over here could give me any hint about the numbers they are getting... Thanks in Advance Cheers Goncalo Borges --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: equallogic - load balancing and xfs
Matthew Kent wrote: On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote: Matthew Kent wrote: Can anyone suggest a timeout I might be hitting or a setting I'm missing? The run down: - EqualLogic target - CentOS 5.2 client You will want to upgrade that to 5.3 when you can. The iscsi code in there fixes a bug where the initiator dropped the session when it should not. Will do, probably Wednesday night and we'll see if this goes away. I'll be sure to follow up for the archives. - xfs lvm iscsi During a period of high load the EqualLogic decides to load balance: INFO 4/13/09 12:08:29 AM eql3iSCSI session to target '20.20.20.31:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was closed. Load balancing request was received on the array. So is this what you get in the EQL log when it decides to load balance the initiator and send us to a different portal? Yes, a straight copy from event log in the java web interface. INFO 4/13/09 12:08:31 AM eql3iSCSI login to target '20.20.20.32:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72' successful, using standard frame length. on the client see I get: Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error: return code = 0x0002 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev sdc, sector 113287552 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem (dm-10) meta-data dev dm-10 block 0x6c0a000 Are you using dm-multipath over iscsi? Does this load balance issue affect all the paths at the same time? What is your multipath no_path_retry value? I think you might want to set that higher to avoid the FS from getting IO errors at this time if all paths are affected at the same time. Not using multipath on this one. Do you have xfs on sdc or is there something like LVM or RAID on top of sdc? That is really strange then. 0x0002 is DID_BUS_BUSY. The iscsi initiator layer would return this when the target does its load balancing. The initiator does this to ask he scsi layer to retry the IO. If dm-multipath was used then it is failed to the multipath layer right away. If dm-multipath is not used then we get 5 retries so we should not see the error if there was only the one rebalancing at the time. If there was a bunch of load rebalancing within a couple minutes then it makes sense. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: equallogic - load balancing and xfs
On Mon, 2009-04-13 at 17:28 -0500, Mike Christie wrote: Matthew Kent wrote: On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote: Matthew Kent wrote: Can anyone suggest a timeout I might be hitting or a setting I'm missing? The run down: - EqualLogic target - CentOS 5.2 client You will want to upgrade that to 5.3 when you can. The iscsi code in there fixes a bug where the initiator dropped the session when it should not. Will do, probably Wednesday night and we'll see if this goes away. I'll be sure to follow up for the archives. - xfs lvm iscsi During a period of high load the EqualLogic decides to load balance: INFO 4/13/09 12:08:29 AM eql3iSCSI session to target '20.20.20.31:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was closed. Load balancing request was received on the array. So is this what you get in the EQL log when it decides to load balance the initiator and send us to a different portal? Yes, a straight copy from event log in the java web interface. INFO 4/13/09 12:08:31 AM eql3iSCSI login to target '20.20.20.32:3260, iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72' successful, using standard frame length. on the client see I get: Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error: return code = 0x0002 Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev sdc, sector 113287552 Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem (dm-10) meta-data dev dm-10 block 0x6c0a000 Are you using dm-multipath over iscsi? Does this load balance issue affect all the paths at the same time? What is your multipath no_path_retry value? I think you might want to set that higher to avoid the FS from getting IO errors at this time if all paths are affected at the same time. Not using multipath on this one. Do you have xfs on sdc or is there something like LVM or RAID on top of sdc? That is really strange then. 0x0002 is DID_BUS_BUSY. The iscsi initiator layer would return this when the target does its load balancing. The initiator does this to ask he scsi layer to retry the IO. If dm-multipath was used then it is failed to the multipath layer right away. If dm-multipath is not used then we get 5 retries so we should not see the error if there was only the one rebalancing at the time. If there was a bunch of load rebalancing within a couple minutes then it makes sense. Yeah xfs on top of lvm, no multipath. Logs only show the one load balancing request around that time. Funny thing is this system, and the load balancing etc, has been going error free for months now, but the last couple days it's flared up right around the time of some log rotation and heavy i/o. We'll see what happens after the centos 5.3 upgrade. We'll also be upgrading the firmware on all the equallogics to the latest version. -- Matthew Kent \ SA \ bravenet.com --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---