Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
Accross my SAN, tuned system: ping -I eth2 -s 9000 10.1.253.48 PING 10.1.253.48 (10.1.253.48) from 10.1.253.48 eth2: 9000(9028) bytes of data. 9008 bytes from 10.1.253.48: icmp_seq=1 ttl=64 time=0.074 ms 9008 bytes from 10.1.253.48: icmp_seq=2 ttl=64 time=0.013 ms 9008 bytes from 10.1.253.48: icmp_seq=3 ttl=64 time=0.012 ms 9008 bytes from 10.1.253.48: icmp_seq=4 ttl=64 time=0.011 ms 9008 bytes from 10.1.253.48: icmp_seq=5 ttl=64 time=0.012 ms 9008 bytes from 10.1.253.48: icmp_seq=6 ttl=64 time=0.012 ms 9008 bytes from 10.1.253.48: icmp_seq=7 ttl=64 time=0.012 ms 9008 bytes from 10.1.253.48: icmp_seq=8 ttl=64 time=0.011 ms 9008 bytes from 10.1.253.48: icmp_seq=9 ttl=64 time=0.012 ms TSO, TCP checksum offload and things like that seem to have a big effect on latency. If you look at how things like TSO work, their intention is to save you CPU overhead...in my case I don't care about overhead I've got 24 cores. On Apr 17, 3:25 am, "Ulrich Windl" wrote: > On 16 Apr 2009 at 13:59, jnantel wrote: > > > > > FINAL RESULTS * > > First of all I'd thank Mike Christie for all his help. Mike I'll > > tapping your brain again for some read performance help. > > > This for the benefit of anyone using the Dell Equallogic PS5000XV > > PS5000E with SLES10 SP2 / Redhat 5.3 / Centos 5.3 / Oracle Linux + > > Multipath ( MPIO ) and open-iscsi ( iscsi ). Sorry about weird > > formatting, making sure this is going get hit for people that were in > > my predicament. > > When seeing your settings, I wonder what your network latency for jumbo > frames is > (e.g using ping). The timing is dependent on packet sizes. Here is what I > have if > everything is connected to one switch (and both ends are handling normal iSCSI > traffic at the same time), started from Domain-0 of a XEN-virtualized machine > that > has 77 users logged on: > > # ping -s 9000 172.20.76.1 > PING 172.20.76.1 (172.20.76.1) 9000(9028) bytes of data. > 9008 bytes from 172.20.76.1: icmp_seq=1 ttl=64 time=1.90 ms > 9008 bytes from 172.20.76.1: icmp_seq=2 ttl=64 time=1.38 ms > 9008 bytes from 172.20.76.1: icmp_seq=3 ttl=64 time=1.39 ms > 9008 bytes from 172.20.76.1: icmp_seq=4 ttl=64 time=1.40 ms > 9008 bytes from 172.20.76.1: icmp_seq=5 ttl=64 time=1.56 ms > 9008 bytes from 172.20.76.1: icmp_seq=6 ttl=64 time=1.52 ms > 9008 bytes from 172.20.76.1: icmp_seq=7 ttl=64 time=1.39 ms > 9008 bytes from 172.20.76.1: icmp_seq=8 ttl=64 time=1.40 ms > 9008 bytes from 172.20.76.1: icmp_seq=9 ttl=64 time=1.55 ms > 9008 bytes from 172.20.76.1: icmp_seq=10 ttl=64 time=1.38 ms > > --- 172.20.76.1 ping statistics --- > 10 packets transmitted, 10 received, 0% packet loss, time 9000ms > rtt min/avg/max/mdev = 1.384/1.491/1.900/0.154 ms > # ping 172.20.76.1 > PING 172.20.76.1 (172.20.76.1) 56(84) bytes of data. > 64 bytes from 172.20.76.1: icmp_seq=1 ttl=64 time=0.253 ms > 64 bytes from 172.20.76.1: icmp_seq=2 ttl=64 time=0.214 ms > 64 bytes from 172.20.76.1: icmp_seq=3 ttl=64 time=0.223 ms > 64 bytes from 172.20.76.1: icmp_seq=4 ttl=64 time=0.214 ms > 64 bytes from 172.20.76.1: icmp_seq=5 ttl=64 time=0.215 ms > 64 bytes from 172.20.76.1: icmp_seq=6 ttl=64 time=0.208 ms > 64 bytes from 172.20.76.1: icmp_seq=7 ttl=64 time=0.270 ms > 64 bytes from 172.20.76.1: icmp_seq=8 ttl=64 time=0.313 ms > > --- 172.20.76.1 ping statistics --- > 8 packets transmitted, 8 received, 0% packet loss, time 6996ms > rtt min/avg/max/mdev = 0.208/0.238/0.313/0.039 ms > > I think large queues are more important if the roundtrip delay is high. ANd > don't > forget that queue sizes are per device or session, so is uses some RAM. > > Regards, > Ulrich > > > > > As from this thread my issue was amazingly slow performance with > > sequential writes with my multipath, around 35 meg/s, configuration > > when measured with IOMETER. First things first... THROW OUT IOMETER > > FOR LINUX , it has problems with queue depth. With that said, with > > default iscsi and multipath setup we saw between 60-80meg/sec > > performance with multipath. In essence it was slower than single > > interface in certain block sizes. When I got done my write performance > > was pushing 180-190meg/sec with blocks as small as 4k ( sequential > > write test using "dt"). > > > Here are my tweaks: > > > After making any multipath changes do "multipath -F" then "multipath" > > otherwise your changes won't take effect. > > > /etc/multipath.conf > > > device { > > vendor "EQLOGIC" > > product "100E-00" > > path_grouping_policy multibus > > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > > features "1 queue_if_no_path" < --- important > > path_checker readsector0 > > failback immediate > > path_selector "round-robin 0" > > rr_min_io 512 < important, only works with large queue > > depth and cms in iscsi.conf > > rr_weight priorities > > } > > > /etc/iscsi/iscsi.conf ( restarting iscsi seems to apply the configs > > fine) > > > # To control
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
On 16 Apr 2009 at 13:59, jnantel wrote: > > FINAL RESULTS * > First of all I'd thank Mike Christie for all his help. Mike I'll > tapping your brain again for some read performance help. > > This for the benefit of anyone using the Dell Equallogic PS5000XV > PS5000E with SLES10 SP2 / Redhat 5.3 / Centos 5.3 / Oracle Linux + > Multipath ( MPIO ) and open-iscsi ( iscsi ). Sorry about weird > formatting, making sure this is going get hit for people that were in > my predicament. When seeing your settings, I wonder what your network latency for jumbo frames is (e.g using ping). The timing is dependent on packet sizes. Here is what I have if everything is connected to one switch (and both ends are handling normal iSCSI traffic at the same time), started from Domain-0 of a XEN-virtualized machine that has 77 users logged on: # ping -s 9000 172.20.76.1 PING 172.20.76.1 (172.20.76.1) 9000(9028) bytes of data. 9008 bytes from 172.20.76.1: icmp_seq=1 ttl=64 time=1.90 ms 9008 bytes from 172.20.76.1: icmp_seq=2 ttl=64 time=1.38 ms 9008 bytes from 172.20.76.1: icmp_seq=3 ttl=64 time=1.39 ms 9008 bytes from 172.20.76.1: icmp_seq=4 ttl=64 time=1.40 ms 9008 bytes from 172.20.76.1: icmp_seq=5 ttl=64 time=1.56 ms 9008 bytes from 172.20.76.1: icmp_seq=6 ttl=64 time=1.52 ms 9008 bytes from 172.20.76.1: icmp_seq=7 ttl=64 time=1.39 ms 9008 bytes from 172.20.76.1: icmp_seq=8 ttl=64 time=1.40 ms 9008 bytes from 172.20.76.1: icmp_seq=9 ttl=64 time=1.55 ms 9008 bytes from 172.20.76.1: icmp_seq=10 ttl=64 time=1.38 ms --- 172.20.76.1 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9000ms rtt min/avg/max/mdev = 1.384/1.491/1.900/0.154 ms # ping 172.20.76.1 PING 172.20.76.1 (172.20.76.1) 56(84) bytes of data. 64 bytes from 172.20.76.1: icmp_seq=1 ttl=64 time=0.253 ms 64 bytes from 172.20.76.1: icmp_seq=2 ttl=64 time=0.214 ms 64 bytes from 172.20.76.1: icmp_seq=3 ttl=64 time=0.223 ms 64 bytes from 172.20.76.1: icmp_seq=4 ttl=64 time=0.214 ms 64 bytes from 172.20.76.1: icmp_seq=5 ttl=64 time=0.215 ms 64 bytes from 172.20.76.1: icmp_seq=6 ttl=64 time=0.208 ms 64 bytes from 172.20.76.1: icmp_seq=7 ttl=64 time=0.270 ms 64 bytes from 172.20.76.1: icmp_seq=8 ttl=64 time=0.313 ms --- 172.20.76.1 ping statistics --- 8 packets transmitted, 8 received, 0% packet loss, time 6996ms rtt min/avg/max/mdev = 0.208/0.238/0.313/0.039 ms I think large queues are more important if the roundtrip delay is high. ANd don't forget that queue sizes are per device or session, so is uses some RAM. Regards, Ulrich > > As from this thread my issue was amazingly slow performance with > sequential writes with my multipath, around 35 meg/s, configuration > when measured with IOMETER. First things first... THROW OUT IOMETER > FOR LINUX , it has problems with queue depth. With that said, with > default iscsi and multipath setup we saw between 60-80meg/sec > performance with multipath. In essence it was slower than single > interface in certain block sizes. When I got done my write performance > was pushing 180-190meg/sec with blocks as small as 4k ( sequential > write test using "dt"). > > Here are my tweaks: > > After making any multipath changes do "multipath -F" then "multipath" > otherwise your changes won't take effect. > > /etc/multipath.conf > > device { > vendor "EQLOGIC" > product "100E-00" > path_grouping_policy multibus > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > features "1 queue_if_no_path" < --- important > path_checker readsector0 > failback immediate > path_selector "round-robin 0" > rr_min_io 512 < important, only works with large queue > depth and cms in iscsi.conf > rr_weight priorities > } > > > /etc/iscsi/iscsi.conf ( restarting iscsi seems to apply the configs > fine) > > # To control how many commands the session will queue set > # node.session.cmds_max to an integer between 2 and 2048 that is also > # a power of 2. The default is 128. > node.session.cmds_max = 1024 > > # To control the device's queue depth set node.session.queue_depth > # to a value between 1 and 128. The default is 32. > node.session.queue_depth = 128 > > Other changes I've made are basic gigabit network tuning for large > transfers and turning off some congestion functions, some scheduler > changes (noop is amazing for sub 4k blocks but awful for 4meg chunks > or higher). I've turned off TSO on the network cards, apparently it's > not supported with jumbo frames and actually slows down performance. > > > dc1stgdb14:~ # ethtool -k eth7 > Offload parameters for eth7: > rx-checksumming: off > tx-checksumming: off > scatter-gather: off > tcp segmentation offload: off > dc1stgdb14:~ # ethtool -k eth10 > Offload parameters for eth10: > rx-checksumming: off > tx-checksumming: off > scatter-gather: off > tcp segmentation offload: off > dc1stgdb14:~ # > > > On Apr 13, 4:36 pm, jnantel wrote: > > I am having a major
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
FINAL RESULTS * First of all I'd thank Mike Christie for all his help. Mike I'll tapping your brain again for some read performance help. This for the benefit of anyone using the Dell Equallogic PS5000XV PS5000E with SLES10 SP2 / Redhat 5.3 / Centos 5.3 / Oracle Linux + Multipath ( MPIO ) and open-iscsi ( iscsi ). Sorry about weird formatting, making sure this is going get hit for people that were in my predicament. As from this thread my issue was amazingly slow performance with sequential writes with my multipath, around 35 meg/s, configuration when measured with IOMETER. First things first... THROW OUT IOMETER FOR LINUX , it has problems with queue depth. With that said, with default iscsi and multipath setup we saw between 60-80meg/sec performance with multipath. In essence it was slower than single interface in certain block sizes. When I got done my write performance was pushing 180-190meg/sec with blocks as small as 4k ( sequential write test using "dt"). Here are my tweaks: After making any multipath changes do "multipath -F" then "multipath" otherwise your changes won't take effect. /etc/multipath.conf device { vendor "EQLOGIC" product "100E-00" path_grouping_policy multibus getuid_callout "/sbin/scsi_id -g -u -s /block/%n" features "1 queue_if_no_path" < --- important path_checker readsector0 failback immediate path_selector "round-robin 0" rr_min_io 512 < important, only works with large queue depth and cms in iscsi.conf rr_weight priorities } /etc/iscsi/iscsi.conf ( restarting iscsi seems to apply the configs fine) # To control how many commands the session will queue set # node.session.cmds_max to an integer between 2 and 2048 that is also # a power of 2. The default is 128. node.session.cmds_max = 1024 # To control the device's queue depth set node.session.queue_depth # to a value between 1 and 128. The default is 32. node.session.queue_depth = 128 Other changes I've made are basic gigabit network tuning for large transfers and turning off some congestion functions, some scheduler changes (noop is amazing for sub 4k blocks but awful for 4meg chunks or higher). I've turned off TSO on the network cards, apparently it's not supported with jumbo frames and actually slows down performance. dc1stgdb14:~ # ethtool -k eth7 Offload parameters for eth7: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp segmentation offload: off dc1stgdb14:~ # ethtool -k eth10 Offload parameters for eth10: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp segmentation offload: off dc1stgdb14:~ # On Apr 13, 4:36 pm, jnantel wrote: > I am having a major issue with multipath + iscsi writeperformance > with anything random or any sequential write with data sizes smaller > than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to > get a maximum throughput of 33meg/s write. Myperformancegets cut by > a third with each smaller size, with 4k blocks giving me a whopping > 4meg/s combined throughput. Now bumping the data size up to 32meg > gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to > top it out 128meg gives me 210megabytes/sec. My question is what > factors would limit myperformancein the 4-128k range? > > Some basics about myperformancelab: > > 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in > separate pcie slots. > > Hardware: > 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT > Cisco 3750s with 32gigabit stackwise interconnect > 2 x Dell Equallogic PS5000XV arrays > 1 x Dell Equallogic PS5000E arrays > > Operating systems > SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 > > /etc/mutipath.conf > > defaults { > udev_dir /dev > polling_interval 10 > selector "round-robin 0" > path_grouping_policy multibus > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > prio_callout /bin/true > path_checker readsector0 > features "1 queue_if_no_path" > rr_min_io 10 > max_fds 8192 > # rr_weight priorities > failback immediate > # no_path_retry fail > # user_friendly_names yes > > /etc/iscsi/iscsi.conf (non default values) > > node.session.timeo.replacement_timeout = 15 > node.conn[0].timeo.noop_out_interval = 5 > node.conn[0].timeo.noop_out_timeout = 30 > node.session.cmds_max = 128 > node.session.queue_depth = 32 > node.session.iscsi.FirstBurstLength = 262144 > node.session.iscsi.MaxBurstLength = 16776192 > node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 > node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144 > > discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536 > > Scheduler: > > cat /sys/block/sdb/queue/scheduler > [noop] anticipatory deadline cfq
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
On Mon, Apr 13, 2009 at 10:33 PM, Mike Christie wrote: > I think linux is just not so good with smaller IO sizes like 4K. I do > not see good performance with Fibre Channel or iscsi. Most people run a filesystem on top of a block device imported via open-iscsi. It is well known that a filesystem performs I/O to the underlying block device using block sizes between 4 KB and 64 KB, with a significant fraction being 4 KB I/O's. If there was a performance problem in Linux with regard to small block sizes, filesystem performance in Linux would suffer. I have not yet seen statistics that show that Linux' filesystem performance is worse than for other operating systems. But I have already seen measurements that show the contrary. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
iometer 32k write 0 read 0 randoms Equallogic is using this in their lab iozone with -I option and various settings dd + iostat On Apr 14, 1:57 pm, Mike Christie wrote: > Bart Van Assche wrote: > > On Mon, Apr 13, 2009 at 10:33 PM, Mike Christie > > wrote: > >> I think linux is just not so good with smaller IO sizes like 4K. I do > >> not see good performance with Fibre Channel or iscsi. > > > Can you elaborate on the above ? I have already measured a throughput > > of more than 60 MB/s when using the SRP protocol over an InfiniBand > > network with a block size of 4 KB blocks, which is definitely not bad. > > How does that compare to Windows or Solaris? > > Is that a 10 gig link? > > What tool were you using and what command did you run? I will try to > replicate it here and see what I get. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
jnantel wrote: > Well I've got some disconcerting news on this issue. No changes at > any level alter the 34/meg throughput I get. I flushed multipath, blew > away /var/lib/iscsi just in case. I also verified in /var/lib/iscsi > the options got set. RHEL53 took my renice no problem. > What were you using for the io test tool, and how did you run it? > Some observations: > Single interface iscsi gives me the exact same 34meg/sec > Going with 2 interfaces it gives me 17meg/sec each interface > Going with 4 interfaces it gives me 8meg/sec...etc..etc..etc. > I can't seem to set node.conn[0].iscsi.MaxXmitDataSegmentLength = > 262144 in a way that actually gets used. We will always take what the target wants to use, so you have to increase it there. > node.session.iscsi.MaxConnections = 1can't find any docs on this, > doubtful it is relevant. > > iscsiadm -m session -P 3 still gives me the default 65536 for xmit > segment. > > The Equallogic has all its interfaces on the same SAN network, this is > contrary to most implementations of multipath I've done. This is the > vendor recommended deployment. > > Whatever is choking performance its consistently choking it down to > the same level. > > > > > On Apr 13, 5:33 pm, Mike Christie wrote: >> jnantel wrote: >> >>> I am having a major issue with multipath + iscsi write performance >>> with anything random or any sequential write with data sizes smaller >>> than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to >>> get a maximum throughput of 33meg/s write. My performance gets cut by >>> a third with each smaller size, with 4k blocks giving me a whopping >>> 4meg/s combined throughput. Now bumping the data size up to 32meg >>> gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to >>> top it out 128meg gives me 210megabytes/sec. My question is what >>> factors would limit my performance in the 4-128k range? >> I think linux is just not so good with smaller IO sizes like 4K. I do >> not see good performance with Fibre Channel or iscsi. >> >> 64K+ should be fine, but you want to get lots of 64K+ IOs in flight. If >> you run iostat or blktrace you should see more than 1 IO in flight. If >> while the test is running if you >> cat /sys/class/scsi_host/hostX/host_busy >> you should also see lots of IO running. >> >> What limits the number of IO? On the iscsi initiator side, it could be >> params like node.session.cmds_max or node.session.queue_depth. For a >> decent target like the ones you have I would increase >> node.session.cmds_max to 1024 and increase node.session.queue_depth to 128. >> >> What IO tool are you using? Are you doing direct IO or are you doing >> file system IO? If you just use something like dd with bs=64K then you >> are not going to get lots of IO running. I think you will get 1 64K IO >> in flight, so throughput is not going to be high. If you use something >> like disktest >> disktest -PT -T30 -h1 -K128 -B64k -ID /dev/sdb >> >> you should see a lot of IOs (depends on merging). >> >> If you were using dd with bs=128m then that IO is going to get broken >> down into lots of smaller IOs (probably around 256K), and so the pipe is >> nice and full. >> >> Another thing I noticed in RHEL is if you increase the nice value of the >> iscsi threads it will increase write perforamnce sometimes. So for RHEL >> or Oracle do >> >> ps -u root | grep scsi_wq >> >> Then patch the scsi_wq_%HOST_ID with the iscsiadm -m session -P 3 Host >> Number. And then renive the thread to -20. >> >> Also check the logs and make sure you do not see any conn error messages. >> >> And then what do you get when running the IO test to the individual >> iscsi disks instead of the dm one? Is there any difference? You might >> want to change the rr_min_io. If you are sending smaller IOs then >> rr_min_io of 10 is probably too small. The path is not going to get lots >> of nice large IOs like you would want. >> >> >> >>> Some basics about my performance lab: >>> 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in >>> separate pcie slots. >>> Hardware: >>> 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT >>> Cisco 3750s with 32gigabit stackwise interconnect >>> 2 x Dell Equallogic PS5000XV arrays >>> 1 x Dell Equallogic PS5000E arrays >>> Operating system >>> SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 >>> /etc/mutipath.conf >>> defaults { >>> udev_dir/dev >>> polling_interval10 >>> selector"round-robin 0" >>> path_grouping_policymultibus >>> getuid_callout "/sbin/scsi_id -g -u -s /block/%n" >>> prio_callout/bin/true >>> path_checkerreadsector0 >>> features "1 queue_if_no_path" >>> rr_min_io 10 >>> max_fds 8192 >>> # rr_weight priorities >>> failbackimmediate >>> #
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
Mike Christie wrote: > Bart Van Assche wrote: >> On Mon, Apr 13, 2009 at 10:33 PM, Mike Christie >> wrote: >>> I think linux is just not so good with smaller IO sizes like 4K. I do >>> not see good performance with Fibre Channel or iscsi. >> >> Can you elaborate on the above ? I have already measured a throughput >> of more than 60 MB/s when using the SRP protocol over an InfiniBand >> network with a block size of 4 KB blocks, which is definitely not bad. >> > > How does that compare to Windows or Solaris? > > Is that a 10 gig link? > > What tool were you using and what command did you run? I will try to > replicate it here and see what I get. > Oh yeah, how many IOPs can you get with that setup? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
Bart Van Assche wrote: > On Mon, Apr 13, 2009 at 10:33 PM, Mike Christie wrote: >> I think linux is just not so good with smaller IO sizes like 4K. I do >> not see good performance with Fibre Channel or iscsi. > > Can you elaborate on the above ? I have already measured a throughput > of more than 60 MB/s when using the SRP protocol over an InfiniBand > network with a block size of 4 KB blocks, which is definitely not bad. > How does that compare to Windows or Solaris? Is that a 10 gig link? What tool were you using and what command did you run? I will try to replicate it here and see what I get. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
An addition to this, I seem to be getting the following error when I login: Apr 14 08:24:54 dc1stgdb15 iscsid: received iferror -38 Apr 14 08:24:54 dc1stgdb15 last message repeated 2 times Apr 14 08:24:54 dc1stgdb15 iscsid: connection2:0 is operational now Apr 14 08:24:54 dc1stgdb15 iscsid: received iferror -38 Apr 14 08:24:54 dc1stgdb15 last message repeated 2 times Apr 14 08:24:54 dc1stgdb15 iscsid: connection1:0 is operational now On Apr 14, 1:12 pm, jnantel wrote: > Well I've got some disconcerting news on this issue. No changes at > any level alter the 34/meg throughput I get. I flushed multipath, blew > away /var/lib/iscsi just in case. I also verified in /var/lib/iscsi > the options got set. RHEL53 took my renice no problem. > > Some observations: > Single interface iscsi gives me the exact same 34meg/sec > Going with 2 interfaces it gives me 17meg/sec each interface > Going with 4 interfaces it gives me 8meg/sec...etc..etc..etc. > I can't seem to set node.conn[0].iscsi.MaxXmitDataSegmentLength = > 262144 in a way that actually gets used. > node.session.iscsi.MaxConnections = 1 can't find any docs on this, > doubtful it is relevant. > > iscsiadm -m session -P 3 still gives me the default 65536 for xmit > segment. > > The Equallogic has all its interfaces on the same SAN network, this is > contrary to most implementations of multipath I've done. This is the > vendor recommended deployment. > > Whatever is choking performance its consistently choking it down to > the same level. > > On Apr 13, 5:33 pm, Mike Christie wrote: > > > jnantel wrote: > > > > I am having a major issue with multipath + iscsi write performance > > > with anything random or any sequential write with data sizes smaller > > > than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to > > > get a maximum throughput of 33meg/s write. My performance gets cut by > > > a third with each smaller size, with 4k blocks giving me a whopping > > > 4meg/s combined throughput. Now bumping the data size up to 32meg > > > gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to > > > top it out 128meg gives me 210megabytes/sec. My question is what > > > factors would limit my performance in the 4-128k range? > > > I think linux is just not so good with smaller IO sizes like 4K. I do > > not see good performance with Fibre Channel or iscsi. > > > 64K+ should be fine, but you want to get lots of 64K+ IOs in flight. If > > you run iostat or blktrace you should see more than 1 IO in flight. If > > while the test is running if you > > cat /sys/class/scsi_host/hostX/host_busy > > you should also see lots of IO running. > > > What limits the number of IO? On the iscsi initiator side, it could be > > params like node.session.cmds_max or node.session.queue_depth. For a > > decent target like the ones you have I would increase > > node.session.cmds_max to 1024 and increase node.session.queue_depth to 128. > > > What IO tool are you using? Are you doing direct IO or are you doing > > file system IO? If you just use something like dd with bs=64K then you > > are not going to get lots of IO running. I think you will get 1 64K IO > > in flight, so throughput is not going to be high. If you use something > > like disktest > > disktest -PT -T30 -h1 -K128 -B64k -ID /dev/sdb > > > you should see a lot of IOs (depends on merging). > > > If you were using dd with bs=128m then that IO is going to get broken > > down into lots of smaller IOs (probably around 256K), and so the pipe is > > nice and full. > > > Another thing I noticed in RHEL is if you increase the nice value of the > > iscsi threads it will increase write perforamnce sometimes. So for RHEL > > or Oracle do > > > ps -u root | grep scsi_wq > > > Then patch the scsi_wq_%HOST_ID with the iscsiadm -m session -P 3 Host > > Number. And then renive the thread to -20. > > > Also check the logs and make sure you do not see any conn error messages. > > > And then what do you get when running the IO test to the individual > > iscsi disks instead of the dm one? Is there any difference? You might > > want to change the rr_min_io. If you are sending smaller IOs then > > rr_min_io of 10 is probably too small. The path is not going to get lots > > of nice large IOs like you would want. > > > > Some basics about my performance lab: > > > > 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in > > > separate pcie slots. > > > > Hardware: > > > 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT > > > Cisco 3750s with 32gigabit stackwise interconnect > > > 2 x Dell Equallogic PS5000XV arrays > > > 1 x Dell Equallogic PS5000E arrays > > > > Operating systems > > > SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 > > > > /etc/mutipath.conf > > > > defaults { > > > udev_dir /dev > > > polling_interval 10 > > > selector "round-robin 0" > > > path_grouping_policy multibus > > > get
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
Well I've got some disconcerting news on this issue. No changes at any level alter the 34/meg throughput I get. I flushed multipath, blew away /var/lib/iscsi just in case. I also verified in /var/lib/iscsi the options got set. RHEL53 took my renice no problem. Some observations: Single interface iscsi gives me the exact same 34meg/sec Going with 2 interfaces it gives me 17meg/sec each interface Going with 4 interfaces it gives me 8meg/sec...etc..etc..etc. I can't seem to set node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144 in a way that actually gets used. node.session.iscsi.MaxConnections = 1can't find any docs on this, doubtful it is relevant. iscsiadm -m session -P 3 still gives me the default 65536 for xmit segment. The Equallogic has all its interfaces on the same SAN network, this is contrary to most implementations of multipath I've done. This is the vendor recommended deployment. Whatever is choking performance its consistently choking it down to the same level. On Apr 13, 5:33 pm, Mike Christie wrote: > jnantel wrote: > > > I am having a major issue with multipath + iscsi write performance > > with anything random or any sequential write with data sizes smaller > > than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to > > get a maximum throughput of 33meg/s write. My performance gets cut by > > a third with each smaller size, with 4k blocks giving me a whopping > > 4meg/s combined throughput. Now bumping the data size up to 32meg > > gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to > > top it out 128meg gives me 210megabytes/sec. My question is what > > factors would limit my performance in the 4-128k range? > > I think linux is just not so good with smaller IO sizes like 4K. I do > not see good performance with Fibre Channel or iscsi. > > 64K+ should be fine, but you want to get lots of 64K+ IOs in flight. If > you run iostat or blktrace you should see more than 1 IO in flight. If > while the test is running if you > cat /sys/class/scsi_host/hostX/host_busy > you should also see lots of IO running. > > What limits the number of IO? On the iscsi initiator side, it could be > params like node.session.cmds_max or node.session.queue_depth. For a > decent target like the ones you have I would increase > node.session.cmds_max to 1024 and increase node.session.queue_depth to 128. > > What IO tool are you using? Are you doing direct IO or are you doing > file system IO? If you just use something like dd with bs=64K then you > are not going to get lots of IO running. I think you will get 1 64K IO > in flight, so throughput is not going to be high. If you use something > like disktest > disktest -PT -T30 -h1 -K128 -B64k -ID /dev/sdb > > you should see a lot of IOs (depends on merging). > > If you were using dd with bs=128m then that IO is going to get broken > down into lots of smaller IOs (probably around 256K), and so the pipe is > nice and full. > > Another thing I noticed in RHEL is if you increase the nice value of the > iscsi threads it will increase write perforamnce sometimes. So for RHEL > or Oracle do > > ps -u root | grep scsi_wq > > Then patch the scsi_wq_%HOST_ID with the iscsiadm -m session -P 3 Host > Number. And then renive the thread to -20. > > Also check the logs and make sure you do not see any conn error messages. > > And then what do you get when running the IO test to the individual > iscsi disks instead of the dm one? Is there any difference? You might > want to change the rr_min_io. If you are sending smaller IOs then > rr_min_io of 10 is probably too small. The path is not going to get lots > of nice large IOs like you would want. > > > > > Some basics about my performance lab: > > > 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in > > separate pcie slots. > > > Hardware: > > 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT > > Cisco 3750s with 32gigabit stackwise interconnect > > 2 x Dell Equallogic PS5000XV arrays > > 1 x Dell Equallogic PS5000E arrays > > > Operating systems > > SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 > > > /etc/mutipath.conf > > > defaults { > > udev_dir /dev > > polling_interval 10 > > selector "round-robin 0" > > path_grouping_policy multibus > > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > > prio_callout /bin/true > > path_checker readsector0 > > features "1 queue_if_no_path" > > rr_min_io 10 > > max_fds 8192 > > # rr_weight priorities > > failback immediate > > # no_path_retry fail > > # user_friendly_names yes > > > /etc/iscsi/iscsi.conf (non default values) > > > node.session.timeo.replacement_timeout = 15 > > node.conn[0].timeo.noop_out_interval = 5 > > node.conn[0].timeo.noop_out_timeout = 30
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
On Mon, Apr 13, 2009 at 10:33 PM, Mike Christie wrote: > I think linux is just not so good with smaller IO sizes like 4K. I do > not see good performance with Fibre Channel or iscsi. Can you elaborate on the above ? I have already measured a throughput of more than 60 MB/s when using the SRP protocol over an InfiniBand network with a block size of 4 KB blocks, which is definitely not bad. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
jnantel wrote: > > > I am having a major issue with multipath + iscsi write performance > with anything random or any sequential write with data sizes smaller > than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to > get a maximum throughput of 33meg/s write. My performance gets cut by > a third with each smaller size, with 4k blocks giving me a whopping > 4meg/s combined throughput. Now bumping the data size up to 32meg > gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to > top it out 128meg gives me 210megabytes/sec. My question is what > factors would limit my performance in the 4-128k range? I think linux is just not so good with smaller IO sizes like 4K. I do not see good performance with Fibre Channel or iscsi. 64K+ should be fine, but you want to get lots of 64K+ IOs in flight. If you run iostat or blktrace you should see more than 1 IO in flight. If while the test is running if you cat /sys/class/scsi_host/hostX/host_busy you should also see lots of IO running. What limits the number of IO? On the iscsi initiator side, it could be params like node.session.cmds_max or node.session.queue_depth. For a decent target like the ones you have I would increase node.session.cmds_max to 1024 and increase node.session.queue_depth to 128. What IO tool are you using? Are you doing direct IO or are you doing file system IO? If you just use something like dd with bs=64K then you are not going to get lots of IO running. I think you will get 1 64K IO in flight, so throughput is not going to be high. If you use something like disktest disktest -PT -T30 -h1 -K128 -B64k -ID /dev/sdb you should see a lot of IOs (depends on merging). If you were using dd with bs=128m then that IO is going to get broken down into lots of smaller IOs (probably around 256K), and so the pipe is nice and full. Another thing I noticed in RHEL is if you increase the nice value of the iscsi threads it will increase write perforamnce sometimes. So for RHEL or Oracle do ps -u root | grep scsi_wq Then patch the scsi_wq_%HOST_ID with the iscsiadm -m session -P 3 Host Number. And then renive the thread to -20. Also check the logs and make sure you do not see any conn error messages. And then what do you get when running the IO test to the individual iscsi disks instead of the dm one? Is there any difference? You might want to change the rr_min_io. If you are sending smaller IOs then rr_min_io of 10 is probably too small. The path is not going to get lots of nice large IOs like you would want. > > > Some basics about my performance lab: > > 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in > separate pcie slots. > > Hardware: > 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT > Cisco 3750s with 32gigabit stackwise interconnect > 2 x Dell Equallogic PS5000XV arrays > 1 x Dell Equallogic PS5000E arrays > > Operating systems > SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 > > > /etc/mutipath.conf > > defaults { > udev_dir/dev > polling_interval10 > selector"round-robin 0" > path_grouping_policymultibus > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > prio_callout/bin/true > path_checkerreadsector0 > features "1 queue_if_no_path" > rr_min_io 10 > max_fds 8192 > # rr_weight priorities > failbackimmediate > # no_path_retry fail > # user_friendly_names yes > > /etc/iscsi/iscsi.conf (non default values) > > node.session.timeo.replacement_timeout = 15 > node.conn[0].timeo.noop_out_interval = 5 > node.conn[0].timeo.noop_out_timeout = 30 > node.session.cmds_max = 128 > node.session.queue_depth = 32 > node.session.iscsi.FirstBurstLength = 262144 > node.session.iscsi.MaxBurstLength = 16776192 > node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 > node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144 > > discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536 > > Scheduler: > > cat /sys/block/sdb/queue/scheduler > [noop] anticipatory deadline cfq > cat /sys/block/sdc/queue/scheduler > [noop] anticipatory deadline cfq > > > Command outputs: > > iscsiadm -m session -P 3 > iSCSI Transport Class version 2.0-724 > iscsiadm version 2.0-868 > Target: iqn.2001-05.com.equallogic:0-8a0906-2c82dfd03-64c000cfe2249e37- > dc1stgdb15-sas-raid6 > Current Portal: 10.1.253.13:3260,1 > Persistent Portal: 10.1.253.10:3260,1 > ** > Interface: > ** > Iface Name: ieth1 > Iface Transport: tcp > Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15 > Iface IPaddress: 10.1.253.148 > Iface HWaddress: default > Iface Netdev: e
Multipath + iscsi + SLES10 SP2 / REDHAT 5.3 / Oracle Linux 5 update 3
I am having a major issue with multipath + iscsi write performance with anything random or any sequential write with data sizes smaller than 4meg (128k 64k 32k 16k 8k). With 32k block size, I am able to get a maximum throughput of 33meg/s write. My performance gets cut by a third with each smaller size, with 4k blocks giving me a whopping 4meg/s combined throughput. Now bumping the data size up to 32meg gets me 160meg/sec throughput, and 64 gives me 190meg/s and finally to top it out 128meg gives me 210megabytes/sec. My question is what factors would limit my performance in the 4-128k range? Some basics about my performance lab: 2 identical 1 gigabit paths (2 dual port intel pro 1000 MTs) in separate pcie slots. Hardware: 2 x Dell R900 6 quad core, 128gig ram, 2 x Dual port Intel Pro MT Cisco 3750s with 32gigabit stackwise interconnect 2 x Dell Equallogic PS5000XV arrays 1 x Dell Equallogic PS5000E arrays Operating systems SLES 10 SP2 , RHEL5 Update 3, Oracle Linux 5 update 3 /etc/mutipath.conf defaults { udev_dir/dev polling_interval10 selector"round-robin 0" path_grouping_policymultibus getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio_callout/bin/true path_checkerreadsector0 features "1 queue_if_no_path" rr_min_io 10 max_fds 8192 # rr_weight priorities failbackimmediate # no_path_retry fail # user_friendly_names yes /etc/iscsi/iscsi.conf (non default values) node.session.timeo.replacement_timeout = 15 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 30 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 262144 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 65536 Scheduler: cat /sys/block/sdb/queue/scheduler [noop] anticipatory deadline cfq cat /sys/block/sdc/queue/scheduler [noop] anticipatory deadline cfq Command outputs: iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-724 iscsiadm version 2.0-868 Target: iqn.2001-05.com.equallogic:0-8a0906-2c82dfd03-64c000cfe2249e37- dc1stgdb15-sas-raid6 Current Portal: 10.1.253.13:3260,1 Persistent Portal: 10.1.253.10:3260,1 ** Interface: ** Iface Name: ieth1 Iface Transport: tcp Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15 Iface IPaddress: 10.1.253.148 Iface HWaddress: default Iface Netdev: eth1 SID: 3 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 65536 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: No MaxOutstandingR2T: 1 Attached SCSI devices: Host Number: 5 State: running scsi5 Channel 00 Id 0 Lun: 0 Attached scsi disk sdb State: running Current Portal: 10.1.253.12:3260,1 Persistent Portal: 10.1.253.10:3260,1 ** Interface: ** Iface Name: ieth2 Iface Transport: tcp Iface Initiatorname: iqn.2005-04.com.linux:dc1stgdb15 Iface IPaddress: 10.1.253.48 Iface HWaddress: default Iface Netdev: eth2 SID: 4 iSCSI Connection State: LOGGED IN iSCSI Session State: Unknown Internal iscsid Session State: NO CHANGE Negotiated iSCSI params: HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 262144 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 65536 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: No MaxOutstandingR2T: 1 Attached SCSI devices: *