Re: RFC: GEOM MULTIPATH rewrite
On Nov 14, 2011, at 11:09 PM, Gary Palmer wrote: On Tue, Nov 01, 2011 at 10:24:06PM +0200, Alexander Motin wrote: On 01.11.2011 19:50, Dennis K?gel wrote: Not sure if replying on-list or off-list makes more sense... Replying on-list could share experience to other users. Anyway, some first impressions, on stable/9: The lab environment here is a EMC VNX / Clariion SAN, which has two Storage Processors, connected to different switches, connected to two isp(4)s on the test machine. So at any time, the machine sees four paths, but only two are available (depending on which SP owns the LUN). 580# camcontrol devlist DGC VRAID 0531 at scbus0 target 0 lun 0 (da0,pass0) DGC VRAID 0531 at scbus0 target 1 lun 0 (da1,pass1) DGC VRAID 0531 at scbus1 target 0 lun 0 (da2,pass2) DGC VRAID 0531 at scbus1 target 1 lun 0 (da3,pass3) COMPAQ RAID 1(1VOLUME OK at scbus2 target 0 lun 0 (da4,pass4) COMPAQ RAID 0 VOLUME OK at scbus2 target 1 lun 0 (da5,pass5) hp DVD D DS8D3SH HHE7 at scbus4 target 0 lun 0 (cd0,pass6) I miss the ability to add disks to automatic mode multipaths, but I (just now) realized this only makes sense when gmultipath has some kind of path checking facility (like periodically trying to read sector 0 of each configured device, this is was Linux' devicemapper-multipathd does). In automatic mode other paths supposed to be detected via metadata reading. If in your case some paths are not readable, automatic mode can't work as expected. By the way, could you describe how your configuration supposed to work, like when other paths will start working? Without knowledge of the particular Clariion SAN Dennis is working with, I've seen some so-called active/active RAID controllers force a LUN fail over from one controller to another (taking it offline for 3 seconds in the process) because the LUN received an I/O down a path to the controller that was formerly taking the standby role for that LUN (and it was per-LUN, so some would be owned by one controller and some by the other). During the controller switch, all I/O to the LUN would fail. Thankfully that particular RAID model where I observed this behaviour hasn't been sold in several years, but I would tend to expect such behaviour at the lower end of the storage market with the higher end units doing true active/active configurations. (and no, I won't name the manufacturer on a public list) This is exactly why Linux ships with a multipath configuration file, so it can describe exactly what form of brain damage the controller in question implements so it can work around it, and maybe even document some vendor-specific extensions so that the host can detect which controller is taking which role for a particular path. Even some controllers that don't have pathological behaviour when they receive I/O down the wrong path have sub-optimal behaviour unless you choose the right path. NetApp SANs in particular typically have two independant controllers with a high-speed internal interconnect, however there is a measurable and not-insignificant penalty for sending the I/O to the partner controller for a LUN, across the internal interconnect (called a VTIC I believe) to the owner controller. I've been told, although I have not measured this myself, that it can add several ms to a transaction, which when talking about SAN storage is potentially several times what it takes to do the same I/O directly to the controller that owns it. There's probably a way to make the partner controller not advertise the LUN until it takes over in a failover scenario, but every NetApp I've worked with is set (by default I believe) to advertise the LUN out both controllers. Gary ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: GEOM MULTIPATH rewrite
On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: GEOM MULTIPATH rewrite
On 20.01.2012, at 12:51, Alexander Motin m...@freebsd.org wrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. -- Alexander Motin I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: GEOM MULTIPATH rewrite
On 01/20/12 13:08, Nikolay Denev wrote: On 20.01.2012, at 12:51, Alexander Motinm...@freebsd.org wrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. -- Alexander Motin I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: GEOM MULTIPATH rewrite
On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: On 01/20/12 13:08, Nikolay Denev wrote: On 20.01.2012, at 12:51, Alexander Motinm...@freebsd.org wrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. -- Alexander Motin I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. -- Alexander Motin I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose): gmultipath label -A -v LD_0 /dev/da0 /dev/da24 gmultipath label -A -v LD_1 /dev/da1 /dev/da25 gmultipath label -A -v LD_2 /dev/da2 /dev/da26 gmultipath label -A -v LD_3 /dev/da3 /dev/da27 gmultipath label -A -v LD_4 /dev/da4 /dev/da28 gmultipath label -A -v LD_5 /dev/da5 /dev/da29 gmultipath label -A -v LD_6 /dev/da6 /dev/da30 gmultipath label -A -v LD_7 /dev/da7 /dev/da31 gmultipath label -A -v LD_8 /dev/da8 /dev/da32 gmultipath label -A -v LD_9 /dev/da9 /dev/da33 gmultipath label -A -v LD_10 /dev/da10 /dev/da34 gmultipath label -A -v LD_11 /dev/da11 /dev/da35 gmultipath label -A -v LD_12 /dev/da12 /dev/da36 gmultipath label -A -v LD_13 /dev/da13 /dev/da37 gmultipath label -A -v LD_14 /dev/da14 /dev/da38 gmultipath label -A -v LD_15 /dev/da15 /dev/da39 gmultipath label -A -v LD_16 /dev/da16 /dev/da40 gmultipath label -A -v LD_17 /dev/da17 /dev/da41 gmultipath label -A -v LD_18 /dev/da18 /dev/da42 gmultipath label -A -v LD_19 /dev/da19 /dev/da43 gmultipath label -A -v LD_20 /dev/da20 /dev/da44 gmultipath label -A -v LD_21 /dev/da21 /dev/da45 gmultipath label -A -v LD_22 /dev/da22 /dev/da46 gmultipath label -A -v LD_23 /dev/da23 /dev/da47 :~# gmultipath status Name Status Components multipath/LD_0 OPTIMAL da0 (ACTIVE) da24 (ACTIVE) multipath/LD_1 OPTIMAL da1 (ACTIVE) da25 (ACTIVE) multipath/LD_2 OPTIMAL da2 (ACTIVE) da26 (ACTIVE) multipath/LD_3 OPTIMAL da3 (ACTIVE) da27 (ACTIVE) multipath/LD_4 OPTIMAL da4 (ACTIVE) da28 (ACTIVE) multipath/LD_5 OPTIMAL da5 (ACTIVE) da29 (ACTIVE) multipath/LD_6 OPTIMAL da6 (ACTIVE) da30 (ACTIVE) multipath/LD_7 OPTIMAL da7 (ACTIVE) da31 (ACTIVE) multipath/LD_8 OPTIMAL da8 (ACTIVE) da32 (ACTIVE) multipath/LD_9 OPTIMAL da9 (ACTIVE) da33 (ACTIVE) multipath/LD_10 OPTIMAL da10 (ACTIVE) da34 (ACTIVE) multipath/LD_11 OPTIMAL da11 (ACTIVE) da35 (ACTIVE) multipath/LD_12 OPTIMAL da12 (ACTIVE) da36 (ACTIVE) multipath/LD_13 OPTIMAL da13 (ACTIVE) da37 (ACTIVE) multipath/LD_14 OPTIMAL da14 (ACTIVE) da38 (ACTIVE) multipath/LD_15 OPTIMAL da15 (ACTIVE) da39 (ACTIVE) multipath/LD_16 OPTIMAL da16 (ACTIVE) da40 (ACTIVE) multipath/LD_17 OPTIMAL da17 (ACTIVE) da41 (ACTIVE) multipath/LD_18 OPTIMAL da18 (ACTIVE) da42 (ACTIVE) multipath/LD_19 OPTIMAL da19 (ACTIVE) da43 (ACTIVE) multipath/LD_20 OPTIMAL da20 (ACTIVE) da44 (ACTIVE) multipath/LD_21 OPTIMAL da21 (ACTIVE) da45 (ACTIVE) multipath/LD_22 OPTIMAL da22 (ACTIVE) da46 (ACTIVE) multipath/LD_23 OPTIMAL da23 (ACTIVE) da47 (ACTIVE) :~# zpool import tank :~# zpool status pool: tank state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 multipath/LD_0 ONLINE 0 0 0 multipath/LD_1
Re: RFC: GEOM MULTIPATH rewrite
On 01/20/12 14:13, Nikolay Denev wrote: On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: On 01/20/12 13:08, Nikolay Denev wrote: On 20.01.2012, at 12:51, Alexander Motinm...@freebsd.org wrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose): And now a very naive benchmark : :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) Now deactivate the alternative paths : And the benchmark again: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is directly connected to a 4Gbps Xyratex dual-controller (active-active) storage array. All the 24 SAS drives are setup as single disk RAID0 LUNs. This difference is too huge to explain it with ineffective paths utilization. Can't this storage have some per-LUN port/controller affinity that may penalize concurrent access to the same LUN from different paths? Can't it be active/active on port level, but active/passive for each specific LUN? If there are really two controllers inside, they may need to synchronize their caches or bounce requests, that may be expensive. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: GEOM MULTIPATH rewrite
On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote: On 01/20/12 14:13, Nikolay Denev wrote: On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: On 01/20/12 13:08, Nikolay Denev wrote: On 20.01.2012, at 12:51, Alexander Motinm...@freebsd.org wrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose): And now a very naive benchmark : :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) Now deactivate the alternative paths : And the benchmark again: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is directly connected to a 4Gbps Xyratex dual-controller (active-active) storage array. All the 24 SAS drives are setup as single disk RAID0 LUNs. This difference is too huge to explain it with ineffective paths utilization. Can't this storage have some per-LUN port/controller affinity that may penalize concurrent access to the same LUN from different paths? Can't it be active/active on port level, but active/passive for each specific LUN? If there are really two controllers inside, they may need to synchronize their caches or bounce requests, that may be expensive. -- Alexander Motin Yes, I think that's what's happening. There are two controllers each with it's own CPU and cache and have cache synchronization enabled. I will try to test multipath if both paths are connected to the same controller (there are two ports on each controller). But that will require remote hands and take some time. In the mean time I've now disabled the writeback cache on the array (this disables also the cache synchronization) and here are the results : ACTIVE-ACTIVE: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec) ACTIVE-PASSIVE (half of the paths failed the same way as in the previous email): :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec) This increased the performance for both cases, probably because writeback caching does nothing for large sequential writes. Anyways, here ACTIVE-ACTIVE is still slower, but not by that much. ___
Re: RFC: GEOM MULTIPATH rewrite
On 01/20/12 15:27, Nikolay Denev wrote: On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote: On 01/20/12 14:13, Nikolay Denev wrote: On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: On 01/20/12 13:08, Nikolay Denev wrote: On 20.01.2012, at 12:51, Alexander Motinm...@freebsd.orgwrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose): And now a very naive benchmark : :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) Now deactivate the alternative paths : And the benchmark again: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is directly connected to a 4Gbps Xyratex dual-controller (active-active) storage array. All the 24 SAS drives are setup as single disk RAID0 LUNs. This difference is too huge to explain it with ineffective paths utilization. Can't this storage have some per-LUN port/controller affinity that may penalize concurrent access to the same LUN from different paths? Can't it be active/active on port level, but active/passive for each specific LUN? If there are really two controllers inside, they may need to synchronize their caches or bounce requests, that may be expensive. -- Alexander Motin Yes, I think that's what's happening. There are two controllers each with it's own CPU and cache and have cache synchronization enabled. I will try to test multipath if both paths are connected to the same controller (there are two ports on each controller). But that will require remote hands and take some time. In the mean time I've now disabled the writeback cache on the array (this disables also the cache synchronization) and here are the results : ACTIVE-ACTIVE: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec) ACTIVE-PASSIVE (half of the paths failed the same way as in the previous email): :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec) This increased the performance for both cases, probably because writeback caching does nothing for large sequential writes. Anyways, here ACTIVE-ACTIVE is still slower, but not by that much. Thank you for numbers, but I have some doubts about them. 2295702835 bytes/sec is
Re: RFC: GEOM MULTIPATH rewrite
On Jan 20, 2012, at 3:38 PM, Alexander Motin wrote: On 01/20/12 15:27, Nikolay Denev wrote: On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote: On 01/20/12 14:13, Nikolay Denev wrote: On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: On 01/20/12 13:08, Nikolay Denev wrote: On 20.01.2012, at 12:51, Alexander Motinm...@freebsd.orgwrote: On 01/20/12 10:09, Nikolay Denev wrote: Another thing I've observed is that active/active probably only makes sense if you are accessing single LUN. In my tests where I have 24 LUNS that form 4 vdevs in a single zpool, the highest performance was achieved when I split the active paths among the controllers installed in the server importing the pool. (basically gmultipath rotate $LUN in rc.local for half of the paths) Using active/active in this situation resulted in fluctuating performance. How big was fluctuation? Between speed of one and all paths? Several active/active devices without knowledge about each other with some probability will send part of requests via the same links, while ZFS itself already does some balancing between vdevs. I will test in a bit and post results. P.S.: Is there a way to enable/disable active-active on the fly? I'm currently re-labeling to achieve that. No, there is not now. But for experiments you may achieve the same results by manually marking as failed all paths except one. It is not dangerous, as if that link fail, all other will resurrect automatically. I had to destroy and relabel anyways, since I was not using active-active currently. Here's what I did (maybe a little too verbose): And now a very naive benchmark : :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) Now deactivate the alternative paths : And the benchmark again: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and is directly connected to a 4Gbps Xyratex dual-controller (active-active) storage array. All the 24 SAS drives are setup as single disk RAID0 LUNs. This difference is too huge to explain it with ineffective paths utilization. Can't this storage have some per-LUN port/controller affinity that may penalize concurrent access to the same LUN from different paths? Can't it be active/active on port level, but active/passive for each specific LUN? If there are really two controllers inside, they may need to synchronize their caches or bounce requests, that may be expensive. -- Alexander Motin Yes, I think that's what's happening. There are two controllers each with it's own CPU and cache and have cache synchronization enabled. I will try to test multipath if both paths are connected to the same controller (there are two ports on each controller). But that will require remote hands and take some time. In the mean time I've now disabled the writeback cache on the array (this disables also the cache synchronization) and here are the results : ACTIVE-ACTIVE: :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec) ACTIVE-PASSIVE (half of the paths failed the same way as in the previous email): :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec) :~# dd if=/dev/zero of=/tank/TEST bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec) This increased the performance for both cases, probably because writeback caching does nothing for large
Re: Improving the FreeBSD-9 boot menu
On Tue, 20 Sep 2011, Warren Block wrote: The patch in PR 160818 makes some clarifications and improvements to the new boot menu. Obviously this is not for 9.0-RELEASE, just wanting to get it out there so people can look at it. http://www.freebsd.org/cgi/query-pr.cgi?pr=160818 Among other things, the patch removes the word boot from options that don't actually boot. The options are lined up, and enabled options are drawn in reverse video when loader_color=1 is set in /boot/loader.conf. Just reminding people about this now that 9.0 is out. It makes what I feel are genuine usability and readability improvements to the boot menu. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote: T New panic has been introduced somewhere between T r229851 and r229932, that happens on shutdown if T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. I've run through binary search and panic was introduced by r229854. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: FS hang when creating snapshots on a UFS SU+J setup
First step in debugging is to find out if the problem is SU+J specific. To find out, turn off SU+J but leave SU. This change is done by running: umount filesystem tunefs -j disable filesystem mount filesystem cd filesystem rm .sujournal Success! Thanks Mr. McKusick. I posted having this problem to the FreeBSD forum http://forums.freebsd.org/showthread.php?t=25787, but wanted to emphasize that in two VirtualBox VMs that were created in exactly the same way, the dump issue didn't occur in the absolutely fresh FreeBSD-9.0 install (not even portsnap yet), but it did occur in the system I had installed some ports on (an Apache/MySQL/Python stack, a few additional GNU build tools, and some other miscellaneous ports). I don't know if this means anything, just hoping it might help - presumably SU+J would be a good thing. ;) Regards, Dale ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Improving the FreeBSD-9 boot menu
On 20-01-2012 7:57, Warren Block wrote: On Tue, 20 Sep 2011, Warren Block wrote: The patch in PR 160818 makes some clarifications and improvements to the new boot menu. Obviously this is not for 9.0-RELEASE, just wanting to get it out there so people can look at it. http://www.freebsd.org/cgi/query-pr.cgi?pr=160818 Among other things, the patch removes the word boot from options that don't actually boot. The options are lined up, and enabled options are drawn in reverse video when loader_color=1 is set in /boot/loader.conf. Just reminding people about this now that 9.0 is out. It makes what I feel are genuine usability and readability improvements to the boot menu. I agree. Definitely an improvement. -- Joel ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Improving the FreeBSD-9 boot menu
On Fri, Jan 20, 2012 at 11:38 AM, Joel Dahl j...@vnode.se wrote: On 20-01-2012 7:57, Warren Block wrote: On Tue, 20 Sep 2011, Warren Block wrote: The patch in PR 160818 makes some clarifications and improvements to the new boot menu. Obviously this is not for 9.0-RELEASE, just wanting to get it out there so people can look at it. http://www.freebsd.org/cgi/query-pr.cgi?pr=160818 Among other things, the patch removes the word boot from options that don't actually boot. The options are lined up, and enabled options are drawn in reverse video when loader_color=1 is set in /boot/loader.conf. Just reminding people about this now that 9.0 is out. It makes what I feel are genuine usability and readability improvements to the boot menu. I agree. Definitely an improvement. Is this in the release notes? -- Eitan Adler ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: posix_fadvise noreuse disables file caching
On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans wrote: Hi, I recently noticed that multimedia/vlc generates a lot of disk IO when playing media files. For instance, when playing a 320kbps mp3 gstat reports about 1250kBps (=1kbps). That's quite a lot of overhead. It turns out that vlc sets POSIX_FADV_NOREUSE on the entire file and reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as if O_DIRECT was specified during open(2), i.e. it disables all caching. That means every 1028 byte read turns into a 32KiB read (new default block size in 9.0) which explains the above numbers. I've copied the relevant vlc code below (modules/access/file.c:Open()). It's interesting to see that on OSX it sets F_NOCACHE which disables caching too, but combined with F_RDAHEAD there's still read-ahead caching. I don't think POSIX intended for NOREUSE to mean O_DIRECT. It should still cache data (and even do read-ahead if F_RDAHEAD is specified), and once data is fetched from the cache, it can be marked WONTNEED. POSIX doesn't specify O_DIRECT, so it's not clear what it asks for. Is it possible to implement it this way, or if not to just ignore the NOREUSE hint for now? I think it would be good to improve NOREUSE, though I had sort of assumed that applications using NOREUSE would do their own buffering and read full blocks. We could perhaps reimplement NOREUSE by doing the equivalent of POSIX_FADV_DONTNEED after each read to free buffers and pages after the data is copied out to userland. I also have an XXX about whether or not NOREUSE should still allow read-ahead as it isn't very clear what the right thing to do there is. HP-UX (IIRC) has an fadvise() that lets you specify multiple policies, so you could specify both NOREUSE and SEQUENTIAL for a single region to get read-ahead but still release memory once the data is read once. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
amd: is there an alternative with NFSv4 capabilities?
Hello. I still use the amd automounter, but I miss NFSv4 capabilities. Since Linux seems to use a more deep in the kernel located facility, I'd like to ask whether FreeBSd has an alternative to the amd automounter with NFSv4 capabilities. Sorry if I bother someone, I'm not aware of an alternative and it maybe the case that I'm stuck with the amd ... Cheers, Oliver signature.asc Description: OpenPGP digital signature
LSI supported mps(4) driver available
The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt I plan to check it in to head next week, and then MFC it into stable/9 a week after that most likely. Please test it out and let me know if you run into any problems. In addition to supporting WarpDrive, the driver also supports Integrated RAID. Thanks to LSI for doing the work on this driver! I have added a number of other infrastructure changes that are necessary for the driver, and here is a brief summary: - A new Advanced Information buffer is now added to the EDT for drives that support READ CAPACITY (16). The da(4) driver updates this buffer when it grabs new read capacity data from a drive. - The mps(4) driver will look for Advanced Information state change async events, and updates its table of drives with protection information turned on accordingly. - The size of struct scsi_read_capacity_data_long has been bumped up to the amount specified in the latest SBC-3 draft. The hope is to avoid some future structure size bumps with that change. The API for scsi_read_capacity_16() has been changed to add a length argument. Hopefully this will future-proof it somewhat. - __FreeBSD_version bumped for the addition of the Advanced Information buffer with the read capacity information. The mps(4) driver has a kludgy way of getting the information on versions of FreeBSD without this change. I believe that the CAM API changes are mild enough and beneficial enough for a merge into stable/9, but they are intertwined with the unmap changes in the da(4) driver, so those changes will have to go back to stable/9 as well in order to MFC the full set of changes. Otherwise it'll just be the driver that gets merged into stable/9, and it'll use the kludgy method of getting the read capacity data for each drive. A couple of notes about issues with this driver: - Unlike the current mps(4) driver, it probes sequentially. If you have a lot of drives in your system, it will take a while to probe them all. - You may see warning messages like this: _mapping_add_new_device: failed to add the device with handle 0x0019 to persiste nt table because there is no free space available _mapping_add_new_device: failed to add the device with handle 0x001a to persiste nt table because there is no free space available - The driver is not endian safe. (It assumes a little endian machine.) This is not new, the driver in the tree has the same issue. The LSI folks know about these issues. The driver has passed their testing process. Many thanks to LSI for going through the effort to support FreeBSD. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Fri, Jan 20, 2012 at 12:44 PM, Kenneth D. Merry k...@freebsd.org wrote: The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: Just to clarify, this will replace the existing mps(4) driver in FreeBSD 10-CURRENT and 9-STABLE? So there won't be mps(4) (FreeBSD driver) and mpslsi(4) (LSI driver) anymore? Just mps(4)? -- Freddie Cash fjwc...@gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Fri, Jan 20, 2012 at 12:53:04 -0800, Freddie Cash wrote: On Fri, Jan 20, 2012 at 12:44 PM, Kenneth D. Merry k...@freebsd.org wrote: The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: Just to clarify, this will replace the existing mps(4) driver in FreeBSD 10-CURRENT and 9-STABLE? That is correct. So there won't be mps(4) (FreeBSD driver) and mpslsi(4) (LSI driver) anymore? Just mps(4)? Right. Just mps(4), which will be the LSI driver. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
- Original Message - From: Kenneth D. Merry k...@freebsd.org To: freebsd-s...@freebsd.org; freebsd-current@freebsd.org Sent: Friday, January 20, 2012 8:44 PM Subject: LSI supported mps(4) driver available The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt I plan to check it in to head next week, and then MFC it into stable/9 a week after that most likely. Great to see this being done, thanks to everyone! Be even better to see this MFC'ed to 8.x as well if all goes well. Do you think this will possible? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On Fri, Jan 20, 2012 at 23:14:20 -, Steven Hartland wrote: - Original Message - From: Kenneth D. Merry k...@freebsd.org To: freebsd-s...@freebsd.org; freebsd-current@freebsd.org Sent: Friday, January 20, 2012 8:44 PM Subject: LSI supported mps(4) driver available The LSI-supported version of the mps(4) driver that supports their 6Gb SAS HBAs as well as WarpDrive controllers, is available here: http://people.freebsd.org/~ken/lsi/mps_lsi.20120120.1.txt I plan to check it in to head next week, and then MFC it into stable/9 a week after that most likely. Great to see this being done, thanks to everyone! Be even better to see this MFC'ed to 8.x as well if all goes well. Do you think this will possible? Yes, that should be doable as well. It's unlikely that all of the CAM changes will get merged back, but the driver itself shouldn't be a problem. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: LSI supported mps(4) driver available
On 21/01/2012, at 7:14, Kenneth D. Merry wrote: In addition to supporting WarpDrive, the driver also supports Integrated RAID. Thanks to LSI for doing the work on this driver! This is great news (the RAID support) - thanks very much. Is there a corresponding userland tool, or plans for one? Thanks again. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: amd: is there an alternative with NFSv4 capabilities?
O. Hartmann wrote: Hello. I still use the amd automounter, but I miss NFSv4 capabilities. Since Linux seems to use a more deep in the kernel located facility, I'd like to ask whether FreeBSd has an alternative to the amd automounter with NFSv4 capabilities. Sorry if I bother someone, I'm not aware of an alternative and it maybe the case that I'm stuck with the amd ... Cheers, Oliver I'm not aware of anything, but maybe someone else knows of an alternative? On my maybe it would be nice list (not my to do list, because I'll never get around to it) was to look at the Solaris' AutoFS. I believe Apple switched to it and has found it worked well in Mac OSX. I suspect that there are OpenSolaris sources for it out there under the CDDL, but I haven't even checked that. So, if anyone is looking for an interesting project, this might be a nice one. (I could probably provide some help with it, if someone took it on.) rick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: atkbc not loaded with ACPI enabled in 9.0
--- On Wed, 2012/1/18, John Baldwin j...@freebsd.org wrote: On Friday, January 13, 2012 10:27:13 pm aconnoll...@yahoo.co.jp wrote: Please try this patch: Index: sys/dev/atkbdc/atkbdc_isa.c === --- atkbdc_isa.c(revision 230009) +++ atkbdc_isa.c(working copy) @@ -87,6 +87,7 @@ static driver_t atkbdc_isa_driver = { static struct isa_pnp_id atkbdc_ids[] = { { 0x0303d041, Keyboard controller (i8042) },/* PNP0303 */ +{ 0x0320d041, Keyboard controller (i8042) },/* PNP0320 */ { 0 } }; -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org John, Thanks for your help, but that patch doesn't appear to address the problem. I edited the atkbdc_isa.c file as you instructed, rebuilt and installed my kernel, but my integrated keyboard remains unresponsive with ACPI enabled. Here's the new output of dmesg -a http://pastebin.com/h6ahmD2ddevinfo -ur http://pastebin.com/sdNcNEJUdevinfo -vr http://pastebin.com/P2yqQBLY Perhaps I was supposed to remove PNP0303 support? No, the goal was to get atkbdc to try to attach to PNP0320 devices since those have your keyboard I/O ports. Can you add some printfs to atkbdc_isa_probe() to see how many times it is getting past the ID check, and how far along it gets in each cases (i.e. which failure case causes the probe routine to return an error)? -- John Baldwin John, I added some printfs to the isa_probe() function to see how far it was getting. The function is called many times as you can see in the dmesg, but mostly it exits at the PnP ID check with ENXIO. At one point it gets further, but still exits with ENXIO when port0 is found to be NULL. Any suggestions for further investigation? edited function http://pastebin.com/uUsVLiz2 dmesg -a http://pastebin.com/kDtC9gvM ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org