[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 01, 2021 at 03:47:32PM +0100, Gianluca Cecchi wrote: > On Mon, Feb 1, 2021 at 3:10 PM Nir Soffer wrote: > [snip] > > > So at the end I have the multipath.conf default file installed by vdsm > > (so without the # PRIVATE line) > > > and this in /etc/multipath/conf.d/eql.conf > > > > > > devices { > > > device { > > > vendor "EQLOGIC" > > > product "100E-00" > > > > Ben, why is this device missing from multipath builtin devices? > > > > I was using Equallogic kind of storage since oVirt 3.6, so CentoOS/RHEL 6, > and it has never been inside the multipath database as far as I remember. > But I don't know why. > The parameters I put was from latest EQL best practices, but they was > updated at CentOS 7 time. > I would like to use the same parameters in CentOS 8 now and see if they > works ok. > PS line of EQL is somehow deprecated (in the sense of no new features and > so on..) but anyway still supported > > > > > > > path_selector "round-robin 0" > > > path_grouping_policymultibus > > > path_checkertur > > > rr_min_io_rq10 > > > rr_weight priorities > > > failbackimmediate > > > features"0" > > > > This is never needed, multipath generates this value. > > > > Those were the recommended values from EQL > Latest is dated April 2016 when 8 not out yet: > http://downloads.dell.com/solutions/storage-solution-resources/(3199-CD-L)RHEL-PSseries-Configuration.pdf > Thanks for this link. I'll add this to the default built-ins. -Ben > > > > > > Ben: please correct me if needed > > > > > no_path_retry16 > > > > I'm don't think that you need this, since you should inherit the value > > from vdsm > > multipath.conf, either from the "defaults" section, or from the > > "overrides" section. > > > > You must add no_path_retry here if you want to use another value, and you > > don't > > want to use vdsm default value. > > > > You are right; I see the value of 16 both in defaults and overrides. But I > put it also inside the device section during my tests in doubt it was not > picked up in the hope to see similar output as in CentOS 7: > > 36090a0c8d04f2fc4251c7c08d0a3 dm-14 EQLOGIC ,100E-00 > size=2.4T features='1 queue_if_no_path' hwhandler='0' wp=rw > > where you notice the hwhandler='0' > > Originally I remember the default value for no_path_retry was 4 but > probably it has been changed in 4.4 to 16, correct? > If I want to see the default that vdsm would create from scratch should I > see inside > /usr/lib/python3.6/site-packages/vdsm/tool/configurators/multipath.py of my > version? > On my system with vdsm-python-4.40.40-1.el8.noarch I have this inside that > file > _NO_PATH_RETRY = 16 > > > > > > > Note that if you use your own value, you need to match it to sanlock > > io_timeout. > > See this document for more info: > > https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md > > > > > } > > > > Yes I set this: > > # cat /etc/vdsm/vdsm.conf.d/99-FooIO.conf > # Configuration for FooIO storage. > > [sanlock] > # Set renewal timeout to 80 seconds > # (8 * io_timeout == 80). > io_timeout = 10 > > And for another environment with Netapp MetroCluster and 2 different sites > (I'm with RHV there...) I plan to set no_path_retry to 24 and io_timeout to > 15, to manage disaster recovery scenarios and planned maintenance with > Netapp node failover through sites taking potentially up to 120 seconds. > > > But still I see this > > > > > > # multipath -l > > > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > > > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > > `-+- policy='round-robin 0' prio=0 status=active > > > |- 16:0:0:0 sdc 8:32 active undef running > > > `- 18:0:0:0 sde 8:64 active undef running > > > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > > > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > > `-+- policy='round-robin 0' prio=0 status=active > > > |- 15:0:0:0 sdb 8:16 active undef running > > > `- 17:0:0:0 sdd 8:48 active undef running > > > > > > that lets me think I'm not using the no_path_retry setting, but > > queue_if_no_path... I could be wrong anyway.. > > > > Not this is expected. What is means, if I understand multipath > > behavior correctly, > > that the device queue data for no_path_retry * polling_internal seconds > > when all > > paths failed. After that the device will fail all pending and new I/O > > until at least > > one path is recovered. > > > > > How to verify for sure (without dropping the paths, at least at the > > moment) from the config? > > > Any option with multipath and/or dmsetup commands? > > > > multipath show config -> find your device section, it will show the current > > value for no_path_retry. > > > > Nir > > > > > I would like just to be confident about
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 01, 2021 at 04:10:02PM +0200, Nir Soffer wrote: > On Mon, Feb 1, 2021 at 1:55 PM Gianluca Cecchi > wrote: > > > > On Sat, Jan 30, 2021 at 6:05 PM Strahil Nikolov > > wrote: > >> > >> So you created that extra conf with this content but it didn't work ? > >> multipath -v4 could hint you why it was complaining. > >> > >> > >> Best Regards, > >> Strahil Nikolov > >> > > > > Ok, I missed the surrounding root part > > > > devices { > > > > } > > It seems that we need more examples in multpath.conf file > installed by vdsm. > > > Apparently "multipathd show config" didn't complain... > > Now I put also that and it seems to work, thanks for pointing it > > > > So at the end I have the multipath.conf default file installed by vdsm (so > > without the # PRIVATE line) > > and this in /etc/multipath/conf.d/eql.conf > > > > devices { > > device { > > vendor "EQLOGIC" > > product "100E-00" > > Ben, why is this device missing from multipath builtin devices? > We usually get these from the device vendors themselves, or at least get their blessing. But it's cerainly possible to add configs even without the vendor's blessing. > > path_selector "round-robin 0" > > path_grouping_policymultibus > > path_checkertur > > rr_min_io_rq10 > > rr_weight priorities > > failbackimmediate > > features"0" > > This is never needed, multipath generates this value. > > Ben: please correct me if needed That's not exactly correct. Multipath will override the "queue_if_no_path" and "retain_attached_hw_handler" featurs, based on the values of the retain_attached_hw_handler and no_path_retry options. It will leave any other features alone. So, features like "pg_init_retries" and "pg_init_delay_seconds" still need to be specified using the features option. But you certianly don't ever need to specify a blank featues line, not that it hurts anything. > > > no_path_retry16 > > I'm don't think that you need this, since you should inherit the value from > vdsm > multipath.conf, either from the "defaults" section, or from the > "overrides" section. > > You must add no_path_retry here if you want to use another value, and you > don't > want to use vdsm default value. > > Note that if you use your own value, you need to match it to sanlock > io_timeout. > See this document for more info: > https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md > > > } > > } > > > > Recreated initrd and rebooted the host and activated it without further > > problems. > > And "multipathd show config" confirms it. > > Yes, this is the recommended way to configure multipath, thanks Strahil for > the > good advice! > > > But still I see this > > > > # multipath -l > > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 16:0:0:0 sdc 8:32 active undef running > > `- 18:0:0:0 sde 8:64 active undef running > > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:0 sdb 8:16 active undef running > > `- 17:0:0:0 sdd 8:48 active undef running > > > > that lets me think I'm not using the no_path_retry setting, but > > queue_if_no_path... I could be wrong anyway.. > > Not this is expected. What is means, if I understand multipath > behavior correctly, > that the device queue data for no_path_retry * polling_internal seconds when > all > paths failed. After that the device will fail all pending and new I/O > until at least > one path is recovered. Correct. This is just showing you the current configuration that the kernel is using. The kernel doesn't know anything about no_path_retry. That is something that multipathd uses internally. It just switches on and off the queue_if_no_path feature at the appropriate time. Currently, you have usable paths, and so the queue_if_no_path features should obviously be on. Once you've lost all your paths, and they've stayed down path the no_path_retry limit, multipathd will remove this feature from the kernel device. When a path is then restored, it will add the feature back. > > How to verify for sure (without dropping the paths, at least at the moment) > > from the config? > > Any option with multipath and/or dmsetup commands? > > multipath show config -> find your device section, it will show the current > value for no_path_retry. > > Nir ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct:
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 8:37 PM Gianluca Cecchi wrote: > > > > On Mon, Feb 1, 2021 at 6:51 PM David Teigland wrote: >> >> On Mon, Feb 01, 2021 at 07:18:24PM +0200, Nir Soffer wrote: >> > Assuming we could use: >> > >> > io_timeout = 10 >> > renewal_retries = 8 >> > >> > The worst case would be: >> > >> > 00 sanlock renewal succeeds >> > 19 storage fails >> > 20 sanlock try to renew lease 1/7 (timeout=10) >> > 30 sanlock renewal timeout >> > 40 sanlock try to renew lease 2/7 (timeout=10) >> > 50 sanlock renewal timeout >> > 60 sanlock try to renew lease 3/7 (timeout=10) >> > 70 sanlock renewal timeout >> > 80 sanlock try to renew lease 4/7 (timeout=10) >> > 90 sanlock renewal timeout >> > 100 sanlock try to renew lease 5/7 (timeout=10) >> > 110 sanlock renewal timeout >> > 120 sanlock try to renew lease 6/7 (timeout=10) >> > 130 sanlock renewal timeout >> > 139 storage is back >> > 140 sanlock try to renew lease 7/7 (timeout=10) >> > 140 sanlock renewal succeeds >> > >> > David, what do you think? >> >> I wish I could say, it would require some careful study to know how >> feasible it is. The timings are intricate and fundamental to correctness >> of the algorithm. >> Dave >> > > I was taking values also reading this: > > https://access.redhat.com/solutions/5152311 > > Perhaps it needs some review? Yes, I think we need to update the effective timeout filed. The value describe how sanlock and multipath configuration are related, but it does not represent the maximum outage time. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DCDI25ZKUM5JB4HW4NM7UYBYIU4JL7XG/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 6:51 PM David Teigland wrote: > On Mon, Feb 01, 2021 at 07:18:24PM +0200, Nir Soffer wrote: > > Assuming we could use: > > > > io_timeout = 10 > > renewal_retries = 8 > > > > The worst case would be: > > > > 00 sanlock renewal succeeds > > 19 storage fails > > 20 sanlock try to renew lease 1/7 (timeout=10) > > 30 sanlock renewal timeout > > 40 sanlock try to renew lease 2/7 (timeout=10) > > 50 sanlock renewal timeout > > 60 sanlock try to renew lease 3/7 (timeout=10) > > 70 sanlock renewal timeout > > 80 sanlock try to renew lease 4/7 (timeout=10) > > 90 sanlock renewal timeout > > 100 sanlock try to renew lease 5/7 (timeout=10) > > 110 sanlock renewal timeout > > 120 sanlock try to renew lease 6/7 (timeout=10) > > 130 sanlock renewal timeout > > 139 storage is back > > 140 sanlock try to renew lease 7/7 (timeout=10) > > 140 sanlock renewal succeeds > > > > David, what do you think? > > I wish I could say, it would require some careful study to know how > feasible it is. The timings are intricate and fundamental to correctness > of the algorithm. > Dave > > I was taking values also reading this: https://access.redhat.com/solutions/5152311 Perhaps it needs some review? Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/E4SSLVJ2YGYK2NVWGMJ2WKGSVGVXS5R7/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 01, 2021 at 07:18:24PM +0200, Nir Soffer wrote: > Assuming we could use: > > io_timeout = 10 > renewal_retries = 8 > > The worst case would be: > > 00 sanlock renewal succeeds > 19 storage fails > 20 sanlock try to renew lease 1/7 (timeout=10) > 30 sanlock renewal timeout > 40 sanlock try to renew lease 2/7 (timeout=10) > 50 sanlock renewal timeout > 60 sanlock try to renew lease 3/7 (timeout=10) > 70 sanlock renewal timeout > 80 sanlock try to renew lease 4/7 (timeout=10) > 90 sanlock renewal timeout > 100 sanlock try to renew lease 5/7 (timeout=10) > 110 sanlock renewal timeout > 120 sanlock try to renew lease 6/7 (timeout=10) > 130 sanlock renewal timeout > 139 storage is back > 140 sanlock try to renew lease 7/7 (timeout=10) > 140 sanlock renewal succeeds > > David, what do you think? I wish I could say, it would require some careful study to know how feasible it is. The timings are intricate and fundamental to correctness of the algorithm. Dave ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/75UVHZEUU6T5AIIYNCK2W37NPHDVH63Z/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 5:23 PM Gianluca Cecchi wrote: > > On Mon, Feb 1, 2021 at 4:09 PM Nir Soffer wrote: ... >> For 120 seconds, you likey need >> >> sanlock:io_timeout=20 >> no_path_retry=32 >> > > Shouldn't the above values for a 160 seconds timeout? I need 120 120 seconds for sanlock means that sanlock will expire the lease exactly 120 seconds since the last successful lease renewal. Sanlock cannot exceeds this deadline since other hosts assume that timeout when acquiring a lease from a "dead" host. When using 15 seconds timeout, sanlock renews the lease every 30 seconds. The best case flow is: 00 sanlock renewal succeeds 01 storage fails 30 sanlock try to renew lease 1/3 (timeout=15) 45 sanlock renewal timeout 60 sanlock try to renew lease 2/3 (timeout=15) 75 sanlock renewal timeout 90 sanlock tries to renew lease 3/3 (timeout=15) 105 sanlock renewal timeout 120 sanlock expire the lease, kill the vm/vdsm 121 storage is back If you use 20 seconds io timeout, sanlock checks every 40 seconds. The best case flow is: 00 sanlock renewal succeeds 01 storage fails 40 sanlock try to renew lease 1/3 (timeout=20) 60 sanlock renewal timeout 80 sanlock try to renew lease 2/3 (timeout=20) 100 sanlock renewal timeout 120 sanlock try to renew lease 3/3 (timeout=20) 121 storage is back 122 sanlock renwal succeeds But, we need to consider also the worst case flow: 00 sanlock renewal succeeds 39 storage fails 40 sanlock try to renew lease 1/3 (timeout=20) 60 sanlock renewal timeout 80 sanlock try to renew lease 2/3 (timeout=20) 100 sanlock renewal timeout 120 sanlock try to renew lease 3/3 (timeout=20) 140 sanlock renwal timeout 159 storage is back 160 sanlock expire lease, kill vm/vdsm etc. So even with 20 seconds io timeout, 120 seconds outage may not succeed. In practice we can assume that we detect storage outage sometime in the middle between sanlock renewals, so the flow would be: 00 sanlock renewal succeeds 20 storage fails 40 sanlock try to renew lease 1/3 (timeout=20) 60 sanlock renewal timeout 80 sanlock try to renew lease 2/3 (timeout=20) 100 sanlock renewal timeout 120 sanlock try to renew lease 3/3 (timeout=20) 140 storage is back 140 sanlock renwal succeeds 160 sanlock expire lease, kill vm/vdsm etc. So I would start with 20 seconds io timeout, and increase it if needed. These flows assume that multiapth timeout is configured properly. If multipath is using too short timeout, it will fail sanlock renewal immediately instead of queuing the I/O. I also did not add the time to detect that storage is available again. multipath check paths every 5 seconds (polling_internal), so this may add 5 seconds delay from the time the storage is up, until multipath detect it and try to send queued I/O. I think the current way sanlock works is not helpful for dealing with long outages on the storage side. If we could keep the io_timeout constant (e.g. 10 seconds), and change the number of retries we could work better and be easier to predict. Assuming we could use: io_timeout = 10 renewal_retries = 8 The worst case would be: 00 sanlock renewal succeeds 19 storage fails 20 sanlock try to renew lease 1/7 (timeout=10) 30 sanlock renewal timeout 40 sanlock try to renew lease 2/7 (timeout=10) 50 sanlock renewal timeout 60 sanlock try to renew lease 3/7 (timeout=10) 70 sanlock renewal timeout 80 sanlock try to renew lease 4/7 (timeout=10) 90 sanlock renewal timeout 100 sanlock try to renew lease 5/7 (timeout=10) 110 sanlock renewal timeout 120 sanlock try to renew lease 6/7 (timeout=10) 130 sanlock renewal timeout 139 storage is back 140 sanlock try to renew lease 7/7 (timeout=10) 140 sanlock renewal succeeds David, what do you think? ... > On another host with same config (other luns on the same storage), if I run: > > multipath reconfigure -v4 > /tmp/multipath_reconfigure_v4.txt 2>&1 > > I get this: > https://drive.google.com/file/d/1VkezFkT9IwsrYD8LoIp4-Q-j2X1dN_qR/view?usp=sharing > > anything important inside, concerned with path retry settings? I don't see anything about no_path_retry, there, maybe logging was changed, or it is not the right flags to see all the info during reconfiguration. I think "multipathd show config" is the canonical way to look at the current configuration. It shows the actual values multipath will use during runtime, after local configuration was applied on top of the built configuration. Nir ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/25ONGBFYU5DC6XM3HFUV7RU2OMJK5VRA/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 4:09 PM Nir Soffer wrote: [snip] > > The easiest way to get vdsm defaults is to change the the file to use > an older version, > (e.g. use 1.9), remove the private comment, and run: > > vdms-tool configure --force --module multipath > > Vdsm will upgrade your old file to the most recent version, and backup > your old file > to /etc/multipath.conf.timestamp. > thanks > For 120 seconds, you likey need > > sanlock:io_timeout=20 > no_path_retry=32 > > Shouldn't the above values for a 160 seconds timeout? I need 120 Because of the different way sanlock and multipath handles timeouts. > > Also note that our QE never tested changing these settings, but your > feedback on this new > configuration is very important. > ok > > multipath -r deleages the command to multipathd daemon, this is > probably the reason > you don't see the logs here. > > I think this will be more useful: > >multipathd reconfigure -v3 > > I'm not sure about the -v3, check multipathd manual for the details. > > Nir > > On another host with same config (other luns on the same storage), if I run: multipath reconfigure -v4 > /tmp/multipath_reconfigure_v4.txt 2>&1 I get this: https://drive.google.com/file/d/1VkezFkT9IwsrYD8LoIp4-Q-j2X1dN_qR/view?usp=sharing anything important inside, concerned with path retry settings? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FC5ZONSANUSXKJCYTPEOAWDLZOGDR7TF/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 4:47 PM Gianluca Cecchi wrote: > > On Mon, Feb 1, 2021 at 3:10 PM Nir Soffer wrote: > [snip] > >> > So at the end I have the multipath.conf default file installed by vdsm (so >> > without the # PRIVATE line) >> > and this in /etc/multipath/conf.d/eql.conf >> > >> > devices { >> > device { >> > vendor "EQLOGIC" >> > product "100E-00" >> >> Ben, why is this device missing from multipath builtin devices? > > > I was using Equallogic kind of storage since oVirt 3.6, so CentoOS/RHEL 6, > and it has never been inside the multipath database as far as I remember. > But I don't know why. > The parameters I put was from latest EQL best practices, but they was updated > at CentOS 7 time. > I would like to use the same parameters in CentOS 8 now and see if they works > ok. > PS line of EQL is somehow deprecated (in the sense of no new features and so > on..) but anyway still supported > >> >> >> > path_selector "round-robin 0" >> > path_grouping_policymultibus >> > path_checkertur >> > rr_min_io_rq10 >> > rr_weight priorities >> > failbackimmediate >> > features"0" >> >> This is never needed, multipath generates this value. > > > Those were the recommended values from EQL > Latest is dated April 2016 when 8 not out yet: > http://downloads.dell.com/solutions/storage-solution-resources/(3199-CD-L)RHEL-PSseries-Configuration.pdf > > >> >> >> Ben: please correct me if needed >> >> > no_path_retry16 >> >> I'm don't think that you need this, since you should inherit the value from >> vdsm >> multipath.conf, either from the "defaults" section, or from the >> "overrides" section. >> >> You must add no_path_retry here if you want to use another value, and you >> don't >> want to use vdsm default value. > > > You are right; I see the value of 16 both in defaults and overrides. But I > put it also inside the device section during my tests in doubt it was not > picked up in the hope to see similar output as in CentOS 7: > > 36090a0c8d04f2fc4251c7c08d0a3 dm-14 EQLOGIC ,100E-00 > size=2.4T features='1 queue_if_no_path' hwhandler='0' wp=rw > > where you notice the hwhandler='0' > > Originally I remember the default value for no_path_retry was 4 but probably > it has been changed in 4.4 to 16, correct? Yes. Working on configurable sanlock io timeout revealed that these values should match. > If I want to see the default that vdsm would create from scratch should I see > inside /usr/lib/python3.6/site-packages/vdsm/tool/configurators/multipath.py > of my version? Yes. The easiest way to get vdsm defaults is to change the the file to use an older version, (e.g. use 1.9), remove the private comment, and run: vdms-tool configure --force --module multipath Vdsm will upgrade your old file to the most recent version, and backup your old file to /etc/multipath.conf.timestamp. > On my system with vdsm-python-4.40.40-1.el8.noarch I have this inside that > file > _NO_PATH_RETRY = 16 Yes, this matches sanlock default io timeout (10 seconds). >> Note that if you use your own value, you need to match it to sanlock >> io_timeout. >> See this document for more info: >> https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md >> >> > } > > > Yes I set this: > > # cat /etc/vdsm/vdsm.conf.d/99-FooIO.conf > # Configuration for FooIO storage. > > [sanlock] > # Set renewal timeout to 80 seconds > # (8 * io_timeout == 80). > io_timeout = 10 > > And for another environment with Netapp MetroCluster and 2 different sites > (I'm with RHV there...) I plan to set no_path_retry to 24 and io_timeout to > 15, to manage disaster recovery scenarios and planned maintenance with Netapp > node failover through sites taking potentially up to 120 seconds. For 120 seconds, you likey need sanlock:io_timeout=20 no_path_retry=32 Because of the different way sanlock and multipath handles timeouts. Also note that our QE never tested changing these settings, but your feedback on this new configuration is very important. ... > I would like just to be confident about the no_path_retry setting, because > the multipath output, also with -v2, -v3, -v4 seems not so clear to me > In 7 (as Benjamin suggested 4 years ago.. ;-) I have this: > > # multipath -r -v3 | grep no_path_retry > Feb 01 15:45:27 | 36090a0d88034667163b315f8c906b0ac: no_path_retry = 4 > (config file default) > Feb 01 15:45:27 | 36090a0c8d04f2fc4251c7c08d0a3: no_path_retry = 4 > (config file default) > > On CentOS 8.3 I get only standard error...: > > # multipath -r -v3 > Feb 01 15:46:32 | set open fds limit to 8192/262144 > Feb 01 15:46:32 | loading /lib64/multipath/libchecktur.so checker > Feb 01 15:46:32 | checker tur: message table size = 3 > Feb 01 15:46:32 | loading /lib64/multipath/libprioconst.so prioritizer > Feb 01
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 3:10 PM Nir Soffer wrote: [snip] > So at the end I have the multipath.conf default file installed by vdsm > (so without the # PRIVATE line) > > and this in /etc/multipath/conf.d/eql.conf > > > > devices { > > device { > > vendor "EQLOGIC" > > product "100E-00" > > Ben, why is this device missing from multipath builtin devices? > I was using Equallogic kind of storage since oVirt 3.6, so CentoOS/RHEL 6, and it has never been inside the multipath database as far as I remember. But I don't know why. The parameters I put was from latest EQL best practices, but they was updated at CentOS 7 time. I would like to use the same parameters in CentOS 8 now and see if they works ok. PS line of EQL is somehow deprecated (in the sense of no new features and so on..) but anyway still supported > > > path_selector "round-robin 0" > > path_grouping_policymultibus > > path_checkertur > > rr_min_io_rq10 > > rr_weight priorities > > failbackimmediate > > features"0" > > This is never needed, multipath generates this value. > Those were the recommended values from EQL Latest is dated April 2016 when 8 not out yet: http://downloads.dell.com/solutions/storage-solution-resources/(3199-CD-L)RHEL-PSseries-Configuration.pdf > > Ben: please correct me if needed > > > no_path_retry16 > > I'm don't think that you need this, since you should inherit the value > from vdsm > multipath.conf, either from the "defaults" section, or from the > "overrides" section. > > You must add no_path_retry here if you want to use another value, and you > don't > want to use vdsm default value. > You are right; I see the value of 16 both in defaults and overrides. But I put it also inside the device section during my tests in doubt it was not picked up in the hope to see similar output as in CentOS 7: 36090a0c8d04f2fc4251c7c08d0a3 dm-14 EQLOGIC ,100E-00 size=2.4T features='1 queue_if_no_path' hwhandler='0' wp=rw where you notice the hwhandler='0' Originally I remember the default value for no_path_retry was 4 but probably it has been changed in 4.4 to 16, correct? If I want to see the default that vdsm would create from scratch should I see inside /usr/lib/python3.6/site-packages/vdsm/tool/configurators/multipath.py of my version? On my system with vdsm-python-4.40.40-1.el8.noarch I have this inside that file _NO_PATH_RETRY = 16 > > Note that if you use your own value, you need to match it to sanlock > io_timeout. > See this document for more info: > https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md > > > } > Yes I set this: # cat /etc/vdsm/vdsm.conf.d/99-FooIO.conf # Configuration for FooIO storage. [sanlock] # Set renewal timeout to 80 seconds # (8 * io_timeout == 80). io_timeout = 10 And for another environment with Netapp MetroCluster and 2 different sites (I'm with RHV there...) I plan to set no_path_retry to 24 and io_timeout to 15, to manage disaster recovery scenarios and planned maintenance with Netapp node failover through sites taking potentially up to 120 seconds. > But still I see this > > > > # multipath -l > > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 16:0:0:0 sdc 8:32 active undef running > > `- 18:0:0:0 sde 8:64 active undef running > > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:0 sdb 8:16 active undef running > > `- 17:0:0:0 sdd 8:48 active undef running > > > > that lets me think I'm not using the no_path_retry setting, but > queue_if_no_path... I could be wrong anyway.. > > Not this is expected. What is means, if I understand multipath > behavior correctly, > that the device queue data for no_path_retry * polling_internal seconds > when all > paths failed. After that the device will fail all pending and new I/O > until at least > one path is recovered. > > > How to verify for sure (without dropping the paths, at least at the > moment) from the config? > > Any option with multipath and/or dmsetup commands? > > multipath show config -> find your device section, it will show the current > value for no_path_retry. > > Nir > > I would like just to be confident about the no_path_retry setting, because the multipath output, also with -v2, -v3, -v4 seems not so clear to me In 7 (as Benjamin suggested 4 years ago.. ;-) I have this: # multipath -r -v3 | grep no_path_retry Feb 01 15:45:27 | 36090a0d88034667163b315f8c906b0ac: no_path_retry = 4 (config file default) Feb 01 15:45:27 | 36090a0c8d04f2fc4251c7c08d0a3: no_path_retry = 4 (config file default) On CentOS 8.3 I get
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 1, 2021 at 1:55 PM Gianluca Cecchi wrote: > > On Sat, Jan 30, 2021 at 6:05 PM Strahil Nikolov wrote: >> >> So you created that extra conf with this content but it didn't work ? >> multipath -v4 could hint you why it was complaining. >> >> >> Best Regards, >> Strahil Nikolov >> > > Ok, I missed the surrounding root part > > devices { > > } It seems that we need more examples in multpath.conf file installed by vdsm. > Apparently "multipathd show config" didn't complain... > Now I put also that and it seems to work, thanks for pointing it > > So at the end I have the multipath.conf default file installed by vdsm (so > without the # PRIVATE line) > and this in /etc/multipath/conf.d/eql.conf > > devices { > device { > vendor "EQLOGIC" > product "100E-00" Ben, why is this device missing from multipath builtin devices? > path_selector "round-robin 0" > path_grouping_policymultibus > path_checkertur > rr_min_io_rq10 > rr_weight priorities > failbackimmediate > features"0" This is never needed, multipath generates this value. Ben: please correct me if needed > no_path_retry16 I'm don't think that you need this, since you should inherit the value from vdsm multipath.conf, either from the "defaults" section, or from the "overrides" section. You must add no_path_retry here if you want to use another value, and you don't want to use vdsm default value. Note that if you use your own value, you need to match it to sanlock io_timeout. See this document for more info: https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md > } > } > > Recreated initrd and rebooted the host and activated it without further > problems. > And "multipathd show config" confirms it. Yes, this is the recommended way to configure multipath, thanks Strahil for the good advice! > But still I see this > > # multipath -l > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 16:0:0:0 sdc 8:32 active undef running > `- 18:0:0:0 sde 8:64 active undef running > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:0 sdb 8:16 active undef running > `- 17:0:0:0 sdd 8:48 active undef running > > that lets me think I'm not using the no_path_retry setting, but > queue_if_no_path... I could be wrong anyway.. Not this is expected. What is means, if I understand multipath behavior correctly, that the device queue data for no_path_retry * polling_internal seconds when all paths failed. After that the device will fail all pending and new I/O until at least one path is recovered. > How to verify for sure (without dropping the paths, at least at the moment) > from the config? > Any option with multipath and/or dmsetup commands? multipath show config -> find your device section, it will show the current value for no_path_retry. Nir ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/DVZSNPVMPY5RNXUI5QN7A57H2IB34CUS/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Sat, Jan 30, 2021 at 6:05 PM Strahil Nikolov wrote: > So you created that extra conf with this content but it didn't work ? > multipath -v4 could hint you why it was complaining. > > > Best Regards, > Strahil Nikolov > > Ok, I missed the surrounding root part devices { } Apparently "multipathd show config" didn't complain... Now I put also that and it seems to work, thanks for pointing it So at the end I have the multipath.conf default file installed by vdsm (so without the # PRIVATE line) and this in /etc/multipath/conf.d/eql.conf devices { device { vendor "EQLOGIC" product "100E-00" path_selector "round-robin 0" path_grouping_policymultibus path_checkertur rr_min_io_rq10 rr_weight priorities failbackimmediate features"0" no_path_retry16 } } Recreated initrd and rebooted the host and activated it without further problems. And "multipathd show config" confirms it. But still I see this # multipath -l 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sdc 8:32 active undef running `- 18:0:0:0 sde 8:64 active undef running 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:0 sdb 8:16 active undef running `- 17:0:0:0 sdd 8:48 active undef running that lets me think I'm not using the no_path_retry setting, but queue_if_no_path... I could be wrong anyway.. How to verify for sure (without dropping the paths, at least at the moment) from the config? Any option with multipath and/or dmsetup commands? Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NVVH4PN6MWKB3O2MEX2CCOB6ZH2LJ4ID/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
So you created that extra conf with this content but it didn't work ?multipath -v4 could hint you why it was complaining. Best Regards,Strahil Nikolov devices { device { vendor "EQLOGIC" product "100E-00" path_selector "round-robin 0" path_grouping_policy multibus path_checker tur rr_min_io_rq 10 rr_weight priorities failback immediate features "0" no_path_retry 16 } } Sent from Yahoo Mail on Android On Sat, Jan 30, 2021 at 15:08, Gianluca Cecchi wrote: ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YREAJLZ5LGA75QIU7BPLRVYCAW6OUBDQ/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y3UU2SOPUJE45YATEFLWJQGHCLLIJNON/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Sat, Jan 30, 2021 at 1:44 PM Strahil Nikolov wrote: > > > I would recommend you to put your customizations at > '/etc/multipath/conf.d/.conf' (you have to create the dir), so > vdsm can keep multipath.conf up to date. > For details check the header of multipath.conf > > Here is mine: > # The recommended way to add configuration for your storage is to add a > # drop-in configuration file in "/etc/multipath/conf.d/.conf". > # Settings in drop-in configuration files override settings in this > # file. > > Of course you need to rebuild dracut. > > Hi and thanks. The dir was already there because oVirt had put the vdsm_blacklist.conf file blacklisting local disk: # This file is managed by vdsm, do not edit! # Any changes made to this file will be overwritten when running: # vdsm-tool config-lvm-filter blacklist { wwid "36d09466029914f0021e89c5710e256be" } I tried it, creating a 90-eql.conf file with only the device section (not indented.. correct?), but for some reason vdsm daemon didn't started correctly (restart too quickly) and the host kept staying in "non operational" mode So I reverted to a fully customized multipath.conf Not sure if I put anything inside /etc/multipath/conf.d/ if I have to mark anyway the multipath.conf as PRIVATE or not Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YREAJLZ5LGA75QIU7BPLRVYCAW6OUBDQ/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
В 12:45 +0100 на 30.01.2021 (сб), Gianluca Cecchi написа: > On Fri, Jan 29, 2021 at 12:00 PM Gianluca Cecchi < > gianluca.cec...@gmail.com> wrote: > > Hello,I'm upgrading some environments from 4.3 to 4.4.Storage > > domains are iSCSI based, connected to an Equallogic storage array > > (PS-6510ES), that is recognized such as this as vendor/product in > > relation to multipath configuration > > > > > [snip] > > In 8 I get this; see also the strange line about vendor or product > > missing, but it is not true... > > > > > > [snip] > The message > > "device config in /etc/multipath.conf missing vendor or product > parameter" > was due to the empty device section (actually containing lines with > comments)So it was solved removing the whole section. > > Now, my full multipath.conf is this > # head -2 /etc/multipath.conf > # VDSM REVISION 2.0 > # VDSM PRIVATE > > # cat /etc/multipath.conf | grep -v "^#" | grep -v "^#" | sed > '/^[[:space:]]*$/d' > defaults { > polling_interval5 > no_path_retry 16 > user_friendly_names no > flush_on_last_del yes > fast_io_fail_tmo5 > dev_loss_tmo30 > max_fds 8192 > } > blacklist { > wwid "36d09466029914f0021e89c5710e256be" > } > devices { > device { > vendor "EQLOGIC" > product "100E-00" > path_selector "round-robin 0" > path_grouping_policymultibus > path_checkertur > rr_min_io_rq10 > rr_weight priorities > failbackimmediate > features"0" > no_path_retry16 > } > } > overrides { > no_path_retry16 > } > > Rebuilt the initrd and put online I would recommend you to put your customizations at '/etc/multipath/conf.d/.conf' (you have to create the dir), so vdsm can keep multipath.conf up to date. For details check the header of multipath.conf Here is mine: # The recommended way to add configuration for your storage is to add a# drop-in configuration file in "/etc/multipath/conf.d/.conf".# Settings in drop-in configuration files override settings in this# file. Of course you need to rebuild dracut. > dracut -f /boot/$(imgbase layer --current)/initramfs-$(uname -r).img > cp -p /boot/$(imgbase layer --current)/initramfs-$(uname -r).img > /boot/ > > reboot the host and it seems ok with the command > > lsinitrd -f etc/multipath.conf /boot/initramfs-4.18.0- > 240.8.1.el8_3.x86_64.img > > and also confirmed by > > multipathd show config > > But anyway I see this > > # multipath -l > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 16:0:0:0 sdc 8:32 active undef running > `- 18:0:0:0 sde 8:64 active undef running > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:0 sdb 8:16 active undef running > `- 17:0:0:0 sdd 8:48 active undef running > > so in my opinion it is still using queue if no path... > > What else can I try to debug this? Or is the expected output in > CentOS 8? What is the command to verify no_path_retry is effectively > set for this device in CentOS8? > > On the host still in 7 I have this for the same two luns: > > # multipath -l > 36090a0c8d04f2fc4251c7c08d0a3 dm-14 EQLOGIC ,100E-00 > size=2.4T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 16:0:0:0 sdc 8:32 active undef running > `- 18:0:0:0 sde 8:64 active undef running > 36090a0d88034667163b315f8c906b0ac dm-13 EQLOGIC ,100E-00 > size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 15:0:0:0 sdb 8:16 active undef running > `- 17:0:0:0 sdd 8:48 active undef running > > Thanks > Gianluca > Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QQJDZ7ZSDT3UYFF4M64JVZN7NSDKN5KM/
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Fri, Jan 29, 2021 at 12:00 PM Gianluca Cecchi wrote: > Hello, > I'm upgrading some environments from 4.3 to 4.4. > Storage domains are iSCSI based, connected to an Equallogic storage array > (PS-6510ES), that is recognized such as this as vendor/product in relation > to multipath configuration > > [snip] > > In 8 I get this; see also the strange line about vendor or product > missing, but it is not true... > > [snip] The message "device config in /etc/multipath.conf missing vendor or product parameter" was due to the empty device section (actually containing lines with comments) So it was solved removing the whole section. Now, my full multipath.conf is this # head -2 /etc/multipath.conf # VDSM REVISION 2.0 # VDSM PRIVATE # cat /etc/multipath.conf | grep -v "^#" | grep -v "^#" | sed '/^[[:space:]]*$/d' defaults { polling_interval5 no_path_retry 16 user_friendly_names no flush_on_last_del yes fast_io_fail_tmo5 dev_loss_tmo30 max_fds 8192 } blacklist { wwid "36d09466029914f0021e89c5710e256be" } devices { device { vendor "EQLOGIC" product "100E-00" path_selector "round-robin 0" path_grouping_policymultibus path_checkertur rr_min_io_rq10 rr_weight priorities failbackimmediate features"0" no_path_retry16 } } overrides { no_path_retry16 } Rebuilt the initrd and put online dracut -f /boot/$(imgbase layer --current)/initramfs-$(uname -r).img cp -p /boot/$(imgbase layer --current)/initramfs-$(uname -r).img /boot/ reboot the host and it seems ok with the command lsinitrd -f etc/multipath.conf /boot/initramfs-4.18.0-240.8.1.el8_3.x86_64.img and also confirmed by multipathd show config But anyway I see this # multipath -l 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sdc 8:32 active undef running `- 18:0:0:0 sde 8:64 active undef running 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:0 sdb 8:16 active undef running `- 17:0:0:0 sdd 8:48 active undef running so in my opinion it is still using queue if no path... What else can I try to debug this? Or is the expected output in CentOS 8? What is the command to verify no_path_retry is effectively set for this device in CentOS8? On the host still in 7 I have this for the same two luns: # multipath -l 36090a0c8d04f2fc4251c7c08d0a3 dm-14 EQLOGIC ,100E-00 size=2.4T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sdc 8:32 active undef running `- 18:0:0:0 sde 8:64 active undef running 36090a0d88034667163b315f8c906b0ac dm-13 EQLOGIC ,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 15:0:0:0 sdb 8:16 active undef running `- 17:0:0:0 sdd 8:48 active undef running Thanks Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6TKCMSYOKEJVKO54Q2B3ADIRU6RLYCUF/