[ovirt-users] Re: Multipath flapping with SAS via FCP
On Fri, Feb 26, 2021 at 05:29:50PM +0200, Nir Soffer wrote: > On Fri, Feb 26, 2021 at 12:07 PM Benoit Chatelain wrote: > > > > Hi Nir Soffer, > > Thank for your reply > > > > Indeed, the device fails immediately after it was reinstated. > > > > There is my 'multipathd show config' dump : > > > > defaults { > > verbosity 2 > > polling_interval 5 > > max_polling_interval 20 > > reassign_maps "no" > > multipath_dir "/lib64/multipath" > > path_selector "service-time 0" > > path_grouping_policy "failover" > > uid_attribute "ID_SERIAL" > > prio "const" > > prio_args "" > > features "0" > > path_checker "tur" > > alias_prefix "mpath" > > failback "manual" > > rr_min_io 1000 > > rr_min_io_rq 1 > > max_fds 4096 > > rr_weight "uniform" > > no_path_retry 16 > > queue_without_daemon "no" > > flush_on_last_del "yes" > > user_friendly_names "no" > > fast_io_fail_tmo 5 > > dev_loss_tmo 60 > > bindings_file "/etc/multipath/bindings" > > wwids_file "/etc/multipath/wwids" > > prkeys_file "/etc/multipath/prkeys" > > log_checker_err always > > all_tg_pt "no" > > retain_attached_hw_handler "yes" > > detect_prio "yes" > > detect_checker "yes" > > force_sync "no" > > strict_timing "no" > > deferred_remove "no" > > config_dir "/etc/multipath/conf.d" > > delay_watch_checks "no" > > delay_wait_checks "no" > > san_path_err_threshold "no" > > san_path_err_forget_rate "no" > > san_path_err_recovery_time "no" > > marginal_path_err_sample_time "no" > > marginal_path_err_rate_threshold "no" > > marginal_path_err_recheck_gap_time "no" > > marginal_path_double_failed_time "no" > > find_multipaths "on" > > uxsock_timeout 4000 > > retrigger_tries 3 > > retrigger_delay 10 > > missing_uev_wait_timeout 30 > > skip_kpartx "no" > > disable_changed_wwids ignored > > remove_retries 0 > > ghost_delay "no" > > find_multipaths_timeout -10 > > enable_foreign "" > > marginal_pathgroups "no" > > } > > blacklist { > > devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])" > > wwid "36f402700f232e40026b41bd43a0812e5" > > protocol "(scsi:adt|scsi:sbp)" > > ... > > } > > blacklist_exceptions { > > protocol "scsi:sas" > > } > > devices { > > ... > > device { > > vendor "COMPELNT" > > product "Compellent Vol" > > path_grouping_policy "multibus" > > no_path_retry "queue" > > } > > ... > > } > > overrides { > > no_path_retry 16 > > } > > > > And there is my scsi disks (sdb & sdc disks) : > > > > [root@anarion-adm ~]# lsscsi -l > > [0:2:0:0]diskDELL PERC H330 Adp4.30 /dev/sda > > state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 > > timeout=90 > > [1:0:0:2]diskCOMPELNT Compellent Vol 0704 /dev/sdb > > state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 > > timeout=30 > > [1:0:1:2]diskCOMPELNT Compellent Vol 0704 /dev/sdc > > state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 > > timeout=30 > > > > > > My disk configuration is present in multipath, and the DELLEMC > > documentation & white paper don't specifying exotics configuration for > > multipathd. (I'm wrong ?) > > > > I looked modules for SAS & FCP driver, they look good : > > > > [root@anarion-adm ~]# lsmod | grep sas > > mpt3sas 303104 4 > > raid_class 16384 1 mpt3sas > > megaraid_sas 172032 2 > > scsi_transport_sas 45056 1 mpt3sas > > > > [root@anarion-adm ~]# lsmod | grep fc > > bnx2fc110592 0 > > cnic 69632 1 bnx2fc > > libfcoe77824 2 qedf,bnx2fc > > libfc 147456 3 qedf,bnx2fc,libfcoe > > scsi_transport_fc 69632 3 qedf,libfc,bnx2fc > > > > Do you think my device is misconfigured? should I check on the vendor side? > > Another idea ? :) > > I guess this is related to exposing a FC device via SAS, not sure how > this is done > and why. > > I hope Ben (RHEL multipath maintainer) can help with this. The issue here is that the device appears to be advertising itself as ready when it responds to the SCSI Test Unit Ready command. However, it is not actually able to handle IO sent to it. To make multipath stop flapping, you can add path_checker directio to the devices configuration for this device. That will make multipath send a read request to the device to determine if it is usable. This should fail, meaning that the device will stay in the failed state. As to why the device isn't able to handle IO, looking at the log messages: F
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 01, 2021 at 03:47:32PM +0100, Gianluca Cecchi wrote: > On Mon, Feb 1, 2021 at 3:10 PM Nir Soffer wrote: > [snip] > > > So at the end I have the multipath.conf default file installed by vdsm > > (so without the # PRIVATE line) > > > and this in /etc/multipath/conf.d/eql.conf > > > > > > devices { > > > device { > > > vendor "EQLOGIC" > > > product "100E-00" > > > > Ben, why is this device missing from multipath builtin devices? > > > > I was using Equallogic kind of storage since oVirt 3.6, so CentoOS/RHEL 6, > and it has never been inside the multipath database as far as I remember. > But I don't know why. > The parameters I put was from latest EQL best practices, but they was > updated at CentOS 7 time. > I would like to use the same parameters in CentOS 8 now and see if they > works ok. > PS line of EQL is somehow deprecated (in the sense of no new features and > so on..) but anyway still supported > > > > > > > path_selector "round-robin 0" > > > path_grouping_policymultibus > > > path_checkertur > > > rr_min_io_rq10 > > > rr_weight priorities > > > failbackimmediate > > > features"0" > > > > This is never needed, multipath generates this value. > > > > Those were the recommended values from EQL > Latest is dated April 2016 when 8 not out yet: > http://downloads.dell.com/solutions/storage-solution-resources/(3199-CD-L)RHEL-PSseries-Configuration.pdf > Thanks for this link. I'll add this to the default built-ins. -Ben > > > > > > Ben: please correct me if needed > > > > > no_path_retry16 > > > > I'm don't think that you need this, since you should inherit the value > > from vdsm > > multipath.conf, either from the "defaults" section, or from the > > "overrides" section. > > > > You must add no_path_retry here if you want to use another value, and you > > don't > > want to use vdsm default value. > > > > You are right; I see the value of 16 both in defaults and overrides. But I > put it also inside the device section during my tests in doubt it was not > picked up in the hope to see similar output as in CentOS 7: > > 36090a0c8d04f2fc4251c7c08d0a3 dm-14 EQLOGIC ,100E-00 > size=2.4T features='1 queue_if_no_path' hwhandler='0' wp=rw > > where you notice the hwhandler='0' > > Originally I remember the default value for no_path_retry was 4 but > probably it has been changed in 4.4 to 16, correct? > If I want to see the default that vdsm would create from scratch should I > see inside > /usr/lib/python3.6/site-packages/vdsm/tool/configurators/multipath.py of my > version? > On my system with vdsm-python-4.40.40-1.el8.noarch I have this inside that > file > _NO_PATH_RETRY = 16 > > > > > > > Note that if you use your own value, you need to match it to sanlock > > io_timeout. > > See this document for more info: > > https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md > > > > > } > > > > Yes I set this: > > # cat /etc/vdsm/vdsm.conf.d/99-FooIO.conf > # Configuration for FooIO storage. > > [sanlock] > # Set renewal timeout to 80 seconds > # (8 * io_timeout == 80). > io_timeout = 10 > > And for another environment with Netapp MetroCluster and 2 different sites > (I'm with RHV there...) I plan to set no_path_retry to 24 and io_timeout to > 15, to manage disaster recovery scenarios and planned maintenance with > Netapp node failover through sites taking potentially up to 120 seconds. > > > But still I see this > > > > > > # multipath -l > > > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > > > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > > `-+- policy='round-robin 0' prio=0 status=active > > > |- 16:0:0:0 sdc 8:32 active undef running > > > `- 18:0:0:0 sde 8:64 active undef running > > > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > > > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > > `-+- policy='round-robin 0' prio=0 status=active > > > |- 15:0:0:0 sdb 8:16 active undef running > > > `- 17:0:0:0 sdd 8:48 active undef running > > > > > > that lets me think I'm not using the no_path_retry setting, but > > queue_if_no_path... I could be wrong anyway.. > > > > Not this is expected. What is means, if I understand multipath > > behavior correctly, > > that the device queue data for no_path_retry * polling_internal seconds > > when all > > paths failed. After that the device will fail all pending and new I/O > > until at least > > one path is recovered. > > > > > How to verify for sure (without dropping the paths, at least at the > > moment) from the config? > > > Any option with multipath and/or dmsetup commands? > > > > multipath show config -> find your device section, it will show the current > > value for no_path_retry. > > > > Nir > > > > > I would like just to be confident about th
[ovirt-users] Re: ovirt 4.4 and CentOS 8 and multipath with Equallogic
On Mon, Feb 01, 2021 at 04:10:02PM +0200, Nir Soffer wrote: > On Mon, Feb 1, 2021 at 1:55 PM Gianluca Cecchi > wrote: > > > > On Sat, Jan 30, 2021 at 6:05 PM Strahil Nikolov > > wrote: > >> > >> So you created that extra conf with this content but it didn't work ? > >> multipath -v4 could hint you why it was complaining. > >> > >> > >> Best Regards, > >> Strahil Nikolov > >> > > > > Ok, I missed the surrounding root part > > > > devices { > > > > } > > It seems that we need more examples in multpath.conf file > installed by vdsm. > > > Apparently "multipathd show config" didn't complain... > > Now I put also that and it seems to work, thanks for pointing it > > > > So at the end I have the multipath.conf default file installed by vdsm (so > > without the # PRIVATE line) > > and this in /etc/multipath/conf.d/eql.conf > > > > devices { > > device { > > vendor "EQLOGIC" > > product "100E-00" > > Ben, why is this device missing from multipath builtin devices? > We usually get these from the device vendors themselves, or at least get their blessing. But it's cerainly possible to add configs even without the vendor's blessing. > > path_selector "round-robin 0" > > path_grouping_policymultibus > > path_checkertur > > rr_min_io_rq10 > > rr_weight priorities > > failbackimmediate > > features"0" > > This is never needed, multipath generates this value. > > Ben: please correct me if needed That's not exactly correct. Multipath will override the "queue_if_no_path" and "retain_attached_hw_handler" featurs, based on the values of the retain_attached_hw_handler and no_path_retry options. It will leave any other features alone. So, features like "pg_init_retries" and "pg_init_delay_seconds" still need to be specified using the features option. But you certianly don't ever need to specify a blank featues line, not that it hurts anything. > > > no_path_retry16 > > I'm don't think that you need this, since you should inherit the value from > vdsm > multipath.conf, either from the "defaults" section, or from the > "overrides" section. > > You must add no_path_retry here if you want to use another value, and you > don't > want to use vdsm default value. > > Note that if you use your own value, you need to match it to sanlock > io_timeout. > See this document for more info: > https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md > > > } > > } > > > > Recreated initrd and rebooted the host and activated it without further > > problems. > > And "multipathd show config" confirms it. > > Yes, this is the recommended way to configure multipath, thanks Strahil for > the > good advice! > > > But still I see this > > > > # multipath -l > > 36090a0c8d04f2fc4251c7c08d0a3 dm-13 EQLOGIC,100E-00 > > size=2.4T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 16:0:0:0 sdc 8:32 active undef running > > `- 18:0:0:0 sde 8:64 active undef running > > 36090a0d88034667163b315f8c906b0ac dm-12 EQLOGIC,100E-00 > > size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > `-+- policy='round-robin 0' prio=0 status=active > > |- 15:0:0:0 sdb 8:16 active undef running > > `- 17:0:0:0 sdd 8:48 active undef running > > > > that lets me think I'm not using the no_path_retry setting, but > > queue_if_no_path... I could be wrong anyway.. > > Not this is expected. What is means, if I understand multipath > behavior correctly, > that the device queue data for no_path_retry * polling_internal seconds when > all > paths failed. After that the device will fail all pending and new I/O > until at least > one path is recovered. Correct. This is just showing you the current configuration that the kernel is using. The kernel doesn't know anything about no_path_retry. That is something that multipathd uses internally. It just switches on and off the queue_if_no_path feature at the appropriate time. Currently, you have usable paths, and so the queue_if_no_path features should obviously be on. Once you've lost all your paths, and they've stayed down path the no_path_retry limit, multipathd will remove this feature from the kernel device. When a path is then restored, it will add the feature back. > > How to verify for sure (without dropping the paths, at least at the moment) > > from the config? > > Any option with multipath and/or dmsetup commands? > > multipath show config -> find your device section, it will show the current > value for no_path_retry. > > Nir ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/communit
Re: [ovirt-users] VM has been paused due to storage I/O problem
On Fri, Feb 03, 2017 at 12:31:49AM +0100, Gianluca Cecchi wrote: >On Thu, Feb 2, 2017 at 10:53 PM, Benjamin Marzinski ><[1]bmarz...@redhat.com> wrote: > > > > I'm trying to mitigate inserting a timeout for my SAN devices but > I'm not > > > sure of its effectiveness as CentOS 7 behavior of "multipathd -k" > and then > > > "show config" seems different from CentOS 6.x > > > In fact my attempt for multipath.conf is this > > There was a significant change in how multipath deals with merging > device configurations between RHEL6 and RHEL7. The short answer is, as > long as you copy the entire existing configuration, and just change what > you want changed (like you did), you can ignore the change. Also, > multipath doesn't care if you quote numbers. > > If you want to verify that no_path_retry is being set as intented, you > can run: > > # multipath -r -v3 | grep no_path_retry > >Hi Benjamin, >thank you very much for the explanations, especially the long one ;-) >I tried and confirmed that I has no_path_retry = 4 as expected >The regex matching is only for merge, correct? No. Both RHEL6 and RHEL7 use regex matching to determine which device configuration to use with your device, otherwise product "^1814" would never match any device, since there is no array with a literal product string of "^1814". RHEL7 also uses the same regex matching to determine which builtin device configuration a user-supplied device configuration should modify. RHEL6 uses string matching for this. >So in your example if in RH EL 7 I put this > device { > vendor "IBM" > product "^1814" > no_path_retry 12 > } >It would not match for merging, but it would match for applying to my >device (because it is put at the end of config read backwards). correct. The confusing point is that in the merging case, "^1814" in the user-supplied configuration is being treaded as a string that needs to regex match the regular expression "^1814" in the builtin configuration. These don't match. For matching the device configuration to the device, "^1814" in the user-supplied configuration is being treated as a regular expression that needs to regex match the actual product string of the device. >And it would apply only the no_path_retry setting, while all other ones >would not be picked from builtin configuration for device, but from >defaults in general. >So for example it would set path_checker not this way: >path_checker "rdac" >but this way: >path_checker "directio" >that is default.. >correct? exactly. -Ben > References > >Visible links >1. mailto:bmarz...@redhat.com ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VM has been paused due to storage I/O problem
On Wed, Feb 01, 2017 at 09:39:45AM +0200, Nir Soffer wrote: > On Tue, Jan 31, 2017 at 6:09 PM, Gianluca Cecchi > wrote: > > On Tue, Jan 31, 2017 at 3:23 PM, Nathanaël Blanchet > > wrote: > >> > >> exactly the same issue by there with FC EMC domain storage... > >> > >> > > > > I'm trying to mitigate inserting a timeout for my SAN devices but I'm not > > sure of its effectiveness as CentOS 7 behavior of "multipathd -k" and then > > "show config" seems different from CentOS 6.x > > In fact my attempt for multipath.conf is this There was a significant change in how multipath deals with merging device configurations between RHEL6 and RHEL7. The short answer is, as long as you copy the entire existing configuration, and just change what you want changed (like you did), you can ignore the change. Also, multipath doesn't care if you quote numbers. If you want to verify that no_path_retry is being set as intented, you can run: # multipath -r -v3 | grep no_path_retry To reload you multipath devices with verbosity turned up. You shoudl see lines like: Feb 02 09:38:30 | mpatha: no_path_retry = 12 (controller setting) That will tell you what no_path_retry is set to. The configuration Nir suggested at the end of this email looks good to me. Now, here's the long answer: multipath allows you to merge device configurations. This means that as long as you put in the "vendor" and "product" strings, you only need to set the other values that you care about. On RHEL6, this would work device { vendor "IBM" product "^1814" no_path_retry 12 } And it would create a configuration that was exactly the same as the builtin config for this device, except that no_path_retry was set to 12. However, this wasn't as easy for users as it was supposed to be. Specifically, users would often add their device's vendor and product information, as well as whatever they wanted changed, and then be surprised when multipath didn't retain all the information from the builtin configuration as advertised. This is because they used the actual vendor and product strings for their device, but the builtin device configuration's vendor and product strings were regexes. In RHEL6, multipath only merged configurations if the vendor and product strings string matched. So users would try device { vendor "IBM" product "1814 FASt" no_path_retry 12 } and it wouldn't work as expected, since the product strings didn't match. To fix this, when RHEL7 checks if a user configuration should be merged with a builtin configuration, all that is required is that the user configuration's vendor and product strings regex match the builtin. This means that the above configuration will work as expected in RHEL7. However the first configuration won't because "^1814" doesn't regex match "^1814". This means that multipath would treat is as a completely new configuration, and not merge any values from the builtin configuration. You can reenable the RHEL6 behaviour in RHEL7 by setting hw_str_match yes in the defaults section. Now, because the builtin configurations could handle more than one device type per configuration, since they used regexes to match the vendor and product strings, multipath couldn't just remove the original builtin configuration when users added a new configuration that modified it. Otherwise, devices that regex matched the builtin configuration's vendor and product strings but not the user configuration's vendor and product strings wouldn't have any device configuration information. So multipath keeps the original builtin configuration as well as the new one. However, when it's time to assign a device configuration to a device, multipath looks through the device configurations list backwards, and finds the first match. This means that it will always use the user configuration instead of the builtin one (since new configurations get added to the end of the list). Like I said before, if you add all the values you want set in your configuration, instead of relying on them being merged from the builtin configuration, then you don't need to worry about any of this. -Ben > > > > # VDSM REVISION 1.3 > > # VDSM PRIVATE > > > > defaults { > > polling_interval5 > > no_path_retry fail > > user_friendly_names no > > flush_on_last_del yes > > fast_io_fail_tmo5 > > dev_loss_tmo30 > > max_fds 4096 > > } > > > > # Remove devices entries when overrides section is available. > > devices { > > device { > > # These settings overrides built-in devices settings. It does not > > apply > > # to devices without built-in settings (these use the settings in > > the > > # "defaults" section), or to devices defined in the "devices" > > section. > > # Note: This is not available yet
Re: [ovirt-users] What recovers a VM from pause?
On Mon, May 30, 2016 at 10:09:25PM +0300, Nir Soffer wrote: > On Mon, May 30, 2016 at 4:07 PM, Nicolas Ecarnot wrote: > > Hello, > > > > We're planning a move from our old building towards a new one a few meters > > away. > > > > > > > > In a similar way of Martijn > > (https://www.mail-archive.com/users@ovirt.org/msg33182.html), I have > > maintenance planed on our storage side. > > > > Say an oVirt DC is using a SAN's LUN via iSCSI (Equallogic). > > This SAN allows me to setup block replication between two SANs, seen by > > oVirt as one (Dell is naming it SyncRep). > > Then switch all the iSCSI accesses to the replicated LUN. > > > > When doing this, the iSCSI stack of each oVirt host notices the > > de-connection, tries to reconnect, and succeeds. > > Amongst our hosts, this happens between 4 and 15 seconds. > > > > When this happens fast enough, oVirt engine and the VMs don't even notice, > > and they keep running happily. > > > > When this takes more than 4 seconds, there are 2 cases : > > > > 1 - The hosts and/or oVirt and/or the SPM (I actually don't know) notices > > that there is a storage failure, and pauses the VMs. > > When the iSCSI stack reconnects, the VMs are automatically recovered from > > pause, and this all takes less than 30 seconds. That is very acceptable for > > us, as this action is extremely rare. > > > > 2 - Same storage failure, VMs paused, and some VMs stay in pause mode > > forever. > > Manual "run" action is mandatory. > > When done, everything recovers correctly. > > This is also quite acceptable, but here come my questions : > > > > My questions : (!) > > - *WHAT* process or piece of code or what oVirt parts is responsible for > > deciding when to UN-pause a VM, and at what conditions? > > Vms get paused by qemu, when you get ENOSPC or some other IO error. > This probably happens when a vm is writing to storage, and all paths to > storage > are faulty - with current configuration, the scsi layer will fail > after 5 seconds, > and if no path is available, the write will fail. > > If vdsm storage monitoring system detected the issue, the storage domain > will become invalid. When the storage domain will become valid again, we > try to resume all vms paused because of IO errors. > > Storage monitoring is done every 10 seconds in normal conditions, but in > current release, there can be delays of up to couple of minutes in > extreme conditions, > for example, 50 storage domains and doing lot of io. So basically, the > storage domain > monitor may miss an error on storage, never become invalid, and would > never become valid again and the vm will have to be resumed manually. > See https://bugzilla.redhat.com/1081962 > > In ovirt 4.0 monitoring should be improved, and will always monitor > storage every > 10 seconds, but even this cannot guarantee that we will detect all > storage errors > For example, if the storage outage is shorter then 10 seconds. But I > guess that chance > that storage outage was shorter then 10 seconds, but long enough to cause a vm > to pause is very low. > > > That would help me to understand why some cases are working even more > > smoothly than others. > > - Are there related timeouts I could play with in engine-config options? > > Nothing on the engine side... > > > - [a bit off-topic] Is it safe to increase some iSCSI timeouts of > > buffer-sizes in the hope this kind of disconnection would get un-noticed? > > But you may modify multipath configuration on the host. > > We use now this multipath configuration (/etc/multipath.conf): > > # VDSM REVISION 1.3 > > defaults { > polling_interval5 > no_path_retry fail > user_friendly_names no > flush_on_last_del yes > fast_io_fail_tmo5 > dev_loss_tmo30 > max_fds 4096 > deferred_remove yes > } > > devices { > device { > all_devsyes > no_path_retry fail > } > } > > This enforces failing of io request on devices that by default will queue such > requests for long or unlimited time. Queuing requests is very bad for vdsm, > and > cause various commands to block for minutes during storage outage, > failing various > flows in vdsm and the ui. > See https://bugzilla.redhat.com/880738 > > However, in your case, using queuing may be the best way to do the switch > from one storage to another in the smoothest way. > > You may try this setting: > > devices { > device { > all_devsyes > no_path_retry 30 > } > } > > This will queue io requests for 30 seconds before failing. > Using this normally would be a bad idea with vdsm, since during storage > outage, > vdsm may block for 30 seconds when no paths is available, and is not designed > for this behavior, but blocking from time to time for short time should be ok. > > I think that modifying the configuration and reloadin
Re: [ovirt-users] Update to 3.5.1 scrambled multipath.conf?
On Mon, Jan 26, 2015 at 10:27:23AM -0500, Fabian Deutsch wrote: > > > - Original Message - > > - Original Message - > > > From: "Dan Kenigsberg" > > > To: "Gianluca Cecchi" , nsof...@redhat.com > > > Cc: "users" , ykap...@redhat.com > > > Sent: Monday, January 26, 2015 2:09:23 PM > > > Subject: Re: [ovirt-users] Update to 3.5.1 scrambled multipath.conf? > > > > > > On Sat, Jan 24, 2015 at 12:59:01AM +0100, Gianluca Cecchi wrote: > > > > Hello, > > > > on my all-in-one installation @home I had 3.5.0 with F20. > > > > Today I updated to 3.5.1. > > > > > > > > it seems it modified /etc/multipath.conf preventing me from using my > > > > second > > > > disk at all... > > > > > > > > My system has internal ssd disk (sda) for OS and one local storage > > > > domain > > > > and another disk (sdb) with some partitions (on one of them there is > > > > also > > > > another local storage domain). > > > > > > > > At reboot I was put in emergency boot because partitions at sdb disk > > > > could > > > > not be mounted (they were busy). > > > > it took me some time to understand that the problem was due to sdb gone > > > > managed as multipath device and so busy for partitions to be mounted. > > > > > > > > Here you can find how multipath became after update and reboot > > > > https://drive.google.com/file/d/0BwoPbcrMv8mvS0FkMnNyMTdVTms/view?usp=sharing > > > > > > > > No device-mapper-multipath update in yum.log > > > > > > > > Also it seems that after changing it, it was then reverted at boot again > > > > (I > > > > don't know if the responsible was initrd/dracut or vdsmd) so in the mean > > > > time the only thing I could do was to make the file immutable with > > > > > > > > chattr +i /etc/multipath.conf > > > > > > The "supported" method of achieving this is to place "# RHEV PRIVATE" in > > > the second line of your hand-modified multipath.conf > > > > > > I do not understand why this has happened only after upgrade to 3.5.1 - > > > 3.5.0's should have reverted you multipath.conf just as well during each > > > vdsm startup. > > > > > > The good thing is that this annoying behavior has been dropped from the > > > master branch, so that 3.6 is not going to have it. Vdsm is not to mess > > > with other services config file while it is running. The logic moved to > > > `vdsm-tool configure` > > > > > > > > > > > and so I was able to reboot and verify that my partitions on sdb were ok > > > > and I was able to mount them (for safe I also ran an fsck against them) > > > > > > > > Update ran around 19:20 and finished at 19:34 > > > > here the log in gzip format > > > > https://drive.google.com/file/d/0BwoPbcrMv8mvWjJDTXU1YjRWOFk/view?usp=sharing > > > > > > > > Reboot was done around 21:10-21:14 > > > > > > > > Here my /var/log/messages in gzip format, where you can see latest days. > > > > https://drive.google.com/file/d/0BwoPbcrMv8mvMm1ldXljd3hZWnM/view?usp=sharing > > > > > > > > > > > > Any suggestion appreciated. > > > > > > > > Current multipath.conf (where I also commented out the getuid_callout > > > > that > > > > is not used anymore): > > > > > > > > [root@tekkaman setup]# cat /etc/multipath.conf > > > > # RHEV REVISION 1.1 > > > > > > > > blacklist { > > > > devnode "^(sda|sdb)[0-9]*" > > > > } > > > > > > I think what happened is: > > > > 1. 3.5.1 had new multipath version > > 2. So vdsm upgraded the local file > > 3. blacklist above was removed > >(it should exists in /etc/multipath.bak) > > > > To prevent local changes, you have to mark the file as private > > as Dan suggests. > > > > Seems to be related to the find_multiapth = "yes" bug: > > https://bugzilla.redhat.com/show_bug.cgi?id=1173290 > > The symptoms above sound exactly liek this issue. > When find_multipahts is no (the default when the directive is not present) > I sthat all non-blacklisted devices are tried to get claimed, and this > happened above. > > Blacklisting the devices works, or adding "find_mutlipaths yes" should also > work, because > in that case only device which have more than one path (or are explicitly > named) will be > claimed by multipath. I would like to point out one issue. Once a device is claimed (even if find_multipaths wasn't set when it was claimed) it will get added to /etc/multipath/wwids. This means that if you have previously claimed a single path device, adding "find_multipaths yes" won't stop that device from being claimed in the fulture (since it is in the wwids file). You would need to either run: # multipath -w to remove the device's wwid from the wwids file, or run # multipath -W to reset the wwids file to only include the wwids of the current multipath devices (Obviously, you need to remove any devices that you don't want multipathed before your run this). > > My 2ct. > > - fabian > > > Ben, can you confirm that this is the same issue? Yeah, I think so. -Ben > > > > > > > > > > defaults { > > > > polling_interval5 > > > >
Re: [Users] default mutipath.conf config for fedora 18 invalid
On Fri, Jan 25, 2013 at 10:53:51PM +0200, Dan Kenigsberg wrote: > On Thu, Jan 24, 2013 at 10:44:48AM -0500, Yeela Kaplan wrote: > > Hi, > > I've tested the new patch on fedora 18 vdsm host (created iscsi storage > > domain, attached, activated) and it works well. > > Even though multipath.conf no longer uses getuid_callout to recognize the > > device's wwid, > > it still knows how to deal with the attribute's existence in the conf file > > when running multipath command (only output is to stdout which we don't use > > anyway, stderr empty and rc=0). > > The relevant patch is: http://gerrit.ovirt.org/#/c/10824/ > > Given your verification, and the fact that this patch is a step forward, > I've taken it into vdsm master and acked it for ovirt-3.2. I trust Ben > Marzinski to shout at us loudly if keeping the outdated verb is terribly > wrong. There's no harm at all in using invalid keywords in multipath.conf. It just prints a warning message. > > I'd expect to see a future patch, adding getuid_callout only for > multipath versions that actually need it. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users