Hi Nir, BMC - board management controller, in my case I have ilo. Yes I set up power management for all hosts - ovirt sees ilo status as ok. I use remote pdu to shutdown the port, after that happens the picture I attached. After I switch power port on, ovirt is able to read ilo status, sees that Linux is down and immediately switches the spm server.
On Mon, Apr 17, 2017 at 6:07 AM Nir Soffer <[email protected]> wrote: > On Mon, Apr 17, 2017 at 8:24 AM Konstantin Raskoshnyi <[email protected]> > wrote: > >> But actually, it didn't work well. After main SPM host went down I see >> this >> > [image: Screen Shot 2017-04-16 at 10.22.00 PM.png] >> > >> 2017-04-17 05:23:15,554Z ERROR >> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] >> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM Init: >> could not find reported vds or not up - pool: 'STG' vds_spm_id: '1' >> 2017-04-17 05:23:15,567Z INFO >> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] >> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] SPM >> selection - vds seems as spm 'tank5' >> 2017-04-17 05:23:15,567Z WARN >> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] >> (DefaultQuartzScheduler5) [4dcc033d-26bf-49bb-bfaa-03a970dbbec1] spm vds is >> non responsive, stopping spm selection. >> >> So that means only if BMC is up it's possible to automatically switch >> SPM host? >> > > BMC? > > If your SPM is no responsive, the system will try to fence it. Did you > configure power management for all hosts? did you check that it > work? How did you simulate non-responsive host? > > If power management is not configured or fail, the system cannot > move the spm to another host, unless you manually confirm that the > SPM host was rebooted. > > Nir > > >> >> Thanks >> >> On Sun, Apr 16, 2017 at 8:29 PM, Konstantin Raskoshnyi < >> [email protected]> wrote: >> >>> Oh, fence agent works fine if I select ilo4, >>> Thank you for your help! >>> >>> On Sun, Apr 16, 2017 at 8:22 PM Dan Yasny <[email protected]> wrote: >>> >>>> On Sun, Apr 16, 2017 at 11:19 PM, Konstantin Raskoshnyi < >>>> [email protected]> wrote: >>>> >>>>> Makes sense. >>>>> I was trying to set it up, but doesn't work with our staging hardware. >>>>> We have old ilo100, I'll try again. >>>>> Thanks! >>>>> >>>>> >>>> It is absolutely necessary for any HA to work properly. There's of >>>> course the "confirm host has been shutdown" option, which serves as an >>>> override for the fence command, but it's manual >>>> >>>> >>>>> On Sun, Apr 16, 2017 at 8:18 PM Dan Yasny <[email protected]> wrote: >>>>> >>>>>> On Sun, Apr 16, 2017 at 11:15 PM, Konstantin Raskoshnyi < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Fence agent under each node? >>>>>>> >>>>>> >>>>>> When you configure a host, there's the power management tab, where >>>>>> you need to enter the bmc details for the host. If you don't have fencing >>>>>> enabled, how do you expect the system to make sure a host running a >>>>>> service >>>>>> is actually down (and it is safe to start HA services elsewhere), and >>>>>> not, >>>>>> for example, just unreachable by the engine? How do you avoid a >>>>>> splitbraid >>>>>> -> SBA ? >>>>>> >>>>>> >>>>>>> >>>>>>> On Sun, Apr 16, 2017 at 8:14 PM Dan Yasny <[email protected]> wrote: >>>>>>> >>>>>>>> On Sun, Apr 16, 2017 at 11:13 PM, Konstantin Raskoshnyi < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> "Corner cases"? >>>>>>>>> I tried to simulate crash of SPM server and ovirt kept trying to >>>>>>>>> reistablished connection to the failed node. >>>>>>>>> >>>>>>>> >>>>>>>> Did you configure fencing? >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Apr 16, 2017 at 8:10 PM Dan Yasny <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Sun, Apr 16, 2017 at 7:29 AM, Nir Soffer <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Sun, Apr 16, 2017 at 2:05 PM Dan Yasny <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Apr 16, 2017 7:01 AM, "Nir Soffer" <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Apr 16, 2017 at 4:17 AM Dan Yasny <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> When you set up a storage domain, you need to specify a host >>>>>>>>>>>>> to perform the initial storage operations, but once the SD is >>>>>>>>>>>>> defined, it's >>>>>>>>>>>>> details are in the engine database, and all the hosts get >>>>>>>>>>>>> connected to it >>>>>>>>>>>>> directly. If the first host you used to define the SD goes down, >>>>>>>>>>>>> all other >>>>>>>>>>>>> hosts will still remain connected and work. SPM is an HA service, >>>>>>>>>>>>> and if >>>>>>>>>>>>> the current SPM host goes down, SPM gets started on another host >>>>>>>>>>>>> in the DC. >>>>>>>>>>>>> In short, unless your actual NFS exporting host goes down, there >>>>>>>>>>>>> is no >>>>>>>>>>>>> outage. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> There is no storage outage, but if you shutdown the spm host, >>>>>>>>>>>> the spm host >>>>>>>>>>>> will not move to a new host until the spm host is online again, >>>>>>>>>>>> or you confirm >>>>>>>>>>>> manually that the spm host was rebooted. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In a properly configured setup the SBA should take care of >>>>>>>>>>>> that. That's the whole point of HA services >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In some cases like power loss or hardware failure, there is no >>>>>>>>>>> way to start >>>>>>>>>>> the spm host, and the system cannot recover automatically. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> There are always corner cases, no doubt. But in a normal >>>>>>>>>> situation. where an SPM host goes down because of a hardware >>>>>>>>>> failure, it >>>>>>>>>> gets fenced, other hosts contend for SPM and start it. No surprises >>>>>>>>>> there. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Nir >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Nir >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Apr 15, 2017 at 1:53 PM, Konstantin Raskoshnyi < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Fernando, >>>>>>>>>>>>>> I see each host has direct connection nfs mount, but yes, if >>>>>>>>>>>>>> main host to which I connected nfs storage going down the >>>>>>>>>>>>>> storage becomes >>>>>>>>>>>>>> unavailable and all vms are down >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Apr 15, 2017 at 10:37 AM FERNANDO FREDIANI < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Konstantin. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That doesn`t make much sense make a whole cluster depend on >>>>>>>>>>>>>>> a single host. From what I know any host talk directly to NFS >>>>>>>>>>>>>>> Storage Array >>>>>>>>>>>>>>> or whatever other Shared Storage you have. >>>>>>>>>>>>>>> Have you tested that host going down if that affects the >>>>>>>>>>>>>>> other with the NFS mounted directlly in a NFS Storage array ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fernando >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-04-15 12:42 GMT-03:00 Konstantin Raskoshnyi < >>>>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In ovirt you have to attach storage through specific host. >>>>>>>>>>>>>>>> If host goes down storage is not available. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, Apr 15, 2017 at 7:31 AM FERNANDO FREDIANI < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Well, make it not go through host1 and dedicate a storage >>>>>>>>>>>>>>>>> server for running NFS and make both hosts connect to it. >>>>>>>>>>>>>>>>> In my view NFS is much easier to manage than any other >>>>>>>>>>>>>>>>> type of storage, specially FC and iSCSI and performance is >>>>>>>>>>>>>>>>> pretty much the >>>>>>>>>>>>>>>>> same, so you won`t get better results other than management >>>>>>>>>>>>>>>>> going to other >>>>>>>>>>>>>>>>> type. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Fernando >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2017-04-15 5:25 GMT-03:00 Konstantin Raskoshnyi < >>>>>>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>>>>>>> I have one nfs storage, >>>>>>>>>>>>>>>>>> it's connected through host1. >>>>>>>>>>>>>>>>>> host2 also has access to it, I can easily migrate >>>>>>>>>>>>>>>>>> vms between them. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The question is - if host1 is down - all infrastructure >>>>>>>>>>>>>>>>>> is down, since all traffic goes through host1, >>>>>>>>>>>>>>>>>> is there any way in oVirt to use redundant storage? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Only glusterfs? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Users mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

