[ovirt-users] Re: better understand ovirt-engine functions

2018-08-07 Thread stuartk
For future readers, here is supporting material:

An admin's view of the feature:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-Cluster_Tasks.html

A developer’s view of the feature:
https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/

A discussion of how this works in practice (similar failure scenario):
http://users.ovirt.narkive.com/XWYhDO6R/ovirt-users-strange-fencing-behaviour-3-5-3
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AXTTCEGNLL5BGTOH73JVDY53NTAUBFUI/


[ovirt-users] Re: better understand ovirt-engine functions

2018-08-03 Thread stuartk
I have been reading about fencing:
https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/
https://www.slideshare.net/MartinPeina/host-fencing-in-ovirt-fixing-the-unknown-and-allowing-vms-to-be-highly-available

Looking at Edit Cluster, I see:
X Enable Fencing
 Skip fencing if host has live lease on storage
 Skip fencing on cluster connectivity issues 
Threshold 50
i.e. Fencing is Enabled and neither of the two 'Skip' options are checked

Looking at Edit Host... Power Management, I see:
X Enable Power Management
X Kdump integration
Primary
[ ... and the remaining fields are populated with our ILO address & credential 
info ...]

OK, I get it now.  Here's the story:
- ovirt-engine runs on Cluster A in Data Center A
- When ovirt-engine is unable to reach Cluster B in Data Center B, given enough 
disruption, Fencing will kick in and try to mitigate the problem using various 
techniques, including (eventually) power cycling via the ILO

One path forward for me is to check:
 X Skip fencing on cluster connectivity issues 
Threshold 50
And twink with the Threshold to be more suitable for my cluster:
 X Skip fencing on cluster connectivity issues 
Threshold 2

OK, I have a plausible model for understanding what has been happening.

Thank you for your assistance.

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GP4KCCM3CS3BW44U6UQQXJM3SPIUVG63/


[ovirt-users] Re: better understand ovirt-engine functions

2018-08-02 Thread Gianluca Cecchi
On Thu, Aug 2, 2018 at 1:47 PM,  wrote:

> OK, I've spent time capturing traffic from the Hosts in Cluster B back to
> Data Center A.  I don't believe most of the traffic matters:  syslog, snmp,
> icmp, influxd (grafana), ssh, cfengine
>
> After filtering out all that, I'm left with TCP 54321 -- netstat tells me
> that the Python interpreter owns this port -- I'm guessing that this daemon
> is talking with ovirt-engine down in Data Center A.
>
> No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor
> 49152+  (that's what I expected to see ... some sort of storage dependency
> between the two)
>
> So I'm back to wondering under what happens when the conversation between
> ovirt-engine and KVM instances is disrupted?  Does it sound plausible that
> bad things happen?  Or would you say that this seems unlikely ... that
> management functions may be disrupted, but operational functions would be
> unaffected?
>
> --sk
>

https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/installation_guide/networking-requirements#host-firewall-requirements_RHV_install

port 54321 used by both manager and hosts to inter-communicate each one
with the others:

VDSM communications with the Manager and other virtualization hosts.

I think that in case engine in site A is not able to connect to vdsmd of a
host in site B (did this happen? you only talk about disruption inside site
A but it is not clear the kind of disruption...), I think it should mark it
as not responsive and eventually fence it so that it can release VM
resources (if VMs running on it) and storage (if SPM)  it is carrying on
and start on other hosts.
But if all cluster B hosts becomes unresponsive from the engine point of
view I don't know the default action what would be: perhaps freeze all
until something comes back?

Did you configure fencing in your clusters? If so, when you stop
communication inside site A, could it affect your fencing configuration
towards hosts in site B?

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RH77HTNBGUFLORGIQ43ONBA43R6RCP7F/


[ovirt-users] Re: better understand ovirt-engine functions

2018-08-02 Thread stuartk
OK, I've spent time capturing traffic from the Hosts in Cluster B back to Data 
Center A.  I don't believe most of the traffic matters:  syslog, snmp, icmp, 
influxd (grafana), ssh, cfengine

After filtering out all that, I'm left with TCP 54321 -- netstat tells me that 
the Python interpreter owns this port -- I'm guessing that this daemon is 
talking with ovirt-engine down in Data Center A.

No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor 49152+  
(that's what I expected to see ... some sort of storage dependency between the 
two)

So I'm back to wondering under what happens when the conversation between 
ovirt-engine and KVM instances is disrupted?  Does it sound plausible that bad 
things happen?  Or would you say that this seems unlikely ... that management 
functions may be disrupted, but operational functions would be unaffected?

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICED3LWK4UACXZJLGMGKHFR4BLXHV5W3/


[ovirt-users] Re: better understand ovirt-engine functions

2018-07-31 Thread stuartk
What do I mean by 'Hosts in Cluster B crashed':

I have (3) events (events during which I twinked with the Data Center A 
network) during which I accumulated symptoms:
Event 1:  A handful of the several dozen VMs in Cluster B Paused due to Storage 
Issues.  Restarted the VMs to restore service.
Event 2:  Same
Event 3:  One of the (3) Hosts in Cluster B rebooted (that's what I mean by 
'crash').  gluster was unhappy also (two of the three Hosts also function as 
Gluster Bricks) ... but that could be a byproduct.

At the start of Event 3, I put hosted-engine into global maintenance mode:
hosted-engine --set-maintenance --mode=global
Why?  Because I was imagining that hosted-engine might perform some sort of 
connectivity checks with its local IP gateway ... and if it couldn't reach it, 
then emit some sort of 'shutdown' commands to *all* the KVM hosts it knows 
about (yes, I'm waiving my hands a lot right here ... ergo my interest in 
reading about what kind of checks ovirt-engine performs and what kind of 
remedial action it might take based on the results of those checks).


You are suggesting that Cluster B depends, storage-wise, on Cluster A (or, more 
precisely, on Storage located at Cluster A's site).  That's where my thoughts 
turned immediately ... but thus far, I don't see it in the pcaps I've gathering 
-- lots of ovirt-engine traffic, but nothing else.  More poking needed.

ovirt 3.5
glusterfs 3.7.6

I want to do more homework, to demonstrate that Cluster B has no storage 
dependency on Data Center A.

But back to my original question:  where might I go to better understand what 
kind of checks ovirt-engine performs on KVM hosts and what kind of remedial 
action it might take, based on the results of those checks?

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TOAV667Y3NOYPDKAIC3LL7PLT3TPP2FM/


[ovirt-users] Re: better understand ovirt-engine functions

2018-07-24 Thread Simone Tiraboschi
On Mon, Jul 23, 2018 at 7:48 PM  wrote:

> I've been reading through documentation
> https://www.ovirt.org/documentation/architecture/architecture/
> https://www.ovirt.org/documentation/self-hosted/Self-Hosted_Engine_Guide/
>
> But am struggling still to understand the role ovirt-engine plays.  Would
> anyone have recommends for additional reads?
>
> The problem I'm tackling currently looks like this:
> - We have (2) oVirt Data Centers, each populated by a single Cluster.  The
> Data Centers are physically & network-wise 'distant' from one another
> - hosted-engine runs on (3) of the (4) Hosts in Data Center A / Cluster
> A.  hosted-engine does not run on Data Center B / Cluster B
> - When we disrupt network connectivity around Cluster A (yes, that's
> Cluster *A*), Hosts in Cluster B crash (requiring a power cycle) and Guests
> in Cluster B get stopped and paused
>
> I'm struggling to understand why mussing with Cluster A affects Cluster
> B.  From pcaps, I can see plenty of TLS traffic from Cluster A's Hosts --
> presumably from ovirt-engine running on Cluster A -- exchanged with Cluster
> B.  So, during my last maintenance window, I put hosted-engine into
> maintenance mode ... but Hosts/VMs in Cluster B were still affected.
>

The engine is the brain of your system, it's a kind of orchestrator that
starts different tasks on your hosts.
You need an engine for new tasks but existing tasks such as running a VM
should not be affected at all by the lack of an engine.

I fear that your issue is somewhere else.
what do you exactly mean with "Hosts in Cluster B crash "? are you sure
that your hosts in data center B are not consuming storage exposed by hosts
in data center A?


>
> Where do I go to better understand what ovirt-engine does when it is
> 'managing' Hosts & VMs?
>
> --sk
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VQKE5JRT3DC5HQVZXP72FVC75RNAOAXM/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5LH4NHBO6OEBMH3BCYSQXEEMRAQGL3N2/