[ovirt-users] qemu-kvm memory leak in 4.2.5

2018-09-02 Thread Hesham Ahmed
Starting oVirt 4.2.4 (also in 4.2.5 and maybe in 4.2.3) I am facing
some sort of memory leak. The memory usage on the hosts keep
increasing till it reaches somewhere around 97%. Putting the host in
maintenance and back resolves it. The memory usage by the qemu-kvm
processes is way above the defined VM memory for instance below is the
memory usage of a VM:

   PID   USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
 12271  qemu  20   0  35.4g   30.9g  8144 S  13.3 49.3
9433:15 qemu-kvm

The VM memory settings are:

Defined Memory: 8192 MB
Physical Memory Guaranteed: 5461 MB

This is for a 3 node hyperconverged cluster running on latest oVirt Node 4.2.5.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Q7W7S3V2GRBIJ4ID6EJVJWRCBM4DE42L/


[ovirt-users] Re: Managing multiple oVirt installs?

2018-09-02 Thread Gianluca Cecchi
On Mon, Sep 3, 2018 at 7:22 AM Maton, Brett 
wrote:

> Good question, I'm interested in the solution.
>
> On 3 September 2018 at 01:39, femi adegoke 
> wrote:
>
>> Let's say you have multiple oVirt installs.
>>
>> How can they all be "managed" by using a single engine web UI (so I don't
>> have to login 5 different times)?
>>
>>
Could you detail what do you mean for "multiple oVirt installs"?
You can use a single engine and manage more than on of the so called
"Datacenters"..

Or if for some reason you have to manage different engine installations,
you can think about ManageIQ for some kind of operations needed:
http://manageiq.org/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DV2GFTB7CZZXKGVLQ72K7WOVANY5IHDQ/


[ovirt-users] the issue of starting a VM

2018-09-02 Thread hhz711
hello:

I created a VM on node1, but when I power on the vm and try to boot from the 
CD-ROM with ISO, but it's stucked .
the boot window only show below info:
 "seaBIOS ( version xx)
Machine UUID  "
Did you guys have met this before ?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LXKTRVFCVIISHAK5YV7WS3MKAOF2J427/


[ovirt-users] Re: Managing multiple oVirt installs?

2018-09-02 Thread Maton, Brett
Good question, I'm interested in the solution.

On 3 September 2018 at 01:39, femi adegoke  wrote:

> Let's say you have multiple oVirt installs.
>
> How can they all be "managed" by using a single engine web UI (so I don't
> have to login 5 different times)?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-
> guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/
> message/CD5WDA2Y77MHQOV73DQ5UA7W3YO7COUY/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/43IYAAQ5FFX3HNCIRX4FDPI2FCSE57FH/


[ovirt-users] Re: Gluster clients intermittently hang until first gluster server in a Replica 1 Arbiter 1 cluster is rebooted, server error: 0-management: Unlocking failed & client error: bailing out

2018-09-02 Thread Sam McLeod
Sorry, please ignore, incorrect mailing list (doh!)

--
Sam McLeod (protoporpoise on IRC)
https://twitter.com/s_mcleod
https://smcleod.net

Words are my own opinions and do not necessarily represent those of my
employer or partners.


On Mon, 3 Sep 2018, at 12:30 PM, Sam McLeod wrote:
> We've got an odd problem where clients are blocked from writing to
> Gluster volumes until the first node of the Gluster cluster is
> rebooted.> 
> I suspect I've either configured something incorrectly with the
> arbiter / replica configuration of the volumes, or there is some sort
> of bug in the gluster client-server connection that we're triggering.> 
> I was wondering if anyone has seen this or could point me in the right
> direction?> 
> 
> *Environment:*
>  * Typology: 3 node cluster, replica 2, arbiter 1 (third node is
>metadata only).
>  * Version: Client and Servers both running 4.1.3, both on CentOS 7,
>kernel 4.18.x, (Xen) VMs with relatively fast networked SSD storage
>backing them, XFS.
>  * Client: Native Gluster FUSE client mounting via the kubernetes
>provider> 
> *Problem:*
>  * Seemingly randomly some clients will be blocked / are unable to
>write to what should be a highly available gluster volume.
>  * The client gluster logs show it failing to do new file operations
>across various volumes and all three nodes of the gluster.
>  * The server gluster (or OS) logs do not show any warnings or errors.
>  * The client recovers and is able to write to volumes again after the
>first node of the gluster cluster is rebooted.
>  * Until the first node of the gluster cluster is rebooted, the client
>fails to write to the volume that is (or should be) available on
>the second node (a replica) and third node (an arbiter only node).> 
> *What 'fixes' the issue:*
>  * Although the clients (kubernetes hosts) connect to all 3 nodes of
>the Gluster cluster - restarting the first gluster node always
>unblocks the IO and allows the client to continue writing.
>  * Stopping and starting the glusterd service on the gluster server is
>not enough to fix the issue, nor is restarting its networking.
>  * This suggests to me that the volume is unavailable for writing for
>some reason and restarting the first node in the cluster either
>clears some sort of TCP sessions between the client-server or
>between the server-server replication.> 
> *Expected behaviour:*
> 
>  * If the first gluster node / server had failed or was blocked from
>performing operations for some reason (which it doesn't seem it
>is), I'd expect the clients to access data from the second gluster
>node and write metadata to the third gluster node as well as it's
>an arbiter / metadata only node.
>  * If for some reason the a gluster node was not able to serve
>connections to clients, I'd expect to see errors in the volume,
>glusterd or brick log files (there are none on the first gluster
>node).
>  * If the first gluster node was for some reason blocking IO on a
>volume, I'd expect that node either to show as unhealthy or
>unavailable in the gluster peer status or gluster volume status.> 
> 
> *Client gluster errors:*
> 
>  * staging_static in this example is a volume name.
>  * You can see the client trying to connect to the second and third
>nodes of the gluster cluster and failing (unsure as to why?)
>  * The server side logs on the first gluster node do not show any
>errors or problems, but the second / third node show errors in the
>glusterd.log when trying to 'unlock' the 0-management volume on the
>first node.> 
> 
> *On a gluster client* (a kubernetes host using the kubernetes
> connector which uses the native fuse client) when its blocked from
> writing but the gluster appears healthy (other than the errors
> mentioned later):> 
> [2018-09-02 15:33:22.750874] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-
> 2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid =
> 0x1cce sent = 2018-09-02 15:03:22.417773. timeout = 1800 for  third gluster node>:49154> [2018-09-02 15:33:22.750989] E [MSGID: 114031] 
> [client-rpc-
> fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2:
> remote operation failed [Transport endpoint is not connected]> [2018-09-02 
> 16:03:23.097905] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-
> 1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid =
> 0x2e21 sent = 2018-09-02 15:33:22.765751. timeout = 1800 for  second gluster node>:49154> [2018-09-02 16:03:23.097988] E [MSGID: 114031] 
> [client-rpc-
> fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1:
> remote operation failed [Transport endpoint is not connected]> [2018-09-02 
> 16:33:23.439172] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-
> 2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid =
> 0x1d4b sent = 2018-09-02 16:03:23.098133. timeout = 1800 for  third gluster node>:49154> [2018-09

[ovirt-users] Gluster clients intermittently hang until first gluster server in a Replica 1 Arbiter 1 cluster is rebooted, server error: 0-management: Unlocking failed & client error: bailing out fra

2018-09-02 Thread Sam McLeod
We've got an odd problem where clients are blocked from writing to Gluster 
volumes until the first node of the Gluster cluster is rebooted.

I suspect I've either configured something incorrectly with the arbiter / 
replica configuration of the volumes, or there is some sort of bug in the 
gluster client-server connection that we're triggering.

I was wondering if anyone has seen this or could point me in the right 
direction?


Environment:
Typology: 3 node cluster, replica 2, arbiter 1 (third node is metadata only).
Version: Client and Servers both running 4.1.3, both on CentOS 7, kernel 
4.18.x, (Xen) VMs with relatively fast networked SSD storage backing them, XFS.
Client: Native Gluster FUSE client mounting via the kubernetes provider

Problem:
Seemingly randomly some clients will be blocked / are unable to write to what 
should be a highly available gluster volume.
The client gluster logs show it failing to do new file operations across 
various volumes and all three nodes of the gluster.
The server gluster (or OS) logs do not show any warnings or errors.
The client recovers and is able to write to volumes again after the first node 
of the gluster cluster is rebooted.
Until the first node of the gluster cluster is rebooted, the client fails to 
write to the volume that is (or should be) available on the second node (a 
replica) and third node (an arbiter only node).

What 'fixes' the issue:
Although the clients (kubernetes hosts) connect to all 3 nodes of the Gluster 
cluster - restarting the first gluster node always unblocks the IO and allows 
the client to continue writing.
Stopping and starting the glusterd service on the gluster server is not enough 
to fix the issue, nor is restarting its networking.
This suggests to me that the volume is unavailable for writing for some reason 
and restarting the first node in the cluster either clears some sort of TCP 
sessions between the client-server or between the server-server replication.

Expected behaviour:

If the first gluster node / server had failed or was blocked from performing 
operations for some reason (which it doesn't seem it is), I'd expect the 
clients to access data from the second gluster node and write metadata to the 
third gluster node as well as it's an arbiter / metadata only node.
If for some reason the a gluster node was not able to serve connections to 
clients, I'd expect to see errors in the volume, glusterd or brick log files 
(there are none on the first gluster node).
If the first gluster node was for some reason blocking IO on a volume, I'd 
expect that node either to show as unhealthy or unavailable in the gluster peer 
status or gluster volume status.


Client gluster errors:

staging_static in this example is a volume name.
You can see the client trying to connect to the second and third nodes of the 
gluster cluster and failing (unsure as to why?)
The server side logs on the first gluster node do not show any errors or 
problems, but the second / third node show errors in the glusterd.log when 
trying to 'unlock' the 0-management volume on the first node.


On a gluster client (a kubernetes host using the kubernetes connector which 
uses the native fuse client) when its blocked from writing but the gluster 
appears healthy (other than the errors mentioned later):

[2018-09-02 15:33:22.750874] E [rpc-clnt.c:184:call_bail] 
0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
op(INODELK(29)) xid = 0x1cce sent = 2018-09-02 15:03:22.417773. timeout = 1800 
for :49154
[2018-09-02 15:33:22.750989] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
remote operation failed [Transport endpoint is not connected]
[2018-09-02 16:03:23.097905] E [rpc-clnt.c:184:call_bail] 
0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
op(INODELK(29)) xid = 0x2e21 sent = 2018-09-02 15:33:22.765751. timeout = 1800 
for :49154
[2018-09-02 16:03:23.097988] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
remote operation failed [Transport endpoint is not connected]
[2018-09-02 16:33:23.439172] E [rpc-clnt.c:184:call_bail] 
0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
op(INODELK(29)) xid = 0x1d4b sent = 2018-09-02 16:03:23.098133. timeout = 1800 
for :49154
[2018-09-02 16:33:23.439282] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
remote operation failed [Transport endpoint is not connected]
[2018-09-02 17:03:23.786858] E [rpc-clnt.c:184:call_bail] 
0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
op(INODELK(29)) xid = 0x2ee7 sent = 2018-09-02 16:33:23.455171. timeout = 1800 
for :49154
[2018-09-02 17:03:23.786971] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
remote operation failed [Transport endpoint is not connected]
[2018-09-02 17:33:24.160607] E [rpc-clnt.c:184:call_bail]

[ovirt-users] Managing multiple oVirt installs?

2018-09-02 Thread femi adegoke
Let's say you have multiple oVirt installs.

How can they all be "managed" by using a single engine web UI (so I don't have 
to login 5 different times)? 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CD5WDA2Y77MHQOV73DQ5UA7W3YO7COUY/


[ovirt-users] Re: Upgraded host, engine now won't boot

2018-09-02 Thread Darrell Budic
It’s definitely not starting, you’ll have to see if you can figure out why. A 
couple things to try:

- Check "virsh list" and see if it’s running, or paused for storage. (google 
"virsh saslpasswd2 
”
 if you need to add a user to do this with, it’s per host)
-  It’s hyper converged, so check your gluster volume for healing and/or split 
brains and wait/resolve those.
- check “gluster peer status” and on each host and make sure your gluster hosts 
are all talking. I’ve seen an upgrade screwup the firewall, easy fix is to add 
a rule to allow the hosts to talk to each other on your gluster network, no 
questions asked (-j ACCEPT, no port, etc).

Good luck!

> From: Jim Kusznir 
> Subject: [ovirt-users] Upgraded host, engine now won't boot
> Date: September 1, 2018 at 8:38:12 PM CDT
> To: users
> 
> Hello:
> 
> I saw that there were updates to my ovirt-4.2 3 node hyperconverged system, 
> so I proceeded to apply them the usual way through the UI.
> 
> At one point, the hosted engine was migrated to one of the upgraded hosts, 
> and then went "unstable" on me.  Now, the hosted engine appears to be 
> crashed:  It gets powered up, but it never boots up to the point where it 
> responds to pings or allows logins.  After a while, the hosted engine shows 
> status (via console "hosted-engine --vm-status" command) "Powering Down".  It 
> stays there for a long time.
> 
> I tried forcing a poweroff then powering it on, but again, it never gets up 
> to where it will respond to pings.  --vm-status shows bad health, but up.
> 
> I tried running the hosted-engine --console command, but got:
> 
> [root@ovirt1 ~]# hosted-engine --console
> The engine VM is running on this host
> Connected to domain HostedEngine
> Escape character is ^]
> error: internal error: cannot find character device 
> 
> [root@ovirt1 ~]# 
> 
> 
> I tried to run the hosted-engine --upgrade-appliance command, but it hangs at 
> obtaining certificate (understandably, as the hosted-engine is not up).
> 
> How do i recover from this?  And what caused this?
> 
> --Jim
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/XBNOOF4OA5C5AFGCT3KGUPUTRSOLIPXX/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FRGKRZF4G4S2HNN5TNN7DOPEODJBAQSD/