Re: [ovirt-users] rebooting hypervisors from time to time

2018-02-23 Thread Erekle Magradze

Hi,

Thanks a lot for having a look.

HA VMs were migrated, non HA vms were turned off, syslogs were not 
saying anything useful, dmesg reported graceful reboot.


What errors are indicating? may be there is a useful hint to proceed in 
investigation?


Thanks in advance again

Cheers

Erekle


On 02/23/2018 06:15 PM, Mahdi Adnan wrote:

Hi,

The log does't indicate HV reboot, and i see lots of errors in the logs.
During the reboot, what happened to the VM inside of the HV ? migrated 
? paused ? what about the system's logs ? does it indicate a graceful 
shutdown ?



--

Respectfully*
**Mahdi A. Mahdi*


*From:* Erekle Magradze <erekle.magra...@recogizer.de>
*Sent:* Friday, February 23, 2018 2:48 PM
*To:* Mahdi Adnan; users@ovirt.org
*Subject:* Re: [ovirt-users] rebooting hypervisors from time to time

Thanks for the reply,

I've attached all the logs from yesterday, reboot has happened during 
the day but this is not the first time and this is not the only one 
hypervisor.


Kind Regards

Erekle


On 02/23/2018 09:00 AM, Mahdi Adnan wrote:

Hi,

Can you post the VDSM and Engine logs ?


--

Respectfully*
**Mahdi A. Mahdi*


*From:* users-boun...@ovirt.org <mailto:users-boun...@ovirt.org> 
<users-boun...@ovirt.org> <mailto:users-boun...@ovirt.org> on behalf 
of Erekle Magradze <erekle.magra...@recogizer.de> 
<mailto:erekle.magra...@recogizer.de>

*Sent:* Thursday, February 22, 2018 11:48 PM
*To:* users@ovirt.org <mailto:users@ovirt.org>
*Subject:* Re: [ovirt-users] rebooting hypervisors from time to time
Dear all,

It would be great if someone will share any experience regarding the
similar case, would be great to have a hint where to start investigation.

Thanks again

Cheers

Erekle


On 02/22/2018 05:05 PM, Erekle Magradze wrote:
> Hello there,
>
> I am facing the following problem from time to time one of the
> hypervisor (there are 3 of them)s is rebooting, I am using
> ovirt-release42-4.2.1-1.el7.centos.noarch and glsuter as a storage
> backend (glusterfs-3.12.5-2.el7.x86_64).
>
> I am suspecting gluster because of the e.g. message bellow from one of
> the volumes,
>
> Could you please help and suggest to which direction should
> investigation go?
>
> Thanks in advance
>
> Cheers
>
> Erekle
>
>
> [2018-02-22 15:36:10.011687] and [2018-02-22 15:37:10.955013]
> [2018-02-22 15:41:10.198701] I [MSGID: 109063]
> [dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found
> anomalies in (null) (gfid = ----).
> Holes=1 overlaps=0
> [2018-02-22 15:41:10.198704] I [MSGID: 109063]
> [dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found
> anomalies in (null) (gfid = ----).
> Holes=1 overlaps=0
> [2018-02-22 15:42:11.293608] I [MSGID: 109063]
> [dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found
> anomalies in (null) (gfid = ----).
> Holes=1 overlaps=0
> [2018-02-22 15:53:16.245720] I [MSGID: 100030]
> [glusterfsd.c:2524:main] 0-/usr/sbin/glusterfs: Started running
> /usr/sbin/glusterfs version 3.12.5 (args: /usr/sbin/glusterfs
> --volfile-server=10.0.0.21 --volfi
> le-server=10.0.0.22 --volfile-server=10.0.0.23
> --volfile-id=/virtimages
> /rhev/data-center/mnt/glusterSD/10.0.0.21:_virtimages)
> [2018-02-22 15:53:16.263712] W [MSGID: 101002]
> [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family'
> is deprecated, preferred is 'transport.address-family', continuing
> with correction
> [2018-02-22 15:53:16.269595] I [MSGID: 101190]
> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started
> thread with index 1
> [2018-02-22 15:53:16.273483] I [MSGID: 101190]
> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started
> thread with index 2
> [2018-02-22 15:53:16.273594] W [MSGID: 101174]
> [graph.c:363:_log_if_unknown_option] 0-virtimages-readdir-ahead:
> option 'parallel-readdir' is not recognized
> [2018-02-22 15:53:16.273703] I [MSGID: 114020] [client.c:2360:notify]
> 0-virtimages-client-0: parent translators are ready, attempting
> connect on transport
> [2018-02-22 15:53:16.276455] I [MSGID: 114020] [client.c:2360:notify]
> 0-virtimages-client-1: parent translators are ready, attempting
> connect on transport
> [2018-02-22 15:53:16.276683] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
> 0-virtimages-client-0: changing port to 49152 (from 0)
> [2018-02-22 15:53:16.279191] I [MSGID: 114020] [client.c:2360:notify]
> 0-virtimages-client-2: parent translators are ready, attempting
> connect on transport
> [2018-02-22 15:53:16.282126] I [MSGID: 114057]
> [client-handshake.c:1478:select_serv

Re: [ovirt-users] rebooting hypervisors from time to time

2018-02-22 Thread Erekle Magradze

Dear all,

It would be great if someone will share any experience regarding the 
similar case, would be great to have a hint where to start investigation.


Thanks again

Cheers

Erekle


On 02/22/2018 05:05 PM, Erekle Magradze wrote:

Hello there,

I am facing the following problem from time to time one of the 
hypervisor (there are 3 of them)s is rebooting, I am using 
ovirt-release42-4.2.1-1.el7.centos.noarch and glsuter as a storage 
backend (glusterfs-3.12.5-2.el7.x86_64).


I am suspecting gluster because of the e.g. message bellow from one of 
the volumes,


Could you please help and suggest to which direction should 
investigation go?


Thanks in advance

Cheers

Erekle


[2018-02-22 15:36:10.011687] and [2018-02-22 15:37:10.955013]
[2018-02-22 15:41:10.198701] I [MSGID: 109063] 
[dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found 
anomalies in (null) (gfid = ----). 
Holes=1 overlaps=0
[2018-02-22 15:41:10.198704] I [MSGID: 109063] 
[dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found 
anomalies in (null) (gfid = ----). 
Holes=1 overlaps=0
[2018-02-22 15:42:11.293608] I [MSGID: 109063] 
[dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found 
anomalies in (null) (gfid = ----). 
Holes=1 overlaps=0
[2018-02-22 15:53:16.245720] I [MSGID: 100030] 
[glusterfsd.c:2524:main] 0-/usr/sbin/glusterfs: Started running 
/usr/sbin/glusterfs version 3.12.5 (args: /usr/sbin/glusterfs 
--volfile-server=10.0.0.21 --volfi
le-server=10.0.0.22 --volfile-server=10.0.0.23 
--volfile-id=/virtimages 
/rhev/data-center/mnt/glusterSD/10.0.0.21:_virtimages)
[2018-02-22 15:53:16.263712] W [MSGID: 101002] 
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' 
is deprecated, preferred is 'transport.address-family', continuing 
with correction
[2018-02-22 15:53:16.269595] I [MSGID: 101190] 
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started 
thread with index 1
[2018-02-22 15:53:16.273483] I [MSGID: 101190] 
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started 
thread with index 2
[2018-02-22 15:53:16.273594] W [MSGID: 101174] 
[graph.c:363:_log_if_unknown_option] 0-virtimages-readdir-ahead: 
option 'parallel-readdir' is not recognized
[2018-02-22 15:53:16.273703] I [MSGID: 114020] [client.c:2360:notify] 
0-virtimages-client-0: parent translators are ready, attempting 
connect on transport
[2018-02-22 15:53:16.276455] I [MSGID: 114020] [client.c:2360:notify] 
0-virtimages-client-1: parent translators are ready, attempting 
connect on transport
[2018-02-22 15:53:16.276683] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 
0-virtimages-client-0: changing port to 49152 (from 0)
[2018-02-22 15:53:16.279191] I [MSGID: 114020] [client.c:2360:notify] 
0-virtimages-client-2: parent translators are ready, attempting 
connect on transport
[2018-02-22 15:53:16.282126] I [MSGID: 114057] 
[client-handshake.c:1478:select_server_supported_programs] 
0-virtimages-client-0: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2018-02-22 15:53:16.282573] I [MSGID: 114046] 
[client-handshake.c:1231:client_setvolume_cbk] 0-virtimages-client-0: 
Connected to virtimages-client-0, attached to remote volume 
'/mnt/virtimages/virtimgs'.
[2018-02-22 15:53:16.282584] I [MSGID: 114047] 
[client-handshake.c:1242:client_setvolume_cbk] 0-virtimages-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2018-02-22 15:53:16.282665] I [MSGID: 108005] 
[afr-common.c:4929:__afr_handle_child_up_event] 
0-virtimages-replicate-0: Subvolume 'virtimages-client-0' came back 
up; going online.
[2018-02-22 15:53:16.282877] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 
0-virtimages-client-1: changing port to 49152 (from 0)
[2018-02-22 15:53:16.282934] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 
0-virtimages-client-0: Server lk version = 1


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555

E-Mail erekle.magra...@recogizer.de
recogizer.com

-

Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender und löschen Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der 
darin enthaltenen Informationen ist nicht gestattet.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mai

[ovirt-users] rebooting hypervisors from time to time

2018-02-22 Thread Erekle Magradze

Hello there,

I am facing the following problem from time to time one of the 
hypervisor (there are 3 of them)s is rebooting, I am using 
ovirt-release42-4.2.1-1.el7.centos.noarch and glsuter as a storage 
backend (glusterfs-3.12.5-2.el7.x86_64).


I am suspecting gluster because of the e.g. message bellow from one of 
the volumes,


Could you please help and suggest to which direction should 
investigation go?


Thanks in advance

Cheers

Erekle


[2018-02-22 15:36:10.011687] and [2018-02-22 15:37:10.955013]
[2018-02-22 15:41:10.198701] I [MSGID: 109063] 
[dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found 
anomalies in (null) (gfid = ----). 
Holes=1 overlaps=0
[2018-02-22 15:41:10.198704] I [MSGID: 109063] 
[dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found 
anomalies in (null) (gfid = ----). 
Holes=1 overlaps=0
[2018-02-22 15:42:11.293608] I [MSGID: 109063] 
[dht-layout.c:716:dht_layout_normalize] 0-virtimages-dht: Found 
anomalies in (null) (gfid = ----). 
Holes=1 overlaps=0
[2018-02-22 15:53:16.245720] I [MSGID: 100030] [glusterfsd.c:2524:main] 
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 
3.12.5 (args: /usr/sbin/glusterfs --volfile-server=10.0.0.21 --volfi
le-server=10.0.0.22 --volfile-server=10.0.0.23 --volfile-id=/virtimages 
/rhev/data-center/mnt/glusterSD/10.0.0.21:_virtimages)
[2018-02-22 15:53:16.263712] W [MSGID: 101002] 
[options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is 
deprecated, preferred is 'transport.address-family', continuing with 
correction
[2018-02-22 15:53:16.269595] I [MSGID: 101190] 
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2018-02-22 15:53:16.273483] I [MSGID: 101190] 
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 2
[2018-02-22 15:53:16.273594] W [MSGID: 101174] 
[graph.c:363:_log_if_unknown_option] 0-virtimages-readdir-ahead: option 
'parallel-readdir' is not recognized
[2018-02-22 15:53:16.273703] I [MSGID: 114020] [client.c:2360:notify] 
0-virtimages-client-0: parent translators are ready, attempting connect 
on transport
[2018-02-22 15:53:16.276455] I [MSGID: 114020] [client.c:2360:notify] 
0-virtimages-client-1: parent translators are ready, attempting connect 
on transport
[2018-02-22 15:53:16.276683] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 
0-virtimages-client-0: changing port to 49152 (from 0)
[2018-02-22 15:53:16.279191] I [MSGID: 114020] [client.c:2360:notify] 
0-virtimages-client-2: parent translators are ready, attempting connect 
on transport
[2018-02-22 15:53:16.282126] I [MSGID: 114057] 
[client-handshake.c:1478:select_server_supported_programs] 
0-virtimages-client-0: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2018-02-22 15:53:16.282573] I [MSGID: 114046] 
[client-handshake.c:1231:client_setvolume_cbk] 0-virtimages-client-0: 
Connected to virtimages-client-0, attached to remote volume 
'/mnt/virtimages/virtimgs'.
[2018-02-22 15:53:16.282584] I [MSGID: 114047] 
[client-handshake.c:1242:client_setvolume_cbk] 0-virtimages-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2018-02-22 15:53:16.282665] I [MSGID: 108005] 
[afr-common.c:4929:__afr_handle_child_up_event] 
0-virtimages-replicate-0: Subvolume 'virtimages-client-0' came back up; 
going online.
[2018-02-22 15:53:16.282877] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 
0-virtimages-client-1: changing port to 49152 (from 0)
[2018-02-22 15:53:16.282934] I [MSGID: 114035] 
[client-handshake.c:202:client_set_lk_version_cbk] 
0-virtimages-client-0: Server lk version = 1


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] docker, kubernetes and ovirt

2018-01-18 Thread Erekle Magradze

Hi,
Look in OpenShift Origin

Cheers

Erekle


On 01/18/2018 05:57 PM, Nathanaël Blanchet wrote:
And without the kubernetes ui plugin, how can I achieve to use the 
ovirt cloud provider described on the official kubernetes site : 
https://kubernetes.io/docs/getting-started-guides/ovirt/ ?



Le 18/01/2018 à 17:55, Nathanaël Blanchet a écrit :

Hi all,

Regarding to this video : https://www.youtube.com/watch?v=JyyST4ZKne8 
and 
ovedou.blogspot.fr/2014/03/running-docker-container-in-ovirt.html, it 
appears that some stuff has been done in integrated docker into ovirt.


I'm interested into these two UI plugins : docker-resources (I found 
it in samples-uiplugins) and the kubernetes one that I didn't find 
anywhere.


Can anyone tell me if the development of that projects are 
definitively stopped and the reason why?






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] snapshotting

2018-01-11 Thread Erekle Magradze

Hello,
Could you please share your experience with snapshotting the VMs in 
oVirt 4.1 with GlusterFS as a storage?

Thanks in advance
Erekle
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] MoM is failing!!!

2017-10-16 Thread Erekle Magradze

That's the problem, at that time nobody has restarted the server.

Is there any scenario when the hypervisor is restarted by engine?

Cheers

Erekle


On 10/16/2017 04:45 PM, Piotr Kliczewski wrote:

Erekle,

For the time period you mentioned I do not see anything wrong on vdsm
side except of a restart at 2017-10-15 16:28:50,993+0200. It looks
like manual restart.
The engine log starts at 2017-10-16 03:49:04,092+02 so not able to say
whether there was anything else except of heartbeat issue caused by
the restart.

The restart was the cause of "connection reset by peer" on mom side.

Thanks,
Piotr

On Mon, Oct 16, 2017 at 4:21 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi Piotr,

Several times I've restarted vdsm daemon on certain nods, that could be the
reason.

The failure, I've mentioned, has happened yesterday from 15:00 to 17:00

Cheers

Erekle



On 10/16/2017 04:13 PM, Piotr Kliczewski wrote:

Erekle,

In the logs you provided I see:

IOError: [Errno 5] _handleRequests._checkForMail - Could not read
mailbox:
/rhev/data-center/6d52512e-1c02-4509-880a-bf57cbad4bdf/mastersd/dom_md/inbox

and

StorageDomainMasterError: Error validating master storage domain: ('MD
read error',)

which seems to be cause for vdsm being killed by sanlock which caused
connection reset by peer.

After vdsm restart storage looks good.

@Nir can you take a look?

Thanks,
Piotr

On Mon, Oct 16, 2017 at 3:59 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi,

The issue is the following, after installation of ovirt 4.1 on three
nodes
with glusterFS as a storage, oVirt engine reported the failed events,
with
the following message

VDSM hostname command GetStatsVDS failed: Connection reset by peer

after that oVirt was trying to fence the affected host and it was
excluded
from production, luckily I am not running any VMs on it yet.

The logs are attached, don't be surprised with the hostnames :)

Thanks in advance

Cheers

Erekle


On 10/16/2017 03:37 PM, Dafna Ron wrote:

Hi,

Can you please tell us what is the issue that you are actually facing? :)
it
would be easier to debug an issue and not an error message that can be
cause
by several things.

Also, can you provide the engine and the vdsm logs?

thank you,
Dafna


On 10/16/2017 02:30 PM, Erekle Magradze wrote:

It's was a typo in the failure message,

that's what I was getting:

VDSM hostname command GetStatsVDS failed: Connection reset by peer


On 10/16/2017 03:21 PM, Erekle Magradze wrote:

Hi,

It's getting clear now, indeed momd service is disabled

● momd.service - Memory Overcommitment Manager Daemon
 Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor
preset: disabled)
 Active: inactive (dead)

mom-vdsm is enable and running.

● mom-vdsm.service - MOM instance configured for VDSM purposes
 Loaded: loaded (/usr/lib/systemd/system/mom-vdsm.service; enabled;
vendor
preset: enabled)
 Active: active (running) since Mon 2017-10-16 15:14:35 CEST; 1min 3s
ago
   Main PID: 27638 (python)
 CGroup: /system.slice/mom-vdsm.service
 └─27638 python /usr/sbin/momd -c /etc/vdsm/mom.conf

The reason why I came up with digging in mom problems is the following
problem


VDSM hostname command GetStatsVDSThanks failed: Connection reset by peer

that is causing fencing of the node where the failure is happening, what
could be the reason of GetStatsVDS failure?

Best Regards
Erekle


On 10/16/2017 03:11 PM, Martin Sivak wrote:

Hi,

how do you start MOM? MOM is supposed to talk to vdsm, we do not talk
to libvirt directly. The line you posted comes from vdsm and vdsm is
telling you it can't talk to MOM.

Which MOM service is enabled? Because there are two momd and mom-vdsm,
the second one is the one that should be enabled.

Best regards

Martin Sivak


On Mon, Oct 16, 2017 at 3:04 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi Martin,

Thanks for the answer, unfortunately this warning message persists, does
it
mean that mom cannot communicate with libvirt? how critical is it?

Best

Erekle



On 10/16/2017 03:03 PM, Martin Sivak wrote:

Hi,

it is just a warning, there is nothing you have to solve unless it
does not resolve itself within a minute or so. If it happens only once
or twice after vdsm or mom restart then you are fine.

Best regards

--
Martin Sivak
SLA / oVirt

On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi,

after running

systemctl status vdsm I am getting that it's running and this message at
the
end.

Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available.
Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available,
KSM stats will be missing.
Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated
in
favor of ping2 and confirmConnectivity

how critical it is? and how to solve that warning?

I am using libvirt

Cheers

__

Re: [ovirt-users] MoM is failing!!!

2017-10-16 Thread Erekle Magradze

Hi Piotr,

Several times I've restarted vdsm daemon on certain nods, that could be 
the reason.


The failure, I've mentioned, has happened yesterday from 15:00 to 17:00

Cheers

Erekle


On 10/16/2017 04:13 PM, Piotr Kliczewski wrote:

Erekle,

In the logs you provided I see:

IOError: [Errno 5] _handleRequests._checkForMail - Could not read
mailbox: 
/rhev/data-center/6d52512e-1c02-4509-880a-bf57cbad4bdf/mastersd/dom_md/inbox

and

StorageDomainMasterError: Error validating master storage domain: ('MD
read error',)

which seems to be cause for vdsm being killed by sanlock which caused
connection reset by peer.

After vdsm restart storage looks good.

@Nir can you take a look?

Thanks,
Piotr

On Mon, Oct 16, 2017 at 3:59 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi,

The issue is the following, after installation of ovirt 4.1 on three nodes
with glusterFS as a storage, oVirt engine reported the failed events, with
the following message

VDSM hostname command GetStatsVDS failed: Connection reset by peer

after that oVirt was trying to fence the affected host and it was excluded
from production, luckily I am not running any VMs on it yet.

The logs are attached, don't be surprised with the hostnames :)

Thanks in advance

Cheers

Erekle


On 10/16/2017 03:37 PM, Dafna Ron wrote:

Hi,

Can you please tell us what is the issue that you are actually facing? :) it
would be easier to debug an issue and not an error message that can be cause
by several things.

Also, can you provide the engine and the vdsm logs?

thank you,
Dafna


On 10/16/2017 02:30 PM, Erekle Magradze wrote:

It's was a typo in the failure message,

that's what I was getting:

VDSM hostname command GetStatsVDS failed: Connection reset by peer


On 10/16/2017 03:21 PM, Erekle Magradze wrote:

Hi,

It's getting clear now, indeed momd service is disabled

● momd.service - Memory Overcommitment Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor
preset: disabled)
Active: inactive (dead)

mom-vdsm is enable and running.

● mom-vdsm.service - MOM instance configured for VDSM purposes
Loaded: loaded (/usr/lib/systemd/system/mom-vdsm.service; enabled; vendor
preset: enabled)
Active: active (running) since Mon 2017-10-16 15:14:35 CEST; 1min 3s ago
  Main PID: 27638 (python)
CGroup: /system.slice/mom-vdsm.service
└─27638 python /usr/sbin/momd -c /etc/vdsm/mom.conf

The reason why I came up with digging in mom problems is the following
problem


VDSM hostname command GetStatsVDSThanks failed: Connection reset by peer

that is causing fencing of the node where the failure is happening, what
could be the reason of GetStatsVDS failure?

Best Regards
Erekle


On 10/16/2017 03:11 PM, Martin Sivak wrote:

Hi,

how do you start MOM? MOM is supposed to talk to vdsm, we do not talk
to libvirt directly. The line you posted comes from vdsm and vdsm is
telling you it can't talk to MOM.

Which MOM service is enabled? Because there are two momd and mom-vdsm,
the second one is the one that should be enabled.

Best regards

Martin Sivak


On Mon, Oct 16, 2017 at 3:04 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi Martin,

Thanks for the answer, unfortunately this warning message persists, does it
mean that mom cannot communicate with libvirt? how critical is it?

Best

Erekle



On 10/16/2017 03:03 PM, Martin Sivak wrote:

Hi,

it is just a warning, there is nothing you have to solve unless it
does not resolve itself within a minute or so. If it happens only once
or twice after vdsm or mom restart then you are fine.

Best regards

--
Martin Sivak
SLA / oVirt

On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi,

after running

systemctl status vdsm I am getting that it's running and this message at
the
end.

Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available.
Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available,
KSM stats will be missing.
Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated
in
favor of ping2 and confirmConnectivity

how critical it is? and how to solve that warning?

I am using libvirt

Cheers

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555

E-Mail erekle.magra...@recogizer.de
Web: www.recogizer.com

Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/
Folgen Sie uns auf Twitter https://twitter.com/recogizer

-
Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amts

Re: [ovirt-users] MoM is failing!!!

2017-10-16 Thread Erekle Magradze

It's was a typo in the failure message,

that's what I was getting:

*VDSM hostname command GetStatsVDS failed: Connection reset by peer*


On 10/16/2017 03:21 PM, Erekle Magradze wrote:


Hi,

It's getting clear now, indeed momd service is disabled

● momd.service - Memory Overcommitment Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/momd.service; static; 
vendor preset: disabled)

   Active: inactive (dead)

mom-vdsm is enable and running.

● mom-vdsm.service - MOM instance configured for VDSM purposes
   Loaded: loaded (/usr/lib/systemd/system/mom-vdsm.service; enabled; 
vendor preset: enabled)
   Active: active (running) since Mon 2017-10-16 15:14:35 CEST; 1min 
3s ago

 Main PID: 27638 (python)
   CGroup: /system.slice/mom-vdsm.service
   └─27638 python /usr/sbin/momd -c /etc/vdsm/mom.conf

The reason why I came up with digging in mom problems is the following 
problem



*VDSM hostname command GetStatsVDSThanks failed: Connection reset by peer*

that is causing fencing of the node where the failure is happening, 
what could be the reason of GetStatsVDS failure?


Best Regards
Erekle


On 10/16/2017 03:11 PM, Martin Sivak wrote:

Hi,

how do you start MOM? MOM is supposed to talk to vdsm, we do not talk
to libvirt directly. The line you posted comes from vdsm and vdsm is
telling you it can't talk to MOM.

Which MOM service is enabled? Because there are two momd and mom-vdsm,
the second one is the one that should be enabled.

Best regards

Martin Sivak


On Mon, Oct 16, 2017 at 3:04 PM, Erekle Magradze
<erekle.magra...@recogizer.de>  wrote:

Hi Martin,

Thanks for the answer, unfortunately this warning message persists, does it
mean that mom cannot communicate with libvirt? how critical is it?

Best

Erekle



On 10/16/2017 03:03 PM, Martin Sivak wrote:

Hi,

it is just a warning, there is nothing you have to solve unless it
does not resolve itself within a minute or so. If it happens only once
or twice after vdsm or mom restart then you are fine.

Best regards

--
Martin Sivak
SLA / oVirt

On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze
<erekle.magra...@recogizer.de>  wrote:

Hi,

after running

systemctl status vdsm I am getting that it's running and this message at
the
end.

Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available.
Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available,
KSM stats will be missing.
Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated
in
favor of ping2 and confirmConnectivity

how critical it is? and how to solve that warning?

I am using libvirt

Cheers

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555

E-Mail erekle.magra...@recogizer.de
Web: www.recogizer.com
 
Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/

Folgen Sie uns auf Twitter https://twitter.com/recogizer
 
-

Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
 
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.

Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben,
informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der 
darin enthaltenen Informationen ist nicht gestattet.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] MoM is failing!!!

2017-10-16 Thread Erekle Magradze

Hi,

It's getting clear now, indeed momd service is disabled

● momd.service - Memory Overcommitment Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/momd.service; static; vendor 
preset: disabled)

   Active: inactive (dead)

mom-vdsm is enable and running.

● mom-vdsm.service - MOM instance configured for VDSM purposes
   Loaded: loaded (/usr/lib/systemd/system/mom-vdsm.service; enabled; 
vendor preset: enabled)

   Active: active (running) since Mon 2017-10-16 15:14:35 CEST; 1min 3s ago
 Main PID: 27638 (python)
   CGroup: /system.slice/mom-vdsm.service
   └─27638 python /usr/sbin/momd -c /etc/vdsm/mom.conf

The reason why I came up with digging in mom problems is the following 
problem



*VDSM hostname command GetStatsVDSThanks failed: Connection reset by peer*

that is causing fencing of the node where the failure is happening, what 
could be the reason of GetStatsVDS failure?


Best Regards
Erekle


On 10/16/2017 03:11 PM, Martin Sivak wrote:

Hi,

how do you start MOM? MOM is supposed to talk to vdsm, we do not talk
to libvirt directly. The line you posted comes from vdsm and vdsm is
telling you it can't talk to MOM.

Which MOM service is enabled? Because there are two momd and mom-vdsm,
the second one is the one that should be enabled.

Best regards

Martin Sivak


On Mon, Oct 16, 2017 at 3:04 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi Martin,

Thanks for the answer, unfortunately this warning message persists, does it
mean that mom cannot communicate with libvirt? how critical is it?

Best

Erekle



On 10/16/2017 03:03 PM, Martin Sivak wrote:

Hi,

it is just a warning, there is nothing you have to solve unless it
does not resolve itself within a minute or so. If it happens only once
or twice after vdsm or mom restart then you are fine.

Best regards

--
Martin Sivak
SLA / oVirt

On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi,

after running

systemctl status vdsm I am getting that it's running and this message at
the
end.

Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available.
Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not
available,
KSM stats will be missing.
Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated
in
favor of ping2 and confirmConnectivity

how critical it is? and how to solve that warning?

I am using libvirt

Cheers

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] MoM is failing!!!

2017-10-16 Thread Erekle Magradze

Hi Martin,

Thanks for the answer, unfortunately this warning message persists, does 
it mean that mom cannot communicate with libvirt? how critical is it?


Best

Erekle


On 10/16/2017 03:03 PM, Martin Sivak wrote:

Hi,

it is just a warning, there is nothing you have to solve unless it
does not resolve itself within a minute or so. If it happens only once
or twice after vdsm or mom restart then you are fine.

Best regards

--
Martin Sivak
SLA / oVirt

On Mon, Oct 16, 2017 at 2:44 PM, Erekle Magradze
<erekle.magra...@recogizer.de> wrote:

Hi,

after running

systemctl status vdsm I am getting that it's running and this message at the
end.

Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not available.
Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not available,
KSM stats will be missing.
Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated in
favor of ping2 and confirmConnectivity

how critical it is? and how to solve that warning?

I am using libvirt

Cheers

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] MoM is failing!!!

2017-10-16 Thread Erekle Magradze

Hi,

after running

systemctl status vdsm I am getting that it's running and this message at 
the end.


Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not available.
Oct 16 14:26:52 hostname vdsmd[2392]: vdsm throttled WARN MOM not 
available, KSM stats will be missing.
Oct 16 14:26:57 hostname vdsmd[2392]: vdsm root WARN ping was deprecated 
in favor of ping2 and confirmConnectivity


how critical it is? and how to solve that warning?

I am using libvirt

Cheers

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] problem after update

2017-10-16 Thread Erekle Magradze

Hello guys,
I have 3 node oVirt with GlusterFS volumes,
here is the kernel version: *3.10.0-693.2.2.el7.x86_64
*this is the version of gluster I am running:*3.8.15-2
*oVirt engine is running on an separate baremetal host*

*I am getting the following failure message:*
**
VDSM  command GetStatsVDS failed: Heartbeat exceeded**

*after that  is set to unreachable and system is trying to 
fence it out (since I haven't configured power management yet it's not 
possible)*.
*I checked the network communication and it works fine, can you suggest 
how to investigate this problem?

also what is GetStatsVDS command for?

Thanks in advance
Cheers
Erekle

**
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] disk attachment to VM

2017-09-05 Thread Erekle Magradze

Hey Guys,
Is there a way to attach an SSD directly to the oVirt VM?
Thanks in advance
Cheers
Erekle
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Good practices

2017-08-07 Thread Erekle Magradze

Hi Fernando,

Indeed, having and arbiter node is always a good idea, and it saves 
costs a lot.


Good luck with your setup.

Cheers

Erekle


On 07.08.2017 23:03, FERNANDO FREDIANI wrote:


Thanks for the detailed answer Erekle.

I conclude that it is worth in any scenario to have a arbiter node in 
order to avoid wasting more disk space to RAID X + Gluster Replication 
on the top of it. The cost seems much lower if you consider running 
costs of the whole storage and compare it with the cost to build the 
arbiter node. Even having a fully redundant arbiter service with 2 
nodes would make it wort on a larger deployment.


Regards
Fernando

On 07/08/2017 17:07, Erekle Magradze wrote:


Hi Fernando (sorry for misspelling your name, I used a different 
keyboard),


So let's go with the following scenarios:

1. Let's say you have two servers (replication factor is 2), i.e. two 
bricks per volume, in this case it is strongly recommended to have 
the arbiter node, the metadata storage that will guarantee avoiding 
the split brain situation, in this case for arbiter you don't even 
need a disk with lots of space, it's enough to have a tiny ssd but 
hosted on a separate server. Advantage of such setup is that you 
don't need the RAID 1 for each brick, you have the metadata 
information stored in arbiter node and brick replacement is easy.


2. If you have odd number of bricks (let's say 3, i.e. replication 
factor is 3) in your volume and you didn't create the arbiter node as 
well as you didn't configure the quorum, in this case the entire load 
for keeping the consistency of the volume resides on all 3 servers, 
each of them is important and each brick contains key information, 
they need to cross-check each other (that's what people usually do 
with the first try of gluster :) ), in this case replacing a brick is 
a big pain and in this case RAID 1 is a good option to have (that's 
the disadvantage, i.e. loosing the space and not having the JBOD 
option) advantage is that you don't have the to have additional 
arbiter node.


3. You have odd number of bricks and configured arbiter node, in this 
case you can easily go with JBOD, however a good practice would be to 
have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly 
sufficient for volumes with 10s of TB-s in size.)


That's basically it

The rest about the reliability and setup scenarios you can find in 
gluster documentation, especially look for quorum and arbiter node 
configs+options.


Cheers

Erekle

P.S. What I was mentioning, regarding a good practice is mostly 
related to the operations of gluster not installation or deployment, 
i.e. not the conceptual understanding of gluster (conceptually it's a 
JBOD system).


On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:


Thanks for the clarification Erekle.

However I get surprised with this way of operating from GlusterFS as 
it adds another layer of complexity to the system (either a hardware 
or software RAID) before the gluster config and increase the 
system's overall costs.


An important point to consider is: In RAID configuration you already 
have space 'wasted' in order to build redundancy (either RAID 1, 5, 
or 6). Then when you have GlusterFS on the top of several RAIDs you 
have again more data replicated so you end up with the same data 
consuming more space in a group of disks and again on the top of 
several RAIDs depending on the Gluster configuration you have (in a 
RAID 1 config the same data is replicated 4 times).


Yet another downside of having a RAID (specially RAID 5 or 6) is 
that it reduces considerably the write speeds as each group of disks 
will end up having the write speed of a single disk as all other 
disks of that group have to wait for each other to write as well.


Therefore if Gluster already replicates data why does it create this 
big pain you mentioned if the data is replicated somewhere else, can 
still be retrieved to both serve clients and reconstruct the 
equivalent disk when it is replaced ?


Fernando


On 07/08/2017 10:26, Erekle Magradze wrote:


Hi Frenando,

Here is my experience, if you consider a particular hard drive as a 
brick for gluster volume and it dies, i.e. it becomes not 
accessible it's a huge hassle to discard that brick and exchange 
with another one, since gluster some tries to access that broken 
brick and it's causing (at least it cause for me) a big pain, 
therefore it's better to have a RAID as brick, i.e. have RAID 1 
(mirroring) for each brick, in this case if the disk is down you 
can easily exchange it and rebuild the RAID without going offline, 
i.e switching off the volume doing brick manipulations and 
switching it back on.


Cheers

Erekle


On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:


For any RAID 5 or 6 configuration I normally follow a simple gold 
rule which gave good results so far:

- up to 4 disks RAID 5
- 5 or more disks RAID 6

However I didn't really understand well the recommendation to use 
any RAID

Re: [ovirt-users] How to shutdown an oVirt cluster with Gluster and hosted engine

2017-08-07 Thread Erekle Magradze

Hi Moacir,

First switch off all VMs.

Second you need to declare hosts maintenance mode, don't start with SRM 
(of course if you are able use the ovirt-engine), it will ask you to 
shutdown the glusterfs on a machine.


Third if all machines are in maintenance mode, you can start shutting 
down them.



If you have hosted engine setup follow this [1]


Cheers

Erekle


[1] 
https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oVirt-3.6-Cluster-Shutdown-and-Startup



On 08/07/2017 08:58 PM, Moacir Ferreira wrote:


I have installed a oVirt cluster in a KVM virtualized test 
environment. Now, how do I properly shutdown the oVirt cluster, with 
Gluster and the hosted engine?


I.e.: I want to install a cluster of 3 servers and then send it to a 
remote office. How do I do it properly? I noticed that glusterd is not 
enabled to start automatically. And how do I deal with the hosted engine?



Thanks,

Moacir



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555

E-Mail erekle.magra...@recogizer.de
Web: www.recogizer.com
 
Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/

Folgen Sie uns auf Twitter https://twitter.com/recogizer
 
-

Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
 
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.

Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben,
informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der 
darin enthaltenen Informationen ist nicht gestattet.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Good practices

2017-08-07 Thread Erekle Magradze

Hi Fernando (sorry for misspelling your name, I used a different keyboard),

So let's go with the following scenarios:

1. Let's say you have two servers (replication factor is 2), i.e. two 
bricks per volume, in this case it is strongly recommended to have the 
arbiter node, the metadata storage that will guarantee avoiding the 
split brain situation, in this case for arbiter you don't even need a 
disk with lots of space, it's enough to have a tiny ssd but hosted on a 
separate server. Advantage of such setup is that you don't need the RAID 
1 for each brick, you have the metadata information stored in arbiter 
node and brick replacement is easy.


2. If you have odd number of bricks (let's say 3, i.e. replication 
factor is 3) in your volume and you didn't create the arbiter node as 
well as you didn't configure the quorum, in this case the entire load 
for keeping the consistency of the volume resides on all 3 servers, each 
of them is important and each brick contains key information, they need 
to cross-check each other (that's what people usually do with the first 
try of gluster :) ), in this case replacing a brick is a big pain and in 
this case RAID 1 is a good option to have (that's the disadvantage, i.e. 
loosing the space and not having the JBOD option) advantage is that you 
don't have the to have additional arbiter node.


3. You have odd number of bricks and configured arbiter node, in this 
case you can easily go with JBOD, however a good practice would be to 
have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly 
sufficient for volumes with 10s of TB-s in size.)


That's basically it

The rest about the reliability and setup scenarios you can find in 
gluster documentation, especially look for quorum and arbiter node 
configs+options.


Cheers

Erekle

P.S. What I was mentioning, regarding a good practice is mostly related 
to the operations of gluster not installation or deployment, i.e. not 
the conceptual understanding of gluster (conceptually it's a JBOD system).


On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:


Thanks for the clarification Erekle.

However I get surprised with this way of operating from GlusterFS as 
it adds another layer of complexity to the system (either a hardware 
or software RAID) before the gluster config and increase the system's 
overall costs.


An important point to consider is: In RAID configuration you already 
have space 'wasted' in order to build redundancy (either RAID 1, 5, or 
6). Then when you have GlusterFS on the top of several RAIDs you have 
again more data replicated so you end up with the same data consuming 
more space in a group of disks and again on the top of several RAIDs 
depending on the Gluster configuration you have (in a RAID 1 config 
the same data is replicated 4 times).


Yet another downside of having a RAID (specially RAID 5 or 6) is that 
it reduces considerably the write speeds as each group of disks will 
end up having the write speed of a single disk as all other disks of 
that group have to wait for each other to write as well.


Therefore if Gluster already replicates data why does it create this 
big pain you mentioned if the data is replicated somewhere else, can 
still be retrieved to both serve clients and reconstruct the 
equivalent disk when it is replaced ?


Fernando


On 07/08/2017 10:26, Erekle Magradze wrote:


Hi Frenando,

Here is my experience, if you consider a particular hard drive as a 
brick for gluster volume and it dies, i.e. it becomes not accessible 
it's a huge hassle to discard that brick and exchange with another 
one, since gluster some tries to access that broken brick and it's 
causing (at least it cause for me) a big pain, therefore it's better 
to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, 
in this case if the disk is down you can easily exchange it and 
rebuild the RAID without going offline, i.e switching off the volume 
doing brick manipulations and switching it back on.


Cheers

Erekle


On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:


For any RAID 5 or 6 configuration I normally follow a simple gold 
rule which gave good results so far:

- up to 4 disks RAID 5
- 5 or more disks RAID 6

However I didn't really understand well the recommendation to use 
any RAID with GlusterFS. I always thought that GlusteFS likes to 
work in JBOD mode and control the disks (bricks) directlly so you 
can create whatever distribution rule you wish, and if a single disk 
fails you just replace it and which obviously have the data 
replicated from another. The only downside of using in this way is 
that the replication data will be flow accross all servers but that 
is not much a big issue.


Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.

Thanks
Regards
Fernando


On 07/08/2017 03:46, Devin Acosta wrote:


Moacir,

I have recently installed multiple Red Hat Virtualization hosts for 
several different companies, and have dealt with the Red Hat

Re: [ovirt-users] Good practices

2017-08-07 Thread Erekle Magradze

Hi Franando,

So let's go with the following scenarios:

1. Let's say you have two servers (replication factor is 2), i.e. two 
bricks per volume, in this case it is strongly recommended to have the 
arbiter node, the metadata storage that will guarantee avoiding the 
split brain situation, in this case for arbiter you don't even need a 
disk with lots of space, it's enough to have a tiny ssd but hosted on a 
separate server. Advantage of such setup is that you don't need the RAID 
1 for each brick, you have the metadata information stored in arbiter 
node and brick replacement is easy.


2. If you have odd number of bricks (let's say 3, i.e. replication 
factor is 3) in your volume and you didn't create the arbiter node as 
well as you didn't configure the quorum, in this case the entire load 
for keeping the consistency of the volume resides on all 3 servers, each 
of them is important and each brick contains key information, they need 
to cross-check each other (that's what people usually do with the first 
try of gluster :) ), in this case replacing a brick is a big pain and in 
this case RAID 1 is a good option to have (that's the disadvantage, i.e. 
loosing the space and not having the JBOD option) advantage is that you 
don't have the to have additional arbiter node.


3. You have odd number of bricks and configured arbiter node, in this 
case you can easily go with JBOD, however a good practice would be to 
have a RAID 1 for arbiter disks (tiny 128GB SSD-s ar perfectly 
sufficient for volumes with 10s of TB-s in size.)


That's basically it

The rest about the reliability and setup scenarios you can find in 
gluster documentation, especially look for quorum and arbiter node 
configs+options.


Cheers

Erekle

P.S. What I was mentioning, regarding a good practice is mostly related 
to the operations of gluster not installation or deployment, i.e. not 
the conceptual understanding of gluster (conceptually it's a JBOD system).



On 08/07/2017 05:41 PM, FERNANDO FREDIANI wrote:


Thanks for the clarification Erekle.

However I get surprised with this way of operating from GlusterFS as 
it adds another layer of complexity to the system (either a hardware 
or software RAID) before the gluster config and increase the system's 
overall costs.


An important point to consider is: In RAID configuration you already 
have space 'wasted' in order to build redundancy (either RAID 1, 5, or 
6). Then when you have GlusterFS on the top of several RAIDs you have 
again more data replicated so you end up with the same data consuming 
more space in a group of disks and again on the top of several RAIDs 
depending on the Gluster configuration you have (in a RAID 1 config 
the same data is replicated 4 times).


Yet another downside of having a RAID (specially RAID 5 or 6) is that 
it reduces considerably the write speeds as each group of disks will 
end up having the write speed of a single disk as all other disks of 
that group have to wait for each other to write as well.


Therefore if Gluster already replicates data why does it create this 
big pain you mentioned if the data is replicated somewhere else, can 
still be retrieved to both serve clients and reconstruct the 
equivalent disk when it is replaced ?


Fernando


On 07/08/2017 10:26, Erekle Magradze wrote:


Hi Frenando,

Here is my experience, if you consider a particular hard drive as a 
brick for gluster volume and it dies, i.e. it becomes not accessible 
it's a huge hassle to discard that brick and exchange with another 
one, since gluster some tries to access that broken brick and it's 
causing (at least it cause for me) a big pain, therefore it's better 
to have a RAID as brick, i.e. have RAID 1 (mirroring) for each brick, 
in this case if the disk is down you can easily exchange it and 
rebuild the RAID without going offline, i.e switching off the volume 
doing brick manipulations and switching it back on.


Cheers

Erekle


On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:


For any RAID 5 or 6 configuration I normally follow a simple gold 
rule which gave good results so far:

- up to 4 disks RAID 5
- 5 or more disks RAID 6

However I didn't really understand well the recommendation to use 
any RAID with GlusterFS. I always thought that GlusteFS likes to 
work in JBOD mode and control the disks (bricks) directlly so you 
can create whatever distribution rule you wish, and if a single disk 
fails you just replace it and which obviously have the data 
replicated from another. The only downside of using in this way is 
that the replication data will be flow accross all servers but that 
is not much a big issue.


Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.

Thanks
Regards
Fernando


On 07/08/2017 03:46, Devin Acosta wrote:


Moacir,

I have recently installed multiple Red Hat Virtualization hosts for 
several different companies, and have dealt with the Red Hat 
Support Team in depth about optimal configuration in regards

Re: [ovirt-users] Good practices

2017-08-07 Thread Erekle Magradze

Hi Frenando,

Here is my experience, if you consider a particular hard drive as a 
brick for gluster volume and it dies, i.e. it becomes not accessible 
it's a huge hassle to discard that brick and exchange with another one, 
since gluster some tries to access that broken brick and it's causing 
(at least it cause for me) a big pain, therefore it's better to have a 
RAID as brick, i.e. have RAID 1 (mirroring) for each brick, in this case 
if the disk is down you can easily exchange it and rebuild the RAID 
without going offline, i.e switching off the volume doing brick 
manipulations and switching it back on.


Cheers

Erekle


On 08/07/2017 03:04 PM, FERNANDO FREDIANI wrote:


For any RAID 5 or 6 configuration I normally follow a simple gold rule 
which gave good results so far:

- up to 4 disks RAID 5
- 5 or more disks RAID 6

However I didn't really understand well the recommendation to use any 
RAID with GlusterFS. I always thought that GlusteFS likes to work in 
JBOD mode and control the disks (bricks) directlly so you can create 
whatever distribution rule you wish, and if a single disk fails you 
just replace it and which obviously have the data replicated from 
another. The only downside of using in this way is that the 
replication data will be flow accross all servers but that is not much 
a big issue.


Anyone can elaborate about Using RAID + GlusterFS and JBOD + GlusterFS.

Thanks
Regards
Fernando


On 07/08/2017 03:46, Devin Acosta wrote:


Moacir,

I have recently installed multiple Red Hat Virtualization hosts for 
several different companies, and have dealt with the Red Hat Support 
Team in depth about optimal configuration in regards to setting up 
GlusterFS most efficiently and I wanted to share with you what I learned.


In general Red Hat Virtualization team frowns upon using each DISK of 
the system as just a JBOD, sure there is some protection by having 
the data replicated, however, the recommendation is to use RAID 6 
(preferred) or RAID-5, or at least RAID-1 at the very least.


Here is the direct quote from Red Hat when I asked about RAID and Bricks:
/
/
/"A typical Gluster configuration would use RAID underneath the 
bricks. RAID 6 is most typical as it gives you 2 disk failure 
protection, but RAID 5 could be used too. Once you have the RAIDed 
bricks, you'd then apply the desired replication on top of that. The 
most popular way of doing this would be distributed replicated with 
2x replication. In general you'll get better performance with larger 
bricks. 12 drives is often a sweet spot. Another option would be to 
create a separate tier using all SSD’s.” /


/In order to SSD tiering from my understanding you would need 1 x 
NVMe drive in each server, or 4 x SSD hot tier (it needs to be 
distributed, replicated for the hot tier if not using NVME). So with 
you only having 1 SSD drive in each server, I’d suggest maybe looking 
into the NVME option. /

/
/
/Since your using only 3-servers, what I’d probably suggest is to do 
(2 Replicas + Arbiter Node), this setup actually doesn’t require the 
3rd server to have big drives at all as it only stores meta-data 
about the files and not actually a full copy. /

/
/
/Please see the attached document that was given to me by Red Hat to 
get more information on this. Hope this information helps you./

/
/

--

Devin Acosta, RHCA, RHVCA
Red Hat Certified Architect

On August 6, 2017 at 7:29:29 PM, Moacir Ferreira 
(moacirferre...@hotmail.com ) wrote:


I am willing to assemble a oVirt "pod", made of 3 servers, each with 
2 CPU sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is 
to use GlusterFS to provide HA for the VMs. The 3 servers have a 
dual 40Gb NIC and a dual 10Gb NIC. So my intention is to create a 
loop like a server triangle using the 40Gb NICs for virtualization 
files (VMs .qcow2) access and to move VMs around the pod (east /west 
traffic) while using the 10Gb interfaces for giving services to the 
outside world (north/south traffic).



This said, my first question is: How should I deploy GlusterFS in 
such oVirt scenario? My questions are:



1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, 
and then create a GlusterFS using them?


2 - Instead, should I create a JBOD array made of all server's disks?

3 - What is the best Gluster configuration to provide for HA while 
not consuming too much disk space?


4 - Does a oVirt hypervisor pod like I am planning to build, and the 
virtualization environment, benefits from tiering when using a SSD 
disk? And yes, will Gluster do it by default or I have to configure 
it to do so?



At the bottom line, what is the good practice for using GlusterFS in 
small pods for enterprises?



You opinion/feedback will be really appreciated!

Moacir

___
Users mailing list
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users




Re: [ovirt-users] test email

2017-07-19 Thread Erekle Magradze

test


On 07/19/2017 11:53 AM, Abi Askushi wrote:

several days without receiving any email from this list.
please test back.

Abi


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Recogizer Group GmbH

Dr.rer.nat. Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555

E-Mail erekle.magra...@recogizer.de
Web: www.recogizer.com
 
Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/

Folgen Sie uns auf Twitter https://twitter.com/recogizer
 
-

Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
 
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.

Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben,
informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der 
darin enthaltenen Informationen ist nicht gestattet.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] is arbiter configuration needed?

2017-06-23 Thread Erekle Magradze

Thanks a lot Kasturl


On 06/23/2017 03:25 PM, knarra wrote:

On 06/23/2017 03:38 PM, Erekle Magradze wrote:

Hello,
I am using glusterfs as the storage backend for the VM images, 
volumes for oVirt consist of three bricks, is it still necessary to 
configure the arbiter to be on the safe side? or since the number of 
bricks is odd it will be done out of the box?

Thanks in advance
Cheers
Erekle
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Hi,

An arbiter volume is a special class of replica-3 volume. Arbiter is 
special because the third brick of replica set contains only directory 
hierarchy information and metadata. Therefore, arbiter provides 
split-brain protection with the equivalent consistency of a replica-3 
volume without incurring the additional storage space overhead.


If you already have a replica volume in your config with three 
bricks then that config should be good. You do not need to create a 
arbiter.


Hope this helps !!

Thanks

kasturi



--
Recogizer Group GmbH

Erekle Magradze
Lead Big Data Engineering & DevOps
Rheinwerkallee 2, 53227 Bonn
Tel: +49 228 29974555

E-Mail erekle.magra...@recogizer.de
Web: www.recogizer.com
 
Recogizer auf LinkedIn https://www.linkedin.com/company-beta/10039182/

Folgen Sie uns auf Twitter https://twitter.com/recogizer
 
-

Recogizer Group GmbH
Geschäftsführer: Oliver Habisch, Carsten Kreutze
Handelsregister: Amtsgericht Bonn HRB 20724
Sitz der Gesellschaft: Bonn; USt-ID-Nr.: DE294195993
 
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen.

Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben,
informieren Sie bitte sofort den Absender und löschen Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der 
darin enthaltenen Informationen ist nicht gestattet.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] is arbiter configuration needed?

2017-06-23 Thread Erekle Magradze

Hello,
I am using glusterfs as the storage backend for the VM images, volumes 
for oVirt consist of three bricks, is it still necessary to configure 
the arbiter to be on the safe side? or since the number of bricks is odd 
it will be done out of the box?

Thanks in advance
Cheers
Erekle
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users