[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-16 Thread Leo David
Just thinking
Maybe different options where configured to the volumes during update, that
could make them unstable ?
ie: sharding or something else

On Sat, Feb 16, 2019, 13:26 Darryl Scott  Sandro
>
>
> I don't have ovirt-log-collector on my ovirt engine.  How can obtain?  I
> see a github repo to make file, I do not want to be making files on my
> ovirt-engine, just not yet, I could possible on weekend.
>
>
> Where can I obtain the ovirt-log-collector?
>
>
>
> --
> *From:* Sandro Bonazzola 
> *Sent:* Thursday, February 14, 2019 9:16:05 AM
> *To:* Jayme
> *Cc:* Darryl Scott; users
> *Subject:* Re: [ovirt-users] Re: Ovirt Cluster completely unstable
>
>
>
> Il giorno gio 14 feb 2019 alle ore 07:54 Jayme  ha
> scritto:
>
> I have a three node HCI gluster which was previously running 4.2 with zero
> problems.  I just upgraded it yesterday.  I ran in to a few bugs right away
> with the upgrade process, but aside from that I also discovered other users
> with severe GlusterFS problems since the upgrade to new GlusterFS version.
> It is less than 24 hours since I upgrade my cluster and I just got a notice
> that one of my GlusterFS bricks is offline.  There does appear to be a very
> real and serious issue here with the latest updates.
>
>
> tracking the issue on Gluster side on this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1677160
> If you can help Gluster community providing requested logs it would be
> great.
>
>
>
>
>
>
> On Wed, Feb 13, 2019 at 7:26 PM  wrote:
>
> I'm abandoning my production ovirt cluster due to instability.   I have a
> 7 host cluster running about 300 vms and have been for over a year.  It has
> become unstable over the past three days.  I have random hosts both,
> compute and storage disconnecting.  AND many vms disconnecting and becoming
> unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
> it to the storage people but have not responded with any meaningful
> information.  I have submitted several logs.
>
> I have found some discussion on problems with instability with gluster
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
> if that's the culprit.  I installed gluster using the ovirt gui and this is
> the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to
> get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/
>
>
>
> --
>
> SANDRO BONAZZOLA
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>
> Red Hat EMEA <https://www.redhat.com/>
>
> sbona...@redhat.com
> <https://red.ht/sig>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ECPLXX5JIG5VCIQZDH5KWTWOCXGJYD6Z/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BDMYNYUGH72J2OQAWHHK5DJI2HVCIJH6/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-16 Thread Darryl Scott
Sandro


I don't have ovirt-log-collector on my ovirt engine.  How can obtain?  I see a 
github repo to make file, I do not want to be making files on my ovirt-engine, 
just not yet, I could possible on weekend.


Where can I obtain the ovirt-log-collector?




From: Sandro Bonazzola 
Sent: Thursday, February 14, 2019 9:16:05 AM
To: Jayme
Cc: Darryl Scott; users
Subject: Re: [ovirt-users] Re: Ovirt Cluster completely unstable



Il giorno gio 14 feb 2019 alle ore 07:54 Jayme 
mailto:jay...@gmail.com>> ha scritto:
I have a three node HCI gluster which was previously running 4.2 with zero 
problems.  I just upgraded it yesterday.  I ran in to a few bugs right away 
with the upgrade process, but aside from that I also discovered other users 
with severe GlusterFS problems since the upgrade to new GlusterFS version.  It 
is less than 24 hours since I upgrade my cluster and I just got a notice that 
one of my GlusterFS bricks is offline.  There does appear to be a very real and 
serious issue here with the latest updates.

tracking the issue on Gluster side on this bug: 
https://bugzilla.redhat.com/show_bug.cgi?id=1677160
If you can help Gluster community providing requested logs it would be great.





On Wed, Feb 13, 2019 at 7:26 PM 
mailto:dsc...@umbctraining.com>> wrote:
I'm abandoning my production ovirt cluster due to instability.   I have a 7 
host cluster running about 300 vms and have been for over a year.  It has 
become unstable over the past three days.  I have random hosts both, compute 
and storage disconnecting.  AND many vms disconnecting and becoming unusable.

7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts 
running 3.12.5.  I submitted a bugzilla bug and they immediately assigned it to 
the storage people but have not responded with any meaningful information.  I 
have submitted several logs.

I have found some discussion on problems with instability with gluster 3.12.5.  
I would be willing to upgrade my gluster to a more stable version if that's the 
culprit.  I installed gluster using the ovirt gui and this is the version the 
ovirt gui installed.

Is there an ovirt health monitor available?  Where should I be looking to get a 
resolution the problems I'm facing.
___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/


--

SANDRO BONAZZOLA

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA<https://www.redhat.com/>

sbona...@redhat.com<mailto:sbona...@redhat.com>

[https://www.redhat.com/files/brand/email/sig-redhat.png]<https://red.ht/sig>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ECPLXX5JIG5VCIQZDH5KWTWOCXGJYD6Z/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-15 Thread Ryan Bullock
Just to add, after we updated to 4.3 our gluster just went south.
Thankfully gluster is only secondary storage for us, and our primary
storage is an ISCSI SAN.  We migrated everything over to the SAN that we
could, but a few VM's got corrupted by gluster (data was gone). Right now
we just have gluster off and set to maintenance because the connectivity
issues were causing our main cluster to continuously migrate VMs.

Looking at the gluster hosts themselves I noticed that heal info would
often report one brick down, even if ovirt didn't. Checking the status of
glusterd would also show health checks failed.

Feb  6 15:36:37 vmc3h1 glusterfs-virtstore[17036]: [2019-02-06
23:36:37.937041] M [MSGID: 113075]
[posix-helpers.c:1957:posix_health_check_thread_proc] 0-VirstStore-posix:
health-check failed, going down
Feb  6 15:36:37 vmc3h1 glusterfs-virtstore[17036]: [2019-02-06
23:36:37.937561] M [MSGID: 113075]
[posix-helpers.c:1975:posix_health_check_thread_proc] 0-VirstStore-posix:
still alive! -> SIGTERM

I think the health-check is failing (maybe erroneously) which is then
killing the brick. When this happens it just causing a continuous cycle of
brick up-down and healing, and in turn connectivity issues.

This is our second time running into issues with gluster, so I think we are
going to sideline it for awhile.

-Ryan

On Thu, Feb 14, 2019 at 12:47 PM Darryl Scott 
wrote:

> I do believe something went wrong after fully updating everything last
> Friday.  I updated all the ovirt compute nodes on Friday and gluster/engine
> on Saturday.  I have been experiencing these issues every since.  I have
> pour over engine.log and seems to be connection to storage issue.
>
>
> --
> *From:* Jayme 
> *Sent:* Thursday, February 14, 2019 1:52:59 AM
> *To:* Darryl Scott
> *Cc:* users
> *Subject:* Re: [ovirt-users] Ovirt Cluster completely unstable
>
> I have a three node HCI gluster which was previously running 4.2 with zero
> problems.  I just upgraded it yesterday.  I ran in to a few bugs right away
> with the upgrade process, but aside from that I also discovered other users
> with severe GlusterFS problems since the upgrade to new GlusterFS version.
> It is less than 24 hours since I upgrade my cluster and I just got a notice
> that one of my GlusterFS bricks is offline.  There does appear to be a very
> real and serious issue here with the latest updates.
>
>
> On Wed, Feb 13, 2019 at 7:26 PM  wrote:
>
> I'm abandoning my production ovirt cluster due to instability.   I have a
> 7 host cluster running about 300 vms and have been for over a year.  It has
> become unstable over the past three days.  I have random hosts both,
> compute and storage disconnecting.  AND many vms disconnecting and becoming
> unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
> it to the storage people but have not responded with any meaningful
> information.  I have submitted several logs.
>
> I have found some discussion on problems with instability with gluster
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
> if that's the culprit.  I installed gluster using the ovirt gui and this is
> the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to
> get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/IMUKFFANNJXLKXNVGMMJ6Y7MOLW2CQE3/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MVHTTLRZRN4XJA6BXCQQ5NIVPH7SPJCU/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Darryl Scott
I do believe something went wrong after fully updating everything last Friday.  
I updated all the ovirt compute nodes on Friday and gluster/engine on Saturday. 
 I have been experiencing these issues every since.  I have pour over 
engine.log and seems to be connection to storage issue.



From: Jayme 
Sent: Thursday, February 14, 2019 1:52:59 AM
To: Darryl Scott
Cc: users
Subject: Re: [ovirt-users] Ovirt Cluster completely unstable

I have a three node HCI gluster which was previously running 4.2 with zero 
problems.  I just upgraded it yesterday.  I ran in to a few bugs right away 
with the upgrade process, but aside from that I also discovered other users 
with severe GlusterFS problems since the upgrade to new GlusterFS version.  It 
is less than 24 hours since I upgrade my cluster and I just got a notice that 
one of my GlusterFS bricks is offline.  There does appear to be a very real and 
serious issue here with the latest updates.


On Wed, Feb 13, 2019 at 7:26 PM 
mailto:dsc...@umbctraining.com>> wrote:
I'm abandoning my production ovirt cluster due to instability.   I have a 7 
host cluster running about 300 vms and have been for over a year.  It has 
become unstable over the past three days.  I have random hosts both, compute 
and storage disconnecting.  AND many vms disconnecting and becoming unusable.

7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts 
running 3.12.5.  I submitted a bugzilla bug and they immediately assigned it to 
the storage people but have not responded with any meaningful information.  I 
have submitted several logs.

I have found some discussion on problems with instability with gluster 3.12.5.  
I would be willing to upgrade my gluster to a more stable version if that's the 
culprit.  I installed gluster using the ovirt gui and this is the version the 
ovirt gui installed.

Is there an ovirt health monitor available?  Where should I be looking to get a 
resolution the problems I'm facing.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IMUKFFANNJXLKXNVGMMJ6Y7MOLW2CQE3/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Donny Davis
Is it this bug

https://bugzilla.redhat.com/show_bug.cgi?id=1651246


On Thu, Feb 14, 2019 at 11:50 AM Jayme  wrote:

> [2019-02-14 02:20:29.611099] I [login.c:110:gf_auth] 0-auth/login: allowed
> user names: 7b741fe4-72ca-41ba-8efb-7add1e4fe6f3
> [2019-02-14 02:20:29.611131] I [MSGID: 115029]
> [server-handshake.c:537:server_setvolume] 0-non_prod_b-server: accepted
> client from
> CTX_ID:ee716e24-e187-4b57-a371-cab544f41162-GRAPH_ID:0-PID:30671-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> (version: 5.3)
> [2019-02-14 02:20:29.619521] I [MSGID: 115036]
> [server.c:469:server_rpc_notify] 0-non_prod_b-server: disconnecting
> connection from
> CTX_ID:ee716e24-e187-4b57-a371-cab544f41162-GRAPH_ID:0-PID:30671-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> [2019-02-14 02:20:29.619867] I [MSGID: 101055]
> [client_t.c:435:gf_client_unref] 0-non_prod_b-server: Shutting down
> connection
> CTX_ID:ee716e24-e187-4b57-a371-cab544f41162-GRAPH_ID:0-PID:30671-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 19988 times between [2019-02-14 02:19:31.377315] and
> [2019-02-14 02:21:14.033991]
> [2019-02-14 02:21:30.303440] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 10 times between [2019-02-14 02:21:30.303440] and
> [2019-02-14 02:23:20.421140]
> [2019-02-14 02:23:33.142281] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 34 times between [2019-02-14 02:23:33.142281] and
> [2019-02-14 02:25:29.115156]
> [2019-02-14 02:25:30.326469] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-02-14 02:25:53.973830] I [addr.c:54:compare_addr_and_update]
> 0-/gluster_bricks/non_prod_b/non_prod_b: allowed = "*", received addr =
> "10.11.0.222"
> [2019-02-14 02:25:53.973896] I [login.c:110:gf_auth] 0-auth/login: allowed
> user names: 7b741fe4-72ca-41ba-8efb-7add1e4fe6f3
> [2019-02-14 02:25:53.973928] I [MSGID: 115029]
> [server-handshake.c:537:server_setvolume] 0-non_prod_b-server: accepted
> client from
> CTX_ID:4a6b8860-8274-4b3b-b400-d66cbfb97349-GRAPH_ID:0-PID:33522-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> (version: 5.3)
> [2019-02-14 02:25:54.627728] I [MSGID: 115036]
> [server.c:469:server_rpc_notify] 0-non_prod_b-server: disconnecting
> connection from
> CTX_ID:4a6b8860-8274-4b3b-b400-d66cbfb97349-GRAPH_ID:0-PID:33522-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> [2019-02-14 02:25:54.628149] I [MSGID: 101055]
> [client_t.c:435:gf_client_unref] 0-non_prod_b-server: Shutting down
> connection
> CTX_ID:4a6b8860-8274-4b3b-b400-d66cbfb97349-GRAPH_ID:0-PID:33522-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> [2019-02-14 02:25:56.396855] I [addr.c:54:compare_addr_and_update]
> 0-/gluster_bricks/non_prod_b/non_prod_b: allowed = "*", received addr =
> "10.11.0.220"
> [2019-02-14 02:25:56.396926] I [login.c:110:gf_auth] 0-auth/login: allowed
> user names: 7b741fe4-72ca-41ba-8efb-7add1e4fe6f3
> [2019-02-14 02:25:56.396957] I [MSGID: 115029]
> [server-handshake.c:537:server_setvolume] 0-non_prod_b-server: accepted
> client from
> CTX_ID:963c2196-108c-485d-aca6-a236906d2acf-GRAPH_ID:0-PID:33635-HOST:host0.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> (version: 5.3)
> [2019-02-14 02:25:56.404566] I [MSGID: 115036]
> [server.c:469:server_rpc_notify] 0-non_prod_b-server: disconnecting
> connection from
> CTX_ID:963c2196-108c-485d-aca6-a236906d2acf-GRAPH_ID:0-PID:33635-HOST:host0.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> [2019-02-14 02:25:56.404866] I [MSGID: 101055]
> [client_t.c:435:gf_client_unref] 0-non_prod_b-server: Shutting down
> connection
> CTX_ID:963c2196-108c-485d-aca6-a236906d2acf-GRAPH_ID:0-PID:33635-HOST:host0.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 25 times between [2019-02-14 02:25:30.326469] and
> [2019-02-14 02:27:25.965601]
> [2019-02-14 02:28:10.538374] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 38 times between [2019-02-14 02:28:10.538374] and
> [2019-02-14 02:29:22.622679]
> [2019-02-14 02:29:48.891040] E [MSGID: 101191]
> [event

[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Jayme
[2019-02-14 02:20:29.611099] I [login.c:110:gf_auth] 0-auth/login: allowed
user names: 7b741fe4-72ca-41ba-8efb-7add1e4fe6f3
[2019-02-14 02:20:29.611131] I [MSGID: 115029]
[server-handshake.c:537:server_setvolume] 0-non_prod_b-server: accepted
client from
CTX_ID:ee716e24-e187-4b57-a371-cab544f41162-GRAPH_ID:0-PID:30671-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
(version: 5.3)
[2019-02-14 02:20:29.619521] I [MSGID: 115036]
[server.c:469:server_rpc_notify] 0-non_prod_b-server: disconnecting
connection from
CTX_ID:ee716e24-e187-4b57-a371-cab544f41162-GRAPH_ID:0-PID:30671-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
[2019-02-14 02:20:29.619867] I [MSGID: 101055]
[client_t.c:435:gf_client_unref] 0-non_prod_b-server: Shutting down
connection
CTX_ID:ee716e24-e187-4b57-a371-cab544f41162-GRAPH_ID:0-PID:30671-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 19988 times between [2019-02-14 02:19:31.377315] and
[2019-02-14 02:21:14.033991]
[2019-02-14 02:21:30.303440] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 10 times between [2019-02-14 02:21:30.303440] and
[2019-02-14 02:23:20.421140]
[2019-02-14 02:23:33.142281] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 34 times between [2019-02-14 02:23:33.142281] and
[2019-02-14 02:25:29.115156]
[2019-02-14 02:25:30.326469] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-02-14 02:25:53.973830] I [addr.c:54:compare_addr_and_update]
0-/gluster_bricks/non_prod_b/non_prod_b: allowed = "*", received addr =
"10.11.0.222"
[2019-02-14 02:25:53.973896] I [login.c:110:gf_auth] 0-auth/login: allowed
user names: 7b741fe4-72ca-41ba-8efb-7add1e4fe6f3
[2019-02-14 02:25:53.973928] I [MSGID: 115029]
[server-handshake.c:537:server_setvolume] 0-non_prod_b-server: accepted
client from
CTX_ID:4a6b8860-8274-4b3b-b400-d66cbfb97349-GRAPH_ID:0-PID:33522-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
(version: 5.3)
[2019-02-14 02:25:54.627728] I [MSGID: 115036]
[server.c:469:server_rpc_notify] 0-non_prod_b-server: disconnecting
connection from
CTX_ID:4a6b8860-8274-4b3b-b400-d66cbfb97349-GRAPH_ID:0-PID:33522-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
[2019-02-14 02:25:54.628149] I [MSGID: 101055]
[client_t.c:435:gf_client_unref] 0-non_prod_b-server: Shutting down
connection
CTX_ID:4a6b8860-8274-4b3b-b400-d66cbfb97349-GRAPH_ID:0-PID:33522-HOST:host2.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
[2019-02-14 02:25:56.396855] I [addr.c:54:compare_addr_and_update]
0-/gluster_bricks/non_prod_b/non_prod_b: allowed = "*", received addr =
"10.11.0.220"
[2019-02-14 02:25:56.396926] I [login.c:110:gf_auth] 0-auth/login: allowed
user names: 7b741fe4-72ca-41ba-8efb-7add1e4fe6f3
[2019-02-14 02:25:56.396957] I [MSGID: 115029]
[server-handshake.c:537:server_setvolume] 0-non_prod_b-server: accepted
client from
CTX_ID:963c2196-108c-485d-aca6-a236906d2acf-GRAPH_ID:0-PID:33635-HOST:host0.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
(version: 5.3)
[2019-02-14 02:25:56.404566] I [MSGID: 115036]
[server.c:469:server_rpc_notify] 0-non_prod_b-server: disconnecting
connection from
CTX_ID:963c2196-108c-485d-aca6-a236906d2acf-GRAPH_ID:0-PID:33635-HOST:host0.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
[2019-02-14 02:25:56.404866] I [MSGID: 101055]
[client_t.c:435:gf_client_unref] 0-non_prod_b-server: Shutting down
connection
CTX_ID:963c2196-108c-485d-aca6-a236906d2acf-GRAPH_ID:0-PID:33635-HOST:host0.replaced.domain.com-PC_NAME:non_prod_b-client-2-RECON_NO:-0
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 25 times between [2019-02-14 02:25:30.326469] and
[2019-02-14 02:27:25.965601]
[2019-02-14 02:28:10.538374] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 38 times between [2019-02-14 02:28:10.538374] and
[2019-02-14 02:29:22.622679]
[2019-02-14 02:29:48.891040] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-02-14 02:29:56.026002] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-02-14 02:31:22.494824] I [addr.c:54:compare_addr_and_update]
0-/gluster_b

[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Sahina Bose
On Thu, Feb 14, 2019 at 8:24 PM Jayme  wrote:

> https://bugzilla.redhat.com/show_bug.cgi?id=1677160 doesn't seem relevant
> to me?  Is that the correct link?
>
> Like I mentioned in a previous email I'm also having problems with Gluster
> bricks going offline since upgrading to oVirt 4.3 yesterday (previously
> I've never had a single issue with gluster nor have had a brick ever go
> down).  I suspect this will continue to happen daily as some other users on
> this group have suggested.  I was able to pull some logs from engine and
> gluster from around the time the brick dropped.  My setup is 3 node HCI and
> I was previously running the latest 4.2 updates (before upgrading to 4.3).
> My hardware is has a lot of overhead and I'm on 10Gbe gluster backend (the
> servers were certainly not under any significant amount of load when the
> brick went offline).  To recover I had to place the host in maintenance
> mode and reboot (although I suspect I could have simply unmounted and
> remounted gluster mounts).
>

Anything in the brick logs..the below logs only indicate that engine
detected that brick was down. To get to why the brick was marked down, the
bricks logs would help


> grep "2019-02-14" engine.log-20190214 | grep "GLUSTER_BRICK_STATUS_CHANGED"
> 2019-02-14 02:41:48,018-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler1) [5ff5b093] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume
> non_prod_b of cluster Default from UP to DOWN via cli.
> 2019-02-14 03:20:11,189-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler3) [760f7851] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/engine/engine of volume engine
> of cluster Default from DOWN to UP via cli.
> 2019-02-14 03:20:14,819-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler3) [760f7851] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/prod_b/prod_b of volume prod_b
> of cluster Default from DOWN to UP via cli.
> 2019-02-14 03:20:19,692-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler3) [760f7851] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/isos/isos of volume isos of
> cluster Default from DOWN to UP via cli.
> 2019-02-14 03:20:25,022-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler3) [760f7851] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/prod_a/prod_a of volume prod_a
> of cluster Default from DOWN to UP via cli.
> 2019-02-14 03:20:29,088-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler3) [760f7851] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume
> non_prod_b of cluster Default from DOWN to UP via cli.
> 2019-02-14 03:20:34,099-04 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler3) [760f7851] EVENT_ID:
> GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
> host2.replaced.domain.com:/gluster_bricks/non_prod_a/non_prod_a of volume
> non_prod_a of cluster Default from DOWN to UP via cli
>
> glusterd.log
>
> # grep -B20 -A20 "2019-02-14 02:41" glusterd.log
> [2019-02-14 02:36:49.585034] I [MSGID: 106499]
> [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume non_prod_b
> [2019-02-14 02:36:49.597788] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 2 times between [2019-02-14 02:36:49.597788] and
> [2019-02-14 02:36:49.900505]
> [2019-02-14 02:36:53.437539] I [MSGID: 106499]
> [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume non_prod_a
> [2019-02-14 02:36:53.452816] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-02-14 02:36:53.864153] I [MSGID: 106499]
> [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume non_prod_a
> [2019-02-14 02:36:53.875835] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
> [2019-02-14 02:36:30.95864

[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Oliver Riesener

Hi Jayme,

btw. in the past there was  a long hunting for gluster problems on this list.
as resolution, there was a failed single disk drive on one gluster host.
the drive was direct connected without controller and smart checks,
so no alert was generated, only gluster problems over days.

please check you *physical* existents and online status of your gluster drives.

my two cents

Oliver
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CBZIZPNEJ5B2L4VSASAQ6LO7P3WODDHZ/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Jayme
Oliver,

Thanks for the input, I do recall reading that thread before, I'm 99.9%
sure it's not the problem here but I will double check, if anything to rule
it out.  These bricks are new enterprise SSDs that are less than 3 months
old with almost 0 wear on them and the issues I'm experiencing only started
within hours after upgrading my environment to oVirt 4.3 (and is the same
issue other users are complaining about regarding gluster bricks going
offline after upgrading to 4.3).  I think it's fairly clear that there is a
gluster problem in play.

On Thu, Feb 14, 2019 at 11:45 AM Oliver Riesener <
oliver.riese...@hs-bremen.de> wrote:

>
> Hi Jayme,
>
> btw. in the past there was  a long hunting for gluster problems on this
> list.
> as resolution, there was a failed single disk drive on one gluster host.
> the drive was direct connected without controller and smart checks,
> so no alert was generated, only gluster problems over days.
>
> please check you *physical* existents and online status of your gluster
> drives.
>
> my two cents
>
> Oliver
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LDFUIJFUZGAVVO4EVTSVXZGUCUAKSIDP/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Sandro Bonazzola
Il giorno gio 14 feb 2019 alle ore 16:12 Darryl Scott <
dsc...@umbctraining.com> ha scritto:

> Sandro
>
>
> I don't have ovirt-log-collector on my ovirt engine.  How can obtain?  I
> see a github repo to make file, I do not want to be making files on my
> ovirt-engine, just not yet, I could possible on weekend.
>

> Where can I obtain the ovirt-log-collector?
>
>
>
Just "yum install ovirt-log-collector" should install it for you :-)


>
> --
> *From:* Sandro Bonazzola 
> *Sent:* Thursday, February 14, 2019 9:16:05 AM
> *To:* Jayme
> *Cc:* Darryl Scott; users
> *Subject:* Re: [ovirt-users] Re: Ovirt Cluster completely unstable
>
>
>
> Il giorno gio 14 feb 2019 alle ore 07:54 Jayme  ha
> scritto:
>
> I have a three node HCI gluster which was previously running 4.2 with zero
> problems.  I just upgraded it yesterday.  I ran in to a few bugs right away
> with the upgrade process, but aside from that I also discovered other users
> with severe GlusterFS problems since the upgrade to new GlusterFS version.
> It is less than 24 hours since I upgrade my cluster and I just got a notice
> that one of my GlusterFS bricks is offline.  There does appear to be a very
> real and serious issue here with the latest updates.
>
>
> tracking the issue on Gluster side on this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1677160
> If you can help Gluster community providing requested logs it would be
> great.
>
>
>
>
>
>
> On Wed, Feb 13, 2019 at 7:26 PM  wrote:
>
> I'm abandoning my production ovirt cluster due to instability.   I have a
> 7 host cluster running about 300 vms and have been for over a year.  It has
> become unstable over the past three days.  I have random hosts both,
> compute and storage disconnecting.  AND many vms disconnecting and becoming
> unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
> it to the storage people but have not responded with any meaningful
> information.  I have submitted several logs.
>
> I have found some discussion on problems with instability with gluster
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
> if that's the culprit.  I installed gluster using the ovirt gui and this is
> the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to
> get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/
>
>
>
> --
>
> SANDRO BONAZZOLA
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>
> Red Hat EMEA <https://www.redhat.com/>
>
> sbona...@redhat.com
> <https://red.ht/sig>
>


-- 

SANDRO BONAZZOLA

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA <https://www.redhat.com/>

sbona...@redhat.com
<https://red.ht/sig>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EEVEUNX7IKS6LM7TV45475GTKAY5R4J3/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Jayme
https://bugzilla.redhat.com/show_bug.cgi?id=1677160 doesn't seem relevant
to me?  Is that the correct link?

Like I mentioned in a previous email I'm also having problems with Gluster
bricks going offline since upgrading to oVirt 4.3 yesterday (previously
I've never had a single issue with gluster nor have had a brick ever go
down).  I suspect this will continue to happen daily as some other users on
this group have suggested.  I was able to pull some logs from engine and
gluster from around the time the brick dropped.  My setup is 3 node HCI and
I was previously running the latest 4.2 updates (before upgrading to 4.3).
My hardware is has a lot of overhead and I'm on 10Gbe gluster backend (the
servers were certainly not under any significant amount of load when the
brick went offline).  To recover I had to place the host in maintenance
mode and reboot (although I suspect I could have simply unmounted and
remounted gluster mounts).

grep "2019-02-14" engine.log-20190214 | grep "GLUSTER_BRICK_STATUS_CHANGED"
2019-02-14 02:41:48,018-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler1) [5ff5b093] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume
non_prod_b of cluster Default from UP to DOWN via cli.
2019-02-14 03:20:11,189-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler3) [760f7851] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/engine/engine of volume engine of
cluster Default from DOWN to UP via cli.
2019-02-14 03:20:14,819-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler3) [760f7851] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/prod_b/prod_b of volume prod_b of
cluster Default from DOWN to UP via cli.
2019-02-14 03:20:19,692-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler3) [760f7851] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/isos/isos of volume isos of
cluster Default from DOWN to UP via cli.
2019-02-14 03:20:25,022-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler3) [760f7851] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/prod_a/prod_a of volume prod_a of
cluster Default from DOWN to UP via cli.
2019-02-14 03:20:29,088-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler3) [760f7851] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/non_prod_b/non_prod_b of volume
non_prod_b of cluster Default from DOWN to UP via cli.
2019-02-14 03:20:34,099-04 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler3) [760f7851] EVENT_ID:
GLUSTER_BRICK_STATUS_CHANGED(4,086), Detected change in status of brick
host2.replaced.domain.com:/gluster_bricks/non_prod_a/non_prod_a of volume
non_prod_a of cluster Default from DOWN to UP via cli

glusterd.log

# grep -B20 -A20 "2019-02-14 02:41" glusterd.log
[2019-02-14 02:36:49.585034] I [MSGID: 106499]
[glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume non_prod_b
[2019-02-14 02:36:49.597788] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 2 times between [2019-02-14 02:36:49.597788] and
[2019-02-14 02:36:49.900505]
[2019-02-14 02:36:53.437539] I [MSGID: 106499]
[glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume non_prod_a
[2019-02-14 02:36:53.452816] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-02-14 02:36:53.864153] I [MSGID: 106499]
[glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume non_prod_a
[2019-02-14 02:36:53.875835] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-02-14 02:36:30.958649] I [MSGID: 106499]
[glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume engine
[2019-02-14 02:36:35.322129] I [MSGID: 106499]
[glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume prod_b
[2019-02-14 02:36:39.639645] I [MSGID: 106499]
[glusterd-handler.c:4389:__glusterd_

[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Sandro Bonazzola
Il giorno gio 14 feb 2019 alle ore 07:54 Jayme  ha
scritto:

> I have a three node HCI gluster which was previously running 4.2 with zero
> problems.  I just upgraded it yesterday.  I ran in to a few bugs right away
> with the upgrade process, but aside from that I also discovered other users
> with severe GlusterFS problems since the upgrade to new GlusterFS version.
> It is less than 24 hours since I upgrade my cluster and I just got a notice
> that one of my GlusterFS bricks is offline.  There does appear to be a very
> real and serious issue here with the latest updates.
>

tracking the issue on Gluster side on this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1677160
If you can help Gluster community providing requested logs it would be
great.




>
>
> On Wed, Feb 13, 2019 at 7:26 PM  wrote:
>
>> I'm abandoning my production ovirt cluster due to instability.   I have a
>> 7 host cluster running about 300 vms and have been for over a year.  It has
>> become unstable over the past three days.  I have random hosts both,
>> compute and storage disconnecting.  AND many vms disconnecting and becoming
>> unusable.
>>
>> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
>> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
>> it to the storage people but have not responded with any meaningful
>> information.  I have submitted several logs.
>>
>> I have found some discussion on problems with instability with gluster
>> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
>> if that's the culprit.  I installed gluster using the ovirt gui and this is
>> the version the ovirt gui installed.
>>
>> Is there an ovirt health monitor available?  Where should I be looking to
>> get a resolution the problems I'm facing.
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/
>


-- 

SANDRO BONAZZOLA

MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV

Red Hat EMEA 

sbona...@redhat.com

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FXNK7NRK3XJRN5ZOYWOTKGMT273NFFZ7/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Sahina Bose
On Thu, Feb 14, 2019 at 4:56 AM  wrote:
>
> I'm abandoning my production ovirt cluster due to instability.   I have a 7 
> host cluster running about 300 vms and have been for over a year.  It has 
> become unstable over the past three days.  I have random hosts both, compute 
> and storage disconnecting.  AND many vms disconnecting and becoming unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts 
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned it 
> to the storage people but have not responded with any meaningful information. 
>  I have submitted several logs.

Can you point to the bug filed?
+Krutika Dhananjay to look at it

>
> I have found some discussion on problems with instability with gluster 
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version if 
> that's the culprit.  I installed gluster using the ovirt gui and this is the 
> version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to get 
> a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NFFMACCICXTDNEFKRDATHQ44MX44YDVX/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-14 Thread Ralf Schenk
Hello,

my problems on gluster started with 4.2.6 or 4.2.7. around end of
September. I still have VM's paused the one or other day an they are
reactivated either by HA oder manually. So i want to testify your
experiences. Even while I'm using bonded network connections there are
communication problems without heavy load or other tasks running.

My new Cluster on EPYC Hardware running on NFS 4.2 storage volumes based
on ZFS runs rock solid and VM's are much faster regarding I/O. Gluster
3.12.5 sucks !

Bye

Am 14.02.2019 um 07:52 schrieb Jayme:
> I have a three node HCI gluster which was previously running 4.2 with
> zero problems.  I just upgraded it yesterday.  I ran in to a few bugs
> right away with the upgrade process, but aside from that I also
> discovered other users with severe GlusterFS problems since the
> upgrade to new GlusterFS version.  It is less than 24 hours since I
> upgrade my cluster and I just got a notice that one of my GlusterFS
> bricks is offline.  There does appear to be a very real and serious
> issue here with the latest updates.
>
>
> On Wed, Feb 13, 2019 at 7:26 PM  > wrote:
>
> I'm abandoning my production ovirt cluster due to instability.   I
> have a 7 host cluster running about 300 vms and have been for over
> a year.  It has become unstable over the past three days.  I have
> random hosts both, compute and storage disconnecting.  AND many
> vms disconnecting and becoming unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs
> hosts running 3.12.5.  I submitted a bugzilla bug and they
> immediately assigned it to the storage people but have not
> responded with any meaningful information.  I have submitted
> several logs. 
>
> I have found some discussion on problems with instability with
> gluster 3.12.5.  I would be willing to upgrade my gluster to a
> more stable version if that's the culprit.  I installed gluster
> using the ovirt gui and this is the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be
> looking to get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org
> 
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/
-- 


*Ralf Schenk*
fon +49 (0) 24 05 / 40 83 70
fax +49 (0) 24 05 / 40 83 759
mail *r...@databay.de* 
    
*Databay AG*
Jens-Otto-Krag-Straße 11
D-52146 Würselen
*www.databay.de* 

Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
Philipp Hermanns
Aufsichtsratsvorsitzender: Wilhelm Dohmen


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RLO37RGAEVJHRQW3YJT4LE5I7X52CUPU/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-13 Thread Jayme
I have a three node HCI gluster which was previously running 4.2 with zero
problems.  I just upgraded it yesterday.  I ran in to a few bugs right away
with the upgrade process, but aside from that I also discovered other users
with severe GlusterFS problems since the upgrade to new GlusterFS version.
It is less than 24 hours since I upgrade my cluster and I just got a notice
that one of my GlusterFS bricks is offline.  There does appear to be a very
real and serious issue here with the latest updates.


On Wed, Feb 13, 2019 at 7:26 PM  wrote:

> I'm abandoning my production ovirt cluster due to instability.   I have a
> 7 host cluster running about 300 vms and have been for over a year.  It has
> become unstable over the past three days.  I have random hosts both,
> compute and storage disconnecting.  AND many vms disconnecting and becoming
> unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
> it to the storage people but have not responded with any meaningful
> information.  I have submitted several logs.
>
> I have found some discussion on problems with instability with gluster
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
> if that's the culprit.  I installed gluster using the ovirt gui and this is
> the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to
> get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QULCBXHTKSCPKH4UV6GLMOLJE6J7M5UW/


[ovirt-users] Re: Ovirt Cluster completely unstable

2019-02-13 Thread Leo David
Hi,
I would have a look at engine.log, it might provide usefull informations.
Also, i would test i different storage type (maybe a quick nfs data domain
) and see if problem persist with that one too.


On Thu, Feb 14, 2019, 01:26  I'm abandoning my production ovirt cluster due to instability.   I have a
> 7 host cluster running about 300 vms and have been for over a year.  It has
> become unstable over the past three days.  I have random hosts both,
> compute and storage disconnecting.  AND many vms disconnecting and becoming
> unusable.
>
> 7 host are 4 compute hosts running Ovirt 4.2.8 and three glusterfs hosts
> running 3.12.5.  I submitted a bugzilla bug and they immediately assigned
> it to the storage people but have not responded with any meaningful
> information.  I have submitted several logs.
>
> I have found some discussion on problems with instability with gluster
> 3.12.5.  I would be willing to upgrade my gluster to a more stable version
> if that's the culprit.  I installed gluster using the ovirt gui and this is
> the version the ovirt gui installed.
>
> Is there an ovirt health monitor available?  Where should I be looking to
> get a resolution the problems I'm facing.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BL4M3JQA3IEXCQUY4IGQXOAALRUQ7TVB/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OT2IJ7TJXFJ5BA5POEPHCDYI6LRKVGZT/