On Sun, Oct 28, 2018 at 5:17 PM fsoyer <[email protected]> wrote: > > > Well guys, > I can say now that I have a real problem, maybe between ovirt and gluster > storage, but I can't be sure. Yesterday, I wanted to clone a VM (named > "crij2") from a snapshot, but (this is another problem I think) the UI never > gave me the popup (blank window with the cursor with a message 400 after a > timeout). So I decided to export it, then import it. > The export/import finally works, but when it was working, some VMs became > randomly unresponsives, and one restarted on error. At this time, the engine > was on "ginger" node. Copy of the event log : > 27 oct. 2018 20:32:12 VM crij2 started on Host victor.local.systea.fr > 27 oct. 2018 20:31:37 VM crij2 was started by admin@internal-authz (Host: > victor.local.systea.fr). > 27 oct. 2018 20:26:39 Vm crij2 was imported successfully to Data Center > Default, Cluster Default > 27 oct. 2018 20:22:53 VM logcollector is not responding. > 27 oct. 2018 20:22:10 VM Sogov3 is not responding. > 27 oct. 2018 20:17:53 VM cerbere4 is not responding. > 27 oct. 2018 20:17:49 VM cerbere3 is not responding. > 27 oct. 2018 20:17:48 VM logcollector is not responding. > 27 oct. 2018 20:16:38 VM Sogov3 is not responding. > 27 oct. 2018 20:16:38 VM cerbere4 is not responding. > 27 oct. 2018 20:16:38 VM op2drugs1 is not responding. > 27 oct. 2018 20:16:33 VM cerbere3 is not responding. > 27 oct. 2018 20:07:30 VM op2drugs1 is not responding. > 27 oct. 2018 20:06:14 VM cerbere3 is not responding. > 27 oct. 2018 20:02:27 VM cerbere3 is not responding. > 27 oct. 2018 20:01:11 VM logcollector is not responding. > 27 oct. 2018 20:00:56 VM zabbix is not responding. > 27 oct. 2018 19:57:42 VM zabbix is not responding. > 27 oct. 2018 19:57:42 VM cerbere3 is not responding. > 27 oct. 2018 19:57:42 VM logcollector is not responding. > 27 oct. 2018 19:54:40 VM zabbix is not responding. > 27 oct. 2018 19:53:25 VM cerbere3 is not responding. > 27 oct. 2018 19:53:25 VM cerbere4 is not responding. > 27 oct. 2018 19:48:29 Starting to import Vm crij2 to Data Center Default, > Cluster Default > 27 oct. 2018 19:47:41 Refresh image list succeeded for domain(s): ISO (ISO > file type) > 27 oct. 2018 19:46:46 VM crij2 was renamed from crij2 to crij2_ok by admin. > 27 oct. 2018 19:46:46 VM crij2 configuration was updated by > admin@internal-authz. > 27 oct. 2018 19:46:12 Refresh image list succeeded for domain(s): ISO (ISO > file type) > 27 oct. 2018 19:42:36 Refresh image list succeeded for domain(s): ISO (ISO > file type) > 27 oct. 2018 19:37:22 Vm crij2 was exported successfully to EXPORT > 27 oct. 2018 19:36:04 VM HostedEngine is not responding. > 27 oct. 2018 19:33:03 VM op2drugs1 is not responding. > 27 oct. 2018 19:32:48 VM altern8 is not responding. > 27 oct. 2018 19:32:48 VM patjoub1 is not responding. > 27 oct. 2018 19:31:03 VM op2drugs1 is not responding. > 27 oct. 2018 19:30:48 VM altern8 is not responding. > 27 oct. 2018 19:30:48 VM patjoub1 is not responding. > 27 oct. 2018 19:28:37 VM Sogov3 is not responding. > 27 oct. 2018 19:28:07 VM altern8 is not responding. > 27 oct. 2018 19:28:07 VM op2drugs1 is not responding. > 27 oct. 2018 19:28:07 VM patjoub1 is not responding. > 27 oct. 2018 19:25:10 VM Mint19 is not responding. > 27 oct. 2018 19:25:10 VM zabbix is not responding. > 27 oct. 2018 19:24:55 VM HostedEngine is not responding. > 27 oct. 2018 19:23:33 VM op2drugs1 is not responding. > 27 oct. 2018 19:23:18 VM altern8 is not responding. > 27 oct. 2018 19:23:18 VM patjoub1 is not responding. > 27 oct. 2018 19:21:52 VM op2drugs1 is not responding. > 27 oct. 2018 19:20:06 VM patjoub1 is not responding. > 27 oct. 2018 19:19:51 VM Sogov3 is not responding. > 27 oct. 2018 19:18:26 Host ginger.local.systea.fr power management was > verified successfully. > 27 oct. 2018 19:18:26 Status of host ginger.local.systea.fr was set to Up. > 27 oct. 2018 19:18:25 Manually synced the storage devices from host > ginger.local.systea.fr > 27 oct. 2018 19:17:51 Executing power management status on Host > ginger.local.systea.fr using Proxy Host victor.local.systea.fr and Fence > Agent ipmilan:10.0.0.225. > 27 oct. 2018 19:17:39 Host ginger.local.systea.fr is not responding. It will > stay in Connecting state for a grace period of 82 seconds and after that an > attempt to fence the host will be issued. > 27 oct. 2018 19:17:21 VM altern8 is not responding. > 27 oct. 2018 19:17:21 Invalid status on Data Center Default. Setting Data > Center status to Non Responsive (On host ginger.local.systea.fr, Error: > Network error during communication with the Host.). > 27 oct. 2018 19:17:21 VM patjoub1 is not responding. > 27 oct. 2018 19:17:20 VM HostedEngine is not responding. > 27 oct. 2018 19:17:20 VM op2drugs1 is not responding. > 27 oct. 2018 19:17:19 VDSM ginger.local.systea.fr command SpmStatusVDS > failed: Connection timeout for host 'ginger.local.systea.fr', last response > arrived 17279 ms ago. > 27 oct. 2018 19:16:16 Failed to update VMs/Templates OVF data for Storage > Domain DATA02 in Data Center Default. 27 oct. 2018 19:16:16 > Failed to update OVF disks 85d67951-d610-49b3-aaab-a81850621e35, OVF data > isn't updated on those OVF stores (Data Center Default, Storage Domain > DATA02). > 27 oct. 2018 19:16:16 VDSM command SetVolumeDescriptionVDS failed: Resource > timeout: () > 27 oct. 2018 19:16:16 VM patjoub1 is not responding. > 27 oct. 2018 19:16:16 VM op2drugs1 is not responding. > 27 oct. 2018 19:14:46 VM patjoub1 is not responding. > 27 oct. 2018 19:14:46 VM op2drugs1 is not responding. > 27 oct. 2018 19:13:18 Host ginger.local.systea.fr power management was > verified successfully. > 27 oct. 2018 19:13:18 Status of host ginger.local.systea.fr was set to Up. > 27 oct. 2018 19:13:03 Manually synced the storage devices from host > ginger.local.systea.fr > 27 oct. 2018 19:12:51 VM altern8 is not responding. > 27 oct. 2018 19:12:51 VM HostedEngine is not responding. > 27 oct. 2018 19:12:51 VM op2drugs1 is not responding. > 27 oct. 2018 19:12:48 Executing power management status on Host > ginger.local.systea.fr using Proxy Host victor.local.systea.fr and Fence > Agent ipmilan:10.0.0.225. > 27 oct. 2018 19:12:44 Host ginger.local.systea.fr does not enforce SELinux. > Current status: DISABLED > 27 oct. 2018 19:12:36 Invalid status on Data Center Default. Setting Data > Center status to Non Responsive (On host ginger.local.systea.fr, Error: > Network error during communication with the Host.). > 27 oct. 2018 19:12:28 Host ginger.local.systea.fr is not responding. It will > stay in Connecting state for a grace period of 82 seconds and after that an > attempt to fence the host will be issued. > 27 oct. 2018 19:12:28 VDSM ginger.local.systea.fr command SpmStatusVDS > failed: Connection timeout for host 'ginger.local.systea.fr', last response > arrived 25225 ms ago. > 27 oct. 2018 19:10:06 VM altern8 is not responding. > 27 oct. 2018 19:10:06 VM patjoub1 is not responding. > 27 oct. 2018 19:10:06 VM op2drugs1 is not responding. > 27 oct. 2018 19:08:49 VM op2drugs1 is not responding. > 27 oct. 2018 19:08:45 Refresh image list succeeded for domain(s): ISO (ISO > file type) > 27 oct. 2018 19:08:34 VM altern8 is not responding. > 27 oct. 2018 19:08:34 VM patjoub1 is not responding. > 27 oct. 2018 19:08:34 VM HostedEngine is not responding. > 27 oct. 2018 19:04:01 VM op2drugs1 is not responding. > 27 oct. 2018 19:01:08 VM HostedEngine is not responding. > 27 oct. 2018 19:00:53 VM zabbix is not responding. > 27 oct. 2018 19:00:01 Trying to restart VM npi2 on Host victor.local.systea.fr > 27 oct. 2018 18:59:14 Trying to restart VM npi2 on Host victor.local.systea.fr > 27 oct. 2018 18:59:13 Highly Available VM np2 failed. It will be restarted > automatically. > 27 oct. 2018 18:59:13 VM npi2 is down with error. Exit message: VM has been > terminated on the host. > 27 oct. 2018 18:59:05 VM altern8 is not responding. > 27 oct. 2018 18:58:44 Storage domain DATA02 experienced a high latency of > 6.16279 seconds from host ginger.local.systea.fr. This may cause performance > and functional issues. Please consult your Storage Administrator. > 27 oct. 2018 18:57:19 VM altern8 is not responding. > 27 oct. 2018 18:57:19 VM patjoub1 is not responding. > 27 oct. 2018 18:57:19 VM HostedEngine is not responding. > 27 oct. 2018 18:57:19 VM op2drugs1 is not responding. > 27 oct. 2018 18:55:56 VM altern8 is not responding. > 27 oct. 2018 18:55:41 VM op2drugs1 is not responding. > 27 oct. 2018 18:55:00 VM altern8 is not responding. > 27 oct. 2018 18:54:45 VM op2drugs1 is not responding. > 27 oct. 2018 18:52:21 VM Sogov3 is not responding. > 27 oct. 2018 18:52:21 VM npi2 is not responding. > 27 oct. 2018 18:50:50 VM altern8 is not responding. > 27 oct. 2018 18:50:47 VM zabbix is not responding. > 27 oct. 2018 18:48:16 VM op2drugs1 is not responding. > 27 oct. 2018 18:48:03 VM altern8 is not responding. > 27 oct. 2018 18:48:03 VM HostedEngine is not responding. > 27 oct. 2018 18:45:48 Starting export Vm crij2 to EXPORT > 27 oct. 2018 18:42:57 Refresh image list succeeded for domain(s): ISO (ISO > file type) > 27 oct. 2018 18:40:44 Refresh image list succeeded for domain(s): ISO (ISO > file type) > 27 oct. 2018 18:40:04 VM crij2 is down. Exit message: User shut down from > within the guest > 27 oct. 2018 18:39:25 User admin@internal-authz got disconnected from VM > crij2. > I checked the network and gluster since it works but saw absolutly nothing > special. The storage network was not too sollicited (bwm-ng indicated max > 50MB/s on bond1). Gluster log no errors too (even if the engine reported some > timeouts). > > This morning I started to search why and wanted to submit to you some logs on > this thread, but I found something that had not caught my attention before, > so I ask about it before all. > > I recall the configuration : > 3 hosts with gluster (replica 2 + arbiter). The volumes are on a separate > network (bond1 is an aggregation of 2 Gb cards while ovirmgmt is on bond0, 2 > NICs in backup mode). > For now, I have only declared the first 2 nodes in the engine GUI as ovirt > nodes, because the arbiter is a small machine with a smaller CPU (and only > 8Gb memory), that needed to downgrade the cluster from Sandybridge to > Nehalem. Maybe it was an error. The storagenetwork on bond1 was declared too > in the GYUI, but not yet as a gluster storage. > > The Gluster volumes themselves were declared on the storage network by using > names indicated in /etc/hosts on bond1 network. Here is a volume status : > # gluster volume status > Status of volume: DATA01 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick victorstorage.local.systea.fr:/home/d > ata01/data01/brick 49152 0 Y 2489 > Brick gingerstorage.local.systea.fr:/home/d > ata01/data01/brick 49152 0 Y 2531 > Brick eskarinastorage.local.systea.fr:/home > /data01/data01/brick 49153 0 Y 28119 > Self-heal Daemon on localhost N/A N/A Y 24859 > Self-heal Daemon on eskarinastorage.local.s > ystea.fr N/A N/A Y 30725 > Self-heal Daemon on victorstorage.local.sys > tea.fr N/A N/A Y 2810 > > Task Status of Volume DATA01 > ------------------------------------------------------------------------------ > There are no active volume tasks > > Status of volume: DATA02 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick victorstorage.local.systea.fr:/home/d > ata02/data02/brick 49153 0 Y 2553 > Brick gingerstorage.local.systea.fr:/home/d > ata02/data02/brick 49153 0 Y 2561 > Brick eskarinastorage.local.systea.fr:/home > /data01/data02/brick 49154 0 Y 28204 > Self-heal Daemon on localhost N/A N/A Y 24859 > Self-heal Daemon on eskarinastorage.local.s > ystea.fr N/A N/A Y 30725 > Self-heal Daemon on victorstorage.local.sys > tea.fr N/A N/A Y 2810 > > Task Status of Volume DATA02 > ------------------------------------------------------------------------------ > There are no active volume tasks > > Status of volume: ENGINE > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick victorstorage.local.systea.fr:/home/d > ata02/engine/brick 49154 0 Y 2571 > Brick gingerstorage.local.systea.fr:/home/d > ata02/engine/brick 49154 0 Y 2610 > Brick eskarinastorage.local.systea.fr:/home > /data01/engine/brick 49152 0 Y 28013 > Self-heal Daemon on localhost N/A N/A Y 24859 > Self-heal Daemon on eskarinastorage.local.s > ystea.fr N/A N/A Y 30725 > Self-heal Daemon on victorstorage.local.sys > tea.fr N/A N/A Y 2810 > > Task Status of Volume ENGINE > ------------------------------------------------------------------------------ > There are no active volume tasks > > Status of volume: EXPORT > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick victorstorage.local.systea.fr:/home/d > ata01/export/brick 49155 0 Y 2588 > Brick gingerstorage.local.systea.fr:/home/d > ata01/export/brick 49155 0 Y 2629 > Brick eskarinastorage.local.systea.fr:/home > /data01/export/brick 49156 0 Y 28384 > Self-heal Daemon on localhost N/A N/A Y 24859 > Self-heal Daemon on eskarinastorage.local.s > ystea.fr N/A N/A Y 30725 > Self-heal Daemon on victorstorage.local.sys > tea.fr N/A N/A Y 2810 > > Task Status of Volume EXPORT > ------------------------------------------------------------------------------ > There are no active volume tasks > > Status of volume: ISO > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick victorstorage.local.systea.fr:/home/d > ata01/iso/brick 49156 0 Y 2595 > Brick gingerstorage.local.systea.fr:/home/d > ata01/iso/brick 49156 0 Y 2636 > Brick eskarinastorage.local.systea.fr:/home > /data01/iso/brick 49155 0 Y 28292 > Self-heal Daemon on localhost N/A N/A Y 24859 > Self-heal Daemon on eskarinastorage.local.s > ystea.fr N/A N/A Y 30725 > Self-heal Daemon on victorstorage.local.sys > tea.fr N/A N/A Y 2810 > > Task Status of Volume ISO > ------------------------------- > But, a df on the nodes shows that all volumes except ENGINE was mounted on > ovirmgmt network (hosts names without "storage") : > > gingerstorage.local.systea.fr:/ENGINE 5,0T 226G 4,7T 5% > /rhev/data-center/mnt/glusterSD/gingerstorage.local.systea.fr:_ENGINE > victor.local.systea.fr:/DATA01 1,3T 425G 862G 33% > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA01 > victor.local.systea.fr:/DATA02 5,0T 226G 4,7T 5% > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02 > victor.local.systea.fr:/ISO 1,3T 425G 862G 33% > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_ISO > victor.local.systea.fr:/EXPORT 1,3T 425G 862G 33% > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_EXPORT > > I can't remember how it was declared at install time, maybe I had not seen > that, but if I tried to had a domain now, gluster managed, effectively it > proposes to me only the nodes by their ovirmgmt names, not storage names. > > Names are only known in the /etc/hosts of all nodes + engine, there is no DNS > for this local addresses. > > So : in your opinion, can this configuration be a (the) source of my problems > ? And have you an idea how I could correct this now, without loosing anything > ?
I don't think this is the cause of your issues. Are there errors in vdsm logs? Do you have issues with storage latency (can you check the gluster volume profile output?) > > Thanks for all suggestions. > > -- > > Regards, > > Frank > > > Le Jeudi, Octobre 18, 2018 23:13 CEST, Nir Soffer <[email protected]> a > écrit: > On Thu, Oct 18, 2018 at 3:43 PM fsoyer <[email protected]> wrote:Hi, > I forgot to look in the /var/log/messages file on the host ! What a shame :/ > Here is the messages file at the time of the error : > https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737 > At the sasme time, the second host as no particular messages in its log _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/AXYRWIFKQN2V7P2GDFR6OWLQZPEPUXEJ/

