[ovirt-users] 4.0 - 2nd node fails on deploy
Hi, I am trying to build a x3 HC cluster, with a self hosted engine using gluster. I have successful built the 1st node, however when I attempt to run hosted-engine -deploy on node 2, I get the following error [WARNING] A configuration file must be supplied to deploy Hosted Engine on an additional host. [ ERROR ] 'version' is not stored in the HE configuration image [ ERROR ] Unable to get the answer file from the shared storage [ ERROR ] Failed to execute stage 'Environment customization': Unable to get the answer file from the shared storage [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20161002232505.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed Looking at the failure in the log file.. 2016-10-02 23:25:05 WARNING otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._customization:151 A configuration file must be supplied to deploy Hosted Engine on an additional host. 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:61 _fetch_answer_f ile 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:69 fetching from: /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff 45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7 8cb2527-a2e2-489a-9fad-465a72221b37 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-f ff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k' 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:70 executing: 'tar -tvf -' 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:88 stdout: 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:89 stderr: 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile heconflib.validateConfImage:111 'version' is not stored in the HE configuration image 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:73 Unable to get t he answer file from the shared storage Looking at the detected gluster path - /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff 45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/ [root@dcasrv02 ~]# ls -al /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff 45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/ total 1049609 drwxr-xr-x. 2 vdsm kvm 4096 Oct 2 04:46 . drwxr-xr-x. 6 vdsm kvm 4096 Oct 2 04:46 .. -rw-rw. 1 vdsm kvm 1073741824 Oct 2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37 -rw-rw. 1 vdsm kvm1048576 Oct 2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37.lease -rw-r--r--. 1 vdsm kvm294 Oct 2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37.meta 78cb2527-a2e2-489a-9fad-465a72221b37 is a 1 GB file, is this the engine VM ? Copying the answers file form primary (/etc/ovirt-hosted-engine/answers.conf ) to node 2 and rerunning produces the same error : ( (hosted-engine --deploy --config-append=/root/answers.conf ) Also tried on node 3, same issues Happy to provide logs and other debugs Thanks Jason -- IMPORTANT! This message has been scanned for viruses and phishing links. However, it is your responsibility to evaluate the links and attachments you choose to click. If you are uncertain, we always try to help. Greetings helpd...@actnet.se -- IMPORTANT! This message has been scanned for viruses and phishing links. However, it is your responsibility to evaluate the links and attachments you choose to click. If you are uncertain, we always try to help. Greetings helpd...@actnet.se ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/EYKRDWSD5FH2PEMOHVTAWL7WINTDOYIN/
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
On Wed, Oct 5, 2016 at 1:56 PM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > HI, > > > > Logs attached > Have you probed 2 interfaces for same host, that is - dcasrv02 and dcastor02? Does "gluster peer status" understand both names as for same host? >From glusterd logs and the mount logs - the connection between the peers is lost, and quorum is lost, which is reaffirming what Simone said earlier. Logs seem to indicate network issues - check the direct link setup. See below >From mount logs: [2016-10-04 17:26:15.718300] E [socket.c:2292:socket_connect_finish] 0-engine-client-2: connection to 10.100.103.3:24007 failed (No route to host) [2016-10-04 17:26:15.718345] W [MSGID: 108001] [afr-common.c:4379:afr_notify] 0-engine-replicate-0: Client-quorum is not met [2016-10-04 17:26:16.428290] E [socket.c:2292:socket_connect_finish] 0-engine-client-1: connection to 10.100.101.2:24007 failed (No route to host) [2016-10-04 17:26:16.428336] E [MSGID: 108006] [afr-common.c:4321:afr_notify] 0-engine-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up And in glusterd logs: [2016-10-04 17:24:39.522402] E [socket.c:2292:socket_connect_finish] 0-management: connection to 10.100.50.82:24007 failed (No route to host) [2016-10-04 17:24:39.522578] I [MSGID: 106004] [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer (<1e788fc9-dfe9-4753-92c7-76a95c8d0891>), in state , has disconnected from glusterd. [2016-10-04 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume engine. Stopping local bricks. [2016-10-04 17:24:39.523314] I [MSGID: 106132] [glusterd-utils.c:1560:glusterd_service_stop] 0-management: brick already stopped [2016-10-04 17:24:39.526188] E [socket.c:2292:socket_connect_finish] 0-management: connection to 10.100.103.3:24007 failed (No route to host) [2016-10-04 17:24:39.526219] I [MSGID: 106004] [glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer (<9a9c037e-96cd-4f73-9800-a1df5cdd2818>), in state , has disconnected from glusterd. > Thanks > > > > *From:* Sahina Bose [mailto:sab...@redhat.com] > *Sent:* 05 October 2016 08:11 > *To:* Jason Jeffrey <ja...@sudo.co.uk>; gluster-us...@gluster.org; > Ravishankar Narayanankutty <ravishan...@redhat.com> > *Cc:* Simone Tiraboschi <stira...@redhat.com>; users <users@ovirt.org> > > *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy > > > > [Adding gluster-users ML] > > The brick logs are filled with errors : > [2016-10-05 19:30:28.659061] E [MSGID: 113077] > [posix-handle.c:309:posix_handle_pump] > 0-engine-posix: malformed internal link /var/run/vdsm/storage/ > 0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/ > 1b5a5e34-818c-4914-8192-2f05733b5583 for /xpool/engine/brick/. > glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8 > [2016-10-05 19:30:28.659069] E [MSGID: 113091] [posix.c:180:posix_lookup] > 0-engine-posix: Failed to create inode handle for path > > The message "E [MSGID: 113018] [posix.c:198:posix_lookup] 0-engine-posix: > lstat on null failed" repeated 3 times between [2016-10-05 19:30:28.656529] > and [2016-10-05 19:30:28.659076] > [2016-10-05 19:30:28.659087] W [MSGID: 115005] > [server-resolve.c:126:resolve_gfid_cbk] 0-engine-server: > b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success) > > - Ravi, the above are from the data brick of the arbiter volume. Can you > take a look? > > > > Jason, > > Could you also provide the mount logs from the first host > (/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and > glusterd log (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) around > the same time frame. > > > > > > On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > > Hi, > > > > Servers are powered off when I’m not looking at the problem. > > > > There may have been instances where all three were not powered on, during > the same period. > > > > Glusterhd log attached, the xpool-engine-brick log is over 1 GB in size, > I’ve taken a sample of the last couple days, looks to be highly repative. > > > > Cheers > > > > Jason > > > > > > > > > > *From:* Simone Tiraboschi [mailto:stira...@redhat.com] > *Sent:* 04 October 2016 16:50 > > > *To:* Jason Jeffrey <ja...@sudo.co.uk> > *Cc:* users <users@ovirt.org> > *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy > > > > > > > > On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > > Hi, > > > > DCASTORXX is a hosts
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
[Adding gluster-users ML] The brick logs are filled with errors : [2016-10-05 19:30:28.659061] E [MSGID: 113077] [posix-handle.c:309:posix_handle_pump] 0-engine-posix: malformed internal link /var/run/vdsm/storage/0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/1b5a5e34-818c-4914-8192-2f05733b5583 for /xpool/engine/brick/.glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8 [2016-10-05 19:30:28.659069] E [MSGID: 113091] [posix.c:180:posix_lookup] 0-engine-posix: Failed to create inode handle for path The message "E [MSGID: 113018] [posix.c:198:posix_lookup] 0-engine-posix: lstat on null failed" repeated 3 times between [2016-10-05 19:30:28.656529] and [2016-10-05 19:30:28.659076] [2016-10-05 19:30:28.659087] W [MSGID: 115005] [server-resolve.c:126:resolve_gfid_cbk] 0-engine-server: b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success) - Ravi, the above are from the data brick of the arbiter volume. Can you take a look? Jason, Could you also provide the mount logs from the first host (/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and glusterd log (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) around the same time frame. On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > Hi, > > > > Servers are powered off when I’m not looking at the problem. > > > > There may have been instances where all three were not powered on, during > the same period. > > > > Glusterhd log attached, the xpool-engine-brick log is over 1 GB in size, > I’ve taken a sample of the last couple days, looks to be highly repative. > > > > Cheers > > > > Jason > > > > > > > > > > *From:* Simone Tiraboschi [mailto:stira...@redhat.com] > *Sent:* 04 October 2016 16:50 > > *To:* Jason Jeffrey <ja...@sudo.co.uk> > *Cc:* users <users@ovirt.org> > *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy > > > > > > > > On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > > Hi, > > > > DCASTORXX is a hosts entry for dedicated direct 10GB links (each private > /28) between the x3 servers i.e 1=> 2&3, 2=> 1&3, etc) planned to be used > solely for storage. > > > > I,e > > > > 10.100.50.81dcasrv01 > > 10.100.101.1dcastor01 > > 10.100.50.82dcasrv02 > > 10.100.101.2dcastor02 > > 10.100.50.83dcasrv03 > > 10.100.103.3dcastor03 > > > > These were setup with the gluster commands > > > > · gluster volume create iso replica 3 arbiter 1 > dcastor01:/xpool/iso/brick dcastor02:/xpool/iso/brick > dcastor03:/xpool/iso/brick > > · gluster volume create export replica 3 arbiter 1 > dcastor02:/xpool/export/brick dcastor03:/xpool/export/brick > dcastor01:/xpool/export/brick > > · gluster volume create engine replica 3 arbiter 1 > dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick > dcastor03:/xpool/engine/brick > > · gluster volume create data replica 3 arbiter 1 > dcastor01:/xpool/data/brick dcastor03:/xpool/data/brick > dcastor02:/xpool/data/bricky > > > > > > So yes, DCASRV01 is the server (pri) and have local bricks access through > DCASTOR01 interface > > > > Is the issue here not the incorrect soft link ? > > > > No, this should be fine. > > > > The issue is that periodically your gluster volume losses its server > quorum and become unavailable. > > It happened more than once from your logs. > > > > Can you please attach also gluster logs for that volume? > > > > > > lrwxrwxrwx. 1 vdsm kvm 132 Oct 3 17:27 hosted-engine.metadata -> > /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a- > 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93 > > [root@dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164- > 76a4876ecaaf/ > > ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/: > No such file or directory > > But the data does exist > > [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al > > drwxr-xr-x. 2 vdsm kvm4096 Oct 3 17:17 . > > drwxr-xr-x. 6 vdsm kvm4096 Oct 3 17:17 .. > > -rw-rw. 2 vdsm kvm 1028096 Oct 3 20:48 cee9440c-4eb8-453b-bc04- > c47e6f9cbc93 > > -rw-rw. 2 vdsm kvm 1048576 Oct 3 17:17 cee9440c-4eb8-453b-bc04- > c47e6f9cbc93.lease > > -rw-r--r--. 2 vdsm kvm 283 Oct 3 17:17 > cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta > > > > > Thanks > > > > Jason > > > > > > > > *From:* Simone Tiraboschi [mailto:stira...@redhat.com] > *S
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > Hi, > > > > DCASTORXX is a hosts entry for dedicated direct 10GB links (each private > /28) between the x3 servers i.e 1=> 2&3, 2=> 1&3, etc) planned to be used > solely for storage. > > > > I,e > > > > 10.100.50.81dcasrv01 > > 10.100.101.1dcastor01 > > 10.100.50.82dcasrv02 > > 10.100.101.2dcastor02 > > 10.100.50.83dcasrv03 > > 10.100.103.3dcastor03 > > > > These were setup with the gluster commands > > > > · gluster volume create iso replica 3 arbiter 1 > dcastor01:/xpool/iso/brick dcastor02:/xpool/iso/brick > dcastor03:/xpool/iso/brick > > · gluster volume create export replica 3 arbiter 1 > dcastor02:/xpool/export/brick dcastor03:/xpool/export/brick > dcastor01:/xpool/export/brick > > · gluster volume create engine replica 3 arbiter 1 > dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick > dcastor03:/xpool/engine/brick > > · gluster volume create data replica 3 arbiter 1 > dcastor01:/xpool/data/brick dcastor03:/xpool/data/brick > dcastor02:/xpool/data/bricky > > > > > > So yes, DCASRV01 is the server (pri) and have local bricks access through > DCASTOR01 interface > > > > Is the issue here not the incorrect soft link ? > No, this should be fine. The issue is that periodically your gluster volume losses its server quorum and become unavailable. It happened more than once from your logs. Can you please attach also gluster logs for that volume? > > > lrwxrwxrwx. 1 vdsm kvm 132 Oct 3 17:27 hosted-engine.metadata -> > /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a- > 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93 > > [root@dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164- > 76a4876ecaaf/ > > ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/: > No such file or directory > > But the data does exist > > [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al > > drwxr-xr-x. 2 vdsm kvm4096 Oct 3 17:17 . > > drwxr-xr-x. 6 vdsm kvm4096 Oct 3 17:17 .. > > -rw-rw. 2 vdsm kvm 1028096 Oct 3 20:48 cee9440c-4eb8-453b-bc04- > c47e6f9cbc93 > > -rw-rw. 2 vdsm kvm 1048576 Oct 3 17:17 cee9440c-4eb8-453b-bc04- > c47e6f9cbc93.lease > > -rw-r--r--. 2 vdsm kvm 283 Oct 3 17:17 > cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta > > > > > Thanks > > > > Jason > > > > > > > > *From:* Simone Tiraboschi [mailto:stira...@redhat.com] > *Sent:* 04 October 2016 14:40 > > *To:* Jason Jeffrey <ja...@sudo.co.uk> > *Cc:* users <users@ovirt.org> > *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy > > > > > > > > On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stira...@redhat.com> > wrote: > > > > > > On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <ja...@sudo.co.uk> wrote: > > Hi, > > > > Another problem has appeared, after rebooting the primary the VM will not > start. > > > > Appears the symlink is broken between gluster mount ref and vdsm > > > > The first host was correctly deployed but it seas that you are facing some > issue connecting the storage. > > Can you please attach vdsm logs and /var/log/messages from the first host? > > > > Thanks Jason, > > I suspect that your issue is related to this: > > Oct 4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 > 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351: > glusterd_do_volume_quorum_action] 0-management: Server quorum lost for > volume data. Stopping local bricks. > > Oct 4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 > 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351: > glusterd_do_volume_quorum_action] 0-management: Server quorum lost for > volume engine. Stopping local bricks. > > > > and for some time your gluster volume has been working. > > > > But then: > > Oct 4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o > backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine > /rhev/data-center/mnt/glusterSD/dcastor01:engine. > > Oct 4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o > backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine > /rhev/data-center/mnt/glusterSD/dcastor01:engine. > > Oct 4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site- > packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending > is
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
Hi, DCASTORXX is a hosts entry for dedicated direct 10GB links (each private /28) between the x3 servers i.e 1=> 2&3, 2=> 1&3, etc) planned to be used solely for storage. I,e 10.100.50.81dcasrv01 10.100.101.1dcastor01 10.100.50.82dcasrv02 10.100.101.2dcastor02 10.100.50.83dcasrv03 10.100.103.3dcastor03 These were setup with the gluster commands * gluster volume create iso replica 3 arbiter 1 dcastor01:/xpool/iso/brick dcastor02:/xpool/iso/brick dcastor03:/xpool/iso/brick * gluster volume create export replica 3 arbiter 1 dcastor02:/xpool/export/brick dcastor03:/xpool/export/brick dcastor01:/xpool/export/brick * gluster volume create engine replica 3 arbiter 1 dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick dcastor03:/xpool/engine/brick * gluster volume create data replica 3 arbiter 1 dcastor01:/xpool/data/brick dcastor03:/xpool/data/brick dcastor02:/xpool/data/bricky So yes, DCASRV01 is the server (pri) and have local bricks access through DCASTOR01 interface Is the issue here not the incorrect soft link ? lrwxrwxrwx. 1 vdsm kvm 132 Oct 3 17:27 hosted-engine.metadata -> /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93 [root@dcasrv01 /]# ls -al /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/ ls: cannot access /var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/: No such file or directory But the data does exist [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al drwxr-xr-x. 2 vdsm kvm4096 Oct 3 17:17 . drwxr-xr-x. 6 vdsm kvm4096 Oct 3 17:17 .. -rw-rw. 2 vdsm kvm 1028096 Oct 3 20:48 cee9440c-4eb8-453b-bc04-c47e6f9cbc93 -rw-rw. 2 vdsm kvm 1048576 Oct 3 17:17 cee9440c-4eb8-453b-bc04-c47e6f9cbc93.lease -rw-r--r--. 2 vdsm kvm 283 Oct 3 17:17 cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta Thanks Jason From: Simone Tiraboschi [mailto:stira...@redhat.com] Sent: 04 October 2016 14:40 To: Jason Jeffrey <ja...@sudo.co.uk> Cc: users <users@ovirt.org> Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi <stira...@redhat.com <mailto:stira...@redhat.com> > wrote: On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <ja...@sudo.co.uk <mailto:ja...@sudo.co.uk> > wrote: Hi, Another problem has appeared, after rebooting the primary the VM will not start. Appears the symlink is broken between gluster mount ref and vdsm The first host was correctly deployed but it seas that you are facing some issue connecting the storage. Can you please attach vdsm logs and /var/log/messages from the first host? Thanks Jason, I suspect that your issue is related to this: Oct 4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume data. Stopping local bricks. Oct 4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume engine. Stopping local bricks. and for some time your gluster volume has been working. But then: Oct 4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine. Oct 4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t glusterfs -o backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine. Oct 4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending is deprecated. Use Dispatcher.socket.pending instead. Oct 4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher, 'pending', lambda: 0) Oct 4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-packages/yajsonrpc/stomp.py:352: DeprecationWarning: Dispatcher.pending is deprecated. Use Dispatcher.socket.pending instead. Oct 4 19:02:11 dcasrv01 ovirt-ha-agent: pending = getattr(dispatcher, 'pending', lambda: 0) Oct 4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof Oct 4 19:02:11 dcasrv01 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent Oct 4 19:02:11 dcasrv01 ovirt-ha-agent: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent Oct 4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]: [2016-10-04 18:02:12.384
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
(arbiter) >> >> Options Reconfigured: >> >> performance.readdir-ahead: on >> >> storage.owner-uid: 36 >> >> storage.owner-gid: 36 >> >> >> >> >> >> [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume >> status >> >> Status of volume: data >> >> Gluster process TCP Port RDMA Port Online >> Pid >> >> >> -- >> >> Brick dcastor01:/xpool/data/brick 49153 0 Y >> 3076 >> >> Brick dcastor03:/xpool/data/brick 49153 0 Y >> 3019 >> >> Brick dcastor02:/xpool/data/bricky 49153 0 Y >> 3857 >> >> NFS Server on localhost 2049 0 Y >> 3097 >> >> Self-heal Daemon on localhost N/A N/AY >> 3088 >> >> NFS Server on dcastor03 2049 0 Y >> 3039 >> >> Self-heal Daemon on dcastor03 N/A N/AY >> 3114 >> >> NFS Server on dcasrv02 2049 0 Y >> 3871 >> >> Self-heal Daemon on dcasrv02N/A N/AY >> 3864 >> >> >> >> Task Status of Volume data >> >> >> -- >> >> There are no active volume tasks >> >> >> >> Status of volume: engine >> >> Gluster process TCP Port RDMA Port Online >> Pid >> >> >> -- >> >> Brick dcastor01:/xpool/engine/brick 49152 0 Y >> 3131 >> >> Brick dcastor02:/xpool/engine/brick 49152 0 Y >> 3852 >> >> Brick dcastor03:/xpool/engine/brick 49152 0 Y >> 2992 >> >> NFS Server on localhost 2049 0 Y >> 3097 >> >> Self-heal Daemon on localhost N/A N/AY >> 3088 >> >> NFS Server on dcastor03 2049 0 Y >> 3039 >> >> Self-heal Daemon on dcastor03 N/A N/AY >> 3114 >> >> NFS Server on dcasrv02 2049 0 Y >> 3871 >> >> Self-heal Daemon on dcasrv02N/A N/AY >> 3864 >> >> >> >> Task Status of Volume engine >> >> >> -- >> >> There are no active volume tasks >> >> >> >> Status of volume: export >> >> Gluster process TCP Port RDMA Port Online >> Pid >> >> >> -- >> >> Brick dcastor02:/xpool/export/brick 49155 0 Y >> 3872 >> >> Brick dcastor03:/xpool/export/brick 49155 0 Y >> 3147 >> >> Brick dcastor01:/xpool/export/brick 49155 0 Y >> 3150 >> >> NFS Server on localhost 2049 0 Y >> 3097 >> >> Self-heal Daemon on localhost N/A N/A Y >> 3088 >> >> NFS Server on dcastor03 2049 0 Y >> 3039 >> >> Self-heal Daemon on dcastor03 N/A N/AY >> 3114 >> >> NFS Server on dcasrv02 2049 0 Y >> 3871 >> >> Self-heal Daemon on dcasrv02N/A N/AY >> 3864 >> >> >> >> Task Status of Volume export >> >> >> -- >> >> There are no active volume tasks >> >> >> >> Status of volume: iso >> >> Gluster process TCP Port RDMA Port Online >> Pid >> >> >> -- >> >> Brick dcastor01:/xpool/iso/brick49154 0 Y >> 3152 >> >> Brick dcastor02:/xpool/iso/brick49154 0 Y >> 3881 >> >> Brick dcastor03:/xpool/iso/brick49154 0 Y >> 3146 >> >> NFS Server on localhost
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
owner-gid: 36 > > > > Volume Name: iso > > Type: Replicate > > Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a > > Status: Started > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: dcastor01:/xpool/iso/brick > > Brick2: dcastor02:/xpool/iso/brick > > Brick3: dcastor03:/xpool/iso/brick (arbiter) > > Options Reconfigured: > > performance.readdir-ahead: on > > storage.owner-uid: 36 > > storage.owner-gid: 36 > > > > > > [root@dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume > status > > Status of volume: data > > Gluster process TCP Port RDMA Port Online > Pid > > > -- > > Brick dcastor01:/xpool/data/brick 49153 0 Y > 3076 > > Brick dcastor03:/xpool/data/brick 49153 0 Y > 3019 > > Brick dcastor02:/xpool/data/bricky 49153 0 Y > 3857 > > NFS Server on localhost 2049 0 Y > 3097 > > Self-heal Daemon on localhost N/A N/AY > 3088 > > NFS Server on dcastor03 2049 0 Y > 3039 > > Self-heal Daemon on dcastor03 N/A N/AY > 3114 > > NFS Server on dcasrv02 2049 0 Y > 3871 > > Self-heal Daemon on dcasrv02N/A N/AY > 3864 > > > > Task Status of Volume data > > > -- > > There are no active volume tasks > > > > Status of volume: engine > > Gluster process TCP Port RDMA Port Online > Pid > > > -- > > Brick dcastor01:/xpool/engine/brick 49152 0 Y > 3131 > > Brick dcastor02:/xpool/engine/brick 49152 0 Y > 3852 > > Brick dcastor03:/xpool/engine/brick 49152 0 Y > 2992 > > NFS Server on localhost 2049 0 Y > 3097 > > Self-heal Daemon on localhost N/A N/AY > 3088 > > NFS Server on dcastor03 2049 0 Y > 3039 > > Self-heal Daemon on dcastor03 N/A N/AY > 3114 > > NFS Server on dcasrv02 2049 0 Y > 3871 > > Self-heal Daemon on dcasrv02N/A N/AY > 3864 > > > > Task Status of Volume engine > > > -- > > There are no active volume tasks > > > > Status of volume: export > > Gluster process TCP Port RDMA Port Online > Pid > > > -- > > Brick dcastor02:/xpool/export/brick 49155 0 Y > 3872 > > Brick dcastor03:/xpool/export/brick 49155 0 Y > 3147 > > Brick dcastor01:/xpool/export/brick 49155 0 Y > 3150 > > NFS Server on localhost 2049 0 Y > 3097 > > Self-heal Daemon on localhost N/A N/AY > 3088 > > NFS Server on dcastor03 2049 0 Y > 3039 > > Self-heal Daemon on dcastor03 N/A N/AY > 3114 > > NFS Server on dcasrv02 2049 0 Y > 3871 > > Self-heal Daemon on dcasrv02N/A N/AY > 3864 > > > > Task Status of Volume export > > > -- > > There are no active volume tasks > > > > Status of volume: iso > > Gluster process TCP Port RDMA Port Online > Pid > > ---- > -- > > Brick dcastor01:/xpool/iso/brick49154 0 Y > 3152 > > Brick dcastor02:/xpool/iso/brick49154 0 Y > 3881 > > Brick dcastor03:/xpool/iso/brick 49154 0 Y > 3146 > > NFS Server on localhost 2049 0 Y > 3097 > > Self-heal Daemon on localhost N/A N/AY > 3088 > > NFS Server on dcastor03 2049 0 Y > 3039 > > Self-heal Daemon on dcastor03 N/A N/A
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
Y 3114 NFS Server on dcasrv02 2049 0 Y 3871 Self-heal Daemon on dcasrv02N/A N/AY 3864 Task Status of Volume data -- There are no active volume tasks Status of volume: engine Gluster process TCP Port RDMA Port Online Pid -- Brick dcastor01:/xpool/engine/brick 49152 0 Y 3131 Brick dcastor02:/xpool/engine/brick 49152 0 Y 3852 Brick dcastor03:/xpool/engine/brick 49152 0 Y 2992 NFS Server on localhost 2049 0 Y 3097 Self-heal Daemon on localhost N/A N/AY 3088 NFS Server on dcastor03 2049 0 Y 3039 Self-heal Daemon on dcastor03 N/A N/AY 3114 NFS Server on dcasrv02 2049 0 Y 3871 Self-heal Daemon on dcasrv02N/A N/AY 3864 Task Status of Volume engine -- There are no active volume tasks Status of volume: export Gluster process TCP Port RDMA Port Online Pid -- Brick dcastor02:/xpool/export/brick 49155 0 Y 3872 Brick dcastor03:/xpool/export/brick 49155 0 Y 3147 Brick dcastor01:/xpool/export/brick 49155 0 Y 3150 NFS Server on localhost 2049 0 Y 3097 Self-heal Daemon on localhost N/A N/AY 3088 NFS Server on dcastor03 2049 0 Y 3039 Self-heal Daemon on dcastor03 N/A N/AY 3114 NFS Server on dcasrv02 2049 0 Y 3871 Self-heal Daemon on dcasrv02N/A N/AY 3864 Task Status of Volume export -- There are no active volume tasks Status of volume: iso Gluster process TCP Port RDMA Port Online Pid -- Brick dcastor01:/xpool/iso/brick49154 0 Y 3152 Brick dcastor02:/xpool/iso/brick49154 0 Y 3881 Brick dcastor03:/xpool/iso/brick49154 0 Y 3146 NFS Server on localhost 2049 0 Y 3097 Self-heal Daemon on localhost N/A N/AY 3088 NFS Server on dcastor03 2049 0 Y 3039 Self-heal Daemon on dcastor03 N/A N/AY 3114 NFS Server on dcasrv02 2049 0 Y 3871 Self-heal Daemon on dcasrv02N/A N/AY 3864 Task Status of Volume iso -- There are no active volume tasks Thanks Jason From: users-boun...@ovirt.org [mailto:users-boun...@ovirt.org] On Behalf Of Jason Jeffrey Sent: 03 October 2016 18:40 To: users@ovirt.org Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy Hi, Setup log attached for primary Regards Jason From: Simone Tiraboschi [mailto:stira...@redhat.com] Sent: 03 October 2016 09:27 To: Jason Jeffrey <ja...@sudo.co.uk <mailto:ja...@sudo.co.uk> > Cc: users <users@ovirt.org <mailto:users@ovirt.org> > Subject: Re: [ovirt-users] 4.0 - 2nd node fails on deploy On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <ja...@sudo.co.uk <mailto:ja...@sudo.co.uk> > wrote: Hi, I am trying to build a x3 HC cluster, with a self hosted engine using gluster. I have successful built the 1st node, however when I attempt to run hosted-engine –deploy on node 2, I get the following error [WARNING] A configuration file must be supplied to deploy Hosted Engine on an additional host. [ ERROR ] 'version' is not stored in the HE configuration image [ ERROR ] Unable to get the answer file from the shared storage [ ERROR ] Failed to execute stage 'Environment customization': Unable to get the answer file from the shared storage [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20161002232505.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Terminati
Re: [ovirt-users] 4.0 - 2nd node fails on deploy
On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffreywrote: > Hi, > > > > I am trying to build a x3 HC cluster, with a self hosted engine using > gluster. > > > > I have successful built the 1st node, however when I attempt to run > hosted-engine –deploy on node 2, I get the following error > > > > [WARNING] A configuration file must be supplied to deploy Hosted Engine on > an additional host. > > [ ERROR ] 'version' is not stored in the HE configuration image > > [ ERROR ] Unable to get the answer file from the shared storage > > [ ERROR ] Failed to execute stage 'Environment customization': Unable to > get the answer file from the shared storage > > [ INFO ] Stage: Clean up > > [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine- > setup/answers/answers-20161002232505.conf' > > [ INFO ] Stage: Pre-termination > > [ INFO ] Stage: Termination > > [ ERROR ] Hosted Engine deployment failed > > > > Looking at the failure in the log file.. > Can you please attach hosted-engine-setup logs from the first host? > > > 2016-10-02 23:25:05 WARNING otopi.plugins.gr_he_common.core.remote_answerfile > remote_answerfile._customization:151 A configuration > > file must be supplied to deploy Hosted Engine on an additional host. > > 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile > remote_answerfile._fetch_answer_file:61 _fetch_answer_f > > ile > > 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile > remote_answerfile._fetch_answer_file:69 fetching from: > > /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b- > fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7 > > 8cb2527-a2e2-489a-9fad-465a72221b37 > > 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile > heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i > > f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/ > 0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c- > 02f9-4cd1-a22c-d6b56a0a8e9b > > /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k' > > 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile > heconflib._dd_pipe_tar:70 executing: 'tar -tvf -' > > 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile > heconflib._dd_pipe_tar:88 stdout: > > 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile > heconflib._dd_pipe_tar:89 stderr: > > 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile > heconflib.validateConfImage:111 'version' is not stored > > in the HE configuration image > > 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile > remote_answerfile._fetch_answer_file:73 Unable to get t > > he answer file from the shared storage > > > > Looking at the detected gluster path - /rhev/data-center/mnt/ > glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b- > fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/ > > > > [root@dcasrv02 ~]# ls -al /rhev/data-center/mnt/ > glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b- > fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/ > > total 1049609 > > drwxr-xr-x. 2 vdsm kvm 4096 Oct 2 04:46 . > > drwxr-xr-x. 6 vdsm kvm 4096 Oct 2 04:46 .. > > -rw-rw. 1 vdsm kvm 1073741824 Oct 2 04:46 78cb2527-a2e2-489a-9fad- > 465a72221b37 > > -rw-rw. 1 vdsm kvm1048576 Oct 2 04:46 78cb2527-a2e2-489a-9fad- > 465a72221b37.lease > > -rw-r--r--. 1 vdsm kvm294 Oct 2 04:46 > 78cb2527-a2e2-489a-9fad-465a72221b37.meta > > > > > 78cb2527-a2e2-489a-9fad-465a72221b37 is a 1 GB file, is this the engine > VM ? > > > > Copying the answers file form primary (/etc/ovirt-hosted-engine/answers.conf > ) to node 2 and rerunning produces the same error : ( > > (hosted-engine --deploy --config-append=/root/answers.conf ) > > > > Also tried on node 3, same issues > > > > Happy to provide logs and other debugs > > > > Thanks > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] 4.0 - 2nd node fails on deploy
Hi, I am trying to build a x3 HC cluster, with a self hosted engine using gluster. I have successful built the 1st node, however when I attempt to run hosted-engine -deploy on node 2, I get the following error [WARNING] A configuration file must be supplied to deploy Hosted Engine on an additional host. [ ERROR ] 'version' is not stored in the HE configuration image [ ERROR ] Unable to get the answer file from the shared storage [ ERROR ] Failed to execute stage 'Environment customization': Unable to get the answer file from the shared storage [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20161002232505.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed Looking at the failure in the log file.. 2016-10-02 23:25:05 WARNING otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._customization:151 A configuration file must be supplied to deploy Hosted Engine on an additional host. 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:61 _fetch_answer_f ile 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:69 fetching from: /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff 45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7 8cb2527-a2e2-489a-9fad-465a72221b37 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-f ff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k' 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:70 executing: 'tar -tvf -' 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:88 stdout: 2016-10-02 23:25:05 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile heconflib._dd_pipe_tar:89 stderr: 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile heconflib.validateConfImage:111 'version' is not stored in the HE configuration image 2016-10-02 23:25:05 ERROR otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:73 Unable to get t he answer file from the shared storage Looking at the detected gluster path - /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff 45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/ [root@dcasrv02 ~]# ls -al /rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-fff 45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/ total 1049609 drwxr-xr-x. 2 vdsm kvm 4096 Oct 2 04:46 . drwxr-xr-x. 6 vdsm kvm 4096 Oct 2 04:46 .. -rw-rw. 1 vdsm kvm 1073741824 Oct 2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37 -rw-rw. 1 vdsm kvm1048576 Oct 2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37.lease -rw-r--r--. 1 vdsm kvm294 Oct 2 04:46 78cb2527-a2e2-489a-9fad-465a72221b37.meta 78cb2527-a2e2-489a-9fad-465a72221b37 is a 1 GB file, is this the engine VM ? Copying the answers file form primary (/etc/ovirt-hosted-engine/answers.conf ) to node 2 and rerunning produces the same error : ( (hosted-engine --deploy --config-append=/root/answers.conf ) Also tried on node 3, same issues Happy to provide logs and other debugs Thanks Jason ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users