Re: [ovirt-users] [Gluster-users] Gluster 3.7.8 from ovirt-3.6-glusterfs-epel breaks vdsm ?
On Thu, Feb 18, 2016 at 6:39 PM, Matteowrote: > Seems that the "info" command is broken. > > the status command for example seems to work ok. > > Matteo > > - Il 17-feb-16, alle 14:15, Sahina Bose sab...@redhat.com ha scritto: > >> [+gluster-users] >> >> Any known compat issues with gluster 3.7.8-1.el7 client packages and >> glusterfs 3.7.6-1.el7 server? >> There might be some new information that 3.7.8 CLI expects, but I'd need to check to verify what it is. We don't really test the Gluster CLI for backwards compatibility, as we expect the CLI to used with GlusterD on it's own host. So when using the gluster CLI's --remote-host option, with different versions of CLI and glusterd, the user should expect some breakage sometime. With REST support being planned for 3.8, I'd like to retire the --remote-host option in its entirety. Going further, the user should only use the REST apis, when attempting to do remote operations. >> >> On 02/17/2016 04:33 PM, Matteo wrote: >>> Hi, >>> >>> today I update one node os, in the updates gluster client >>> packages where upgraded from 3.7.6-1.el7, centos-ovirt36 repository >>> to 3.7.8-1.el7, ovirt-3.6-glusterfs-epel and after reboot >>> the node was marked not operational. >>> >>> looking into the logs, vdsm was failing to get gluster volume information. >>> >>> the command (ovirt-storage is the gluster storage where the hosted engine is >>> kept) >>> >>> gluster --mode=script volume info --remote-host=gluster1 ovirt-storage --xml >>> >>> was failing, returning error 2 (and no output) >>> >>> doing yum downgrade on gluster client packages (back to 3.7.6-1.el7, >>> centos-ovirt36) fixed everything. >>> >>> Data nodes are running glusterfs 3.7.6-1.el7. >>> >>> The funny thing is that from the ovirt I was able to manually mount the >>> glusterfs shares, >>> only the volume info command was failing, thus breaking vdsm. >>> >>> Any hint? >>> >>> regards, >>> Matteo >>> >>> >>> >>> ___ >>> Users mailing list >>> Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] [Gluster-users] Centos 7.1 failed to start glusterd after upgrading to ovirt 3.6
On Mon, Nov 9, 2015 at 9:06 PM, Stefano Danziwrote: > Here output from systemd-analyze critical-chain and systemd-analyze blame. > I think that now glusterd start too early (before networking) You are nearly right. GlusterD did start too early. GlusterD is configured to start after network.target. But network.target in systemd only guarantees that the network management stack is up; it doesn't guarantee that the network devices have been configured and are usable (Ref [1]). This means that when GlusterD starts, the network is still not up and hence GlusterD will fail to resolve bricks. While we could start GlusterD after network-online.target, it would break GlusterFS mounts configured in /etc/fstab with _netdev option. Systemd automatically schedules _netdev mounts to be done after network-online.target. (Ref [1] network-online.target). This could allow the GlusterFS mounts to be done before GlusterD is up, causing them to fail. This can be done using systemd-220 [2] which introduced support for `x-systemd.requires` option for fstab, which can be used to order mounts after specific services, but is not possible with el7 which has systemd-208. [1]: https://wiki.freedesktop.org/www/Software/systemd/NetworkTarget/ [2]: https://bugzilla.redhat.com/show_bug.cgi?id=812826 > > [root@ovirt01 tmp]# systemd-analyze critical-chain > The time after the unit is active or started is printed after the "@" > character. > The time the unit takes to start is printed after the "+" character. > > multi-user.target @17.148s > └─ovirt-ha-agent.service @17.021s +127ms > └─vdsmd.service @15.871s +1.148s > └─vdsm-network.service @11.495s +4.373s > └─libvirtd.service @11.238s +254ms > └─iscsid.service @11.228s +8ms > └─network.target @11.226s > └─network.service @6.748s +4.476s > └─iptables.service @6.630s +117ms > └─basic.target @6.629s > └─paths.target @6.629s > └─brandbot.path @6.629s > └─sysinit.target @6.615s > └─systemd-update-utmp.service @6.610s +4ms > └─auditd.service @6.450s +157ms > └─systemd-tmpfiles-setup.service @6.369s +77ms > └─rhel-import-state.service @6.277s +88ms > └─local-fs.target @6.275s > └─home-glusterfs-data.mount @5.805s +470ms > └─home.mount @3.946s +1.836s > └─systemd-fsck@dev-mapper-centos_ovirt01\x2dhome.service @3.937s +7ms > └─dev-mapper-centos_ovirt01\x2dhome.device @3.936s > > > > [root@ovirt01 tmp]# systemd-analyze blame > 4.476s network.service > 4.373s vdsm-network.service > 2.318s glusterd.service > 2.076s postfix.service > 1.836s home.mount > 1.651s lvm2-monitor.service > 1.258s lvm2-pvscan@9:1.service > 1.211s systemd-udev-settle.service > 1.148s vdsmd.service > 1.079s dmraid-activation.service > 1.046s boot.mount >904ms kdump.service >779ms multipathd.service >657ms var-lib-nfs-rpc_pipefs.mount >590ms > systemd-fsck@dev-disk-by\x2duuid-e185849f\x2d2c82\x2d4eb2\x2da215\x2d97340e90c93e.service >547ms tuned.service >481ms kmod-static-nodes.service >470ms home-glusterfs-data.mount >427ms home-glusterfs-engine.mount >422ms sys-kernel-debug.mount >411ms dev-hugepages.mount >411ms dev-mqueue.mount >278ms systemd-fsck-root.service >263ms systemd-readahead-replay.service >254ms libvirtd.service >243ms systemd-tmpfiles-setup-dev.service >216ms systemd-modules-load.service >209ms rhel-readonly.service >195ms wdmd.service >192ms sanlock.service >191ms gssproxy.service >186ms systemd-udev-trigger.service >157ms auditd.service >151ms plymouth-quit-wait.service >151ms plymouth-quit.service >132ms proc-fs-nfsd.mount >127ms ovirt-ha-agent.service >117ms iptables.service >110ms ovirt-ha-broker.service > 96ms avahi-daemon.service > 89ms systemd-udevd.service > 88ms rhel-import-state.service > 77ms systemd-tmpfiles-setup.service > 71ms sysstat.service > 71ms microcode.service > 71ms chronyd.service > 69ms systemd-readahead-collect.service > 68ms systemd-sysctl.service > 65ms systemd-logind.service > 61ms rsyslog.service > 58ms systemd-remount-fs.service > 46ms rpcbind.service > 46ms nfs-config.service > 45ms
Re: [ovirt-users] [Gluster-users] VM failed to start | Bad volume specification
Awesome Punit! I'm happy to have been a part of the debugging process. ~kaushal On Wed, Mar 25, 2015 at 3:09 PM, Punit Dambiwal hypu...@gmail.com wrote: Hi All, With the help of gluster community and ovirt-china community...my issue got resolved... The main root cause was the following :- 1. the glob operation takes quite a long time, longer than the ioprocess default 60s.. 2. python-ioprocess updated which makes a single change of configuration file doesn't work properly, only because this we should hack the code manually... Solution (Need to do on all the hosts) :- 1. Add the the ioprocess timeout value in the /etc/vdsm/vdsm.conf file as :- [irs] process_pool_timeout = 180 - 2. Check /usr/share/vdsm/storage/outOfProcess.py, line 71 and see whether there is still IOProcess(DEFAULT_TIMEOUT) in it,if yes...then changing the configuration file takes no effect because now timeout is the third parameter not the second of IOProcess.__init__(). 3. Change IOProcess(DEFAULT_TIMEOUT) to IOProcess(timeout=DEFAULT_TIMEOUT) and remove the /usr/share/vdsm/storage/outOfProcess.pyc file and restart vdsm and supervdsm service on all hosts Thanks, Punit Dambiwal On Mon, Mar 23, 2015 at 9:18 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi All, Still i am facing the same issue...please help me to overcome this issue... Thanks, punit On Fri, Mar 20, 2015 at 12:22 AM, Thomas Holkenbrink thomas.holkenbr...@fibercloud.com wrote: I’ve seen this before. The system thinks the storage system us up and running and then attempts to utilize it. The way I got around it was to put a delay in the startup of the gluster Node on the interface that the clients use to communicate. I use a bonded link, I then add a LINKDELAY to the interface to get the underlying system up and running before the network comes up. This then causes Network dependent features to wait for the network to finish. It adds about 10seconds to the startup time, in our environment it works well, you may not need as long of a delay. CentOS root@gls1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 ONBOOT=yes BOOTPROTO=static USERCTL=no NETMASK=255.255.248.0 IPADDR=10.10.1.17 MTU=9000 IPV6INIT=no IPV6_AUTOCONF=no NETWORKING_IPV6=no NM_CONTROLLED=no LINKDELAY=10 NAME=System Storage Bond0 Hi Michal, The Storage domain is up and running and mounted on all the host nodes...as i updated before that it was working perfectly before but just after reboot can not make the VM poweron... [image: Inline image 1] [image: Inline image 2] [root@cpu01 log]# gluster volume info Volume Name: ds01 Type: Distributed-Replicate Volume ID: 369d3fdc-c8eb-46b7-a33e-0a49f2451ff6 Status: Started Number of Bricks: 48 x 2 = 96 Transport-type: tcp Bricks: Brick1: cpu01:/bricks/1/vol1 Brick2: cpu02:/bricks/1/vol1 Brick3: cpu03:/bricks/1/vol1 Brick4: cpu04:/bricks/1/vol1 Brick5: cpu01:/bricks/2/vol1 Brick6: cpu02:/bricks/2/vol1 Brick7: cpu03:/bricks/2/vol1 Brick8: cpu04:/bricks/2/vol1 Brick9: cpu01:/bricks/3/vol1 Brick10: cpu02:/bricks/3/vol1 Brick11: cpu03:/bricks/3/vol1 Brick12: cpu04:/bricks/3/vol1 Brick13: cpu01:/bricks/4/vol1 Brick14: cpu02:/bricks/4/vol1 Brick15: cpu03:/bricks/4/vol1 Brick16: cpu04:/bricks/4/vol1 Brick17: cpu01:/bricks/5/vol1 Brick18: cpu02:/bricks/5/vol1 Brick19: cpu03:/bricks/5/vol1 Brick20: cpu04:/bricks/5/vol1 Brick21: cpu01:/bricks/6/vol1 Brick22: cpu02:/bricks/6/vol1 Brick23: cpu03:/bricks/6/vol1 Brick24: cpu04:/bricks/6/vol1 Brick25: cpu01:/bricks/7/vol1 Brick26: cpu02:/bricks/7/vol1 Brick27: cpu03:/bricks/7/vol1 Brick28: cpu04:/bricks/7/vol1 Brick29: cpu01:/bricks/8/vol1 Brick30: cpu02:/bricks/8/vol1 Brick31: cpu03:/bricks/8/vol1 Brick32: cpu04:/bricks/8/vol1 Brick33: cpu01:/bricks/9/vol1 Brick34: cpu02:/bricks/9/vol1 Brick35: cpu03:/bricks/9/vol1 Brick36: cpu04:/bricks/9/vol1 Brick37: cpu01:/bricks/10/vol1 Brick38: cpu02:/bricks/10/vol1 Brick39: cpu03:/bricks/10/vol1 Brick40: cpu04:/bricks/10/vol1 Brick41: cpu01:/bricks/11/vol1 Brick42: cpu02:/bricks/11/vol1 Brick43: cpu03:/bricks/11/vol1 Brick44: cpu04:/bricks/11/vol1 Brick45: cpu01:/bricks/12/vol1 Brick46: cpu02:/bricks/12/vol1 Brick47: cpu03:/bricks/12/vol1 Brick48: cpu04:/bricks/12/vol1 Brick49: cpu01:/bricks/13/vol1 Brick50: cpu02:/bricks/13/vol1 Brick51: cpu03:/bricks/13/vol1 Brick52: cpu04:/bricks/13/vol1 Brick53: cpu01:/bricks/14/vol1 Brick54: cpu02:/bricks/14/vol1 Brick55: cpu03:/bricks/14/vol1 Brick56: cpu04:/bricks/14/vol1 Brick57: cpu01:/bricks/15/vol1 Brick58: cpu02:/bricks/15/vol1 Brick59: cpu03:/bricks/15/vol1 Brick60: cpu04:/bricks/15/vol1 Brick61: cpu01:/bricks/16/vol1 Brick62: cpu02:/bricks/16/vol1 Brick63: cpu03:/bricks/16/vol1 Brick64: cpu04:/bricks/16/vol1 Brick65:
Re: [ovirt-users] Using gluster on other hosts?
Hey Will, It seems to me you are trying manage GlusterFS from oVirt, and trying to get your multi-network setup to work. As Sahina mentioned already, this is not currently possible as oVirt doesn't have the required support. If you want to make this work right now, I suggest you manage GlusterFS manually. You could do the following, - Install GlusterFS on both the hosts and setup a GlusterFS trusted storage pool using the 'gluster peer probe' commands. Run 'gluster peer probe gfs2' from node1 (and the reverse just for safety) - Create a GlusterFS volume, 'gluster volume create VOLUMENAME gfs1:BRICKPATH gfs2:BRICKPATH; and start it, 'gluster volume start VOLUMENAME'. After this you'll have GlusterFS setup on the particular network and you'll have volume ready to be added as a oVirt storage domain. - Now setup oVirt on the nodes with the node* network. - Add the gfs* network to oVirt. I'm not sure if this would be required, but you can try it anyway. - Add the created GlusterFS volume as a storage domain using a gfs* address. You should now be ready to begin using the new storage domain. If you would want to expand the volume later, you will need to do it manually with an explicit 'gluster volume add-brick' command. You could possible add the GlusterFS cluster to the oVirt interface, just so you can get stats and monitoring. But even then you shouldn't use the oVirt interface to do any management tasks. Multi-network support for GlusterFS within oVirt is an upcoming feature, and Sahina can give you more details on when to expect it to be available. Thanks, Kaushal - Original Message - From: Sahina Bose sab...@redhat.com To: Will K yetanotherw...@yahoo.com, users@ovirt.org, Kaushal M kaus...@redhat.com Sent: Friday, 9 January, 2015 11:10:48 AM Subject: Re: [ovirt-users] Using gluster on other hosts? On 01/08/2015 09:41 PM, Will K wrote: That's what I did, but didn't work for me. 1. use the 192.168.x interface to setup gluster. I used hostname in /etc/hosts. 2. setup oVirt using the switched network hostnames, let's say 10.10.10.x 3. oVirt and all that comes up fine. 4. When try to create a storage domain, it only shows the 10.10.10.x hostnames available. Tried to add a brick and I would get something like Host gfs2 is not in 'Peer in Cluster' state (while node2 is the hostname and gfs2 is the 192.168 name) Which version of glusterfs do you have? Kaushal, will this work in glusterfs3.6 and above? Ran command `gluster probe peer gfs2` or `gluster probe peer 192.168.x.x` didn't work peer probe: failed: Probe returned with unknown errno 107 Ran probe again with the switched network hostname or IP worked fine. May be it is not possible with current GlusterFS version? http://www.gluster.org/community/documentation/index.php/Features/SplitNetwork Will On Thursday, January 8, 2015 3:43 AM, Sahina Bose sab...@redhat.com wrote: On 01/08/2015 12:07 AM, Will K wrote: Hi I would like to see if anyone has good suggestion. I have two physical hosts with 1GB connections to switched networks. The hosts also have 10GB interface connected directly using Twinax cable like copper crossover cable. The idea was to use the 10GB as a private network for GlusterFS till the day we want to grow out of this 2 node setup. GlusterFS was setup with the 10GB ports using non-routable IPs and hostnames in /etc/hosts, for example, gfs1 192.168.1.1 and gfs2 192.168.1.2. I'm following example from community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ , Currently I'm only using Gluster volume on node1, but `gluster probe peer` test worked fine with node2 through the 10GB connection. oVirt engine was setup on physical host1 with hosted engine. Now, when I try to create new Gluster storage domain, I can only see the host node1 available. Is there anyway I can setup oVirt on node1 and node2, while using gfs1 and gfs2 for GlusterFS? or some way to take advantage of the 10GB connection? If I understand right, you have 2 interfaces on each of your hosts, and you want oVirt to communicate via 1 interface and glusterfs to use other? While adding the hosts to oVirt you could use ip1 and then.while creating the volume, add the brick using the other ip address. For instance, gluster volume create volname 192.168.1.2:/bricks/b1 Currently, there's no way to specify the IP address to use while adding a brick from oVirt UI (we're working on this for 3.6), but you could do this from the gluster CLI commands. This would then be detected in the oVirt UI. Thanks W ___ Users mailing list Users@ovirt.org mailto:Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Using gluster on other hosts?
That should work, provided the names being used resolve correctly and to the same values everywhere. But I'd suggest that a manual peer probe is done using the alternative names, before creating the volume. This way Gluster explicitly knows all the names/addresses, and should not cause any troubles. But as I said before, after doing the manual operations, I'd suggest avoiding doing any management actions from the oVirt interface; as it still doesn't know how to handle multiple networks with Gluster. ~kaushal - Original Message - From: Sahina Bose sab...@redhat.com To: Kaushal M kaus...@redhat.com Cc: Will K yetanotherw...@yahoo.com, users@ovirt.org Sent: Friday, 9 January, 2015 1:47:17 PM Subject: Re: [ovirt-users] Using gluster on other hosts? On 01/09/2015 01:13 PM, Kaushal M wrote: Hey Will, It seems to me you are trying manage GlusterFS from oVirt, and trying to get your multi-network setup to work. As Sahina mentioned already, this is not currently possible as oVirt doesn't have the required support. If you want to make this work right now, I suggest you manage GlusterFS manually. You could do the following, - Install GlusterFS on both the hosts and setup a GlusterFS trusted storage pool using the 'gluster peer probe' commands. Run 'gluster peer probe gfs2' from node1 (and the reverse just for safety) - Create a GlusterFS volume, 'gluster volume create VOLUMENAME gfs1:BRICKPATH gfs2:BRICKPATH; and start it, 'gluster volume start VOLUMENAME'. After this you'll have GlusterFS setup on the particular network and you'll have volume ready to be added as a oVirt storage domain. To enable, oVirt to use the node1 interface, is it possible to peer probe using node1 and node2 interface in steps above - i.e gluster peer probe node2 (This is essentially what happens when a host is added with host address node1 or node2) and then create a GlusterFS volume from CLI using the command you mentioned above? - Now setup oVirt on the nodes with the node* network. - Add the gfs* network to oVirt. I'm not sure if this would be required, but you can try it anyway. - Add the created GlusterFS volume as a storage domain using a gfs* address. You should now be ready to begin using the new storage domain. If you would want to expand the volume later, you will need to do it manually with an explicit 'gluster volume add-brick' command. You could possible add the GlusterFS cluster to the oVirt interface, just so you can get stats and monitoring. But even then you shouldn't use the oVirt interface to do any management tasks. Multi-network support for GlusterFS within oVirt is an upcoming feature, and Sahina can give you more details on when to expect it to be available. Thanks, Kaushal - Original Message - From: Sahina Bose sab...@redhat.com To: Will K yetanotherw...@yahoo.com, users@ovirt.org, Kaushal M kaus...@redhat.com Sent: Friday, 9 January, 2015 11:10:48 AM Subject: Re: [ovirt-users] Using gluster on other hosts? On 01/08/2015 09:41 PM, Will K wrote: That's what I did, but didn't work for me. 1. use the 192.168.x interface to setup gluster. I used hostname in /etc/hosts. 2. setup oVirt using the switched network hostnames, let's say 10.10.10.x 3. oVirt and all that comes up fine. 4. When try to create a storage domain, it only shows the 10.10.10.x hostnames available. Tried to add a brick and I would get something like Host gfs2 is not in 'Peer in Cluster' state (while node2 is the hostname and gfs2 is the 192.168 name) Which version of glusterfs do you have? Kaushal, will this work in glusterfs3.6 and above? Ran command `gluster probe peer gfs2` or `gluster probe peer 192.168.x.x` didn't work peer probe: failed: Probe returned with unknown errno 107 Ran probe again with the switched network hostname or IP worked fine. May be it is not possible with current GlusterFS version? http://www.gluster.org/community/documentation/index.php/Features/SplitNetwork Will On Thursday, January 8, 2015 3:43 AM, Sahina Bose sab...@redhat.com wrote: On 01/08/2015 12:07 AM, Will K wrote: Hi I would like to see if anyone has good suggestion. I have two physical hosts with 1GB connections to switched networks. The hosts also have 10GB interface connected directly using Twinax cable like copper crossover cable. The idea was to use the 10GB as a private network for GlusterFS till the day we want to grow out of this 2 node setup. GlusterFS was setup with the 10GB ports using non-routable IPs and hostnames in /etc/hosts, for example, gfs1 192.168.1.1 and gfs2 192.168.1.2. I'm following example from community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ , Currently I'm only using Gluster volume on node1
Re: [ovirt-users] [Gluster-users] Gluster command [UNKNOWN] failed on server...
Can you replace 'Before=network-online.target' with 'Wants=network-online.target' and try the boot again? This should force the network to be online before starting GlusterD. If even that fails, you could try adding an entry into /etc/hosts with the hostname of the system. This should prevent any more failures. I still don't believe it's a problem with Gluster. Gluster uses apis provided by the system to perform name resolution. These definitely work correctly because you can start GlusterD later. Since the resolution failure only happens during boot, it points to system or network setup issues during boot. To me it seems like the network isn't completely setup at that point of time. ~kaushal On Fri, Dec 5, 2014 at 12:47 PM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, It seems it's bug in glusterfs 3.6even i manage my systemd to start the network service before glusterd...but it's still fail... --- [Unit] Description=GlusterFS, a clustered file-system server After=network.target rpcbind.service Before=network-online.target [Service] Type=forking PIDFile=/var/run/glusterd.pid LimitNOFILE=65536 ExecStartPre=/etc/rc.d/init.d/network start ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid KillMode=process [Install] WantedBy=multi-user.target Thanks, Punit On Wed, Dec 3, 2014 at 8:56 PM, Kaushal M kshlms...@gmail.com wrote: I just remembered this. There was another user having a similar issue of GlusterD failing to start on the mailing list a while back. The cause of his problem was the way his network was brought up. IIRC, he was using a static network configuration. The problem vanished when he began using dhcp. Or it might have been he was using dhcp.service and it got solved after switching to NetworkManager. This could be one more thing you could look at. I'll try to find the mail thread to see if it was the same problem as you. ~kaushal On Wed, Dec 3, 2014 at 6:22 PM, Kaushal M kshlms...@gmail.com wrote: I don't know much about how the network target is brought up in CentOS7, but I'll try as much as I can. It seems to me that, after the network has been brought up and by the time GlusterD is started, a. The machine hasn't yet recieved it's hostname, or b. It hasn't yet registered with the name server. This is causing name resolution failures. I don't know if the network target could come up without the machine getting its hostname, so I'm pretty sure it's not a. So it seems to be b. But these kind of signing in happens only in DDNS systems, which doesn't seem to be the case for you. Both of these reasons might be wrong (most likely wrong). You'd do good if you could ask for help from someone with more experience in systemd + networking. ~kaushal On Wed, Dec 3, 2014 at 10:54 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, This is the host...which i rebooted...would you mind to let me know how i can make the glusterd sevice come up after network...i am using centos7...if network is the issue... On Wed, Dec 3, 2014 at 11:54 AM, Kaushal M kshlms...@gmail.com wrote: This peer cannot be identified. [2014-12-03 02:29:25.998153] D [glusterd-peer-utils.c:121:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: cpu05.zne01.hkg1.ovt.36stack.com I don't know why this address is not being resolved during boot time. If this is a valid peer, the the only reason I can think of this that the network is not up. If you had previously detached the peer forcefully, the that could have left stale entries in some volumes. In this case as well, GlusterD will fail to identify the peer. Do either of these reasons seem a possibility to you? On Dec 3, 2014 8:07 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyoe5 and http://ur1.ca/iyoed On Tue, Dec 2, 2014 at 10:43 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, In the logs you've provided, GlusterD appears to be running correctly. Could you provide the logs for the time period when GlusterD attempts to start but fails. ~kaushal On Dec 2, 2014 8:03 PM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyhs5 and http://ur1.ca/iyhue Thanks, punit On Tue, Dec 2, 2014 at 12:00 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, Could you start Glusterd in debug mode and provide the logs here? To start it in debug mode, append '-LDEBUG' to the ExecStart line in the service file. ~kaushal On Mon, Dec 1, 2014 at 9:05 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi, Can Any body help me on this ?? On Thu, Nov 27, 2014 at 9:29 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Thanks for the detailed replylet me explain my setup first
Re: [ovirt-users] [Gluster-users] Gluster command [UNKNOWN] failed on server...
I just remembered this. There was another user having a similar issue of GlusterD failing to start on the mailing list a while back. The cause of his problem was the way his network was brought up. IIRC, he was using a static network configuration. The problem vanished when he began using dhcp. Or it might have been he was using dhcp.service and it got solved after switching to NetworkManager. This could be one more thing you could look at. I'll try to find the mail thread to see if it was the same problem as you. ~kaushal On Wed, Dec 3, 2014 at 6:22 PM, Kaushal M kshlms...@gmail.com wrote: I don't know much about how the network target is brought up in CentOS7, but I'll try as much as I can. It seems to me that, after the network has been brought up and by the time GlusterD is started, a. The machine hasn't yet recieved it's hostname, or b. It hasn't yet registered with the name server. This is causing name resolution failures. I don't know if the network target could come up without the machine getting its hostname, so I'm pretty sure it's not a. So it seems to be b. But these kind of signing in happens only in DDNS systems, which doesn't seem to be the case for you. Both of these reasons might be wrong (most likely wrong). You'd do good if you could ask for help from someone with more experience in systemd + networking. ~kaushal On Wed, Dec 3, 2014 at 10:54 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, This is the host...which i rebooted...would you mind to let me know how i can make the glusterd sevice come up after network...i am using centos7...if network is the issue... On Wed, Dec 3, 2014 at 11:54 AM, Kaushal M kshlms...@gmail.com wrote: This peer cannot be identified. [2014-12-03 02:29:25.998153] D [glusterd-peer-utils.c:121:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: cpu05.zne01.hkg1.ovt.36stack.com I don't know why this address is not being resolved during boot time. If this is a valid peer, the the only reason I can think of this that the network is not up. If you had previously detached the peer forcefully, the that could have left stale entries in some volumes. In this case as well, GlusterD will fail to identify the peer. Do either of these reasons seem a possibility to you? On Dec 3, 2014 8:07 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyoe5 and http://ur1.ca/iyoed On Tue, Dec 2, 2014 at 10:43 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, In the logs you've provided, GlusterD appears to be running correctly. Could you provide the logs for the time period when GlusterD attempts to start but fails. ~kaushal On Dec 2, 2014 8:03 PM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyhs5 and http://ur1.ca/iyhue Thanks, punit On Tue, Dec 2, 2014 at 12:00 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, Could you start Glusterd in debug mode and provide the logs here? To start it in debug mode, append '-LDEBUG' to the ExecStart line in the service file. ~kaushal On Mon, Dec 1, 2014 at 9:05 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi, Can Any body help me on this ?? On Thu, Nov 27, 2014 at 9:29 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Thanks for the detailed replylet me explain my setup first :- 1. Ovirt Engine 2. 4* host as well as storage machine (Host and gluster combined) 3. Every host has 24 bricks... Now whenever the host machine reboot...it can come up but can not join the cluster again and through the following error Gluster command [UNKNOWN] failed on server.. Please check my comment in line :- 1. Use the same string for doing the peer probe and for the brick address during volume create/add-brick. Ideally, we suggest you use properly resolvable FQDNs everywhere. If that is not possible, then use only IP addresses. Try to avoid short names. --- [root@cpu05 ~]# gluster peer status Number of Peers: 3 Hostname: cpu03.stack.com Uuid: 5729b8c4-e80d-4353-b456-6f467bddbdfb State: Peer in Cluster (Connected) Hostname: cpu04.stack.com Uuid: d272b790-c4b2-4bed-ba68-793656e6d7b0 State: Peer in Cluster (Connected) Other names: 10.10.0.8 Hostname: cpu02.stack.com Uuid: 8d8a7041-950e-40d0-85f9-58d14340ca25 State: Peer in Cluster (Connected) [root@cpu05 ~]# 2. During boot up, make sure to launch glusterd only after the network is up. This will allow the new peer identification mechanism to do its job correctly. I think the service itself doing the same job [root@cpu05 ~]# cat /usr/lib/systemd/system/glusterd.service [Unit] Description=GlusterFS, a clustered file-system server After=network.target rpcbind.service Before=network-online.target [Service] Type=forking PIDFile=/var/run
Re: [ovirt-users] [Gluster-users] Gluster command [UNKNOWN] failed on server...
Hey Punit, In the logs you've provided, GlusterD appears to be running correctly. Could you provide the logs for the time period when GlusterD attempts to start but fails. ~kaushal On Dec 2, 2014 8:03 PM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyhs5 and http://ur1.ca/iyhue Thanks, punit On Tue, Dec 2, 2014 at 12:00 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, Could you start Glusterd in debug mode and provide the logs here? To start it in debug mode, append '-LDEBUG' to the ExecStart line in the service file. ~kaushal On Mon, Dec 1, 2014 at 9:05 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi, Can Any body help me on this ?? On Thu, Nov 27, 2014 at 9:29 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Thanks for the detailed replylet me explain my setup first :- 1. Ovirt Engine 2. 4* host as well as storage machine (Host and gluster combined) 3. Every host has 24 bricks... Now whenever the host machine reboot...it can come up but can not join the cluster again and through the following error Gluster command [UNKNOWN] failed on server.. Please check my comment in line :- 1. Use the same string for doing the peer probe and for the brick address during volume create/add-brick. Ideally, we suggest you use properly resolvable FQDNs everywhere. If that is not possible, then use only IP addresses. Try to avoid short names. --- [root@cpu05 ~]# gluster peer status Number of Peers: 3 Hostname: cpu03.stack.com Uuid: 5729b8c4-e80d-4353-b456-6f467bddbdfb State: Peer in Cluster (Connected) Hostname: cpu04.stack.com Uuid: d272b790-c4b2-4bed-ba68-793656e6d7b0 State: Peer in Cluster (Connected) Other names: 10.10.0.8 Hostname: cpu02.stack.com Uuid: 8d8a7041-950e-40d0-85f9-58d14340ca25 State: Peer in Cluster (Connected) [root@cpu05 ~]# 2. During boot up, make sure to launch glusterd only after the network is up. This will allow the new peer identification mechanism to do its job correctly. I think the service itself doing the same job [root@cpu05 ~]# cat /usr/lib/systemd/system/glusterd.service [Unit] Description=GlusterFS, a clustered file-system server After=network.target rpcbind.service Before=network-online.target [Service] Type=forking PIDFile=/var/run/glusterd.pid LimitNOFILE=65536 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid KillMode=process [Install] WantedBy=multi-user.target [root@cpu05 ~]# gluster logs :- [2014-11-24 09:22:22.147471] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.1 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2014-11-24 09:22:22.151565] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2014-11-24 09:22:22.151599] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2014-11-24 09:22:22.155216] W [rdma.c:4195:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2014-11-24 09:22:22.155264] E [rdma.c:4483:init] 0-rdma.management: Failed to initialize IB Device [2014-11-24 09:22:22.155285] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2014-11-24 09:22:22.155354] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2014-11-24 09:22:22.156290] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [2014-11-24 09:22:22.161318] I [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600 [2014-11-24 09:22:22.821800] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.825810] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.828705] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.828771] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-11-24 09:22:22.832670] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-11-24 09:22:22.835919] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-11-24 09:22:22.840209] E [glusterd-store.c:4248:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2014-11-24 09:22:22.840233] E [xlator.c:425:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2014-11-24 09:22:22.840245] E [graph.c:322:glusterfs_graph_init] 0-management
Re: [ovirt-users] [Gluster-users] Gluster command [UNKNOWN] failed on server...
This peer cannot be identified. [2014-12-03 02:29:25.998153] D [glusterd-peer-utils.c:121:glusterd_peerinfo_find_by_hostname] 0-management: Unable to find friend: cpu05.zne01.hkg1.ovt.36stack.com I don't know why this address is not being resolved during boot time. If this is a valid peer, the the only reason I can think of this that the network is not up. If you had previously detached the peer forcefully, the that could have left stale entries in some volumes. In this case as well, GlusterD will fail to identify the peer. Do either of these reasons seem a possibility to you? On Dec 3, 2014 8:07 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyoe5 and http://ur1.ca/iyoed On Tue, Dec 2, 2014 at 10:43 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, In the logs you've provided, GlusterD appears to be running correctly. Could you provide the logs for the time period when GlusterD attempts to start but fails. ~kaushal On Dec 2, 2014 8:03 PM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Please find the logs here :- http://ur1.ca/iyhs5 and http://ur1.ca/iyhue Thanks, punit On Tue, Dec 2, 2014 at 12:00 PM, Kaushal M kshlms...@gmail.com wrote: Hey Punit, Could you start Glusterd in debug mode and provide the logs here? To start it in debug mode, append '-LDEBUG' to the ExecStart line in the service file. ~kaushal On Mon, Dec 1, 2014 at 9:05 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi, Can Any body help me on this ?? On Thu, Nov 27, 2014 at 9:29 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Thanks for the detailed replylet me explain my setup first :- 1. Ovirt Engine 2. 4* host as well as storage machine (Host and gluster combined) 3. Every host has 24 bricks... Now whenever the host machine reboot...it can come up but can not join the cluster again and through the following error Gluster command [UNKNOWN] failed on server.. Please check my comment in line :- 1. Use the same string for doing the peer probe and for the brick address during volume create/add-brick. Ideally, we suggest you use properly resolvable FQDNs everywhere. If that is not possible, then use only IP addresses. Try to avoid short names. --- [root@cpu05 ~]# gluster peer status Number of Peers: 3 Hostname: cpu03.stack.com Uuid: 5729b8c4-e80d-4353-b456-6f467bddbdfb State: Peer in Cluster (Connected) Hostname: cpu04.stack.com Uuid: d272b790-c4b2-4bed-ba68-793656e6d7b0 State: Peer in Cluster (Connected) Other names: 10.10.0.8 Hostname: cpu02.stack.com Uuid: 8d8a7041-950e-40d0-85f9-58d14340ca25 State: Peer in Cluster (Connected) [root@cpu05 ~]# 2. During boot up, make sure to launch glusterd only after the network is up. This will allow the new peer identification mechanism to do its job correctly. I think the service itself doing the same job [root@cpu05 ~]# cat /usr/lib/systemd/system/glusterd.service [Unit] Description=GlusterFS, a clustered file-system server After=network.target rpcbind.service Before=network-online.target [Service] Type=forking PIDFile=/var/run/glusterd.pid LimitNOFILE=65536 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid KillMode=process [Install] WantedBy=multi-user.target [root@cpu05 ~]# gluster logs :- [2014-11-24 09:22:22.147471] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.1 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2014-11-24 09:22:22.151565] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2014-11-24 09:22:22.151599] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2014-11-24 09:22:22.155216] W [rdma.c:4195:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2014-11-24 09:22:22.155264] E [rdma.c:4483:init] 0-rdma.management: Failed to initialize IB Device [2014-11-24 09:22:22.155285] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2014-11-24 09:22:22.155354] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2014-11-24 09:22:22.156290] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [2014-11-24 09:22:22.161318] I [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600 [2014-11-24 09:22:22.821800] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.825810] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.828705] I
Re: [ovirt-users] [Gluster-users] Gluster command [UNKNOWN] failed on server...
Hey Punit, Could you start Glusterd in debug mode and provide the logs here? To start it in debug mode, append '-LDEBUG' to the ExecStart line in the service file. ~kaushal On Mon, Dec 1, 2014 at 9:05 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi, Can Any body help me on this ?? On Thu, Nov 27, 2014 at 9:29 AM, Punit Dambiwal hypu...@gmail.com wrote: Hi Kaushal, Thanks for the detailed replylet me explain my setup first :- 1. Ovirt Engine 2. 4* host as well as storage machine (Host and gluster combined) 3. Every host has 24 bricks... Now whenever the host machine reboot...it can come up but can not join the cluster again and through the following error Gluster command [UNKNOWN] failed on server.. Please check my comment in line :- 1. Use the same string for doing the peer probe and for the brick address during volume create/add-brick. Ideally, we suggest you use properly resolvable FQDNs everywhere. If that is not possible, then use only IP addresses. Try to avoid short names. --- [root@cpu05 ~]# gluster peer status Number of Peers: 3 Hostname: cpu03.stack.com Uuid: 5729b8c4-e80d-4353-b456-6f467bddbdfb State: Peer in Cluster (Connected) Hostname: cpu04.stack.com Uuid: d272b790-c4b2-4bed-ba68-793656e6d7b0 State: Peer in Cluster (Connected) Other names: 10.10.0.8 Hostname: cpu02.stack.com Uuid: 8d8a7041-950e-40d0-85f9-58d14340ca25 State: Peer in Cluster (Connected) [root@cpu05 ~]# 2. During boot up, make sure to launch glusterd only after the network is up. This will allow the new peer identification mechanism to do its job correctly. I think the service itself doing the same job [root@cpu05 ~]# cat /usr/lib/systemd/system/glusterd.service [Unit] Description=GlusterFS, a clustered file-system server After=network.target rpcbind.service Before=network-online.target [Service] Type=forking PIDFile=/var/run/glusterd.pid LimitNOFILE=65536 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid KillMode=process [Install] WantedBy=multi-user.target [root@cpu05 ~]# gluster logs :- [2014-11-24 09:22:22.147471] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.1 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2014-11-24 09:22:22.151565] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2014-11-24 09:22:22.151599] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2014-11-24 09:22:22.155216] W [rdma.c:4195:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2014-11-24 09:22:22.155264] E [rdma.c:4483:init] 0-rdma.management: Failed to initialize IB Device [2014-11-24 09:22:22.155285] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2014-11-24 09:22:22.155354] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2014-11-24 09:22:22.156290] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [2014-11-24 09:22:22.161318] I [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600 [2014-11-24 09:22:22.821800] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.825810] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.828705] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2014-11-24 09:22:22.828771] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-11-24 09:22:22.832670] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-11-24 09:22:22.835919] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-11-24 09:22:22.840209] E [glusterd-store.c:4248:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2014-11-24 09:22:22.840233] E [xlator.c:425:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2014-11-24 09:22:22.840245] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2014-11-24 09:22:22.840264] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed [2014-11-24 09:22:22.840754] W [glusterfsd.c:1194:cleanup_and_exit] (-- 0-: received signum (0), shutting down Thanks, Punit On Wed, Nov 26, 2014 at 7:14 PM, Kaushal M kshlms...@gmail.com wrote: Based on the logs I can guess that glusterd is being started before the network has come up and that the addresses given to bricks do not directly match the addresses used in during peer probe. The gluster_after_reboot log