[ovirt-users] Host after update NonResponsive

Stefan Wolf Fri, 13 Mar 2020 08:32:02 -0700

Hello to all,

I ve done a normal host updadet at the webfrontend.
After the reboot the host is NonResponsive.


It is a HCI setup with glusterfs
But mount shows on this host that only engine is mountet.
The data volume is not mounted.

[root@kvm320 ~]# mount|grep _engine
kvm380.durchhalten.intern:/engine on 
/rhev/data-center/mnt/glusterSD/kvm380.durchhalten.intern:_engine type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@kvm320 ~]# mount|grep _data
[root@kvm320 ~]# #

on every other host it looks like this

[root@kvm10 ~]# mount|grep _engine
kvm380.durchhalten.intern:/engine on 
/rhev/data-center/mnt/glusterSD/kvm380.durchhalten.intern:_engine type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@kvm10 ~]# mount|grep _data
kvm380.durchhalten.intern:/data on 
/rhev/data-center/mnt/glusterSD/kvm380.durchhalten.intern:_data type 
fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@kvm10 ~]#


here so more informaton
[root@kvm320 ~]# systemctl status glusterd -l
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
   Active: active (running) since Fr 2020-03-13 14:19:03 CET; 1h 49min ago
     Docs: man:glusterd(8)
  Process: 9263 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 9264 (glusterd)
    Tasks: 114
   CGroup: /system.slice/glusterd.service
           ├─3839 /usr/sbin/glusterfsd -s kvm320.durchhalten.intern 
--volfile-id data.kvm320.durchhalten.intern.gluster_bricks-data -p 
/var/run/gluster/vols/data/kvm320.durchhalten.intern-gluster_bricks-data.pid -S 
/var/run/gluster/1fd58e7c80335308.socket --brick-name /gluster_bricks/data -l 
/var/log/glusterfs/bricks/gluster_bricks-data.log --xlator-option 
*-posix.glusterd-uuid=ce474774-436a-41d3-bfdd-ab153ac77830 --process-name brick 
--brick-port 49152 --xlator-option data-server.listen-port=49152
           ├─3896 /usr/sbin/glusterfsd -s kvm320.durchhalten.intern 
--volfile-id engine.kvm320.durchhalten.intern.gluster_bricks-engine -p 
/var/run/gluster/vols/engine/kvm320.durchhalten.intern-gluster_bricks-engine.pid
 -S /var/run/gluster/5d4bcd552e3a3806.socket --brick-name 
/gluster_bricks/engine -l /var/log/glusterfs/bricks/gluster_bricks-engine.log 
--xlator-option *-posix.glusterd-uuid=ce474774-436a-41d3-bfdd-ab153ac77830 
--process-name brick --brick-port 49153 --xlator-option 
engine-server.listen-port=49153
           ├─4032 /usr/sbin/glusterfsd -s kvm320.durchhalten.intern 
--volfile-id home.kvm320.durchhalten.intern.gluster_bricks-home -p 
/var/run/gluster/vols/home/kvm320.durchhalten.intern-gluster_bricks-home.pid -S 
/var/run/gluster/050dbbce51bc7cb8.socket --brick-name /gluster_bricks/home -l 
/var/log/glusterfs/bricks/gluster_bricks-home.log --xlator-option 
*-posix.glusterd-uuid=ce474774-436a-41d3-bfdd-ab153ac77830 --process-name brick 
--brick-port 49154 --xlator-option home-server.listen-port=49154
           ├─9264 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           └─9381 /usr/sbin/glusterfs -s localhost --volfile-id 
gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S /var/run/gluster/6b30e1d260d31419.socket 
--xlator-option *replicate*.node-uuid=ce474774-436a-41d3-bfdd-ab153ac77830 
--process-name glustershd --client-pid=-6

Mär 13 14:19:02 kvm320.durchhalten.intern systemd[1]: Starting GlusterFS, a 
clustered file-system server...
Mär 13 14:19:03 kvm320.durchhalten.intern systemd[1]: Started GlusterFS, a 
clustered file-system server.
Mär 13 14:19:12 kvm320.durchhalten.intern glusterd[9264]: [2020-03-13 
13:19:12.869310] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume data. Starting local bricks.
Mär 13 14:19:13 kvm320.durchhalten.intern glusterd[9264]: [2020-03-13 
13:19:13.115484] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume engine. Starting local bricks.

everything seems to be connected

[root@kvm380 ~]# gluster peer status
Number of Peers: 3

Hostname: kvm360.durchhalten.intern
Uuid: 2f6cf198-afe5-4e61-b018-e7e4c10793b4
State: Peer in Cluster (Connected)

Hostname: kvm320.durchhalten.intern
Uuid: ce474774-436a-41d3-bfdd-ab153ac77830
State: Peer in Cluster (Connected)
Other names:
192.168.200.231

Hostname: kvm10
Uuid: 33cd77a6-3cda-4e21-bd45-a907044f410b
State: Peer in Cluster (Connected)
Other names:
kvm10

and this is the state of the brick from engine and data

[root@kvm320 ~]# gluster volume status engine
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kvm10:/gluster_bricks/engine          49153     0          Y       2996
Brick kvm320.durchhalten.intern:/gluster_br
icks/engine                                 49153     0          Y       3896
Brick kvm360.durchhalten.intern:/gluster_br
icks/engine                                 49153     0          Y       5066
Self-heal Daemon on localhost               N/A       N/A        Y       9381
Self-heal Daemon on kvm10                   N/A       N/A        Y       2325
Self-heal Daemon on kvm380.durchhalten.inte
rn                                          N/A       N/A        Y       1070
Self-heal Daemon on kvm360.durchhalten.inte
rn                                          N/A       N/A        Y       20403

Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks

[root@kvm320 ~]# gluster volume status data
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kvm10:/gluster_bricks/data            49152     0          Y       2966
Brick kvm360.durchhalten.intern:/gluster_br
icks/data                                   49152     0          Y       5058
Brick kvm320.durchhalten.intern:/gluster_br
icks/data                                   49152     0          Y       3839
Self-heal Daemon on localhost               N/A       N/A        Y       9381
Self-heal Daemon on kvm10                   N/A       N/A        Y       2325
Self-heal Daemon on kvm380.durchhalten.inte
rn                                          N/A       N/A        Y       1070
Self-heal Daemon on kvm360.durchhalten.inte
rn                                          N/A       N/A        Y       20403

Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks




and this happens, when I restart glusterd on the host

[2020-03-13 15:11:46.470990] W [glusterd-locks.c:807:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0x23a0a) [0x7fa6e1cd0a0a] 
-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0x2e320) [0x7fa6e1cdb320] 
-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0xed3b3) [0x7fa6e1d9a3b3] 
) 0-management: Lock for vol data not held
[2020-03-13 15:11:46.471039] W [glusterd-locks.c:807:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0x23a0a) [0x7fa6e1cd0a0a] 
-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0x2e320) [0x7fa6e1cdb320] 
-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0xed3b3) [0x7fa6e1d9a3b3] 
) 0-management: Lock for vol engine not held
[2020-03-13 15:11:46.471092] W [glusterd-locks.c:807:glusterd_mgmt_v3_unlock] 
(-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0x23a0a) [0x7fa6e1cd0a0a] 
-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0x2e320) [0x7fa6e1cdb320] 
-->/usr/lib64/glusterfs/6.7/xlator/mgmt/glusterd.so(+0xed3b3) [0x7fa6e1d9a3b3] 
) 0-management: Lock for vol home not held
[2020-03-13 15:11:46.471105] W [MSGID: 106117] 
[glusterd-handler.c:6466:__glusterd_peer_rpc_notify] 0-management: Lock not 
released for home
[2020-03-13 15:11:46.871933] I [MSGID: 106163] 
[glusterd-handshake.c:1389:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 40000
[2020-03-13 15:11:46.952276] I [MSGID: 106490] 
[glusterd-handler.c:2611:__glusterd_handle_incoming_friend_req] 0-glusterd: 
Received probe from uuid: ce474774-436a-41d3-bfdd-ab153ac77830
[2020-03-13 15:11:57.353876] I [MSGID: 106493] 
[glusterd-handler.c:3883:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to kvm320.durchhalten.intern (0), ret: 0, op_ret: 0
[2020-03-13 15:11:57.366239] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2020-03-13 15:11:57.366273] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is stopped
[2020-03-13 15:11:57.366295] I [MSGID: 106599] 
[glusterd-nfs-svc.c:81:glusterd_nfssvc_manager] 0-management: nfs/server.so 
xlator is not installed
[2020-03-13 15:11:57.368510] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:92:glusterd_proc_stop] 0-management: Stopping glustershd 
daemon running in pid: 31828
XXXXXXXXXXXX here  the disconnects from  gluster --> but the peer is still 
connected as shown above
[2020-03-13 15:11:57.370534] I [MSGID: 106006] 
[glusterd-svc-mgmt.c:356:glusterd_svc_common_rpc_notify] 0-management: 
glustershd has disconnected from glusterd.
[2020-03-13 15:11:58.368834] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: glustershd service is 
stopped
[2020-03-13 15:11:58.368972] I [MSGID: 106567] 
[glusterd-svc-mgmt.c:220:glusterd_svc_start] 0-management: Starting glustershd 
service
[2020-03-13 15:11:59.372807] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: quotad already 
stopped
[2020-03-13 15:11:59.372883] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: quotad service is 
stopped
[2020-03-13 15:11:59.373068] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped
[2020-03-13 15:11:59.373094] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is 
stopped
[2020-03-13 15:11:59.373269] I [MSGID: 106131] 
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already stopped
[2020-03-13 15:11:59.373292] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is 
stopped
[2020-03-13 15:11:59.373787] I [MSGID: 106492] 
[glusterd-handler.c:2796:__glusterd_handle_friend_update] 0-glusterd: Received 
friend update from uuid: ce474774-436a-41d3-bfdd-ab153ac77830
[2020-03-13 15:11:59.375418] I [MSGID: 106502] 
[glusterd-handler.c:2837:__glusterd_handle_friend_update] 0-management: 
Received my uuid as Friend
[2020-03-13 15:11:59.375651] I [MSGID: 106493] 
[glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received 
ACC from uuid: ce474774-436a-41d3-bfdd-ab153ac77830
[2020-03-13 15:11:59.432044] I [MSGID: 106493] 
[glusterd-rpc-ops.c:468:__glusterd_friend_add_cbk] 0-glusterd: Received ACC 
from uuid: ce474774-436a-41d3-bfdd-ab153ac77830, host: 
kvm320.durchhalten.intern, port: 0
[2020-03-13 15:11:59.465527] I [MSGID: 106492] 
[glusterd-handler.c:2796:__glusterd_handle_friend_update] 0-glusterd: Received 
friend update from uuid: ce474774-436a-41d3-bfdd-ab153ac77830
[2020-03-13 15:11:59.466839] I [MSGID: 106502] 
[glusterd-handler.c:2837:__glusterd_handle_friend_update] 0-management: 
Received my uuid as Friend
[2020-03-13 15:11:59.566501] I [MSGID: 106493] 
[glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received 
ACC from uuid: ce474774-436a-41d3-bfdd-ab153ac77830

does someone have an advice for me?

thx bye stefan
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FV2CRCFS65GLMMNHU4MHEWUMNVUNVXYK/

[ovirt-users] Host after update NonResponsive

Reply via email to