[ovirt-users] Re: Upgrading from 4.2.8 to 4.3.3 broke Node NG GlusterFS

2019-04-24 Thread Andreas Elvers
After rebooting the node that was not able to mount the gluster volume things 
improved eventually. SPM went away and restarted for the Datacenter and 
suddenly node03 was able to mount the gluster volume. In between I was down to 
1/3 active Bricks which results in read only glusterfs. I was lucky to have the 
Engine still on NFS. But anyway... 

Thanks for your thoughts.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5NKHRCJWEZGXSBKRMR447RCX6GWAAZV6/


[ovirt-users] Re: Upgrading from 4.2.8 to 4.3.3 broke Node NG GlusterFS

2019-04-24 Thread Strahil
Fix those disconnectes node and run find  against a node that has successfully 
mounted the volume.

Best Regards,
Strahil NikolovOn Apr 24, 2019 15:31, Andreas Elvers 
 wrote:
>
> The file handle is stale so find will display: 
>
> "find: 
> '/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore': 
> Transport endpoint is not connected" 
>
> "stat /rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore" 
> will output 
> stat: cannot stat 
> '/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore': 
> Transport endpoint is not connected 
>
> All Nodes are peering with the other nodes: 
> - 
> Saiph:~ andreas$ ssh node01 gluster peer status 
> Number of Peers: 2 
>
> Hostname: node02.infra.solutions.work 
> Uuid: 87fab40a-2395-41ce-857d-0b846e078cdb 
> State: Peer in Cluster (Connected) 
>
> Hostname: node03.infra.solutions.work 
> Uuid: 49025f81-e7c1-4760-be03-f36e0f403d26 
> State: Peer in Cluster (Connected) 
>  
> Saiph:~ andreas$ ssh node02 gluster peer status 
> Number of Peers: 2 
>
> Hostname: node03.infra.solutions.work 
> Uuid: 49025f81-e7c1-4760-be03-f36e0f403d26 
> State: Peer in Cluster (Disconnected) 
>
> Hostname: node01.infra.solutions.work 
> Uuid: f25e6bff-e5e2-465f-a33e-9148bef94633 
> State: Peer in Cluster (Connected) 
>  
> ssh node03 gluster peer status 
> Number of Peers: 2 
>
> Hostname: node02.infra.solutions.work 
> Uuid: 87fab40a-2395-41ce-857d-0b846e078cdb 
> State: Peer in Cluster (Connected) 
>
> Hostname: node01.infra.solutions.work 
> Uuid: f25e6bff-e5e2-465f-a33e-9148bef94633 
> State: Peer in Cluster (Connected)
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DI6AWTLIQIPWNK2M7PBABQ4TAPB4J3S3/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GE2WD7UOHGBSZDF7DRNEL7HHHUZZJQOP/


[ovirt-users] Re: Upgrading from 4.2.8 to 4.3.3 broke Node NG GlusterFS

2019-04-24 Thread Andreas Elvers
The file handle is stale so find will display:

"find: '/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore': 
Transport endpoint is not connected"

"stat /rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore" 
will output
stat: cannot stat 
'/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore': 
Transport endpoint is not connected

All Nodes are peering with the other nodes:
- 
Saiph:~ andreas$ ssh node01 gluster peer status
Number of Peers: 2

Hostname: node02.infra.solutions.work
Uuid: 87fab40a-2395-41ce-857d-0b846e078cdb
State: Peer in Cluster (Connected)

Hostname: node03.infra.solutions.work
Uuid: 49025f81-e7c1-4760-be03-f36e0f403d26
State: Peer in Cluster (Connected)

Saiph:~ andreas$ ssh node02 gluster peer status
Number of Peers: 2

Hostname: node03.infra.solutions.work
Uuid: 49025f81-e7c1-4760-be03-f36e0f403d26
State: Peer in Cluster (Disconnected)

Hostname: node01.infra.solutions.work
Uuid: f25e6bff-e5e2-465f-a33e-9148bef94633
State: Peer in Cluster (Connected)

ssh node03 gluster peer status
Number of Peers: 2

Hostname: node02.infra.solutions.work
Uuid: 87fab40a-2395-41ce-857d-0b846e078cdb
State: Peer in Cluster (Connected)

Hostname: node01.infra.solutions.work
Uuid: f25e6bff-e5e2-465f-a33e-9148bef94633
State: Peer in Cluster (Connected)
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DI6AWTLIQIPWNK2M7PBABQ4TAPB4J3S3/


[ovirt-users] Re: Upgrading from 4.2.8 to 4.3.3 broke Node NG GlusterFS

2019-04-24 Thread Strahil Nikolov
 Try to run a find from a working server(for example node02):

find /rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore -exec 
stat {} \;


Also, check if all peers see each other.
Best Regards,Strahil Nikolov

В сряда, 24 април 2019 г., 3:27:41 ч. Гринуич-4, Andreas Elvers 
 написа:  
 
 Hi,

I am currently upgrading my oVirt setup from 4.2.8 to 4.3.3.1.

The setup consists of:

Datacenter/Cluster Default: [fully upgraded to 4.3.3.1]
  2 nodes (node04,node05)- NFS storage domain with self hosted engine 

Datacenter Luise:
  Cluster1: 3 nodes (node01,node02,node03) - Node NG with GlusterFS - Ceph 
Cinder storage domain
                  [Node1 and Node3 are upgraded to 4.3.3.1, Node2 is on 4.2.8]
  Cluster2: 1 node (node06)  - only Ceph Cinder storage domain [fully upgraded 
to 4.3.3.1]


Problems started when upgrading Luise/Cluster1 with GlusterFS:
(I always waited for GlusterFS to be fully synced before proceeding to the next 
step)

- Upgrade node01 to 4.3.3 -> OK
- Upgrade node03 to 4.3.3.1 -> OK
- Upgrade node01 to 4.3.3.1 -> GlusterFS became unstable.


I now get the error message:

VDSM node03.infra.solutions.work command ConnectStoragePoolVDS failed: Cannot 
find master domain: u'spUUID=f3218bf7-6158-4b2b-b272-51cdc3280376, 
msdUUID=02a32017-cbe6-4407-b825-4e558b784157'

And on node03 there is a problem with Gluster:

node03#: ls -l 
/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore
ls: cannot access 
/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore: Transport 
endpoint is not connected

The directory is available on node01 and node02.

The engine is reporting the brick on node03 as down. Node03 and Node06 are 
shown as NonOperational, because they are not able to access the gluster 
storage domain. 

A “gluster peer status” on node1, node2, and node3 shows all peers connected.

“gluster volume heal vmstore info” shows for all nodes:


gluster volume heal vmstore info
Brick node01.infra.solutions.work:/gluster_bricks/vmstore/vmstore
Status: Transport endpoint is not connected
Number of entries: -

Brick node02.infra.solutions.work:/gluster_bricks/vmstore/vmstore



/02a32017-cbe6-4407-b825-4e558b784157/dom_md/ids
/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.66

/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.60
/02a32017-cbe6-4407-b825-4e558b784157/images/a3a10398-9698-4b73-84d9-9735448e3534/6161e310-4ad6-42d9-8117-5a89c5b2b4b6


/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.96


/.shard/d66880de-3fa1-4362-8c43-574a173c5f7d.133


/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.38
/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.67
/__DIRECT_IO_TEST__


/02a32017-cbe6-4407-b825-4e558b784157/images/493188b2-c137-4440-99ee-43a753842a7d/9aa2d139-e3bd-406b-8fe0-b189123eaa73

/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.64
/.shard/d66880de-3fa1-4362-8c43-574a173c5f7d.132



/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.44
/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.9
/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.69

/02a32017-cbe6-4407-b825-4e558b784157/images/12e647fb-20aa-4957-b659-05fa75a9215e/f7e4b2a3-ab84-4eb5-a4e7-7208ddad8156





/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.35
/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.32


/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.39


/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.34
/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.68
Status: Connected
Number of entries: 47

Brick node03.infra.solutions.work:/gluster_bricks/vmstore/vmstore
/02a32017-cbe6-4407-b825-4e558b784157/images/12e647fb-20aa-4957-b659-05fa75a9215e/f7e4b2a3-ab84-4eb5-a4e7-7208ddad8156











/.shard/d66880de-3fa1-4362-8c43-574a173c5f7d.133






/02a32017-cbe6-4407-b825-4e558b784157/images/493188b2-c137-4440-99ee-43a753842a7d/9aa2d139-e3bd-406b-8fe0-b189123eaa73






/.shard/40948f85-2212-47f9-bd5e-102a8dd632b8.44







/02a32017-cbe6-4407-b825-4e558b784157/dom_md/ids



/02a32017-cbe6-4407-b825-4e558b784157/images/a3a10398-9698-4b73-84d9-9735448e3534/6161e310-4ad6-42d9-8117-5a89c5b2b4b6



/.shard/d66880de-3fa1-4362-8c43-574a173c5f7d.132



/__DIRECT_IO_TEST__
Status: Connected
Number of entries: 47

On Node03 there are several self healing processes, that seem to be doing 
nothing.

Oh well.. What now?

Best regards,
- Andreas
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/R5GS6AQXTEQRMUQNMEBDC72YG3A5JFF6/
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 

[ovirt-users] Re: Upgrading from 4.2.8 to 4.3.3 broke Node NG GlusterFS

2019-04-24 Thread Andreas Elvers
Restarting improved things a little bit. Still bricks on node03 are shown as 
down, but "gluster volume status" is looking better.

Saiph:~ andreas$ ssh node01 gluster volume status vmstore
Status of volume: vmstore
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick node01.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49157 0  Y   24543
Brick node02.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49154 0  Y   23795
Brick node03.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49157 0  Y   1617
Self-heal Daemon on localhost   N/A   N/AY   32121
Self-heal Daemon on node03.infra.solutions.
workN/A   N/AY   25798
Self-heal Daemon on node02.infra.solutions.
workN/A   N/AY   30879

Task Status of Volume vmstore
--
There are no active volume tasks

Saiph:~ andreas$ ssh node02 gluster volume status vmstore
Status of volume: vmstore
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick node01.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49157 0  Y   24543
Brick node02.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49154 0  Y   23795
Brick node03.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49157 0  Y   1617
Self-heal Daemon on localhost   N/A   N/AY   30879
Self-heal Daemon on node03.infra.solutions.
workN/A   N/AY   25798
Self-heal Daemon on node01.infra.solutions.
workN/A   N/AY   32121

Task Status of Volume vmstore
--
There are no active volume tasks

Saiph:~ andreas$ ssh node03 gluster volume status vmstore
Status of volume: vmstore
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick node01.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49157 0  Y   24543
Brick node02.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49154 0  Y   23795
Brick node03.infra.solutions.work:/gluster_
bricks/vmstore/vmstore  49157 0  Y   1617
Self-heal Daemon on localhost   N/A   N/AY   25798
Self-heal Daemon on node01.infra.solutions.
workN/A   N/AY   32121
Self-heal Daemon on node02.infra.solutions.
workN/A   N/AY   30879

Task Status of Volume vmstore
--
There are no active volume tasks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AAGJ42GF267NOFEQXNJRUQJD7C5UCOM5/


[ovirt-users] Re: Upgrading from 4.2.8 to 4.3.3 broke Node NG GlusterFS

2019-04-24 Thread Andreas Elvers
"systemctl restart glusterd" on node03 did not help. Still getting:

node03#: ls -l 
/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore 
ls: cannot access 
/rhev/data-center/mnt/glusterSD/node01.infra.solutions.work:_vmstore: Transport 
endpoint is not connected

Engine still shows bricks on node03 as down.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2JMM4UBZ54TNHCFUFYDX2OOVCKEMXFBX/