Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-09-08 Thread yayo (j)
2017-07-19 11:22 GMT+02:00 yayo (j) :

> running the "gluster volume heal engine" don't solve the problem...
>
> Some extra info:
>
> We have recently changed the gluster from: 2 (full repliacated) + 1
> arbiter to 3 full replicated cluster but i don't know this is the problem...
>
>

Hi,

I'm sorry for the follow up. I want to say that after upgrade all nodes to
the same level, all problems are solved and cluster works very well now!

Thank you all for the support!
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-25 Thread yayo (j)
2017-07-25 7:42 GMT+02:00 Kasturi Narra :

> These errors are because not having glusternw assigned to the correct
> interface. Once you attach that these errors should go away.  This has
> nothing to do with the problem you are seeing.
>

Hi,

You talking  about errors like these?

2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbro
ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
Could not associate brick 'gdnode01:/gluster/engine/brick' of volume
'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster
network found in cluster '0002-0002-0002-0002-017a'


How to assign "glusternw (???)" to the correct interface?

Other errors on unsync gluster elements still remain... This is a
production env, so, there is any chance to subscribe to RH support?

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-24 Thread Kasturi Narra
These errors are because not having glusternw assigned to the correct
interface. Once you attach that these errors should go away.  This has
nothing to do with the problem you are seeing.

sahina any idea about engine not showing the correct volume info ?

On Mon, Jul 24, 2017 at 7:30 PM, yayo (j)  wrote:

> Hi,
>
> UI refreshed but problem still remain ...
>
> No specific error, I've only these errors but I've read that there is no
> problem if I have this kind of errors:
>
>
> 2017-07-24 15:53:59,823+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] START, GlusterServersListVDSCommand(HostName =
> node01.localdomain.local, VdsIdVDSCommandParametersBase:{runAsync='true',
> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
> 2017-07-24 15:54:01,066+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] FINISH, GlusterServersListVDSCommand, return: 
> [10.10.20.80/24:CONNECTED,
> node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id: 29a62417
> 2017-07-24 15:54:01,076+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] START, GlusterVolumesListVDSCommand(HostName =
> node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync='true',
> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3
> 2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode01:/gluster/engine/brick' of volume
> 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster
> network found in cluster '0002-0002-0002-0002-017a'
> 2017-07-24 15:54:02,212+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode02:/gluster/engine/brick' of volume
> 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster
> network found in cluster '0002-0002-0002-0002-017a'
> 2017-07-24 15:54:02,215+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode04:/gluster/engine/brick' of volume
> 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster
> network found in cluster '0002-0002-0002-0002-017a'
> 2017-07-24 15:54:02,218+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode01:/gluster/data/brick' of volume
> 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no gluster
> network found in cluster '0002-0002-0002-0002-017a'
> 2017-07-24 15:54:02,221+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode02:/gluster/data/brick' of volume
> 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no gluster
> network found in cluster '0002-0002-0002-0002-017a'
> 2017-07-24 15:54:02,224+02 WARN  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4]
> Could not associate brick 'gdnode04:/gluster/data/brick' of volume
> 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no gluster
> network found in cluster '0002-0002-0002-0002-017a'
> 2017-07-24 15:54:02,224+02 INFO  [org.ovirt.engine.core.vdsbro
> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
> [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d-437
> b-8ba7-4f2a23d17515=org.ovirt.engine.core.common.businessentities.gluste
> r.GlusterVolumeEntity@fdc91062, c7a5dfc9-3e72-4ea1-843e-c8275d
> 4a7c2d=org.ovirt.engine.core.common.businessentities.gluste
> r.GlusterVolumeEntity@999a6f23}, log id: 7fce25d3
>
>
> Thank you
>
>
> 2017-07-24 8:12 GMT+02:00 Kasturi Narra :
>
>> Hi,
>>
>>Regarding the UI showing incorrect information about engine and data
>> volumes, can you please refresh the UI and see if the issue persists  plus
>> any errors in the engine.log files ?
>>
>> Thanks
>> kasturi
>>
>> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N 
>> wrote:
>>
>>>
>>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>>
>>> Hi,
>>>
>>> Sorry for follow up again, but, checking the ovirt interface I've found
>>> that ovirt report the "engine" volume as an "arbiter" configuration and the
>>> "data" volume as full replicated volume. Check these screenshots:
>>>
>>>
>>> This is probably some refresh bug in the UI, Sahina might be able to
>>> tell you.
>>>
>>>
>>> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>>> VmR5aDQ?usp=sharing
>>>
>>> But the "gluster volume info" command report that all 2 volume are full
>>> replicated:
>>>
>>>
>>> *Volu

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-24 Thread yayo (j)
>
> All these ip are pingable and hosts resolvible across all 3 nodes but,
>> only the 10.10.10.0 network is the decidated network for gluster  (rosolved
>> using gdnode* host names) ... You think that remove other entries can fix
>> the problem? So, sorry, but, how can I remove other entries?
>>
> I don't think having extra entries could be a problem. Did you check the
> fuse mount logs for disconnect messages that I referred to in the other
> email?
>



* tail -f
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-dvirtgluster\:engine.log*

*NODE01:*


[2017-07-24 07:34:00.799347] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 07:44:46.687334] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 09:04:25.951350] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 09:15:11.839357] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 10:34:51.231353] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 10:45:36.991321] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 12:05:16.383323] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 12:16:02.271320] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers
[2017-07-24 13:35:41.535308] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0
-glusterfsd-mgmt: failed to connect with remote-host: gdnode03 (Transport
endpoint is not connected)
[2017-07-24 13:46:27.423304] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0
-glusterfsd-mgmt: Exhausted all volfile servers



Why again gdnode03? Was removed from gluster! was the arbiter node...


*NODE02:*


[2017-07-24 14:08:18.709209] I [MSGID: 108026] [
afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [
1]  sinks=2
[2017-07-24 14:08:38.746688] I [MSGID: 108026] [
afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81
[2017-07-24 14:08:38.749379] I [MSGID: 108026] [
afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
sources=0 [1]  sinks=2
[2017-07-24 14:08:46.068001] I [MSGID: 108026] [
afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0 [
1]  sinks=2
The message "I [MSGID: 108026] [
afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81" repeated 3 times between [2017-07-24
14:08:38.746688] and [2017-07-24 14:10:09.088625]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=0 [1]  sinks=2 " repeated 3
times between [2017-07-24 14:08:38.749379] and [2017-07-24 14:10:09.091377]
[2017-07-24 14:10:19.384379] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on db56ac00-fd5b-4326-a879-326ff56181de. sources=0
[1]  sinks=2
[2017-07-24 14:10:39.433155] I [MSGID: 108026] [afr-self-heal-metadata.c:51:
__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata
selfheal on f05b9742-2771-484a-85fc-5b6974bcef81
[2017-07-24 14:10:39.435847] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
sources=0 [1]  sinks=2



*NODE04:*


[2017-07-24 14:08:56.789598] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2
[2017-07-24 14:09:17.231987] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00
-fd5b-4326-a879-326ff56181de. sources=[0] 1  sinks=2
[2017-07-24 14:09:38.039541] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2
[2017-07-24 14:09:48.875602] I [MSGID: 108026] [afr-self-heal-common.c:1254
:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on db56ac00
-fd5b-4326-a879-326ff56181de. sources=[0] 1  sinks=2
[2017-07-24 14:10:39.832068] I [MSGID: 108026] [afr-self

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-24 Thread yayo (j)
Hi,

UI refreshed but problem still remain ...

No specific error, I've only these errors but I've read that there is no
problem if I have this kind of errors:


2017-07-24 15:53:59,823+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] START,
GlusterServersListVDSCommand(HostName
= node01.localdomain.local, VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
2017-07-24 15:54:01,066+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterServersListVDSCommand,
return: [10.10.20.80/24:CONNECTED, node02.localdomain.local:CONNECTED,
gdnode04:CONNECTED], log id: 29a62417
2017-07-24 15:54:01,076+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] START,
GlusterVolumesListVDSCommand(HostName
= node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync='true',
hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3
2017-07-24 15:54:02,209+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode01:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-
4f2a23d17515' with correct network as no gluster network found in cluster
'0002-0002-0002-0002-017a'
2017-07-24 15:54:02,212+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode02:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-
4f2a23d17515' with correct network as no gluster network found in cluster
'0002-0002-0002-0002-017a'
2017-07-24 15:54:02,215+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode04:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-
4f2a23d17515' with correct network as no gluster network found in cluster
'0002-0002-0002-0002-017a'
2017-07-24 15:54:02,218+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode01:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-
c8275d4a7c2d' with correct network as no gluster network found in cluster
'0002-0002-0002-0002-017a'
2017-07-24 15:54:02,221+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode02:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-
c8275d4a7c2d' with correct network as no gluster network found in cluster
'0002-0002-0002-0002-017a'
2017-07-24 15:54:02,224+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler2) [b7590c4] Could not associate brick
'gdnode04:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-
c8275d4a7c2d' with correct network as no gluster network found in cluster
'0002-0002-0002-0002-017a'
2017-07-24 15:54:02,224+02 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterVolumesListVDSCommand,
return: {d19c19e3-910d-437b-8ba7-4f2a23d17515=org.ovirt.engine.core.
common.businessentities.gluster.GlusterVolumeEntity@fdc91062, c7a5dfc9-3e72
-4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.common.businessentities.
gluster.GlusterVolumeEntity@999a6f23}, log id: 7fce25d3


Thank you


2017-07-24 8:12 GMT+02:00 Kasturi Narra :

> Hi,
>
>Regarding the UI showing incorrect information about engine and data
> volumes, can you please refresh the UI and see if the issue persists  plus
> any errors in the engine.log files ?
>
> Thanks
> kasturi
>
> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N 
> wrote:
>
>>
>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>
>> Hi,
>>
>> Sorry for follow up again, but, checking the ovirt interface I've found
>> that ovirt report the "engine" volume as an "arbiter" configuration and the
>> "data" volume as full replicated volume. Check these screenshots:
>>
>>
>> This is probably some refresh bug in the UI, Sahina might be able to tell
>> you.
>>
>>
>> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>> VmR5aDQ?usp=sharing
>>
>> But the "gluster volume info" command report that all 2 volume are full
>> replicated:
>>
>>
>> *Volume Name: data*
>> *Type: Replicate*
>> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
>> *Status: Started*
>> *Snapshot Count: 0*
>> *Number of Bricks: 1 x 3 = 3*
>> *Transport-type: tcp*
>> *Bricks:*
>> *Brick1: gdnode01:/gluster/data/brick*
>> *Brick2: gdnode02:/gluster/data/brick*
>> *Brick3: gdnode04:/gluster/data/brick*
>> *Options Reconfigured:*
>> *nfs.disable: on*
>> *performance.readdir-ahead: on*
>> *transport.address-family: inet*
>> *storage.owner-uid: 36*
>> *per

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-23 Thread Kasturi Narra
Hi,

   Regarding the UI showing incorrect information about engine and data
volumes, can you please refresh the UI and see if the issue persists  plus
any errors in the engine.log files ?

Thanks
kasturi

On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N 
wrote:

>
> On 07/21/2017 11:41 PM, yayo (j) wrote:
>
> Hi,
>
> Sorry for follow up again, but, checking the ovirt interface I've found
> that ovirt report the "engine" volume as an "arbiter" configuration and the
> "data" volume as full replicated volume. Check these screenshots:
>
>
> This is probably some refresh bug in the UI, Sahina might be able to tell
> you.
>
>
> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?
> usp=sharing
>
> But the "gluster volume info" command report that all 2 volume are full
> replicated:
>
>
> *Volume Name: data*
> *Type: Replicate*
> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
> *Status: Started*
> *Snapshot Count: 0*
> *Number of Bricks: 1 x 3 = 3*
> *Transport-type: tcp*
> *Bricks:*
> *Brick1: gdnode01:/gluster/data/brick*
> *Brick2: gdnode02:/gluster/data/brick*
> *Brick3: gdnode04:/gluster/data/brick*
> *Options Reconfigured:*
> *nfs.disable: on*
> *performance.readdir-ahead: on*
> *transport.address-family: inet*
> *storage.owner-uid: 36*
> *performance.quick-read: off*
> *performance.read-ahead: off*
> *performance.io-cache: off*
> *performance.stat-prefetch: off*
> *performance.low-prio-threads: 32*
> *network.remote-dio: enable*
> *cluster.eager-lock: enable*
> *cluster.quorum-type: auto*
> *cluster.server-quorum-type: server*
> *cluster.data-self-heal-algorithm: full*
> *cluster.locking-scheme: granular*
> *cluster.shd-max-threads: 8*
> *cluster.shd-wait-qlength: 1*
> *features.shard: on*
> *user.cifs: off*
> *storage.owner-gid: 36*
> *features.shard-block-size: 512MB*
> *network.ping-timeout: 30*
> *performance.strict-o-direct: on*
> *cluster.granular-entry-heal: on*
> *auth.allow: **
> *server.allow-insecure: on*
>
>
>
>
>
> *Volume Name: engine*
> *Type: Replicate*
> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
> *Status: Started*
> *Snapshot Count: 0*
> *Number of Bricks: 1 x 3 = 3*
> *Transport-type: tcp*
> *Bricks:*
> *Brick1: gdnode01:/gluster/engine/brick*
> *Brick2: gdnode02:/gluster/engine/brick*
> *Brick3: gdnode04:/gluster/engine/brick*
> *Options Reconfigured:*
> *nfs.disable: on*
> *performance.readdir-ahead: on*
> *transport.address-family: inet*
> *storage.owner-uid: 36*
> *performance.quick-read: off*
> *performance.read-ahead: off*
> *performance.io-cache: off*
> *performance.stat-prefetch: off*
> *performance.low-prio-threads: 32*
> *network.remote-dio: off*
> *cluster.eager-lock: enable*
> *cluster.quorum-type: auto*
> *cluster.server-quorum-type: server*
> *cluster.data-self-heal-algorithm: full*
> *cluster.locking-scheme: granular*
> *cluster.shd-max-threads: 8*
> *cluster.shd-wait-qlength: 1*
> *features.shard: on*
> *user.cifs: off*
> *storage.owner-gid: 36*
> *features.shard-block-size: 512MB*
> *network.ping-timeout: 30*
> *performance.strict-o-direct: on*
> *cluster.granular-entry-heal: on*
> *auth.allow: **
>
>   server.allow-insecure: on
>
>
> 2017-07-21 19:13 GMT+02:00 yayo (j) :
>
>> 2017-07-20 14:48 GMT+02:00 Ravishankar N :
>>
>>>
>>> But it does  say something. All these gfids of completed heals in the
>>> log below are the for the ones that you have given the getfattr output of.
>>> So what is likely happening is there is an intermittent connection problem
>>> between your mount and the brick process, leading to pending heals again
>>> after the heal gets completed, which is why the numbers are varying each
>>> time. You would need to check why that is the case.
>>> Hope this helps,
>>> Ravi
>>>
>>>
>>>
>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>>> sources=[0] 1  sinks=2*
>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>> 0-engine-replicate-0: performing metadata selfheal on
>>> f05b9742-2771-484a-85fc-5b6974bcef81*
>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
>>> sources=[0] 1  sinks=2*
>>>
>>>
>>
>> Hi,
>>
>> following your suggestion, I've checked the "peer" status and I found
>> that there is too many name for the hosts, I don't know if this can be the
>> problem or part of it:
>>
>> *gluster peer status on NODE01:*
>> *Number of Peers: 2*
>>
>> *Hostname: dnode02.localdomain.local*
>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>> *State: Peer in Cluster (Connected)*
>> *Other names:*
>> *192.168.10.52*
>> *dnode02.localdomain.local*
>> *10.10.20.90*
>> *10.10.10.20*
>>
>>
>>
>>
>> *gluster peer status on NODE02:*
>> *Number of Peers: 2*
>>
>> *Hostname: dnode01

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread Ravishankar N


On 07/21/2017 11:41 PM, yayo (j) wrote:

Hi,

Sorry for follow up again, but, checking the ovirt interface I've 
found that ovirt report the "engine" volume as an "arbiter" 
configuration and the "data" volume as full replicated volume. Check 
these screenshots:


This is probably some refresh bug in the UI, Sahina might be able to 
tell you.


https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume are 
full replicated:



/Volume Name: data/
/Type: Replicate/
/Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: gdnode01:/gluster/data/brick/
/Brick2: gdnode02:/gluster/data/brick/
/Brick3: gdnode04:/gluster/data/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: enable/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */
/server.allow-insecure: on/





/Volume Name: engine/
/Type: Replicate/
/Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: gdnode01:/gluster/engine/brick/
/Brick2: gdnode02:/gluster/engine/brick/
/Brick3: gdnode04:/gluster/engine/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: off/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */

  server.allow-insecure: on


2017-07-21 19:13 GMT+02:00 yayo (j) >:


2017-07-20 14:48 GMT+02:00 Ravishankar N mailto:ravishan...@redhat.com>>:


But it does  say something. All these gfids of completed heals
in the log below are the for the ones that you have given the
getfattr output of. So what is likely happening is there is an
intermittent connection problem between your mount and the
brick process, leading to pending heals again after the heal
gets completed, which is why the numbers are varying each
time. You would need to check why that is the case.
Hope this helps,
Ravi




/[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
/[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81/
/[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/




Hi,

following your suggestion, I've checked the "peer" status and I
found that there is too many name for the hosts, I don't know if
this can be the problem or part of it:

/*gluster peer status on NODE01:*/
/Number of Peers: 2/
/
/
/Hostname: dnode02.localdomain.local/
/Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
/State: Peer in Cluster (Connected)/
/Other names:/
/192.168.10.52/
/dnode02.localdomain.local/
/10.10.20.90/
/10.10.10.20/

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread yayo (j)
Hi,

Sorry for follow up again, but, checking the ovirt interface I've found
that ovirt report the "engine" volume as an "arbiter" configuration and the
"data" volume as full replicated volume. Check these screenshots:

https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume are full
replicated:


*Volume Name: data*
*Type: Replicate*
*Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
*Status: Started*
*Snapshot Count: 0*
*Number of Bricks: 1 x 3 = 3*
*Transport-type: tcp*
*Bricks:*
*Brick1: gdnode01:/gluster/data/brick*
*Brick2: gdnode02:/gluster/data/brick*
*Brick3: gdnode04:/gluster/data/brick*
*Options Reconfigured:*
*nfs.disable: on*
*performance.readdir-ahead: on*
*transport.address-family: inet*
*storage.owner-uid: 36*
*performance.quick-read: off*
*performance.read-ahead: off*
*performance.io-cache: off*
*performance.stat-prefetch: off*
*performance.low-prio-threads: 32*
*network.remote-dio: enable*
*cluster.eager-lock: enable*
*cluster.quorum-type: auto*
*cluster.server-quorum-type: server*
*cluster.data-self-heal-algorithm: full*
*cluster.locking-scheme: granular*
*cluster.shd-max-threads: 8*
*cluster.shd-wait-qlength: 1*
*features.shard: on*
*user.cifs: off*
*storage.owner-gid: 36*
*features.shard-block-size: 512MB*
*network.ping-timeout: 30*
*performance.strict-o-direct: on*
*cluster.granular-entry-heal: on*
*auth.allow: **
*server.allow-insecure: on*





*Volume Name: engine*
*Type: Replicate*
*Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
*Status: Started*
*Snapshot Count: 0*
*Number of Bricks: 1 x 3 = 3*
*Transport-type: tcp*
*Bricks:*
*Brick1: gdnode01:/gluster/engine/brick*
*Brick2: gdnode02:/gluster/engine/brick*
*Brick3: gdnode04:/gluster/engine/brick*
*Options Reconfigured:*
*nfs.disable: on*
*performance.readdir-ahead: on*
*transport.address-family: inet*
*storage.owner-uid: 36*
*performance.quick-read: off*
*performance.read-ahead: off*
*performance.io-cache: off*
*performance.stat-prefetch: off*
*performance.low-prio-threads: 32*
*network.remote-dio: off*
*cluster.eager-lock: enable*
*cluster.quorum-type: auto*
*cluster.server-quorum-type: server*
*cluster.data-self-heal-algorithm: full*
*cluster.locking-scheme: granular*
*cluster.shd-max-threads: 8*
*cluster.shd-wait-qlength: 1*
*features.shard: on*
*user.cifs: off*
*storage.owner-gid: 36*
*features.shard-block-size: 512MB*
*network.ping-timeout: 30*
*performance.strict-o-direct: on*
*cluster.granular-entry-heal: on*
*auth.allow: **

  server.allow-insecure: on


2017-07-21 19:13 GMT+02:00 yayo (j) :

> 2017-07-20 14:48 GMT+02:00 Ravishankar N :
>
>>
>> But it does  say something. All these gfids of completed heals in the log
>> below are the for the ones that you have given the getfattr output of. So
>> what is likely happening is there is an intermittent connection problem
>> between your mount and the brick process, leading to pending heals again
>> after the heal gets completed, which is why the numbers are varying each
>> time. You would need to check why that is the case.
>> Hope this helps,
>> Ravi
>>
>>
>>
>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>> sources=[0] 1  sinks=2*
>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>> 0-engine-replicate-0: performing metadata selfheal on
>> f05b9742-2771-484a-85fc-5b6974bcef81*
>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
>> sources=[0] 1  sinks=2*
>>
>>
>
> Hi,
>
> following your suggestion, I've checked the "peer" status and I found that
> there is too many name for the hosts, I don't know if this can be the
> problem or part of it:
>
> *gluster peer status on NODE01:*
> *Number of Peers: 2*
>
> *Hostname: dnode02.localdomain.local*
> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *192.168.10.52*
> *dnode02.localdomain.local*
> *10.10.20.90*
> *10.10.10.20*
>
>
>
>
> *gluster peer status on NODE02:*
> *Number of Peers: 2*
>
> *Hostname: dnode01.localdomain.local*
> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *gdnode01*
> *10.10.10.10*
>
> *Hostname: gdnode04*
> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *192.168.10.54*
> *10.10.10.40*
>
>
> *gluster peer status on NODE04:*
> *Number of Peers: 2*
>
> *Hostname: dnode02.neridom.dom*
> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *10.10.20.90*
> *gdnode02*
> *192.168.10.52*
> *10.10.10.20*
>
> *Hostname: dnode01.localdomain.local*
> *Uuid: a568bd60

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread yayo (j)
2017-07-20 14:48 GMT+02:00 Ravishankar N :

>
> But it does  say something. All these gfids of completed heals in the log
> below are the for the ones that you have given the getfattr output of. So
> what is likely happening is there is an intermittent connection problem
> between your mount and the brick process, leading to pending heals again
> after the heal gets completed, which is why the numbers are varying each
> time. You would need to check why that is the case.
> Hope this helps,
> Ravi
>
>
>
> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
> sources=[0] 1  sinks=2*
> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
> 0-engine-replicate-0: performing metadata selfheal on
> f05b9742-2771-484a-85fc-5b6974bcef81*
> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
> sources=[0] 1  sinks=2*
>
>

Hi,

following your suggestion, I've checked the "peer" status and I found that
there is too many name for the hosts, I don't know if this can be the
problem or part of it:

*gluster peer status on NODE01:*
*Number of Peers: 2*

*Hostname: dnode02.localdomain.local*
*Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
*State: Peer in Cluster (Connected)*
*Other names:*
*192.168.10.52*
*dnode02.localdomain.local*
*10.10.20.90*
*10.10.10.20*




*gluster peer status on NODE02:*
*Number of Peers: 2*

*Hostname: dnode01.localdomain.local*
*Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
*State: Peer in Cluster (Connected)*
*Other names:*
*gdnode01*
*10.10.10.10*

*Hostname: gdnode04*
*Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
*State: Peer in Cluster (Connected)*
*Other names:*
*192.168.10.54*
*10.10.10.40*


*gluster peer status on NODE04:*
*Number of Peers: 2*

*Hostname: dnode02.neridom.dom*
*Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
*State: Peer in Cluster (Connected)*
*Other names:*
*10.10.20.90*
*gdnode02*
*192.168.10.52*
*10.10.10.20*

*Hostname: dnode01.localdomain.local*
*Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
*State: Peer in Cluster (Connected)*
*Other names:*
*gdnode01*
*10.10.10.10*



All these ip are pingable and hosts resolvible across all 3 nodes but, only
the 10.10.10.0 network is the decidated network for gluster  (rosolved
using gdnode* host names) ... You think that remove other entries can fix
the problem? So, sorry, but, how can I remove other entries?

And, what about the selinux?

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread Ravishankar N


On 07/21/2017 02:55 PM, yayo (j) wrote:
2017-07-20 14:48 GMT+02:00 Ravishankar N >:



But it does  say something. All these gfids of completed heals in
the log below are the for the ones that you have given the
getfattr output of. So what is likely happening is there is an
intermittent connection problem between your mount and the brick
process, leading to pending heals again after the heal gets
completed, which is why the numbers are varying each time. You
would need to check why that is the case.
Hope this helps,
Ravi




/[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
/[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81/
/[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/




Hi,

But we ha1e 2 gluster volume on the same network and the other one 
(the "Data" gluster) don't have any problems. Why you think there is a 
network problem?


Because pending self-heals come into the picture when I/O from the 
clients (mounts) do not succeed on some bricks. They are mostly due to

(a) the client losing connection to some bricks (likely),
(b) the I/O failing on the bricks themselves (unlikely).

If most of the i/o is also going to the 3rd brick (since you say the 
files are already present on all bricks and I/O is successful) , then it 
is likely to be (a).



How to check this on a gluster infrastructure?

In the fuse mount logs for the engine volume, check if there are any 
messages for brick disconnects. Something along the lines of 
"disconnected from volname-client-x".
Just guessing here, but maybe even the 'data' volume did experience 
disconnects and self-heals later but you did not observe it when you ran 
heal info. See the glustershd log or mount log for for self-heal 
completion messages on /0-data-replicate-0 /also.


Regards,
Ravi

Thank you





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread yayo (j)
2017-07-20 14:48 GMT+02:00 Ravishankar N :

>
> But it does  say something. All these gfids of completed heals in the log
> below are the for the ones that you have given the getfattr output of. So
> what is likely happening is there is an intermittent connection problem
> between your mount and the brick process, leading to pending heals again
> after the heal gets completed, which is why the numbers are varying each
> time. You would need to check why that is the case.
> Hope this helps,
> Ravi
>
>
>
> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
> sources=[0] 1  sinks=2*
> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
> 0-engine-replicate-0: performing metadata selfheal on
> f05b9742-2771-484a-85fc-5b6974bcef81*
> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
> sources=[0] 1  sinks=2*
>
>

Hi,

But we ha1e 2 gluster volume on the same network and the other one (the
"Data" gluster) don't have any problems. Why you think there is a network
problem?  How to check this on a gluster infrastructure?

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-20 Thread Ravishankar N



On 07/20/2017 03:42 PM, yayo (j) wrote:


2017-07-20 11:34 GMT+02:00 Ravishankar N >:



Could you check if the self-heal daemon on all nodes is connected
to the 3 bricks? You will need to check the glustershd.log for that.
If it is not connected, try restarting the shd using `gluster
volume start engine force`, then launch the heal command like you
did earlier and see if heals happen.


I've executed the command on all 3 nodes (Know is enougth only one) , 
after that the "heal" command report elements between 6 and 10 ... 
(sometime 6, sometime 8, sometime 10)



Log on glustershd.log don't say anything :


But it does  say something. All these gfids of completed heals in the 
log below are the for the ones that you have given the getfattr output 
of. So what is likely happening is there is an intermittent connection 
problem between your mount and the brick process, leading to pending 
heals again after the heal gets completed, which is why the numbers are 
varying each time. You would need to check why that is the case.

Hope this helps,
Ravi



/[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
/[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81/
/[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/


If it doesn't, please provide the getfattr outputs of the 12 files
from all 3 nodes using `getfattr -d -m . -e hex
//gluster/engine/brick//path-to-file` ?


*/NODE01:/*
/getfattr: Removing leading '/' from absolute path names/
/# file:
gluster/engine/brick/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.68/
/trusted.afr.dirty=0x/
/trusted.afr.engine-client-1=0x/
/trusted.afr.engine-client-2=0x0012/
/trusted.bit-rot.version=0x090059647d5b000447e9/
/trusted.gfid=0xe3565b5014954e5bae883bceca47b7d9/
/
/
/getfattr: Removing leading '/' from absolute path names/
/# file:
gluster/engine/brick/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.48/
/trusted.afr.dirty=0x/
/trusted.afr.engine-client-1=0x/
/trusted.afr.engine-client-2=0x000e/
/trusted.bit-rot.version=0x090059647d5b000447e9/
/trusted.gfid=0x676067891f344c1586b8c0d05b07f187/
/
/
/getfattr: Removing leading '/' from absolute path names/
/# file:

gluster/engine/brick/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/19d71267-52a4-42a3-bb1e-e3145361c0c2/7a215635-02f3-47db-80db-8b689c6a8f01/
/trusted.afr.dirty=0x/
/trusted.afr.engine-client-1=0x/
/trusted.afr.engine-client-2=0x0055/
/trusted.bit-rot.version=0x090059647d5b000447e9/
/trusted.gfid=0x8aa745646740403ead51f56d9ca5d7a7/
/trusted.glusterfs.shard.block-size=0x2000/

/trusted.glusterfs.shard.file-size=0x000c80d4f229/
/
/
/getfattr: Removing leading '/' from absolute path names/
/# file:
gluster/engine/brick/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.60/
/trusted.afr.dirty=0x/
/trusted.afr.engine-client-1=0x/
/trusted.afr.engine-client-2=0x0007/
/trusted.bit-rot.version=0x090059647d5b000447e9/
/trusted.gfid=0x4e33ac33dddb4e29b4a351770b81166a/
/
/
/getfattr: Removing leading '/' from absolute path names/
/# file:
gluster/engine/brick/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/dom_md/ids/
/trusted.afr.dirty=0x/
/trusted.afr.engine-client-1=0x/
/trusted.afr.engine-client-2=0x/
/trusted.bit-rot.version=0x0f0059647d5b000447e9/
/trusted.gfid=0x2581cb9ac2b74bd9ac17a09bd2f001b3/
/trusted.glusterfs.shard.block-size=0x2000/

/trusted.glusterfs.shard.file-size=0x00100800/
/
/
/getfattr: Removing leading '/' from absolute path names/
/# file: gluster/engine/brick/__DIRECT_IO_TEST__/
/trusted.afr.dirty=0x/
/trusted.afr.engine-client-1=0x/
/trusted.afr.engine-client-2=0x/
/trusted.gfid=0xf05b97422771484a85fc5b6974bcef81/
/trusted.gluster

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-20 Thread yayo (j)
2017-07-20 11:34 GMT+02:00 Ravishankar N :

>
> Could you check if the self-heal daemon on all nodes is connected to the 3
> bricks? You will need to check the glustershd.log for that.
> If it is not connected, try restarting the shd using `gluster volume start
> engine force`, then launch the heal command like you did earlier and see if
> heals happen.
>
>
I've executed the command on all 3 nodes (Know is enougth only one) , after
that the "heal" command report elements between 6 and 10 ... (sometime 6,
sometime 8, sometime 10)


Log on glustershd.log don't say anything :

*[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
sources=[0] 1  sinks=2*
*[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81*
*[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
sources=[0] 1  sinks=2*




> If it doesn't, please provide the getfattr outputs of the 12 files from
> all 3 nodes using `getfattr -d -m . -e hex */gluster/engine/brick/*
> path-to-file` ?
>
>
*NODE01:*
*getfattr: Removing leading '/' from absolute path names*
*# file:
gluster/engine/brick/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.68*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x0012*
*trusted.bit-rot.version=0x090059647d5b000447e9*
*trusted.gfid=0xe3565b5014954e5bae883bceca47b7d9*

*getfattr: Removing leading '/' from absolute path names*
*# file:
gluster/engine/brick/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.48*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x000e*
*trusted.bit-rot.version=0x090059647d5b000447e9*
*trusted.gfid=0x676067891f344c1586b8c0d05b07f187*

*getfattr: Removing leading '/' from absolute path names*
*# file:
gluster/engine/brick/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/19d71267-52a4-42a3-bb1e-e3145361c0c2/7a215635-02f3-47db-80db-8b689c6a8f01*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x0055*
*trusted.bit-rot.version=0x090059647d5b000447e9*
*trusted.gfid=0x8aa745646740403ead51f56d9ca5d7a7*
*trusted.glusterfs.shard.block-size=0x2000*
*trusted.glusterfs.shard.file-size=0x000c80d4f229*

*getfattr: Removing leading '/' from absolute path names*
*# file:
gluster/engine/brick/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.60*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x0007*
*trusted.bit-rot.version=0x090059647d5b000447e9*
*trusted.gfid=0x4e33ac33dddb4e29b4a351770b81166a*

*getfattr: Removing leading '/' from absolute path names*
*# file:
gluster/engine/brick/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/dom_md/ids*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x*
*trusted.bit-rot.version=0x0f0059647d5b000447e9*
*trusted.gfid=0x2581cb9ac2b74bd9ac17a09bd2f001b3*
*trusted.glusterfs.shard.block-size=0x2000*
*trusted.glusterfs.shard.file-size=0x00100800*

*getfattr: Removing leading '/' from absolute path names*
*# file: gluster/engine/brick/__DIRECT_IO_TEST__*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x*
*trusted.gfid=0xf05b97422771484a85fc5b6974bcef81*
*trusted.glusterfs.shard.block-size=0x2000*
*trusted.glusterfs.shard.file-size=0x*

*getfattr: Removing leading '/' from absolute path names*
*# file:
gluster/engine/brick/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/88d41053-a257-4272-9e2e-2f3de0743b81/6573ed08-d3ed-4d12-9227-2c95941e1ad6*
*trusted.afr.dirty=0x*
*trusted.afr.engine-client-1=0x*
*trusted.afr.engine-client-2=0x0001*
*trusted.bit-rot.version=0x0f0059647d5b000447e9*
*trusted.gfid=0xe6dfd556340b4b76b47b7b6f5bd74327*
*trusted.glusterfs.shard.block-size=0x2000*
*trusted.glusterfs.shard.file-size=0x00100800*

*getfattr: Removing leading '/' from absolute path names*
*# file:
glust

Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-20 Thread Ravishankar N


On 07/20/2017 02:20 PM, yayo (j) wrote:

Hi,

Thank you for the answer and sorry for delay:

2017-07-19 16:55 GMT+02:00 Ravishankar N >:


1. What does the glustershd.log say on all 3 nodes when you run
the command? Does it complain anything about these files?


No, glustershd.log is clean, no extra log after command on all 3 nodes


Could you check if the self-heal daemon on all nodes is connected to the 
3 bricks? You will need to check the glustershd.log for that.
If it is not connected, try restarting the shd using `gluster volume 
start engine force`, then launch the heal command like you did earlier 
and see if heals happen.


If it doesn't, please provide the getfattr outputs of the 12 files from 
all 3 nodes using `getfattr -d -m . -e hex 
//gluster/engine/brick//path-to-file` ?


Thanks,
Ravi


2. Are these 12 files also present in the 3rd data brick?


I've checked right now: all files exists in all 3 nodes

3. Can you provide the output of `gluster volume info` for the
this volume?



/Volume Name: engine/
/Type: Replicate/
/Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: node01:/gluster/engine/brick/
/Brick2: node02:/gluster/engine/brick/
/Brick3: node04:/gluster/engine/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: off/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */

  server.allow-insecure: on





Some extra info:

We have recently changed the gluster from: 2 (full
repliacated) + 1 arbiter to 3 full replicated cluster



Just curious, how did you do this? `remove-brick` of arbiter
brick  followed by an `add-brick` to increase to replica-3?


Yes


#gluster volume remove-brick engine replica 2 
node03:/gluster/data/brick force *(OK!)*


#gluster volume heal engine info *(no entries!)*

#gluster volume add-brick engine replica 3 
node04:/gluster/engine/brick *(OK!)*


*After some minutes*

[root@node01 ~]#  gluster volume heal engine info
Brick node01:/gluster/engine/brick
Status: Connected
Number of entries: 0

Brick node02:/gluster/engine/brick
Status: Connected
Number of entries: 0

Brick node04:/gluster/engine/brick
Status: Connected
Number of entries: 0

Thanks,
Ravi


Another extra info (I don't know if this can be the problem): Five 
days ago A black out has suddenly shut down the networks switch (also 
gluster network) of node 03 and 04 ... But I don't know this problem 
is in place after this black out


Thank you!



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-20 Thread yayo (j)
Hi,

Thank you for the answer and sorry for delay:

2017-07-19 16:55 GMT+02:00 Ravishankar N :

1. What does the glustershd.log say on all 3 nodes when you run the
> command? Does it complain anything about these files?
>

No, glustershd.log is clean, no extra log after command on all 3 nodes


> 2. Are these 12 files also present in the 3rd data brick?
>

I've checked right now: all files exists in all 3 nodes


> 3. Can you provide the output of `gluster volume info` for the this volume?
>


*Volume Name: engine*
*Type: Replicate*
*Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
*Status: Started*
*Snapshot Count: 0*
*Number of Bricks: 1 x 3 = 3*
*Transport-type: tcp*
*Bricks:*
*Brick1: node01:/gluster/engine/brick*
*Brick2: node02:/gluster/engine/brick*
*Brick3: node04:/gluster/engine/brick*
*Options Reconfigured:*
*nfs.disable: on*
*performance.readdir-ahead: on*
*transport.address-family: inet*
*storage.owner-uid: 36*
*performance.quick-read: off*
*performance.read-ahead: off*
*performance.io-cache: off*
*performance.stat-prefetch: off*
*performance.low-prio-threads: 32*
*network.remote-dio: off*
*cluster.eager-lock: enable*
*cluster.quorum-type: auto*
*cluster.server-quorum-type: server*
*cluster.data-self-heal-algorithm: full*
*cluster.locking-scheme: granular*
*cluster.shd-max-threads: 8*
*cluster.shd-wait-qlength: 1*
*features.shard: on*
*user.cifs: off*
*storage.owner-gid: 36*
*features.shard-block-size: 512MB*
*network.ping-timeout: 30*
*performance.strict-o-direct: on*
*cluster.granular-entry-heal: on*
*auth.allow: **

  server.allow-insecure: on





>
> Some extra info:
>>
>> We have recently changed the gluster from: 2 (full repliacated) + 1
>> arbiter to 3 full replicated cluster
>>
>
> Just curious, how did you do this? `remove-brick` of arbiter brick
> followed by an `add-brick` to increase to replica-3?
>
>
Yes


#gluster volume remove-brick engine replica 2 node03:/gluster/data/brick
force *(OK!)*

#gluster volume heal engine info *(no entries!)*

#gluster volume add-brick engine replica 3 node04:/gluster/engine/brick
*(OK!)*

*After some minutes*

[root@node01 ~]#  gluster volume heal engine info
Brick node01:/gluster/engine/brick
Status: Connected
Number of entries: 0

Brick node02:/gluster/engine/brick
Status: Connected
Number of entries: 0

Brick node04:/gluster/engine/brick
Status: Connected
Number of entries: 0



> Thanks,
> Ravi
>

Another extra info (I don't know if this can be the problem): Five days ago
A black out has suddenly shut down the networks switch (also gluster
network) of node 03 and 04 ... But I don't know this problem is in place
after this black out

Thank you!
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-19 Thread Ravishankar N



On 07/19/2017 08:02 PM, Sahina Bose wrote:

[Adding gluster-users]

On Wed, Jul 19, 2017 at 2:52 PM, yayo (j) > wrote:


Hi all,

We have an ovirt cluster hyperconverged with hosted engine on 3
full replicated node . This cluster have 2 gluster volume:

- data: volume for the Data (Master) Domain (For vm)
- engine: volume fro the hosted_storage  Domain (for hosted engine)

We have this problem: "engine" gluster volume have always unsynced
elements and we cant' fix the problem, on command line we have
tried to use the "heal" command but elements remain always
unsynced 

Below the heal command "status":

[root@node01 ~]# gluster volume heal engine info
Brick node01:/gluster/engine/brick
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.48
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.64
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.60
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.2
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.68

/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/19d71267-52a4-42a3-bb1e-e3145361c0c2/7a215635-02f3-47db-80db-8b689c6a8f01

/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/88d41053-a257-4272-9e2e-2f3de0743b81/6573ed08-d3ed-4d12-9227-2c95941e1ad6
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.61
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.1
/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/dom_md/ids
/.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.20
/__DIRECT_IO_TEST__
Status: Connected
Number of entries: 12

Brick node02:/gluster/engine/brick

/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/19d71267-52a4-42a3-bb1e-e3145361c0c2/7a215635-02f3-47db-80db-8b689c6a8f01

/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/dom_md/ids



/__DIRECT_IO_TEST__



/8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/88d41053-a257-4272-9e2e-2f3de0743b81/6573ed08-d3ed-4d12-9227-2c95941e1ad6


Status: Connected
Number of entries: 12

Brick node04:/gluster/engine/brick
Status: Connected
Number of entries: 0


running the "gluster volume heal engine" don't solve the problem...



1. What does the glustershd.log say on all 3 nodes when you run the 
command? Does it complain anything about these files?

2. Are these 12 files also present in the 3rd data brick?
3. Can you provide the output of `gluster volume info` for the this volume?


Some extra info:

We have recently changed the gluster from: 2 (full repliacated) +
1 arbiter to 3 full replicated cluster



Just curious, how did you do this? `remove-brick` of arbiter brick 
followed by an `add-brick` to increase to replica-3?


Thanks,
Ravi


but i don't know this is the problem...

The "data" volume is good and healty and have no unsynced entry.

Ovirt refuse to put the node02 and node01 in "maintenance mode"
and complains about "unsynced elements"

How can I fix this?
Thank you

___
Users mailing list
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-19 Thread Sahina Bose
[Adding gluster-users]

On Wed, Jul 19, 2017 at 2:52 PM, yayo (j)  wrote:

> Hi all,
>
> We have an ovirt cluster hyperconverged with hosted engine on 3 full
> replicated node . This cluster have 2 gluster volume:
>
> - data: volume for the Data (Master) Domain (For vm)
> - engine: volume fro the hosted_storage  Domain (for hosted engine)
>
> We have this problem: "engine" gluster volume have always unsynced
> elements and we cant' fix the problem, on command line we have tried to use
> the "heal" command but elements remain always unsynced 
>
> Below the heal command "status":
>
> [root@node01 ~]# gluster volume heal engine info
> Brick node01:/gluster/engine/brick
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.48
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.64
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.60
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.2
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.68
> /8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/19d71267-
> 52a4-42a3-bb1e-e3145361c0c2/7a215635-02f3-47db-80db-8b689c6a8f01
> /8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/88d41053-
> a257-4272-9e2e-2f3de0743b81/6573ed08-d3ed-4d12-9227-2c95941e1ad6
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.61
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.1
> /8f215dd2-8531-4a4f-b6ed-ea789dd8821b/dom_md/ids
> /.shard/8aa74564-6740-403e-ad51-f56d9ca5d7a7.20
> /__DIRECT_IO_TEST__
> Status: Connected
> Number of entries: 12
>
> Brick node02:/gluster/engine/brick
> /8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/19d71267-
> 52a4-42a3-bb1e-e3145361c0c2/7a215635-02f3-47db-80db-8b689c6a8f01
> 
> /8f215dd2-8531-4a4f-b6ed-ea789dd8821b/dom_md/ids
> 
> 
> 
> /__DIRECT_IO_TEST__
> 
> 
> /8f215dd2-8531-4a4f-b6ed-ea789dd8821b/images/88d41053-
> a257-4272-9e2e-2f3de0743b81/6573ed08-d3ed-4d12-9227-2c95941e1ad6
> 
> 
> Status: Connected
> Number of entries: 12
>
> Brick node04:/gluster/engine/brick
> Status: Connected
> Number of entries: 0
>
>
>
> running the "gluster volume heal engine" don't solve the problem...
>
> Some extra info:
>
> We have recently changed the gluster from: 2 (full repliacated) + 1
> arbiter to 3 full replicated cluster but i don't know this is the problem...
>
> The "data" volume is good and healty and have no unsynced entry.
>
> Ovirt refuse to put the node02 and node01 in "maintenance mode" and
> complains about "unsynced elements"
>
> How can I fix this?
> Thank you
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users