Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-11-05 Thread Ville-Pekka Vainio
Hi!

Bumping an old thread, because there’s now activity around this bug. The github 
issue is https://github.com/gluster/glusterfs/issues/2492
We just hit this bug after an update from GlusterFS 7.x to 9.4. We did not see 
this in our test environment, so we did the update, but the bug is still there. 
Apparently the fix should be https://github.com/gluster/glusterfs/pull/2509 
which should get backported to 9.x.

We worked around this issue by identifying the server with the bug and 
restarting the GlusterFS processes on it. On an EL/CentOS/Fedora-based system 
there was one small thing that surprised me, maybe this will help others.

There’s the service /usr/lib/systemd/system/glusterfsd.service which does not 
really start anything, just runs /bin/true, but when stopped, will kill the 
brick processes on the server. If you try doing “systemctl stop glusterfsd” but 
you have not started the service (even though starting it does nothing), 
systemd will not do anything. If you first start the service and then stop it, 
systemd will actually run the ExecStop command.


Best regards,
Ville-Pekka




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-31 Thread Marco Fais
Srijan

no problem at all -- thanks for your help. If you need any additional
information please let me know.

Regards,
Marco


On Thu, 27 May 2021 at 18:39, Srijan Sivakumar  wrote:

> Hi Marco,
>
> Thank you for opening the issue. I'll check the log contents and get back
> to you.
>
> On Thu, May 27, 2021 at 10:50 PM Marco Fais  wrote:
>
>> Srijan
>>
>> thanks a million -- I have opened the issue as requested here:
>>
>> https://github.com/gluster/glusterfs/issues/2492
>>
>> I have attached the glusterd.log and glustershd.log files, but please let
>> me know if there is any other test I should do or logs I should provide.
>>
>>
>> Thanks,
>> Marco
>>
>>
>> On Wed, 26 May 2021 at 18:09, Srijan Sivakumar 
>> wrote:
>>
>>> Hi Marco,
>>>
>>> If possible, let's open an issue in github and track this from there. I
>>> am checking the previous mails in the chain to see if I can infer something
>>> about the situation. It would be helpful if we could analyze this with the
>>> help of log files. Especially glusterd.log and glustershd.log.
>>>
>>> To open an issue, you can use this link : Open a new issue
>>> 
>>>
>>> On Wed, May 26, 2021 at 5:02 PM Marco Fais  wrote:
>>>
 Ravi,

 thanks a million.
 @Mohit, @Srijan please let me know if you need any additional
 information.

 Thanks,
 Marco


 On Tue, 25 May 2021 at 17:28, Ravishankar N 
 wrote:

> Hi Marco,
> I haven't had any luck yet.  Adding Mohit and Srijan who work in
> glusterd in case they have some inputs.
> -Ravi
>
>
> On Tue, May 25, 2021 at 9:31 PM Marco Fais  wrote:
>
>> Hi Ravi
>>
>> just wondering if you have any further thoughts on this --
>> unfortunately it is something still very much affecting us at the moment.
>> I am trying to understand how to troubleshoot it further but haven't
>> been able to make much progress...
>>
>> Thanks,
>> Marco
>>
>>
>> On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:
>>
>>> Just to complete...
>>>
>>> from the FUSE mount log on server 2 I see the same errors as in
>>> glustershd.log on node 1:
>>>
>>> [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
>>> [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
>>> ready, attempting connect on transport []
>>> [2021-05-20 17:58:34.160586 +] I
>>> [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-11: changing 
>>> port
>>> to 49170 (from 0)
>>> [2021-05-20 17:58:34.160608 +] I
>>> [socket.c:849:__socket_shutdown] 0-VM_Storage_1-client-11: intentional
>>> socket shutdown(20)
>>> [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
>>> [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
>>> Connected, attached to remote volume 
>>> [{conn-name=VM_Storage_1-client-10},
>>> {remote_subvol=/bricks/vm_b3_vol/brick}]
>>> [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
>>> [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: 
>>> Client-quorum is
>>> met
>>> [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
>>> [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
>>> ready, attempting connect on transport []
>>> [2021-05-20 17:58:34.162491 +] I
>>> [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-12: changing 
>>> port
>>> to 49170 (from 0)
>>> [2021-05-20 17:58:34.162507 +] I
>>> [socket.c:849:__socket_shutdown] 0-VM_Storage_1-client-12: intentional
>>> socket shutdown(26)
>>> [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
>>> [client-handshake.c:1128:select_server_supported_programs]
>>> 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x 
>>> v1},
>>> {Num=1298437}, {Version=400}]
>>> [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
>>> [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> failed to set the volume [{errno=2}, {error=No such file or directory}]
>>> [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
>>> [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> failed to get from reply dict [{process-uuid}, {errno=22}, 
>>> {error=Invalid
>>> argument}]
>>> [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
>>> [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> SETVOLUME on remote-host failed [{remote-error=Brick not found}, 
>>> {errno=2},
>>> {error=No such file or directory}]
>>> [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
>>> [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> sending CHILD_CONNECTING event []
>>> [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
>>> [client.c:2229:client_rpc_notify] 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-27 Thread Srijan Sivakumar
Hi Marco,

Thank you for opening the issue. I'll check the log contents and get back
to you.

On Thu, May 27, 2021 at 10:50 PM Marco Fais  wrote:

> Srijan
>
> thanks a million -- I have opened the issue as requested here:
>
> https://github.com/gluster/glusterfs/issues/2492
>
> I have attached the glusterd.log and glustershd.log files, but please let
> me know if there is any other test I should do or logs I should provide.
>
>
> Thanks,
> Marco
>
>
> On Wed, 26 May 2021 at 18:09, Srijan Sivakumar 
> wrote:
>
>> Hi Marco,
>>
>> If possible, let's open an issue in github and track this from there. I
>> am checking the previous mails in the chain to see if I can infer something
>> about the situation. It would be helpful if we could analyze this with the
>> help of log files. Especially glusterd.log and glustershd.log.
>>
>> To open an issue, you can use this link : Open a new issue
>> 
>>
>> On Wed, May 26, 2021 at 5:02 PM Marco Fais  wrote:
>>
>>> Ravi,
>>>
>>> thanks a million.
>>> @Mohit, @Srijan please let me know if you need any additional
>>> information.
>>>
>>> Thanks,
>>> Marco
>>>
>>>
>>> On Tue, 25 May 2021 at 17:28, Ravishankar N 
>>> wrote:
>>>
 Hi Marco,
 I haven't had any luck yet.  Adding Mohit and Srijan who work in
 glusterd in case they have some inputs.
 -Ravi


 On Tue, May 25, 2021 at 9:31 PM Marco Fais  wrote:

> Hi Ravi
>
> just wondering if you have any further thoughts on this --
> unfortunately it is something still very much affecting us at the moment.
> I am trying to understand how to troubleshoot it further but haven't
> been able to make much progress...
>
> Thanks,
> Marco
>
>
> On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:
>
>> Just to complete...
>>
>> from the FUSE mount log on server 2 I see the same errors as in
>> glustershd.log on node 1:
>>
>> [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
>> [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
>> ready, attempting connect on transport []
>> [2021-05-20 17:58:34.160586 +] I
>> [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-11: changing 
>> port
>> to 49170 (from 0)
>> [2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
>> 0-VM_Storage_1-client-11: intentional socket shutdown(20)
>> [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
>> [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
>> Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
>> {remote_subvol=/bricks/vm_b3_vol/brick}]
>> [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
>> [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum 
>> is
>> met
>> [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
>> [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
>> ready, attempting connect on transport []
>> [2021-05-20 17:58:34.162491 +] I
>> [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-12: changing 
>> port
>> to 49170 (from 0)
>> [2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
>> 0-VM_Storage_1-client-12: intentional socket shutdown(26)
>> [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
>> [client-handshake.c:1128:select_server_supported_programs]
>> 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
>> {Num=1298437}, {Version=400}]
>> [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
>> [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> failed to set the volume [{errno=2}, {error=No such file or directory}]
>> [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
>> [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
>> argument}]
>> [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
>> [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> SETVOLUME on remote-host failed [{remote-error=Brick not found}, 
>> {errno=2},
>> {error=No such file or directory}]
>> [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
>> [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> sending CHILD_CONNECTING event []
>> [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
>> [client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
>> from client, process will keep trying to connect glusterd until brick's
>> port is available [{conn-name=VM_Storage_1-client-11}]
>>
>> On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:
>>
>>> HI Ravi,
>>>
>>> thanks again for your help.
>>>
>>> Here is the output of "cat
>>> 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-27 Thread Marco Fais
Srijan

thanks a million -- I have opened the issue as requested here:

https://github.com/gluster/glusterfs/issues/2492

I have attached the glusterd.log and glustershd.log files, but please let
me know if there is any other test I should do or logs I should provide.


Thanks,
Marco


On Wed, 26 May 2021 at 18:09, Srijan Sivakumar  wrote:

> Hi Marco,
>
> If possible, let's open an issue in github and track this from there. I am
> checking the previous mails in the chain to see if I can infer something
> about the situation. It would be helpful if we could analyze this with the
> help of log files. Especially glusterd.log and glustershd.log.
>
> To open an issue, you can use this link : Open a new issue
> 
>
> On Wed, May 26, 2021 at 5:02 PM Marco Fais  wrote:
>
>> Ravi,
>>
>> thanks a million.
>> @Mohit, @Srijan please let me know if you need any additional information.
>>
>> Thanks,
>> Marco
>>
>>
>> On Tue, 25 May 2021 at 17:28, Ravishankar N 
>> wrote:
>>
>>> Hi Marco,
>>> I haven't had any luck yet.  Adding Mohit and Srijan who work in
>>> glusterd in case they have some inputs.
>>> -Ravi
>>>
>>>
>>> On Tue, May 25, 2021 at 9:31 PM Marco Fais  wrote:
>>>
 Hi Ravi

 just wondering if you have any further thoughts on this --
 unfortunately it is something still very much affecting us at the moment.
 I am trying to understand how to troubleshoot it further but haven't
 been able to make much progress...

 Thanks,
 Marco


 On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:

> Just to complete...
>
> from the FUSE mount log on server 2 I see the same errors as in
> glustershd.log on node 1:
>
> [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
> [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
> ready, attempting connect on transport []
> [2021-05-20 17:58:34.160586 +] I
> [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-11: changing 
> port
> to 49170 (from 0)
> [2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
> 0-VM_Storage_1-client-11: intentional socket shutdown(20)
> [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
> [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
> Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
> {remote_subvol=/bricks/vm_b3_vol/brick}]
> [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
> [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum 
> is
> met
> [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
> [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
> ready, attempting connect on transport []
> [2021-05-20 17:58:34.162491 +] I
> [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-12: changing 
> port
> to 49170 (from 0)
> [2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
> 0-VM_Storage_1-client-12: intentional socket shutdown(26)
> [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
> [client-handshake.c:1128:select_server_supported_programs]
> 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
> {Num=1298437}, {Version=400}]
> [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
> [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> failed to set the volume [{errno=2}, {error=No such file or directory}]
> [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
> [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
> argument}]
> [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
> [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> SETVOLUME on remote-host failed [{remote-error=Brick not found}, 
> {errno=2},
> {error=No such file or directory}]
> [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
> [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> sending CHILD_CONNECTING event []
> [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
> [client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
> from client, process will keep trying to connect glusterd until brick's
> port is available [{conn-name=VM_Storage_1-client-11}]
>
> On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:
>
>> HI Ravi,
>>
>> thanks again for your help.
>>
>> Here is the output of "cat
>> graphs/active/VM_Storage_1-client-11/private" from the same node
>> where glustershd is complaining:
>>
>> [xlator.protocol.client.VM_Storage_1-client-11.priv]
>> fd.0.remote_fd = 1
>> -- = --
>> granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-26 Thread Srijan Sivakumar
Hi Marco,

If possible, let's open an issue in github and track this from there. I am
checking the previous mails in the chain to see if I can infer something
about the situation. It would be helpful if we could analyze this with the
help of log files. Especially glusterd.log and glustershd.log.

To open an issue, you can use this link : Open a new issue


On Wed, May 26, 2021 at 5:02 PM Marco Fais  wrote:

> Ravi,
>
> thanks a million.
> @Mohit, @Srijan please let me know if you need any additional information.
>
> Thanks,
> Marco
>
>
> On Tue, 25 May 2021 at 17:28, Ravishankar N 
> wrote:
>
>> Hi Marco,
>> I haven't had any luck yet.  Adding Mohit and Srijan who work in glusterd
>> in case they have some inputs.
>> -Ravi
>>
>>
>> On Tue, May 25, 2021 at 9:31 PM Marco Fais  wrote:
>>
>>> Hi Ravi
>>>
>>> just wondering if you have any further thoughts on this -- unfortunately
>>> it is something still very much affecting us at the moment.
>>> I am trying to understand how to troubleshoot it further but haven't
>>> been able to make much progress...
>>>
>>> Thanks,
>>> Marco
>>>
>>>
>>> On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:
>>>
 Just to complete...

 from the FUSE mount log on server 2 I see the same errors as in
 glustershd.log on node 1:

 [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
 [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
 ready, attempting connect on transport []
 [2021-05-20 17:58:34.160586 +] I
 [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-11: changing port
 to 49170 (from 0)
 [2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
 0-VM_Storage_1-client-11: intentional socket shutdown(20)
 [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
 [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
 Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
 {remote_subvol=/bricks/vm_b3_vol/brick}]
 [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
 [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum is
 met
 [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
 [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
 ready, attempting connect on transport []
 [2021-05-20 17:58:34.162491 +] I
 [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-VM_Storage_1-client-12: changing port
 to 49170 (from 0)
 [2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
 0-VM_Storage_1-client-12: intentional socket shutdown(26)
 [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
 [client-handshake.c:1128:select_server_supported_programs]
 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
 {Num=1298437}, {Version=400}]
 [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
 [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
 failed to set the volume [{errno=2}, {error=No such file or directory}]
 [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
 [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
 failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
 argument}]
 [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
 [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
 SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
 {error=No such file or directory}]
 [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
 [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
 sending CHILD_CONNECTING event []
 [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
 [client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
 from client, process will keep trying to connect glusterd until brick's
 port is available [{conn-name=VM_Storage_1-client-11}]

 On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:

> HI Ravi,
>
> thanks again for your help.
>
> Here is the output of "cat
> graphs/active/VM_Storage_1-client-11/private" from the same node
> where glustershd is complaining:
>
> [xlator.protocol.client.VM_Storage_1-client-11.priv]
> fd.0.remote_fd = 1
> -- = --
> granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = F_SETLK
> fl_type = F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type =
> F_RDLCK, l_start = 100, l_len = 1
> granted-posix-lock[1] = owner = 7904e87d91693fb7, cmd = F_SETLK
> fl_type = F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type =
> F_RDLCK, l_start = 101, l_len = 1
> granted-posix-lock[2] = owner = 7904e87d91693fb7, cmd = F_SETLK
> fl_type = F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type =
> F_RDLCK, l_start = 103, l_len = 1

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-26 Thread Marco Fais
Ravi,

thanks a million.
@Mohit, @Srijan please let me know if you need any additional information.

Thanks,
Marco


On Tue, 25 May 2021 at 17:28, Ravishankar N  wrote:

> Hi Marco,
> I haven't had any luck yet.  Adding Mohit and Srijan who work in glusterd
> in case they have some inputs.
> -Ravi
>
>
> On Tue, May 25, 2021 at 9:31 PM Marco Fais  wrote:
>
>> Hi Ravi
>>
>> just wondering if you have any further thoughts on this -- unfortunately
>> it is something still very much affecting us at the moment.
>> I am trying to understand how to troubleshoot it further but haven't been
>> able to make much progress...
>>
>> Thanks,
>> Marco
>>
>>
>> On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:
>>
>>> Just to complete...
>>>
>>> from the FUSE mount log on server 2 I see the same errors as in
>>> glustershd.log on node 1:
>>>
>>> [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
>>> [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
>>> ready, attempting connect on transport []
>>> [2021-05-20 17:58:34.160586 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
>>> 0-VM_Storage_1-client-11: changing port to 49170 (from 0)
>>> [2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
>>> 0-VM_Storage_1-client-11: intentional socket shutdown(20)
>>> [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
>>> [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
>>> Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
>>> {remote_subvol=/bricks/vm_b3_vol/brick}]
>>> [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
>>> [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum is
>>> met
>>> [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
>>> [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
>>> ready, attempting connect on transport []
>>> [2021-05-20 17:58:34.162491 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
>>> 0-VM_Storage_1-client-12: changing port to 49170 (from 0)
>>> [2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
>>> 0-VM_Storage_1-client-12: intentional socket shutdown(26)
>>> [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
>>> [client-handshake.c:1128:select_server_supported_programs]
>>> 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
>>> {Num=1298437}, {Version=400}]
>>> [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
>>> [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> failed to set the volume [{errno=2}, {error=No such file or directory}]
>>> [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
>>> [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
>>> argument}]
>>> [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
>>> [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
>>> {error=No such file or directory}]
>>> [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
>>> [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>>> sending CHILD_CONNECTING event []
>>> [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
>>> [client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
>>> from client, process will keep trying to connect glusterd until brick's
>>> port is available [{conn-name=VM_Storage_1-client-11}]
>>>
>>> On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:
>>>
 HI Ravi,

 thanks again for your help.

 Here is the output of "cat
 graphs/active/VM_Storage_1-client-11/private" from the same node
 where glustershd is complaining:

 [xlator.protocol.client.VM_Storage_1-client-11.priv]
 fd.0.remote_fd = 1
 -- = --
 granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
 = F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
 l_start = 100, l_len = 1
 granted-posix-lock[1] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
 = F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
 l_start = 101, l_len = 1
 granted-posix-lock[2] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
 = F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type = F_RDLCK,
 l_start = 103, l_len = 1
 granted-posix-lock[3] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
 = F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
 l_start = 201, l_len = 1
 granted-posix-lock[4] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
 = F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
 l_start = 203, l_len = 1
 -- = --
 fd.1.remote_fd = 0
 -- = --
 granted-posix-lock[0] = owner = b43238094746d9fe, cmd = F_SETLK fl_type
 = F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-25 Thread Ravishankar N
Hi Marco,
I haven't had any luck yet.  Adding Mohit and Srijan who work in glusterd
in case they have some inputs.
-Ravi


On Tue, May 25, 2021 at 9:31 PM Marco Fais  wrote:

> Hi Ravi
>
> just wondering if you have any further thoughts on this -- unfortunately
> it is something still very much affecting us at the moment.
> I am trying to understand how to troubleshoot it further but haven't been
> able to make much progress...
>
> Thanks,
> Marco
>
>
> On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:
>
>> Just to complete...
>>
>> from the FUSE mount log on server 2 I see the same errors as in
>> glustershd.log on node 1:
>>
>> [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
>> [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
>> ready, attempting connect on transport []
>> [2021-05-20 17:58:34.160586 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
>> 0-VM_Storage_1-client-11: changing port to 49170 (from 0)
>> [2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
>> 0-VM_Storage_1-client-11: intentional socket shutdown(20)
>> [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
>> [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
>> Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
>> {remote_subvol=/bricks/vm_b3_vol/brick}]
>> [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
>> [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum is
>> met
>> [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
>> [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
>> ready, attempting connect on transport []
>> [2021-05-20 17:58:34.162491 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
>> 0-VM_Storage_1-client-12: changing port to 49170 (from 0)
>> [2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
>> 0-VM_Storage_1-client-12: intentional socket shutdown(26)
>> [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
>> [client-handshake.c:1128:select_server_supported_programs]
>> 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
>> {Num=1298437}, {Version=400}]
>> [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
>> [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> failed to set the volume [{errno=2}, {error=No such file or directory}]
>> [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
>> [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
>> argument}]
>> [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
>> [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
>> {error=No such file or directory}]
>> [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
>> [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
>> sending CHILD_CONNECTING event []
>> [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
>> [client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
>> from client, process will keep trying to connect glusterd until brick's
>> port is available [{conn-name=VM_Storage_1-client-11}]
>>
>> On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:
>>
>>> HI Ravi,
>>>
>>> thanks again for your help.
>>>
>>> Here is the output of "cat graphs/active/VM_Storage_1-client-11/private"
>>> from the same node where glustershd is complaining:
>>>
>>> [xlator.protocol.client.VM_Storage_1-client-11.priv]
>>> fd.0.remote_fd = 1
>>> -- = --
>>> granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
>>> l_start = 100, l_len = 1
>>> granted-posix-lock[1] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
>>> l_start = 101, l_len = 1
>>> granted-posix-lock[2] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type = F_RDLCK,
>>> l_start = 103, l_len = 1
>>> granted-posix-lock[3] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
>>> l_start = 201, l_len = 1
>>> granted-posix-lock[4] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
>>> l_start = 203, l_len = 1
>>> -- = --
>>> fd.1.remote_fd = 0
>>> -- = --
>>> granted-posix-lock[0] = owner = b43238094746d9fe, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
>>> l_start = 100, l_len = 1
>>> granted-posix-lock[1] = owner = b43238094746d9fe, cmd = F_SETLK fl_type
>>> = F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
>>> l_start = 201, l_len = 1
>>> granted-posix-lock[2] = owner = b43238094746d9fe, cmd = F_SETLK 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-25 Thread Marco Fais
Hi Ravi

just wondering if you have any further thoughts on this -- unfortunately it
is something still very much affecting us at the moment.
I am trying to understand how to troubleshoot it further but haven't been
able to make much progress...

Thanks,
Marco


On Thu, 20 May 2021 at 19:04, Marco Fais  wrote:

> Just to complete...
>
> from the FUSE mount log on server 2 I see the same errors as in
> glustershd.log on node 1:
>
> [2021-05-20 17:58:34.157971 +] I [MSGID: 114020]
> [client.c:2319:notify] 0-VM_Storage_1-client-11: parent translators are
> ready, attempting connect on transport []
> [2021-05-20 17:58:34.160586 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
> 0-VM_Storage_1-client-11: changing port to 49170 (from 0)
> [2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
> 0-VM_Storage_1-client-11: intentional socket shutdown(20)
> [2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
> [client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
> Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
> {remote_subvol=/bricks/vm_b3_vol/brick}]
> [2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
> [afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum is
> met
> [2021-05-20 17:58:34.162043 +] I [MSGID: 114020]
> [client.c:2319:notify] 0-VM_Storage_1-client-13: parent translators are
> ready, attempting connect on transport []
> [2021-05-20 17:58:34.162491 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
> 0-VM_Storage_1-client-12: changing port to 49170 (from 0)
> [2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
> 0-VM_Storage_1-client-12: intentional socket shutdown(26)
> [2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
> [client-handshake.c:1128:select_server_supported_programs]
> 0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
> {Num=1298437}, {Version=400}]
> [2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
> [client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> failed to set the volume [{errno=2}, {error=No such file or directory}]
> [2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
> [client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
> argument}]
> [2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
> [client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
> {error=No such file or directory}]
> [2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
> [client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
> sending CHILD_CONNECTING event []
> [2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
> [client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
> from client, process will keep trying to connect glusterd until brick's
> port is available [{conn-name=VM_Storage_1-client-11}]
>
> On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:
>
>> HI Ravi,
>>
>> thanks again for your help.
>>
>> Here is the output of "cat graphs/active/VM_Storage_1-client-11/private"
>> from the same node where glustershd is complaining:
>>
>> [xlator.protocol.client.VM_Storage_1-client-11.priv]
>> fd.0.remote_fd = 1
>> -- = --
>> granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
>> l_start = 100, l_len = 1
>> granted-posix-lock[1] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
>> l_start = 101, l_len = 1
>> granted-posix-lock[2] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type = F_RDLCK,
>> l_start = 103, l_len = 1
>> granted-posix-lock[3] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
>> l_start = 201, l_len = 1
>> granted-posix-lock[4] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
>> l_start = 203, l_len = 1
>> -- = --
>> fd.1.remote_fd = 0
>> -- = --
>> granted-posix-lock[0] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
>> l_start = 100, l_len = 1
>> granted-posix-lock[1] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
>> l_start = 201, l_len = 1
>> granted-posix-lock[2] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
>> l_start = 203, l_len = 1
>> -- = --
>> fd.2.remote_fd = 3
>> -- = --
>> granted-posix-lock[0] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
>> F_RDLCK, fl_start = 100, fl_end 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-20 Thread Marco Fais
Just to complete...

from the FUSE mount log on server 2 I see the same errors as in
glustershd.log on node 1:

[2021-05-20 17:58:34.157971 +] I [MSGID: 114020] [client.c:2319:notify]
0-VM_Storage_1-client-11: parent translators are ready, attempting connect
on transport []
[2021-05-20 17:58:34.160586 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
0-VM_Storage_1-client-11: changing port to 49170 (from 0)
[2021-05-20 17:58:34.160608 +] I [socket.c:849:__socket_shutdown]
0-VM_Storage_1-client-11: intentional socket shutdown(20)
[2021-05-20 17:58:34.161403 +] I [MSGID: 114046]
[client-handshake.c:857:client_setvolume_cbk] 0-VM_Storage_1-client-10:
Connected, attached to remote volume [{conn-name=VM_Storage_1-client-10},
{remote_subvol=/bricks/vm_b3_vol/brick}]
[2021-05-20 17:58:34.161513 +] I [MSGID: 108002]
[afr-common.c:6435:afr_notify] 0-VM_Storage_1-replicate-3: Client-quorum is
met
[2021-05-20 17:58:34.162043 +] I [MSGID: 114020] [client.c:2319:notify]
0-VM_Storage_1-client-13: parent translators are ready, attempting connect
on transport []
[2021-05-20 17:58:34.162491 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
0-VM_Storage_1-client-12: changing port to 49170 (from 0)
[2021-05-20 17:58:34.162507 +] I [socket.c:849:__socket_shutdown]
0-VM_Storage_1-client-12: intentional socket shutdown(26)
[2021-05-20 17:58:34.163076 +] I [MSGID: 114057]
[client-handshake.c:1128:select_server_supported_programs]
0-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
{Num=1298437}, {Version=400}]
[2021-05-20 17:58:34.163339 +] W [MSGID: 114043]
[client-handshake.c:727:client_setvolume_cbk] 0-VM_Storage_1-client-11:
failed to set the volume [{errno=2}, {error=No such file or directory}]
[2021-05-20 17:58:34.163351 +] W [MSGID: 114007]
[client-handshake.c:752:client_setvolume_cbk] 0-VM_Storage_1-client-11:
failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
argument}]
[2021-05-20 17:58:34.163360 +] E [MSGID: 114044]
[client-handshake.c:757:client_setvolume_cbk] 0-VM_Storage_1-client-11:
SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
{error=No such file or directory}]
[2021-05-20 17:58:34.163365 +] I [MSGID: 114051]
[client-handshake.c:879:client_setvolume_cbk] 0-VM_Storage_1-client-11:
sending CHILD_CONNECTING event []
[2021-05-20 17:58:34.163425 +] I [MSGID: 114018]
[client.c:2229:client_rpc_notify] 0-VM_Storage_1-client-11: disconnected
from client, process will keep trying to connect glusterd until brick's
port is available [{conn-name=VM_Storage_1-client-11}]

On Thu, 20 May 2021 at 18:54, Marco Fais  wrote:

> HI Ravi,
>
> thanks again for your help.
>
> Here is the output of "cat graphs/active/VM_Storage_1-client-11/private"
> from the same node where glustershd is complaining:
>
> [xlator.protocol.client.VM_Storage_1-client-11.priv]
> fd.0.remote_fd = 1
> -- = --
> granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
> l_start = 100, l_len = 1
> granted-posix-lock[1] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
> l_start = 101, l_len = 1
> granted-posix-lock[2] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type = F_RDLCK,
> l_start = 103, l_len = 1
> granted-posix-lock[3] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
> l_start = 201, l_len = 1
> granted-posix-lock[4] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
> l_start = 203, l_len = 1
> -- = --
> fd.1.remote_fd = 0
> -- = --
> granted-posix-lock[0] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
> l_start = 100, l_len = 1
> granted-posix-lock[1] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
> l_start = 201, l_len = 1
> granted-posix-lock[2] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
> l_start = 203, l_len = 1
> -- = --
> fd.2.remote_fd = 3
> -- = --
> granted-posix-lock[0] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
> l_start = 100, l_len = 1
> granted-posix-lock[1] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
> l_start = 201, l_len = 1
> granted-posix-lock[2] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
> F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
> l_start = 203, l_len = 1
> -- = --
> fd.3.remote_fd = 2
> 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-20 Thread Marco Fais
HI Ravi,

thanks again for your help.

Here is the output of "cat graphs/active/VM_Storage_1-client-11/private"
from the same node where glustershd is complaining:

[xlator.protocol.client.VM_Storage_1-client-11.priv]
fd.0.remote_fd = 1
-- = --
granted-posix-lock[0] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
l_start = 100, l_len = 1
granted-posix-lock[1] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
l_start = 101, l_len = 1
granted-posix-lock[2] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type = F_RDLCK,
l_start = 103, l_len = 1
granted-posix-lock[3] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
l_start = 201, l_len = 1
granted-posix-lock[4] = owner = 7904e87d91693fb7, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
l_start = 203, l_len = 1
-- = --
fd.1.remote_fd = 0
-- = --
granted-posix-lock[0] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
l_start = 100, l_len = 1
granted-posix-lock[1] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
l_start = 201, l_len = 1
granted-posix-lock[2] = owner = b43238094746d9fe, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
l_start = 203, l_len = 1
-- = --
fd.2.remote_fd = 3
-- = --
granted-posix-lock[0] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
l_start = 100, l_len = 1
granted-posix-lock[1] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
l_start = 201, l_len = 1
granted-posix-lock[2] = owner = 53526588c515153b, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
l_start = 203, l_len = 1
-- = --
fd.3.remote_fd = 2
-- = --
granted-posix-lock[0] = owner = 889461581e4fda22, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
l_start = 100, l_len = 1
granted-posix-lock[1] = owner = 889461581e4fda22, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
l_start = 101, l_len = 1
granted-posix-lock[2] = owner = 889461581e4fda22, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 103, fl_end = 103, user_flock: l_type = F_RDLCK,
l_start = 103, l_len = 1
granted-posix-lock[3] = owner = 889461581e4fda22, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 201, fl_end = 201, user_flock: l_type = F_RDLCK,
l_start = 201, l_len = 1
granted-posix-lock[4] = owner = 889461581e4fda22, cmd = F_SETLK fl_type =
F_RDLCK, fl_start = 203, fl_end = 203, user_flock: l_type = F_RDLCK,
l_start = 203, l_len = 1
-- = --
connected = 1
total_bytes_read = 6665235356
ping_timeout = 42
total_bytes_written = 4756303549
ping_msgs_sent = 3662
msgs_sent = 16786186

So they seem to be connected there.
*However* -- they are not connected apparently in server 2 (where I have
just re-mounted the volume):
[root@lab-cnvirt-h02 .meta]# cat
graphs/active/VM_Storage_1-client-11/private
[xlator.protocol.client.VM_Storage_1-client-11.priv]
*connected = 0*
total_bytes_read = 50020
ping_timeout = 42
total_bytes_written = 84628
ping_msgs_sent = 0
msgs_sent = 0
[root@lab-cnvirt-h02 .meta]# cat
graphs/active/VM_Storage_1-client-20/private
[xlator.protocol.client.VM_Storage_1-client-20.priv]
*connected = 0*
total_bytes_read = 53300
ping_timeout = 42
total_bytes_written = 90180
ping_msgs_sent = 0
msgs_sent = 0

The other bricks look connected...

Regards,
Marco


On Thu, 20 May 2021 at 14:02, Ravishankar N  wrote:

> Hi Marco,
>
> On Wed, May 19, 2021 at 8:02 PM Marco Fais  wrote:
>
>> Hi Ravi,
>>
>> thanks a million for your reply.
>>
>> I have replicated the issue in my test cluster by bringing one of the
>> nodes down, and then up again.
>> The glustershd process in the restarted node is now complaining about
>> connectivity to two bricks in one of my volumes:
>>
>> ---
>> [2021-05-19 14:05:14.462133 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
>> 2-VM_Storage_1-client-11: changing port to 49170 (from 0)
>> [2021-05-19 14:05:14.464971 +] I [MSGID: 114057]
>> [client-handshake.c:1128:select_server_supported_programs]
>> 2-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
>> {Num=1298437}, {Version=400}]
>> [2021-05-19 14:05:14.465209 +] W [MSGID: 114043]
>> [client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-11:
>> failed to set the volume [{errno=2}, {error=No such file or directory}]
>> [2021-05-19 14:05:14.465236 +] W [MSGID: 114007]
>> 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-20 Thread Ravishankar N
Hi Marco,

On Wed, May 19, 2021 at 8:02 PM Marco Fais  wrote:

> Hi Ravi,
>
> thanks a million for your reply.
>
> I have replicated the issue in my test cluster by bringing one of the
> nodes down, and then up again.
> The glustershd process in the restarted node is now complaining about
> connectivity to two bricks in one of my volumes:
>
> ---
> [2021-05-19 14:05:14.462133 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
> 2-VM_Storage_1-client-11: changing port to 49170 (from 0)
> [2021-05-19 14:05:14.464971 +] I [MSGID: 114057]
> [client-handshake.c:1128:select_server_supported_programs]
> 2-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
> {Num=1298437}, {Version=400}]
> [2021-05-19 14:05:14.465209 +] W [MSGID: 114043]
> [client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-11:
> failed to set the volume [{errno=2}, {error=No such file or directory}]
> [2021-05-19 14:05:14.465236 +] W [MSGID: 114007]
> [client-handshake.c:752:client_setvolume_cbk] 2-VM_Storage_1-client-11:
> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
> argument}]
> [2021-05-19 14:05:14.465248 +] E [MSGID: 114044]
> [client-handshake.c:757:client_setvolume_cbk] 2-VM_Storage_1-client-11:
> SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
> {error=No such file or directory}]
> [2021-05-19 14:05:14.465256 +] I [MSGID: 114051]
> [client-handshake.c:879:client_setvolume_cbk] 2-VM_Storage_1-client-11:
> sending CHILD_CONNECTING event []
> [2021-05-19 14:05:14.465291 +] I [MSGID: 114018]
> [client.c:2229:client_rpc_notify] 2-VM_Storage_1-client-11: disconnected
> from client, process will keep trying to connect glusterd until brick's
> port is available [{conn-name=VM_Storage_1-client-11}]
> [2021-05-19 14:05:14.473598 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
> 2-VM_Storage_1-client-20: changing port to 49173 (from 0)
>

The above logs indicate that shd is trying to connect to the bricks on
ports 49170 and 49173 respectively, when it should have done so using 49172
and 49169 (as per the volume status and ps output). Shd gets the brick port
numbers info from glusterd, so I'm not sure what is going on here.  Do you
have fuse mounts on this particular node?  If you don't, you can mount it
temporarily, then check if the connection to the bricks is successful from
the .meta folder of the mount:

cd /path-to-fuse-mount
cd .meta
cat graphs/active/VM_Storage_1-client-11/private
cat graphs/active/VM_Storage_1-client-20/private
etc. and check if connected=1 or 0.

I just wanted to see if it is only the shd or even the other clients are
unable to connect to the bricks from this node. FWIW, I tried upgrading
from 7.9 to 8.4 on a test machine and the shd was able to connect to the
bricks just fine.
Regards,
Ravi




> [2021-05-19 14:05:14.476543 +] I [MSGID: 114057]
> [client-handshake.c:1128:select_server_supported_programs]
> 2-VM_Storage_1-client-20: Using Program [{Program-name=GlusterFS 4.x v1},
> {Num=1298437}, {Version=400}]
> [2021-05-19 14:05:14.476764 +] W [MSGID: 114043]
> [client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-20:
> failed to set the volume [{errno=2}, {error=No such file or directory}]
> [2021-05-19 14:05:14.476785 +] W [MSGID: 114007]
> [client-handshake.c:752:client_setvolume_cbk] 2-VM_Storage_1-client-20:
> failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
> argument}]
> [2021-05-19 14:05:14.476799 +] E [MSGID: 114044]
> [client-handshake.c:757:client_setvolume_cbk] 2-VM_Storage_1-client-20:
> SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
> {error=No such file or directory}]
> [2021-05-19 14:05:14.476812 +] I [MSGID: 114051]
> [client-handshake.c:879:client_setvolume_cbk] 2-VM_Storage_1-client-20:
> sending CHILD_CONNECTING event []
> [2021-05-19 14:05:14.476849 +] I [MSGID: 114018]
> [client.c:2229:client_rpc_notify] 2-VM_Storage_1-client-20: disconnected
> from client, process will keep trying to connect glusterd until brick's
> port is available [{conn-name=VM_Storage_1-client-20}]
> ---
>
> The two bricks are the following:
> VM_Storage_1-client-20 --> Brick21:
> lab-cnvirt-h03-storage:/bricks/vm_b5_arb/brick (arbiter)
> VM_Storage_1-client-11 --> Brick12:
> lab-cnvirt-h03-storage:/bricks/vm_b3_arb/brick (arbiter)
> (In this case it the issue is on two arbiter nodes, but it is not always
> the case)
>
> The port information via "gluster volume status VM_Storage_1" on the
> affected node (same as the one running the glustershd reporting the issue)
> is:
> Brick lab-cnvirt-h03-storage:/bricks/vm_b5_arb/brick
> *49172 *0  Y   3978256
> Brick lab-cnvirt-h03-storage:/bricks/vm_b3_arb/brick
> *49169 *0  Y   3978224
>
> This is aligned to the actual port of the process:
> root 3978256  1.5  0.0 1999568 30372 ?   Ssl  May18  15:56
> /usr/sbin/glusterfsd -s 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-19 Thread Marco Fais
Hi Strahil

thanks for your reply. Just to better explain my setup, while I am using
the same nodes for oVirt and Gluster I do manage the two independently (so
Gluster is not managed by oVirt).

See below for the output you have requested:
*gluster pool list*
UUIDHostnameState
a2a62dd6-49b2-4eb6-a7e2-59c75723f5c7ovirt-node3-storage Connected
83f24b13-eaad-4443-9dc3-0152b74385f4ovirt-node2-storage Connected
acb80b35-d6ac-4085-87cd-ba69ff3f81e6localhost   Connected

For simplicity I will only send the output of one of the affected volumes
for the info command:

Volume Name: VM_Storage_1
Type: Distributed-Replicate
Volume ID: 1a4e23db-1c98-4d89-b888-b4ae2e0ad5fc
Status: Started
Snapshot Count: 0
Number of Bricks: 9 x (2 + 1) = 27
Transport-type: tcp
Bricks:
Brick1: lab-cnvirt-h01-storage:/bricks/vm_b1_vol/brick
Brick2: lab-cnvirt-h02-storage:/bricks/vm_b1_vol/brick
Brick3: lab-cnvirt-h03-storage:/bricks/vm_b1_arb/brick (arbiter)
Brick4: lab-cnvirt-h03-storage:/bricks/vm_b1_vol/brick
Brick5: lab-cnvirt-h01-storage:/bricks/vm_b2_vol/brick
Brick6: lab-cnvirt-h02-storage:/bricks/vm_b2_arb/brick (arbiter)
Brick7: lab-cnvirt-h02-storage:/bricks/vm_b2_vol/brick
Brick8: lab-cnvirt-h03-storage:/bricks/vm_b2_vol/brick
Brick9: lab-cnvirt-h01-storage:/bricks/vm_b1_arb/brick (arbiter)
Brick10: lab-cnvirt-h01-storage:/bricks/vm_b3_vol/brick
Brick11: lab-cnvirt-h02-storage:/bricks/vm_b3_vol/brick
Brick12: lab-cnvirt-h03-storage:/bricks/vm_b3_arb/brick (arbiter)
Brick13: lab-cnvirt-h03-storage:/bricks/vm_b3_vol/brick
Brick14: lab-cnvirt-h01-storage:/bricks/vm_b4_vol/brick
Brick15: lab-cnvirt-h02-storage:/bricks/vm_b4_arb/brick (arbiter)
Brick16: lab-cnvirt-h02-storage:/bricks/vm_b4_vol/brick
Brick17: lab-cnvirt-h03-storage:/bricks/vm_b4_vol/brick
Brick18: lab-cnvirt-h01-storage:/bricks/vm_b3_arb/brick (arbiter)
Brick19: lab-cnvirt-h01-storage:/bricks/vm_b5_vol/brick
Brick20: lab-cnvirt-h02-storage:/bricks/vm_b5_vol/brick
Brick21: lab-cnvirt-h03-storage:/bricks/vm_b5_arb/brick (arbiter)
Brick22: lab-cnvirt-h03-storage:/bricks/vm_b5_vol/brick
Brick23: lab-cnvirt-h01-storage:/bricks/vm_b6_vol/brick
Brick24: lab-cnvirt-h02-storage:/bricks/vm_b6_arb/brick (arbiter)
Brick25: lab-cnvirt-h02-storage:/bricks/vm_b6_vol/brick
Brick26: lab-cnvirt-h03-storage:/bricks/vm_b6_vol/brick
Brick27: lab-cnvirt-h01-storage:/bricks/vm_b5_arb/brick (arbiter)
Options Reconfigured:
cluster.self-heal-daemon: enable
storage.owner-uid: 36
storage.owner-gid: 36
performance.strict-o-direct: on
cluster.read-hash-mode: 3
performance.client-io-threads: on
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.fips-mode-rchecksum: on
nfs.disable: on
transport.address-family: inet

I have also tried to set cluster.server-quorum-type: none but with no
difference.

Thanks
Marco


On Wed, 19 May 2021 at 07:48, Strahil Nikolov  wrote:

> I think that we also have to take a look of the quorum settings.
> Usually oVirt adds hosts as part of the TSP even if they got no bricks
> involved in the volume.
>
> Can you provide the output of:
>
> 'gluster pool list'
> 'gluster volume info all'
>
> Best Regards,
> Strahil Nikolov
>
> On Wed, May 19, 2021 at 8:31, Ravishankar N
>  wrote:
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-19 Thread Marco Fais
Hi Ravi,

thanks a million for your reply.

I have replicated the issue in my test cluster by bringing one of the nodes
down, and then up again.
The glustershd process in the restarted node is now complaining about
connectivity to two bricks in one of my volumes:

---
[2021-05-19 14:05:14.462133 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
2-VM_Storage_1-client-11: changing port to 49170 (from 0)
[2021-05-19 14:05:14.464971 +] I [MSGID: 114057]
[client-handshake.c:1128:select_server_supported_programs]
2-VM_Storage_1-client-11: Using Program [{Program-name=GlusterFS 4.x v1},
{Num=1298437}, {Version=400}]
[2021-05-19 14:05:14.465209 +] W [MSGID: 114043]
[client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-11:
failed to set the volume [{errno=2}, {error=No such file or directory}]
[2021-05-19 14:05:14.465236 +] W [MSGID: 114007]
[client-handshake.c:752:client_setvolume_cbk] 2-VM_Storage_1-client-11:
failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
argument}]
[2021-05-19 14:05:14.465248 +] E [MSGID: 114044]
[client-handshake.c:757:client_setvolume_cbk] 2-VM_Storage_1-client-11:
SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
{error=No such file or directory}]
[2021-05-19 14:05:14.465256 +] I [MSGID: 114051]
[client-handshake.c:879:client_setvolume_cbk] 2-VM_Storage_1-client-11:
sending CHILD_CONNECTING event []
[2021-05-19 14:05:14.465291 +] I [MSGID: 114018]
[client.c:2229:client_rpc_notify] 2-VM_Storage_1-client-11: disconnected
from client, process will keep trying to connect glusterd until brick's
port is available [{conn-name=VM_Storage_1-client-11}]
[2021-05-19 14:05:14.473598 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
2-VM_Storage_1-client-20: changing port to 49173 (from 0)
[2021-05-19 14:05:14.476543 +] I [MSGID: 114057]
[client-handshake.c:1128:select_server_supported_programs]
2-VM_Storage_1-client-20: Using Program [{Program-name=GlusterFS 4.x v1},
{Num=1298437}, {Version=400}]
[2021-05-19 14:05:14.476764 +] W [MSGID: 114043]
[client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-20:
failed to set the volume [{errno=2}, {error=No such file or directory}]
[2021-05-19 14:05:14.476785 +] W [MSGID: 114007]
[client-handshake.c:752:client_setvolume_cbk] 2-VM_Storage_1-client-20:
failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
argument}]
[2021-05-19 14:05:14.476799 +] E [MSGID: 114044]
[client-handshake.c:757:client_setvolume_cbk] 2-VM_Storage_1-client-20:
SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
{error=No such file or directory}]
[2021-05-19 14:05:14.476812 +] I [MSGID: 114051]
[client-handshake.c:879:client_setvolume_cbk] 2-VM_Storage_1-client-20:
sending CHILD_CONNECTING event []
[2021-05-19 14:05:14.476849 +] I [MSGID: 114018]
[client.c:2229:client_rpc_notify] 2-VM_Storage_1-client-20: disconnected
from client, process will keep trying to connect glusterd until brick's
port is available [{conn-name=VM_Storage_1-client-20}]
---

The two bricks are the following:
VM_Storage_1-client-20 --> Brick21:
lab-cnvirt-h03-storage:/bricks/vm_b5_arb/brick (arbiter)
VM_Storage_1-client-11 --> Brick12:
lab-cnvirt-h03-storage:/bricks/vm_b3_arb/brick (arbiter)
(In this case it the issue is on two arbiter nodes, but it is not always
the case)

The port information via "gluster volume status VM_Storage_1" on the
affected node (same as the one running the glustershd reporting the issue)
is:
Brick lab-cnvirt-h03-storage:/bricks/vm_b5_arb/brick
*49172 *0  Y   3978256
Brick lab-cnvirt-h03-storage:/bricks/vm_b3_arb/brick
*49169 *0  Y   3978224

This is aligned to the actual port of the process:
root 3978256  1.5  0.0 1999568 30372 ?   Ssl  May18  15:56
/usr/sbin/glusterfsd -s lab-cnvirt-h03-storage --volfile-id
VM_Storage_1.lab-cnvirt-h03-storage.bricks-vm_b5_arb-brick -p
/var/run/gluster/vols/VM_Storage_1/lab-cnvirt-h03-storage-bricks-vm_b5_arb-brick.pid
-S /var/run/gluster/2b1dd3ca06d39a59.socket --brick-name
/bricks/vm_b5_arb/brick -l
/var/log/glusterfs/bricks/bricks-vm_b5_arb-brick.log --xlator-option
*-posix.glusterd-uuid=a2a62dd6-49b2-4eb6-a7e2-59c75723f5c7 --process-name
brick --brick-port *49172 *--xlator-option VM_Storage_1-server.listen-port=
*49172*
root 3978224  4.3  0.0 1867976 27928 ?   Ssl  May18  44:55
/usr/sbin/glusterfsd -s lab-cnvirt-h03-storage --volfile-id
VM_Storage_1.lab-cnvirt-h03-storage.bricks-vm_b3_arb-brick -p
/var/run/gluster/vols/VM_Storage_1/lab-cnvirt-h03-storage-bricks-vm_b3_arb-brick.pid
-S /var/run/gluster/00d461b7d79badc9.socket --brick-name
/bricks/vm_b3_arb/brick -l
/var/log/glusterfs/bricks/bricks-vm_b3_arb-brick.log --xlator-option
*-posix.glusterd-uuid=a2a62dd6-49b2-4eb6-a7e2-59c75723f5c7 --process-name
brick --brick-port *49169 *--xlator-option VM_Storage_1-server.listen-port=
*49169*

So the issue seems to be specifically 

Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-19 Thread Strahil Nikolov
 I think that we also have to take a look of the quorum settings.Usually oVirt 
adds hosts as part of the TSP even if they got no bricks involved in the volume.
Can you provide the output of:
'gluster pool list''gluster volume info all'
Best Regards,Strahil Nikolov
 
  On Wed, May 19, 2021 at 8:31, Ravishankar N wrote:   




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-18 Thread Ravishankar N
On Mon, May 17, 2021 at 4:22 PM Marco Fais  wrote:

> Hi,
>
> I am having significant issues with glustershd with releases 8.4 and 9.1.
>
> My oVirt clusters are using gluster storage backends, and were running
> fine with Gluster 7.x (shipped with earlier versions of oVirt Node 4.4.x).
> Recently the oVirt project moved to Gluster 8.4 for the nodes, and hence I
> have moved to this release when upgrading my clusters.
>
> Since then I am having issues whenever one of the nodes is brought down;
> when the nodes come back up online the bricks are typically back up and
> working, but some (random) glustershd processes in the various nodes seem
> to have issues connecting to some of them.
>
>
When the issue happens, can you check if the TCP port number of the brick
(glusterfsd) processes displayed in `gluster volume status` matches with
that of the actual port numbers observed (i.e. the --brick-port argument)
when you run `ps aux | grep glusterfsd` ? If they don't match, then
glusterd has incorrect brick port information in its memory and serving it
to glustershd. Restarting glusterd instead of (killing the bricks + `volume
start force`) should fix it, although we need to find why glusterd serves
incorrect port numbers.

If they do match, then can you take a statedump of glustershd to check that
it is indeed disconnected from the bricks? You will need to verify that
'connected=1' in the statedump. See "Self-heal is stuck/ not getting
completed." section in
https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-afr/.
Statedump can be taken by `kill -SIGUSR1 $pid-of-glustershd`. It will be
generated in the /var/run/gluster/ directory.

Regards,
Ravi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-17 Thread Marco Fais
Hi,

I am having significant issues with glustershd with releases 8.4 and 9.1.

My oVirt clusters are using gluster storage backends, and were running fine
with Gluster 7.x (shipped with earlier versions of oVirt Node 4.4.x).
Recently the oVirt project moved to Gluster 8.4 for the nodes, and hence I
have moved to this release when upgrading my clusters.

Since then I am having issues whenever one of the nodes is brought down;
when the nodes come back up online the bricks are typically back up and
working, but some (random) glustershd processes in the various nodes seem
to have issues connecting to some of them.

Typically when this happens the files are not getting healed

VM_Storage_1
Distributed_replicate  Started (UP) - 27/27 Bricks Up
   Capacity: (27.10% used) 2.00 TiB/8.00
TiB (used/total)
   Self-Heal:

lab-cnvirt-h01-storage:/bricks/vm_b1_vol/brick (8 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b1_vol/brick (8 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b2_vol/brick (4 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b2_arb/brick (4 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b2_vol/brick (5 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b1_arb/brick (5 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b3_vol/brick (9 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b3_vol/brick (9 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b4_vol/brick (4 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b4_arb/brick (4 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b4_vol/brick (10 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b3_arb/brick (10 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b5_vol/brick (3 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b5_vol/brick (3 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b6_vol/brick (4 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b6_arb/brick (4 File(s) to heal).

lab-cnvirt-h02-storage:/bricks/vm_b6_vol/brick (9 File(s) to heal).

lab-cnvirt-h01-storage:/bricks/vm_b5_arb/brick (9 File(s) to heal).

(They will never heal; the number of files to heal however changes).

In the glustershd.log files, I can see the following continuously:
[2021-05-17 10:27:30.531561 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
2-VM_Storage_1-client-3: changing port to 49154 (from 0)
[2021-05-17 10:27:30.533709 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
2-VM_Storage_1-client-7: changing port to 49155 (from 0)
[2021-05-17 10:27:30.534211 +] I [MSGID: 114057]
[client-handshake.c:1128:select_server_supported_programs]
2-VM_Storage_1-client-3: Using Program [{Program-name=GlusterFS 4.x v1},
{Num=1298437}, {Version=400}]
[2021-05-17 10:27:30.534514 +] W [MSGID: 114043]
[client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-3:
failed to set the volume [{errno=2}, {error=No such file or directory}]
The message "I [MSGID: 114018] [client.c:2229:client_rpc_notify]
2-VM_Storage_1-client-3: disconnected from client, process will keep trying
to connect glusterd until brick's port is available
[{conn-name=VM_Storage_1-client-3}]" repeated 4 times between [2021-05-17
10:27:18.510668 +] and [2021-05-17 10:27:30.534569 +]
[2021-05-17 10:27:30.536254 +] I [MSGID: 114057]
[client-handshake.c:1128:select_server_supported_programs]
2-VM_Storage_1-client-7: Using Program [{Program-name=GlusterFS 4.x v1},
{Num=1298437}, {Version=400}]
[2021-05-17 10:27:30.536620 +] W [MSGID: 114043]
[client-handshake.c:727:client_setvolume_cbk] 2-VM_Storage_1-client-7:
failed to set the volume [{errno=2}, {error=No such file or directory}]
[2021-05-17 10:27:30.536638 +] W [MSGID: 114007]
[client-handshake.c:752:client_setvolume_cbk] 2-VM_Storage_1-client-7:
failed to get from reply dict [{process-uuid}, {errno=22}, {error=Invalid
argument}]
[2021-05-17 10:27:30.536651 +] E [MSGID: 114044]
[client-handshake.c:757:client_setvolume_cbk] 2-VM_Storage_1-client-7:
SETVOLUME on remote-host failed [{remote-error=Brick not found}, {errno=2},
{error=No such file or directory}]
[2021-05-17 10:27:30.536660 +] I [MSGID: 114051]
[client-handshake.c:879:client_setvolume_cbk] 2-VM_Storage_1-client-7:
sending CHILD_CONNECTING event []
[2021-05-17 10:27:30.536686 +] I [MSGID: 114018]
[client.c:2229:client_rpc_notify] 2-VM_Storage_1-client-7: disconnected
from client, process will keep trying to connect glusterd until brick's
port is available [{conn-name=VM_Storage_1-client-7}]
[2021-05-17 10:27:33.537589 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
2-VM_Storage_1-client-3: changing port to 49154 (from 0)
[2021-05-17 10:27:33.539554 +] I [rpc-clnt.c:1968:rpc_clnt_reconfig]
2-VM_Storage_1-client-7: changing port to 49155 (from 0)

>From my understanding the process is trying to connect to the brick on the
wrong port
lab-cnvirt-h03-storage:-bricks-vm_b2_vol-brick:8:brick-id=VM_Storage_1-client-7
Brick