Re: [Gluster-Maintainers] [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t

2018-08-07 Thread Mohit Agrawal
I have posted a patch https://review.gluster.org/#/c/20657/ and start
brick-mux regression to validate the patch.

Thanks
Mohit Agrawal

On Wed, Aug 8, 2018 at 7:22 AM, Atin Mukherjee  wrote:

> +Mohit
>
> Requesting Mohit for help.
>
> On Wed, 8 Aug 2018 at 06:53, Shyam Ranganathan 
> wrote:
>
>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>> > 5) Current test failures
>> > We still have the following tests failing and some without any RCA or
>> > attention, (If something is incorrect, write back).
>> >
>> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>>
>> Ashish/Atin, the above test failed in run:
>> https://build.gluster.org/job/regression-on-demand-
>> multiplex/172/consoleFull
>>
>> The above run is based on patchset 4 of
>> https://review.gluster.org/#/c/20637/4
>>
>> The logs look as below, and as Ashish is unable to reproduce this, and
>> all failures are on line 78 with a heal outstanding of 105, looks like
>> this run may provide some possibilities on narrowing it down.
>>
>> The problem seems to be glustershd not connecting to one of the bricks
>> that is restarted, and hence failing to heal that brick. This also looks
>> like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t
>>
>> ==
>> Test times from: cat ./glusterd.log | grep TEST
>> [2018-08-06 20:56:28.177386]:++
>> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script
>> --wignore volume heal patchy full ++
>> [2018-08-06 20:56:28.767209]:++
>> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count
>> patchy ++
>> [2018-08-06 20:57:48.957136]:++
>> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o
>> 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o
>> ++
>> ==
>> Repeated connection failure to client-3 in glustershd.log:
>> [2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig]
>> 0-patchy-client-3: changing port to 49152 (from 0)
>> [2018-08-06 20:56:30.222738] W [MSGID: 114043]
>> [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed
>> to set the volume [Resource temporarily unavailable]
>> [2018-08-06 20:56:30.222788] W [MSGID: 114007]
>> [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed
>> to get 'process-uuid' from reply dict [Invalid argument]
>> [2018-08-06 20:56:30.222813] E [MSGID: 114044]
>> [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3:
>> SETVOLUME on remote-host failed: cleanup flag is set for xlator.  Try
>> again later [Resource tempor
>> arily unavailable]
>> [2018-08-06 20:56:30.222845] I [MSGID: 114051]
>> [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3:
>> sending CHILD_CONNECTING event
>> [2018-08-06 20:56:30.222919] I [MSGID: 114018]
>> [client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from
>> patchy-client-3. Client process will keep trying to connect to glusterd
>> until brick's port is
>>  available
>> ==
>> Repeated connection messages close to above retries in
>> d-backends-patchy0.log:
>> [2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update]
>> 0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1"
>> [2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login:
>> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
>> The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict:
>> key 'trusted.ec.version' is would not be sent on wire in future [Invalid
>> argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and
>>  [2018-08-06 20:56:37.933084]
>> [2018-08-06 20:56:38.530067] I [MSGID: 115029]
>> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
>> client from
>> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-
>> PID:10506-HOST:builder104.clo
>> ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev)
>> [2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update]
>> 0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1"
>> [2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login:
>> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
>> [2018-08-06 20:56:38.540555] I [MSGID: 115029]
>> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
>> client from
>> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-
>> PID:10506-HOST:builder104.clo
>> ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev)
>> [2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update]
>> 0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1"
>> [2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login:
>> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
>> [2018-08-06 20:56:38.552494] I [MSGID: 115029]
>> [server-handshake.c:786:server_setvolume] 0-patchy-server: 

Re: [Gluster-Maintainers] [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t

2018-08-07 Thread Atin Mukherjee
+Mohit

Requesting Mohit for help.

On Wed, 8 Aug 2018 at 06:53, Shyam Ranganathan  wrote:

> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> > 5) Current test failures
> > We still have the following tests failing and some without any RCA or
> > attention, (If something is incorrect, write back).
> >
> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>
> Ashish/Atin, the above test failed in run:
>
> https://build.gluster.org/job/regression-on-demand-multiplex/172/consoleFull
>
> The above run is based on patchset 4 of
> https://review.gluster.org/#/c/20637/4
>
> The logs look as below, and as Ashish is unable to reproduce this, and
> all failures are on line 78 with a heal outstanding of 105, looks like
> this run may provide some possibilities on narrowing it down.
>
> The problem seems to be glustershd not connecting to one of the bricks
> that is restarted, and hence failing to heal that brick. This also looks
> like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t
>
> ==
> Test times from: cat ./glusterd.log | grep TEST
> [2018-08-06 20:56:28.177386]:++
> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script
> --wignore volume heal patchy full ++
> [2018-08-06 20:56:28.767209]:++
> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count
> patchy ++
> [2018-08-06 20:57:48.957136]:++
> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o
> 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o
> ++
> ==
> Repeated connection failure to client-3 in glustershd.log:
> [2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig]
> 0-patchy-client-3: changing port to 49152 (from 0)
> [2018-08-06 20:56:30.222738] W [MSGID: 114043]
> [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed
> to set the volume [Resource temporarily unavailable]
> [2018-08-06 20:56:30.222788] W [MSGID: 114007]
> [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed
> to get 'process-uuid' from reply dict [Invalid argument]
> [2018-08-06 20:56:30.222813] E [MSGID: 114044]
> [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3:
> SETVOLUME on remote-host failed: cleanup flag is set for xlator.  Try
> again later [Resource tempor
> arily unavailable]
> [2018-08-06 20:56:30.222845] I [MSGID: 114051]
> [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3:
> sending CHILD_CONNECTING event
> [2018-08-06 20:56:30.222919] I [MSGID: 114018]
> [client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from
> patchy-client-3. Client process will keep trying to connect to glusterd
> until brick's port is
>  available
> ==
> Repeated connection messages close to above retries in
> d-backends-patchy0.log:
> [2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update]
> 0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1"
> [2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login:
> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
> The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict:
> key 'trusted.ec.version' is would not be sent on wire in future [Invalid
> argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and
>  [2018-08-06 20:56:37.933084]
> [2018-08-06 20:56:38.530067] I [MSGID: 115029]
> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
> client from
>
> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
> ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev)
> [2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update]
> 0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1"
> [2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login:
> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
> [2018-08-06 20:56:38.540555] I [MSGID: 115029]
> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
> client from
>
> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
> ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev)
> [2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update]
> 0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1"
> [2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login:
> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
> [2018-08-06 20:56:38.552494] I [MSGID: 115029]
> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
> client from
>
> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
> ud.gluster.org-PC_NAME:patchy-client-2-RECON_NO:-0 (version: 4.2dev)
> [2018-08-06 20:56:38.571671] I [addr.c:55:compare_addr_and_update]
> 0-/d/backends/patchy4: allowed = "*", received 

Re: [Gluster-Maintainers] [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t

2018-08-07 Thread Shyam Ranganathan
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/ec/bug-1236065.t (Ashish)

Ashish/Atin, the above test failed in run:
https://build.gluster.org/job/regression-on-demand-multiplex/172/consoleFull

The above run is based on patchset 4 of
https://review.gluster.org/#/c/20637/4

The logs look as below, and as Ashish is unable to reproduce this, and
all failures are on line 78 with a heal outstanding of 105, looks like
this run may provide some possibilities on narrowing it down.

The problem seems to be glustershd not connecting to one of the bricks
that is restarted, and hence failing to heal that brick. This also looks
like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t

==
Test times from: cat ./glusterd.log | grep TEST
[2018-08-06 20:56:28.177386]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script
--wignore volume heal patchy full ++
[2018-08-06 20:56:28.767209]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count
patchy ++
[2018-08-06 20:57:48.957136]:++
G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o
13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o
++
==
Repeated connection failure to client-3 in glustershd.log:
[2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig]
0-patchy-client-3: changing port to 49152 (from 0)
[2018-08-06 20:56:30.222738] W [MSGID: 114043]
[client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed
to set the volume [Resource temporarily unavailable]
[2018-08-06 20:56:30.222788] W [MSGID: 114007]
[client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed
to get 'process-uuid' from reply dict [Invalid argument]
[2018-08-06 20:56:30.222813] E [MSGID: 114044]
[client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3:
SETVOLUME on remote-host failed: cleanup flag is set for xlator.  Try
again later [Resource tempor
arily unavailable]
[2018-08-06 20:56:30.222845] I [MSGID: 114051]
[client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3:
sending CHILD_CONNECTING event
[2018-08-06 20:56:30.222919] I [MSGID: 114018]
[client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from
patchy-client-3. Client process will keep trying to connect to glusterd
until brick's port is
 available
==
Repeated connection messages close to above retries in
d-backends-patchy0.log:
[2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict:
key 'trusted.ec.version' is would not be sent on wire in future [Invalid
argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and
 [2018-08-06 20:56:37.933084]
[2018-08-06 20:56:38.530067] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.540555] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.552494] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from
CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo
ud.gluster.org-PC_NAME:patchy-client-2-RECON_NO:-0 (version: 4.2dev)
[2018-08-06 20:56:38.571671] I [addr.c:55:compare_addr_and_update]
0-/d/backends/patchy4: allowed = "*", received addr = "127.0.0.1"
[2018-08-06 20:56:38.571701] I [login.c:111:gf_auth] 0-auth/login:
allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05
[2018-08-06 20:56:38.571723] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-server: accepted
client from