Re: [Gluster-Maintainers] [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t
I have posted a patch https://review.gluster.org/#/c/20657/ and start brick-mux regression to validate the patch. Thanks Mohit Agrawal On Wed, Aug 8, 2018 at 7:22 AM, Atin Mukherjee wrote: > +Mohit > > Requesting Mohit for help. > > On Wed, 8 Aug 2018 at 06:53, Shyam Ranganathan > wrote: > >> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: >> > 5) Current test failures >> > We still have the following tests failing and some without any RCA or >> > attention, (If something is incorrect, write back). >> > >> > ./tests/bugs/ec/bug-1236065.t (Ashish) >> >> Ashish/Atin, the above test failed in run: >> https://build.gluster.org/job/regression-on-demand- >> multiplex/172/consoleFull >> >> The above run is based on patchset 4 of >> https://review.gluster.org/#/c/20637/4 >> >> The logs look as below, and as Ashish is unable to reproduce this, and >> all failures are on line 78 with a heal outstanding of 105, looks like >> this run may provide some possibilities on narrowing it down. >> >> The problem seems to be glustershd not connecting to one of the bricks >> that is restarted, and hence failing to heal that brick. This also looks >> like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t >> >> == >> Test times from: cat ./glusterd.log | grep TEST >> [2018-08-06 20:56:28.177386]:++ >> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script >> --wignore volume heal patchy full ++ >> [2018-08-06 20:56:28.767209]:++ >> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count >> patchy ++ >> [2018-08-06 20:57:48.957136]:++ >> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o >> 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o >> ++ >> == >> Repeated connection failure to client-3 in glustershd.log: >> [2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig] >> 0-patchy-client-3: changing port to 49152 (from 0) >> [2018-08-06 20:56:30.222738] W [MSGID: 114043] >> [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed >> to set the volume [Resource temporarily unavailable] >> [2018-08-06 20:56:30.222788] W [MSGID: 114007] >> [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed >> to get 'process-uuid' from reply dict [Invalid argument] >> [2018-08-06 20:56:30.222813] E [MSGID: 114044] >> [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3: >> SETVOLUME on remote-host failed: cleanup flag is set for xlator. Try >> again later [Resource tempor >> arily unavailable] >> [2018-08-06 20:56:30.222845] I [MSGID: 114051] >> [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3: >> sending CHILD_CONNECTING event >> [2018-08-06 20:56:30.222919] I [MSGID: 114018] >> [client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from >> patchy-client-3. Client process will keep trying to connect to glusterd >> until brick's port is >> available >> == >> Repeated connection messages close to above retries in >> d-backends-patchy0.log: >> [2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update] >> 0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1" >> [2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login: >> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 >> The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: >> key 'trusted.ec.version' is would not be sent on wire in future [Invalid >> argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and >> [2018-08-06 20:56:37.933084] >> [2018-08-06 20:56:38.530067] I [MSGID: 115029] >> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted >> client from >> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0- >> PID:10506-HOST:builder104.clo >> ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev) >> [2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update] >> 0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1" >> [2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login: >> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 >> [2018-08-06 20:56:38.540555] I [MSGID: 115029] >> [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted >> client from >> CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0- >> PID:10506-HOST:builder104.clo >> ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev) >> [2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update] >> 0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1" >> [2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login: >> allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 >> [2018-08-06 20:56:38.552494] I [MSGID: 115029] >> [server-handshake.c:786:server_setvolume] 0-patchy-server:
Re: [Gluster-Maintainers] [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t
+Mohit Requesting Mohit for help. On Wed, 8 Aug 2018 at 06:53, Shyam Ranganathan wrote: > On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: > > 5) Current test failures > > We still have the following tests failing and some without any RCA or > > attention, (If something is incorrect, write back). > > > > ./tests/bugs/ec/bug-1236065.t (Ashish) > > Ashish/Atin, the above test failed in run: > > https://build.gluster.org/job/regression-on-demand-multiplex/172/consoleFull > > The above run is based on patchset 4 of > https://review.gluster.org/#/c/20637/4 > > The logs look as below, and as Ashish is unable to reproduce this, and > all failures are on line 78 with a heal outstanding of 105, looks like > this run may provide some possibilities on narrowing it down. > > The problem seems to be glustershd not connecting to one of the bricks > that is restarted, and hence failing to heal that brick. This also looks > like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t > > == > Test times from: cat ./glusterd.log | grep TEST > [2018-08-06 20:56:28.177386]:++ > G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script > --wignore volume heal patchy full ++ > [2018-08-06 20:56:28.767209]:++ > G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count > patchy ++ > [2018-08-06 20:57:48.957136]:++ > G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o > 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o > ++ > == > Repeated connection failure to client-3 in glustershd.log: > [2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig] > 0-patchy-client-3: changing port to 49152 (from 0) > [2018-08-06 20:56:30.222738] W [MSGID: 114043] > [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed > to set the volume [Resource temporarily unavailable] > [2018-08-06 20:56:30.222788] W [MSGID: 114007] > [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed > to get 'process-uuid' from reply dict [Invalid argument] > [2018-08-06 20:56:30.222813] E [MSGID: 114044] > [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3: > SETVOLUME on remote-host failed: cleanup flag is set for xlator. Try > again later [Resource tempor > arily unavailable] > [2018-08-06 20:56:30.222845] I [MSGID: 114051] > [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3: > sending CHILD_CONNECTING event > [2018-08-06 20:56:30.222919] I [MSGID: 114018] > [client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from > patchy-client-3. Client process will keep trying to connect to glusterd > until brick's port is > available > == > Repeated connection messages close to above retries in > d-backends-patchy0.log: > [2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update] > 0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1" > [2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login: > allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 > The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: > key 'trusted.ec.version' is would not be sent on wire in future [Invalid > argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and > [2018-08-06 20:56:37.933084] > [2018-08-06 20:56:38.530067] I [MSGID: 115029] > [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted > client from > > CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo > ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev) > [2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update] > 0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1" > [2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login: > allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 > [2018-08-06 20:56:38.540555] I [MSGID: 115029] > [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted > client from > > CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo > ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev) > [2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update] > 0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1" > [2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login: > allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 > [2018-08-06 20:56:38.552494] I [MSGID: 115029] > [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted > client from > > CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo > ud.gluster.org-PC_NAME:patchy-client-2-RECON_NO:-0 (version: 4.2dev) > [2018-08-06 20:56:38.571671] I [addr.c:55:compare_addr_and_update] > 0-/d/backends/patchy4: allowed = "*", received
Re: [Gluster-Maintainers] [Gluster-devel] Test: ./tests/bugs/ec/bug-1236065.t
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote: > 5) Current test failures > We still have the following tests failing and some without any RCA or > attention, (If something is incorrect, write back). > > ./tests/bugs/ec/bug-1236065.t (Ashish) Ashish/Atin, the above test failed in run: https://build.gluster.org/job/regression-on-demand-multiplex/172/consoleFull The above run is based on patchset 4 of https://review.gluster.org/#/c/20637/4 The logs look as below, and as Ashish is unable to reproduce this, and all failures are on line 78 with a heal outstanding of 105, looks like this run may provide some possibilities on narrowing it down. The problem seems to be glustershd not connecting to one of the bricks that is restarted, and hence failing to heal that brick. This also looks like what Ravi RCAd for the test: ./tests/bugs/replicate/bug-1363721.t == Test times from: cat ./glusterd.log | grep TEST [2018-08-06 20:56:28.177386]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 77 gluster --mode=script --wignore volume heal patchy full ++ [2018-08-06 20:56:28.767209]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 78 ^0$ get_pending_heal_count patchy ++ [2018-08-06 20:57:48.957136]:++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 80 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++ == Repeated connection failure to client-3 in glustershd.log: [2018-08-06 20:56:30.218482] I [rpc-clnt.c:2087:rpc_clnt_reconfig] 0-patchy-client-3: changing port to 49152 (from 0) [2018-08-06 20:56:30.222738] W [MSGID: 114043] [client-handshake.c:1061:client_setvolume_cbk] 0-patchy-client-3: failed to set the volume [Resource temporarily unavailable] [2018-08-06 20:56:30.222788] W [MSGID: 114007] [client-handshake.c:1090:client_setvolume_cbk] 0-patchy-client-3: failed to get 'process-uuid' from reply dict [Invalid argument] [2018-08-06 20:56:30.222813] E [MSGID: 114044] [client-handshake.c:1096:client_setvolume_cbk] 0-patchy-client-3: SETVOLUME on remote-host failed: cleanup flag is set for xlator. Try again later [Resource tempor arily unavailable] [2018-08-06 20:56:30.222845] I [MSGID: 114051] [client-handshake.c:1201:client_setvolume_cbk] 0-patchy-client-3: sending CHILD_CONNECTING event [2018-08-06 20:56:30.222919] I [MSGID: 114018] [client.c:2255:client_rpc_notify] 0-patchy-client-3: disconnected from patchy-client-3. Client process will keep trying to connect to glusterd until brick's port is available == Repeated connection messages close to above retries in d-backends-patchy0.log: [2018-08-06 20:56:38.530009] I [addr.c:55:compare_addr_and_update] 0-/d/backends/patchy0: allowed = "*", received addr = "127.0.0.1" [2018-08-06 20:56:38.530044] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 The message "I [MSGID: 101016] [glusterfs3.h:739:dict_to_xdr] 0-dict: key 'trusted.ec.version' is would not be sent on wire in future [Invalid argument]" repeated 6 times between [2018-08-06 20:56:37.931040] and [2018-08-06 20:56:37.933084] [2018-08-06 20:56:38.530067] I [MSGID: 115029] [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted client from CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo ud.gluster.org-PC_NAME:patchy-client-0-RECON_NO:-0 (version: 4.2dev) [2018-08-06 20:56:38.540499] I [addr.c:55:compare_addr_and_update] 0-/d/backends/patchy1: allowed = "*", received addr = "127.0.0.1" [2018-08-06 20:56:38.540533] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 [2018-08-06 20:56:38.540555] I [MSGID: 115029] [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted client from CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo ud.gluster.org-PC_NAME:patchy-client-1-RECON_NO:-0 (version: 4.2dev) [2018-08-06 20:56:38.552442] I [addr.c:55:compare_addr_and_update] 0-/d/backends/patchy2: allowed = "*", received addr = "127.0.0.1" [2018-08-06 20:56:38.552472] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 [2018-08-06 20:56:38.552494] I [MSGID: 115029] [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted client from CTX_ID:cb3b4fed-62a4-4ad5-8b92-97838c651b22-GRAPH_ID:0-PID:10506-HOST:builder104.clo ud.gluster.org-PC_NAME:patchy-client-2-RECON_NO:-0 (version: 4.2dev) [2018-08-06 20:56:38.571671] I [addr.c:55:compare_addr_and_update] 0-/d/backends/patchy4: allowed = "*", received addr = "127.0.0.1" [2018-08-06 20:56:38.571701] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 756f302a-66eb-4cc0-8f91-797183312f05 [2018-08-06 20:56:38.571723] I [MSGID: 115029] [server-handshake.c:786:server_setvolume] 0-patchy-server: accepted client from