I am seeing a pause when the .t runs that seem to last close to how much ever time we put in EXPECT_WITHIN
[2016-09-01 03:24:21.852744] I [common.c:1134:pl_does_monkey_want_stuck_lock] 0-patchy-locks: stuck lock [2016-09-01 03:24:21.852775] W [inodelk.c:659:pl_inode_setlk] 0-patchy-locks: MONKEY LOCKING (forcing stuck lock)! at 2016-09-01 03:24:21 [2016-09-01 03:24:21.852792] I [server-rpc-fops.c:317:server_finodelk_cbk] 0-patchy-server: replied [2016-09-01 03:24:21.861937] I [server-rpc-fops.c:5682:server3_3_inodelk] 0-patchy-server: inbound [2016-09-01 03:24:21.862318] I [server-rpc-fops.c:278:server_inodelk_cbk] 0-patchy-server: replied [2016-09-01 03:24:21.862627] I [server-rpc-fops.c:5682:server3_3_inodelk] 0-patchy-server: inbound <<---- No I/O after this. [2016-09-01 03:27:19.6N]:++++++++++ G_LOG:tests/features/lock_revocation.t: TEST: 52 append_to_file /mnt/glusterfs/1/testfile ++++++++++ [2016-09-01 03:27:19.871044] I [server-rpc-fops.c:5772:server3_3_finodelk] 0-patchy-server: inbound [2016-09-01 03:27:19.871280] I [clear.c:219:clrlk_clear_inodelk] 0-patchy-locks: 2 [2016-09-01 03:27:19.871307] I [clear.c:273:clrlk_clear_inodelk] 0-patchy-locks: released_granted [2016-09-01 03:27:19.871330] I [server-rpc-fops.c:278:server_inodelk_cbk] 0-patchy-server: replied [2016-09-01 03:27:19.871389] W [inodelk.c:228:__inodelk_prune_stale] 0-patchy-locks: Lock revocation [reason: age; gfid: 3ccca736-ba89-4f8c-ba17-f6cdbcd0e3c3; domain: patchy-replicate-0; age: 178 sec] - Inode lock revoked: 0 granted & 1 blocked locks cleared We can prevent the hang with adding $CLI volume stop $V0, but the test would fail. When that happens, the following error is printed on the console from perfused perfused: perfuse_node_inactive: perfuse_node_fsync failed error = 57: Resource temporarily unavailable <<--- I wonder if this comes because INODELK fop fails with EAGAIN. I am also seeing a weird behaviour where it says it is releasing granted locks but prints that it released 1 blocked lock. +Manu I think there are 2 things going on here. 1) There is a hang, I am still guessing it is gluster issue until proven otherwise. 2) I got to figure out why the counters are showing wrong information from the information printed in the logs. I kept going through the code, it seems fine. It should have printed that it released 1 granted lock & 0 blocked locks. But it prints it in reverse. If you do git diff on nbslave72.cloud.gluster.org, you can see the changes I made. Could you please help? On Sun, Aug 28, 2016 at 7:36 AM, Atin Mukherjee <amukh...@redhat.com> wrote: > This is still bothering us a lot and looks like there is a genuine issue > in the code which is making the the process to be hung/deadlocked? > > Raghavendra T - any more findings? > > > On Friday 19 August 2016, Atin Mukherjee <amukh...@redhat.com> wrote: > >> https://bugzilla.redhat.com/show_bug.cgi?id=1368421 >> >> NetBSD regressions are getting aborted very frequently. Apart from the >> infra issue related to connectivity (Nigel has started looking into it), >> lock_revocation.t is getting hung in such instances which is causing run to >> be aborted after 300 minutes. This has already started impacting the >> patches to get in which eventually impacts the upcoming release cycles. >> >> I'd request the feature owner/maintainer to have a look at it asap. >> >> --Atin >> > > > -- > --Atin > > _______________________________________________ > maintainers mailing list > maintain...@gluster.org > http://www.gluster.org/mailman/listinfo/maintainers > > -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel