[Gluster-devel] namespace.t fails with brick multiplexing enabled
Hi Varsha, Thanks for your first feature "namespace" in GlusterFS! As we run a periodic regression jobs with brick multiplexing, we have seen that tests/basic/namespace.t fails constantly with brick multiplexing enabled. I just went through the function check_samples () in the test file and it looked to me the function was written with an assumption that every process will be associated with one brick instance and will have one log file which is not the case for brick multiplexing [1] . If you need further question on brick multiplexing, feel free to ask. [1] http://blog.gluster.org/brick-multiplexing-in-gluster-3-10/ ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Report ESTALE as ENOENT
On Fri, Feb 23, 2018 at 6:33 AM, J. Bruce Fields wrote: > On Thu, Feb 22, 2018 at 01:17:58PM +0530, Raghavendra G wrote: > > On Wed, Oct 11, 2017 at 7:32 PM, J. Bruce Fields > > wrote: > > > > > On Wed, Oct 11, 2017 at 04:11:51PM +0530, Raghavendra G wrote: > > > > On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields < > bfie...@fieldses.org> > > > > wrote: > > > > > > > > > On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote: > > > > > > I would prefer to: > > > > > > > > > > > > 1. Return ENOENT for all system calls that operate on a path. > > > > > > > > > > > > 2. ESTALE might be ok for file descriptor based operations. > > > > > > > > > > Note that operations which operate on paths can fail with ESTALE > when > > > > > they attempt to look up a component within a directory that no > longer > > > > > exists. > > > > > > > > > > > > > But, "man 2 rmdir" or "man 2 unlink" doesn't list ESTALE as a valid > > > error. > > > > > > In fact, almost no man pages list ESTALE as a valid error: > > > > > > [bfields@patate man-pages]$ git grep ESTALE > > > Changes.old:Change description for ESTALE > > > man2/open_by_handle_at.2:.B ESTALE > > > man2/open_by_handle_at.2:.B ESTALE > > > man3/errno.3:.B ESTALE > > > > > > Cc'ing Michael Kerrisk for advice. Is there some reason for that, or > > > can we fix those man pages? > > > > > > > Also rm doesn't seem to handle ESTALE too [3] > > > > > > > > [4] https://github.com/coreutils/coreutils/blob/master/src/ > remove.c#L305 > > > > > > I *think* that code is just deciding whether a given error should be > > > silently ignored in the rm -f case. I don't think -ESTALE (indicating > > > the directory is bad) is such an error, so I think this code is > correct. > > > But my understanding may be wrong. > > > > > > > For a local filesystem, we may not end up in ESTALE errors. But, when > rmdir > > is executed from multiple clients of a network fs (like NFS, Glusterfs), > > unlink or rmdir can easily fail with ESTALE as the other rm invocation > > could've deleted it. I think this is what has happened in bugs like: > > https://bugzilla.redhat.com/show_bug.cgi?id=1546717 > > https://bugzilla.redhat.com/show_bug.cgi?id=1245065 > > > > This in fact was the earlier motivation to convert ESTALE into ENOENT, so > > that rm would ignore it. Now that I reverted the fix, looks like the bug > > has promptly resurfaced :) > > > > There is one glitch though. Bug 1245065 mentions that some parts of > > directory structure remain undeleted. From my understanding, atleast one > > instance of rm (which is racing ahead of all others causing others to > > fail), should've delted the directory structure completely. Though, I > need > > to understand the directory traversal done by rm to find whether there > are > > cyclic dependency between two rms causing both of them to fail. > > I don't see how you could avoid that. The clients are each caching > multiple subdirectories of the tree, and there's no guarantee that 1 > client has fresher caches of every subdirectory. There's also no > guarantee that the client that's ahead stays ahead--another client that > sees which objects the first client has already deleted can leapfrog > ahead. > What are the drawbacks of applications (like rm) treating ESTALE equivalent of ENOENT? It seems to me, from the application perspective they both convey similar information. If rm could ignore ESTALE just like it does for ENOENT, probably we don't run into this issue. > I think the solution is just not to do that--NFS clients aren't really > equipped to handle directory operations on directories that are deleted > out from under them, and there probably aren't any hacks on the server > side that will fix that. If there's a real need for this kind of case, > we may need to work on the protocol itself. For now all we may be able > to do is educate users about what NFS can and can't do. > > --b. > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Weekly Untriaged Bugs
[...truncated 6 lines...] https://bugzilla.redhat.com/1541261 / build: "glustereventsd-SuSE.in" is missing in extras/init.d https://bugzilla.redhat.com/1544851 / build: Redefinitions of IXDR_GET_LONG and IXDR_PUT_LONG when libtirpc is used https://bugzilla.redhat.com/1547888 / core: [brick-mux] incorrect event-thread scaling in server_reconfigure() https://bugzilla.redhat.com/1545142 / core: GlusterFS - Memory Leak during adding directories https://bugzilla.redhat.com/1544090 / core: possible memleak in glusterfsd process with brick multiplexing on https://bugzilla.redhat.com/1540882 / disperse: Do lock conflict check correctly for wait-list https://bugzilla.redhat.com/1547127 / distribute: Typo error in __dht_check_free_space function log message https://bugzilla.redhat.com/1543585 / fuse: Client Memory Usage Drastically Increased from 3.12 to 3.13 for Replicate 3 Volumes https://bugzilla.redhat.com/1539657 / geo-replication: Georeplication tests intermittently fail https://bugzilla.redhat.com/1542979 / geo-replication: glibc fix for CVE-2018-101 breaks geo-replication https://bugzilla.redhat.com/1544638 / glusterd: 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or 3.13) on Ubuntu 14 https://bugzilla.redhat.com/1540249 / glusterd: Gluster is trying to use a port outside documentation and firewalld's glusterfs.xml https://bugzilla.redhat.com/1546932 / glusterd: systemd units does not stop all gluster daemons https://bugzilla.redhat.com/1540868 / glusterd: Volume start commit failed, Commit failed for operation Start on local node https://bugzilla.redhat.com/1548517 / glusterd: write failed with EINVAL due O_DIRECT write buffer with unaligned size https://bugzilla.redhat.com/1546645 / project-infrastructure: Ansible should setup correct selinux context for /archives https://bugzilla.redhat.com/1545003 / project-infrastructure: Create a new list: automated-test...@gluster.org https://bugzilla.redhat.com/1547791 / project-infrastructure: Jenkins API error https://bugzilla.redhat.com/1544378 / project-infrastructure: mailman list moderation redirects from https to http https://bugzilla.redhat.com/1545891 / project-infrastructure: Provide a automated way to update bugzilla status with patch merge. https://bugzilla.redhat.com/1540478 / quota: Change quota option of many volumes concurrently, some commit operation failed. https://bugzilla.redhat.com/1539680 / rdma: RDMA transport bricks crash https://bugzilla.redhat.com/1544961 / rpc: libgfrpc does not export IPv6 RPC methods even with --with-ipv6-default https://bugzilla.redhat.com/1546295 / rpc: Official packages don't default to IPv6 https://bugzilla.redhat.com/1542934 / rpc: Seeing timer errors in the rebalance logs https://bugzilla.redhat.com/1542072 / scripts: Syntactical errors in hook scripts for managing SELinux context on bricks #2 (S10selinux-label-brick.sh + S10selinux-del-fcontext.sh) https://bugzilla.redhat.com/1540759 / tiering: Failure to demote tiered volume file that is continuously modified by client during hot tier detachment. https://bugzilla.redhat.com/1540376 / tiering: Tiered volume performance degrades badly after a volume stop/start or system restart. [...truncated 2 lines...] build.log Description: Binary data ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel