[Gluster-devel] namespace.t fails with brick multiplexing enabled

2018-02-25 Thread Atin Mukherjee
Hi Varsha,

Thanks for your first feature "namespace" in GlusterFS! As we run a
periodic regression jobs with brick multiplexing, we have seen that
tests/basic/namespace.t fails constantly with brick multiplexing enabled. I
just went through the function check_samples () in the test file and it
looked to me the function was written with an assumption that every process
will be associated with one brick instance and will have one log file which
is not the case for brick multiplexing [1] . If you need further question
on brick multiplexing, feel free to ask.

[1] http://blog.gluster.org/brick-multiplexing-in-gluster-3-10/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Report ESTALE as ENOENT

2018-02-25 Thread Raghavendra G
On Fri, Feb 23, 2018 at 6:33 AM, J. Bruce Fields 
wrote:

> On Thu, Feb 22, 2018 at 01:17:58PM +0530, Raghavendra G wrote:
> > On Wed, Oct 11, 2017 at 7:32 PM, J. Bruce Fields 
> > wrote:
> >
> > > On Wed, Oct 11, 2017 at 04:11:51PM +0530, Raghavendra G wrote:
> > > > On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields <
> bfie...@fieldses.org>
> > > > wrote:
> > > >
> > > > > On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote:
> > > > > > I would prefer to:
> > > > > >
> > > > > > 1. Return ENOENT for all system calls that operate on a path.
> > > > > >
> > > > > > 2. ESTALE might be ok for file descriptor based operations.
> > > > >
> > > > > Note that operations which operate on paths can fail with ESTALE
> when
> > > > > they attempt to look up a component within a directory that no
> longer
> > > > > exists.
> > > > >
> > > >
> > > > But, "man 2 rmdir"  or "man 2 unlink" doesn't list ESTALE as a valid
> > > error.
> > >
> > > In fact, almost no man pages list ESTALE as a valid error:
> > >
> > > [bfields@patate man-pages]$ git grep ESTALE
> > > Changes.old:Change description for ESTALE
> > > man2/open_by_handle_at.2:.B ESTALE
> > > man2/open_by_handle_at.2:.B ESTALE
> > > man3/errno.3:.B ESTALE
> > >
> > > Cc'ing Michael Kerrisk for advice.  Is there some reason for that, or
> > > can we fix those man pages?
> > >
> > > > Also rm doesn't seem to handle ESTALE too [3]
> > > >
> > > > [4] https://github.com/coreutils/coreutils/blob/master/src/
> remove.c#L305
> > >
> > > I *think* that code is just deciding whether a given error should be
> > > silently ignored in the rm -f case.  I don't think -ESTALE (indicating
> > > the directory is bad) is such an error, so I think this code is
> correct.
> > > But my understanding may be wrong.
> > >
> >
> > For a local filesystem, we may not end up in ESTALE errors. But, when
> rmdir
> > is executed from multiple clients of a network fs (like NFS, Glusterfs),
> > unlink or rmdir can easily fail with ESTALE as the other rm invocation
> > could've deleted it. I think this is what has happened in bugs like:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1546717
> > https://bugzilla.redhat.com/show_bug.cgi?id=1245065
> >
> > This in fact was the earlier motivation to convert ESTALE into ENOENT, so
> > that rm would ignore it. Now that I reverted the fix, looks like the bug
> > has promptly resurfaced :)
> >
> > There is one glitch though. Bug 1245065 mentions that some parts of
> > directory structure remain undeleted. From my understanding, atleast one
> > instance of rm (which is racing ahead of all others causing others to
> > fail), should've delted the directory structure completely. Though, I
> need
> > to understand the directory traversal done by rm to find whether there
> are
> > cyclic dependency between two rms causing both of them to fail.
>
> I don't see how you could avoid that.  The clients are each caching
> multiple subdirectories of the tree, and there's no guarantee that 1
> client has fresher caches of every subdirectory.  There's also no
> guarantee that the client that's ahead stays ahead--another client that
> sees which objects the first client has already deleted can leapfrog
> ahead.
>

What are the drawbacks of applications (like rm) treating ESTALE equivalent
of ENOENT? It seems to me, from the application perspective they both
convey similar information. If rm could ignore ESTALE just like it does for
ENOENT, probably we don't run into this issue.


> I think the solution is just not to do that--NFS clients aren't really
> equipped to handle directory operations on directories that are deleted
> out from under them, and there probably aren't any hacks on the server
> side that will fix that.  If there's a real need for this kind of case,
> we may need to work on the protocol itself.  For now all we may be able
> to do is educate users about what NFS can and can't do.
>
> --b.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Weekly Untriaged Bugs

2018-02-25 Thread jenkins
[...truncated 6 lines...]
https://bugzilla.redhat.com/1541261 / build: "glustereventsd-SuSE.in" is 
missing in extras/init.d
https://bugzilla.redhat.com/1544851 / build: Redefinitions of IXDR_GET_LONG and 
IXDR_PUT_LONG when libtirpc is used
https://bugzilla.redhat.com/1547888 / core: [brick-mux] incorrect event-thread 
scaling in server_reconfigure()
https://bugzilla.redhat.com/1545142 / core: GlusterFS - Memory Leak during 
adding directories
https://bugzilla.redhat.com/1544090 / core: possible memleak in glusterfsd 
process with brick multiplexing on
https://bugzilla.redhat.com/1540882 / disperse: Do lock conflict check 
correctly for wait-list
https://bugzilla.redhat.com/1547127 / distribute: Typo error in 
__dht_check_free_space function log message
https://bugzilla.redhat.com/1543585 / fuse: Client Memory Usage Drastically 
Increased from 3.12 to 3.13 for Replicate 3 Volumes
https://bugzilla.redhat.com/1539657 / geo-replication: Georeplication tests 
intermittently fail
https://bugzilla.redhat.com/1542979 / geo-replication: glibc fix for 
CVE-2018-101 breaks geo-replication
https://bugzilla.redhat.com/1544638 / glusterd: 3.8 -> 3.10 rolling upgrade 
fails (same for 3.12 or 3.13) on Ubuntu 14
https://bugzilla.redhat.com/1540249 / glusterd: Gluster is trying to use a port 
outside documentation and firewalld's glusterfs.xml
https://bugzilla.redhat.com/1546932 / glusterd: systemd units does not stop all 
gluster daemons
https://bugzilla.redhat.com/1540868 / glusterd: Volume start commit failed, 
Commit failed for operation Start on local node
https://bugzilla.redhat.com/1548517 / glusterd: write failed with EINVAL due 
O_DIRECT write buffer with unaligned size
https://bugzilla.redhat.com/1546645 / project-infrastructure: Ansible should 
setup correct selinux context for /archives
https://bugzilla.redhat.com/1545003 / project-infrastructure: Create a new 
list: automated-test...@gluster.org
https://bugzilla.redhat.com/1547791 / project-infrastructure: Jenkins API error
https://bugzilla.redhat.com/1544378 / project-infrastructure: mailman list 
moderation redirects from https to http
https://bugzilla.redhat.com/1545891 / project-infrastructure: Provide a 
automated way to update bugzilla status with patch merge.
https://bugzilla.redhat.com/1540478 / quota: Change quota option of many 
volumes concurrently, some commit operation failed.
https://bugzilla.redhat.com/1539680 / rdma: RDMA transport bricks crash
https://bugzilla.redhat.com/1544961 / rpc: libgfrpc does not export IPv6 RPC 
methods even with --with-ipv6-default
https://bugzilla.redhat.com/1546295 / rpc: Official packages don't default to 
IPv6
https://bugzilla.redhat.com/1542934 / rpc: Seeing timer errors in the rebalance 
logs
https://bugzilla.redhat.com/1542072 / scripts: Syntactical errors in hook 
scripts for managing SELinux context on bricks #2 (S10selinux-label-brick.sh + 
S10selinux-del-fcontext.sh)
https://bugzilla.redhat.com/1540759 / tiering: Failure to demote tiered volume 
file that is continuously modified by client during hot tier detachment.
https://bugzilla.redhat.com/1540376 / tiering: Tiered volume performance 
degrades badly after a volume stop/start or system restart.
[...truncated 2 lines...]

build.log
Description: Binary data
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel