Re: [Nfs-ganesha-devel] NFS Ganesha rm -rf randomly not working in Solaris.

2016-03-14 Thread Tushar Shinde
Hi Matt,

I am not able to reproduce this rm issue issue, I will get back to you
if I manage to repro it.


I have query on the commit d669eb56d19f1d722e1919d0f39e7055caa88ddd

In that fix, if flag is CACHE_INODE_FLAG_NONE, Then
cache_inode_avl_lookup_k checks  cookie tree,
">object.dir.avl.c;"

Consider following flow,
- directory got modified outside current servers knowledge (because of
cluster filesystem)
- In cache_inode_readdir -> cache_inode_lock_trust_attrs ->
cache_inode_invalidate_all_cached_dirent ->
cache_inode_release_dirents(entry, CACHE_INODE_AVL_BOTH); Cookie tree
entries got released.
- Later cache_inode_readdir_populate will rebuild the 't' tree.

In this case the cookie tree got cleaned because of "BOTH" flag, But
if server had sent any cookie to client for which entry got deleted
(and entry exists in 'c' tree) we will return bad cookie without
setting following flags
690 *nbfound = 0;
691 *eod_met = true;

Do you think instead of checking 'c' tree we can check size of 't'
tree and if that tree is empty we can set above values?
I did test with avltree_size('t') for above case and its working.

Thank you,
Tushar.






On Thu, Mar 10, 2016 at 10:38 PM, Matt Benjamin <mbenja...@redhat.com> wrote:
> Hi,
>
> Sorry if I missed it, but do you have pcap trace(s) showing the protocol 
> operations when this reproduces?
>
> Thanks
>
> Matt
>
> - Original Message -
>> From: "Tushar Shinde" <mtk.tus...@gmail.com>
>> To: "Tushar Shinde" <mtk.tus...@gmail.com>, 
>> nfs-ganesha-devel@lists.sourceforge.net
>> Sent: Thursday, March 10, 2016 8:51:56 AM
>> Subject: Re: [Nfs-ganesha-devel] NFS Ganesha rm -rf randomly not working in  
>>  Solaris.
>>
>> Hello Malahal,
>>
>> Thanks for reply.
>> I tried that fix, But I am seeing following messages from solaris
>> client. This is bit random issue.
>>
>> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
>> moved or linked to another directory during the execution of rm
>> rm: Unable to remove directory
>> /mnt/vdbench2/bigfileset/0003/0004: No such file or directory
>> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
>> moved or linked to another directory during the execution of rm
>> rm: Unable to remove directory
>> /mnt/vdbench2/bigfileset/0003/0004: No such file or directory
>> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
>> moved or linked to another directory during the execution of rm
>> rm: Unable to remove directory
>> /mnt/vdbench2/bigfileset/0003/0004: No such file or directory
>> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
>> moved or linked to another directory during the execution of rm
>>
>>
>> The filesystem is not updated by any other than Ganesha server itself.
>> And this issue only comes on Solaris, I have not see it on Linux.
>>
>> Thank you,
>> Tushar
>>
>>
>>
>> On Wed, Mar 9, 2016 at 3:41 AM, Malahal Naineni <mala...@us.ibm.com> wrote:
>> > Does the following fix the issue?
>> >
>> > commit d669eb56d19f1d722e1919d0f39e7055caa88ddd
>> > Author: Krishna Harathi <khara...@exablox.com>
>> > Date:   Fri Nov 6 00:03:58 2015 +
>> >
>> > nfsv3 - fix malformed packet response in readdir when zero entries are
>> > returned. Also in cache_inode_readdir.
>> >
>> > Change-Id: If9489308b863d8a819511fbd8cb0ed7f633e283f
>> > Signed-off-by: Krishna Harathi <khara...@exablox.com>
>> >
>> >
>> > Tushar Shinde [mtk.tus...@gmail.com] wrote:
>> >> Hi All,
>> >>
>> >> Env:
>> >> Server: NFS Ganesha V2.2-stable x86_64, on RHEL 6.5
>> >> Client: SunOS 5.10 Generic_150400-30 sun4v sparc
>> >> SUNW,SPARC-Enterprise-T5120
>> >> bash-3.2# cat /etc/release
>> >>Oracle Solaris 10 1/13 s10s_u11wos_24a SPARC
>> >>   Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights
>> >>   reserved.
>> >> Assembled 17 January 2013
>> >>
>> >>
>> >> Issue:
>> >> >From solaris client if we do rm -rf it randomly fails, Many time it
>> >> says eexists or directory is moved or linked while rm.
>> >>
>> >> I was debugging this issue and following are observations. Please
>> >> guide me to patch (if already fixed) or any clue to sol

Re: [Nfs-ganesha-devel] NFS Ganesha rm -rf randomly not working in Solaris.

2016-03-10 Thread Matt Benjamin
Hi,

Sorry if I missed it, but do you have pcap trace(s) showing the protocol 
operations when this reproduces?

Thanks

Matt

- Original Message -
> From: "Tushar Shinde" <mtk.tus...@gmail.com>
> To: "Tushar Shinde" <mtk.tus...@gmail.com>, 
> nfs-ganesha-devel@lists.sourceforge.net
> Sent: Thursday, March 10, 2016 8:51:56 AM
> Subject: Re: [Nfs-ganesha-devel] NFS Ganesha rm -rf randomly not working in   
> Solaris.
> 
> Hello Malahal,
> 
> Thanks for reply.
> I tried that fix, But I am seeing following messages from solaris
> client. This is bit random issue.
> 
> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
> moved or linked to another directory during the execution of rm
> rm: Unable to remove directory
> /mnt/vdbench2/bigfileset/0003/0004: No such file or directory
> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
> moved or linked to another directory during the execution of rm
> rm: Unable to remove directory
> /mnt/vdbench2/bigfileset/0003/0004: No such file or directory
> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
> moved or linked to another directory during the execution of rm
> rm: Unable to remove directory
> /mnt/vdbench2/bigfileset/0003/0004: No such file or directory
> rm: WARNING: A subdirectory of /mnt/vdbench2/bigfileset/0003 was
> moved or linked to another directory during the execution of rm
> 
> 
> The filesystem is not updated by any other than Ganesha server itself.
> And this issue only comes on Solaris, I have not see it on Linux.
> 
> Thank you,
> Tushar
> 
> 
> 
> On Wed, Mar 9, 2016 at 3:41 AM, Malahal Naineni <mala...@us.ibm.com> wrote:
> > Does the following fix the issue?
> >
> > commit d669eb56d19f1d722e1919d0f39e7055caa88ddd
> > Author: Krishna Harathi <khara...@exablox.com>
> > Date:   Fri Nov 6 00:03:58 2015 +
> >
> > nfsv3 - fix malformed packet response in readdir when zero entries are
> > returned. Also in cache_inode_readdir.
> >
> > Change-Id: If9489308b863d8a819511fbd8cb0ed7f633e283f
> > Signed-off-by: Krishna Harathi <khara...@exablox.com>
> >
> >
> > Tushar Shinde [mtk.tus...@gmail.com] wrote:
> >> Hi All,
> >>
> >> Env:
> >> Server: NFS Ganesha V2.2-stable x86_64, on RHEL 6.5
> >> Client: SunOS 5.10 Generic_150400-30 sun4v sparc
> >> SUNW,SPARC-Enterprise-T5120
> >> bash-3.2# cat /etc/release
> >>Oracle Solaris 10 1/13 s10s_u11wos_24a SPARC
> >>   Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights
> >>   reserved.
> >> Assembled 17 January 2013
> >>
> >>
> >> Issue:
> >> >From solaris client if we do rm -rf it randomly fails, Many time it
> >> says eexists or directory is moved or linked while rm.
> >>
> >> I was debugging this issue and following are observations. Please
> >> guide me to patch (if already fixed) or any clue to solve this.
> >>
> >> After debugging I found if unlink is in progress in same dir as of
> >> reading,
> >> while(ent = readdir()) {
> >> remove()
> >> }
> >>
> >> on solaris, getdents returns ENOENT on last call, rather it should get 0.
> >> Client is sending cookie which deleted on server by last remove call.
> >> Since there is no files left,
> >> Ganesha gets no new node and prints error like following.
> >>
> >> On Client:
> >> getdents(4, 0x7E70, 8192)   Err#2 ENOENT
> >>
> >> On Server
> >> 2016-03-08 12:53:28 : ganesha.nfsd-30982[work-9] nfs3_readdir :F_DBG
> >> :---> nfs3_readdir: count=8192  cookie=18411418809641280358
> >> estimated_num_entries=120
> >> 2016-03-08 12:53:28 : ganesha.nfsd-30982[work-9] cache_inode_readdir
> >> :F_DBG :Enter
> >> 2016-03-08 12:53:28 : ganesha.nfsd-30982[work-9] cache_inode_access_sw
> >> :DEBUG :INODE: DEBUG: access_type=0X8401
> >> 2016-03-08 12:53:28 : ganesha.nfsd-30982[work-9]
> >> cache_inode_avl_lookup_k :DEBUG :found deleted supremum (nil)
> >> 2016-03-08 12:53:28 : ganesha.nfsd-30982[work-9] cache_inode_readdir
> >> :F_DBG :seek to cookie=18411418809641280358 fail
> >>
> >> Thing to note here is cookie 18411418809641280358 is no more alive and
> >> hence lookup return nil.
> >>
> >> I tried sending CACHE_INODE_SUCCESS with 0 entries instead of
> >> CACHE