Re: [Nfs-ganesha-devel] libntirpc not available anymore?

2019-03-04 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.Very strange, you can add my remote (https://github.com/malahal/ntirpc.git)
and fetch it in src/libntirpc. It should have the latest commit needed, so
"git submodule update –init –recursive” should pass as long as you fetch
the remote.

Regards, Malahal.

On Mon, Mar 4, 2019 at 1:07 AM Sriram Patil via Nfs-ganesha-devel <
nfs-ganesha-devel@lists.sourceforge.net> wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
>
> Hi,
>
>
>
> I pulled in the latest ganesha code and tried to pull ntirpc with “git
> submodule update –init –recursive”. It is throwing following error,
>
>
>
> Cloning into 'src/libntirpc'...
>
> Username for 'https://github.com': srirampatil
>
> Password for 'https://srirampa...@github.com':
>
> remote: Repository not found.
>
> fatal: repository 'https://github.com/nfs-ganesha/ntirpc.git/' not found
>
> fatal: clone of 'https://github.com/nfs-ganesha/ntirpc.git' into
> submodule path 'src/libntirpc' failed
>
>
>
>
>
> The libntirpc repo does not exist anymore in nfs-ganesha account on github.
>
>
>
> Thanks,
>
> Sriram
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Ganesha 2.6.3 Segfault

2018-10-01 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.I did notice some issues with rpc.statd (hangs) with some versions.
Unfortunately, Ganesha also uses the rpc.statd for NFSv3 locking (in fact I
saw rpc.statd hangs with Ganesha). If you really want to get away from
rpc.statd issues you are having kNFS, I would suggest NFSv4 mounts. If you
are unable to resolve your Ganesha issue, there is an option for you to try
NFSv4 with kNFS.

Regards, Malahal.

On Mon, Oct 1, 2018 at 10:03 PM David C  wrote:

> Hi All
>
> Thanks for the suggestions, I'll have a go and report back. Sharing some
> more info that may or may not be relevant:
>
> The clients I'm accessing Ganesha with were previously accessing a Kernel
> NFS server running on CentOS. I was experiencing frequent issues where
> lockd on the server would go into uninterruptible sleep and I needed to
> stop the nfslock service, clear out /var/lib/nfs/statd dirs and start
> nfslock again to get things working.
>
> Now that these clients are accessing the nfs-ganesha server, I'm seeing
> similar behaviour, the clients were showing "lockd: server *ipaddr *not
> responding" and I had to restart nfs-ganesha to resolve. I don't know if
> these crashes are related to that in any way?
>
> The other thing to note is some of the exports are on a cephfs mount but
> I'm using the VFS FSAL, not the CEPH FSAL.
>
> When it is working, performance seems good, and the crashes don't appear
> to happen during periods of high I/O.
>
> Thanks,
>
>
>
> On Mon, Oct 1, 2018 at 4:30 PM Malahal Naineni  wrote:
>
>> This list has been deprecated. Please subscribe to the new devel list at
>> lists.nfs-ganesha.org.
>> David, another option is to test with Ganesha2.7 as you are able to
>> recreate easily with V2.6.3.
>>
>> On Mon, Oct 1, 2018 at 7:49 PM Daniel Gryniewicz  wrote:
>>
>>> This list has been deprecated. Please subscribe to the new devel list at
>>> lists.nfs-ganesha.org.
>>>
>>> I'm not seeing any easy way that cmpf could be corrupted.  The structure
>>> before it is fairly complex, with it's last element being an integer, so
>>> it's unlikely that something wrote off the end of that.  That leaves a
>>> random memory corruption, which is almost impossible to detect.
>>>
>>> David, can you rebuild your Ganesha?  If so, can you build with the
>>> Address Sanitizer on?  To do this, install libasan on your distro, and
>>> then pass -DSANITIZE_ADDRESS=ON to cmake.  With ASAN enabled, you may
>>> get a crash at the time of corruption, rather than at some future point.
>>>
>>> Daniel
>>>
>>> On 10/01/2018 09:20 AM, Malahal Naineni wrote:
>>> > This list has been deprecated. Please subscribe to the new devel list
>>> at lists.nfs-ganesha.org.
>>> >
>>> >
>>> >
>>> > Looking at the code head->cmpf should be "clnt_req_xid_cmpf" function
>>> > address. Your gdb didn't show that, but I don't know how that could
>>> > happen with the V2.6.3 code though. @Dan, any insights for this issue?
>>> >
>>> > On Mon, Oct 1, 2018 at 2:22 PM David C >> > <mailto:dcsysengin...@gmail.com>> wrote:
>>> >
>>> > Hi Malahal
>>> >
>>> > Result of that command:
>>> >
>>> > (gdb) p head->cmpf
>>> > $1 = (opr_rbtree_cmpf_t) 0x31fb0b405ba000b7
>>> >
>>> > Thanks,
>>> >
>>> > On Mon, Oct 1, 2018 at 5:55 AM Malahal Naineni >> > <mailto:mala...@gmail.com>> wrote:
>>> >
>>> > Looks like the head is messed up. Run these in gdb and let us
>>> > know the second commands output. 1. "frame 0"   2.
>>> > "p head->cmpf".  I believe, head->cmpf function is NULL or bad
>>> > leading to this segfault. I haven't seen this crash before and
>>> > never used Ganesha 2.6 version.
>>> >
>>> > Regards, Malahal.
>>> >
>>> > On Mon, Oct 1, 2018 at 1:25 AM David C <
>>> dcsysengin...@gmail.com
>>> > <mailto:dcsysengin...@gmail.com>> wrote:
>>> >
>>> > Hi Malahal
>>> >
>>> > I've set up ABRT so I'm now getting coredumps for the
>>> > crashes. I've installed debuginfo package for nfs-ganesha
>>> > and libntirpc.
>>> >
>>

Re: [Nfs-ganesha-devel] Ganesha 2.6.3 Segfault

2018-10-01 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.David, another option is to test with Ganesha2.7 as you are able to
recreate easily with V2.6.3.

On Mon, Oct 1, 2018 at 7:49 PM Daniel Gryniewicz  wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
>
> I'm not seeing any easy way that cmpf could be corrupted.  The structure
> before it is fairly complex, with it's last element being an integer, so
> it's unlikely that something wrote off the end of that.  That leaves a
> random memory corruption, which is almost impossible to detect.
>
> David, can you rebuild your Ganesha?  If so, can you build with the
> Address Sanitizer on?  To do this, install libasan on your distro, and
> then pass -DSANITIZE_ADDRESS=ON to cmake.  With ASAN enabled, you may
> get a crash at the time of corruption, rather than at some future point.
>
> Daniel
>
> On 10/01/2018 09:20 AM, Malahal Naineni wrote:
> > This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> >
> >
> >
> > Looking at the code head->cmpf should be "clnt_req_xid_cmpf" function
> > address. Your gdb didn't show that, but I don't know how that could
> > happen with the V2.6.3 code though. @Dan, any insights for this issue?
> >
> > On Mon, Oct 1, 2018 at 2:22 PM David C  > <mailto:dcsysengin...@gmail.com>> wrote:
> >
> > Hi Malahal
> >
> > Result of that command:
> >
> > (gdb) p head->cmpf
> > $1 = (opr_rbtree_cmpf_t) 0x31fb0b405ba000b7
> >
> > Thanks,
> >
> > On Mon, Oct 1, 2018 at 5:55 AM Malahal Naineni  > <mailto:mala...@gmail.com>> wrote:
> >
> > Looks like the head is messed up. Run these in gdb and let us
> > know the second commands output. 1. "frame 0"   2.
> > "p head->cmpf".  I believe, head->cmpf function is NULL or bad
> > leading to this segfault. I haven't seen this crash before and
> > never used Ganesha 2.6 version.
> >
> > Regards, Malahal.
> >
> > On Mon, Oct 1, 2018 at 1:25 AM David C  > <mailto:dcsysengin...@gmail.com>> wrote:
> >
> > Hi Malahal
> >
> > I've set up ABRT so I'm now getting coredumps for the
> > crashes. I've installed debuginfo package for nfs-ganesha
> > and libntirpc.
> >
> > I'd be really grateful if you could give me some guidance on
> > debugging this.
> >
> > Some info on the latest crash:
> >
> > The following was echoed to the kernel log:
> >
> > traps: ganesha.nfsd[28589] general protection
> > ip:7fcf2421dded sp:7fcd9d4d03a0 error:0 in
> > libntirpc.so.1.6.3[7fcf2420d000+3d000]
> >
> >
> > Last lines of output from # gdb /usr/bin/ganesha.nfsd
> coredump:
> >
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `/usr/bin/ganesha.nfsd -L
> > /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.c'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x7fcf2421dded in opr_rbtree_insert
> > (head=head@entry=0x7fcef800c528,
> > node=node@entry=0x7fce68004750) at
> > /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
> > 271 switch (head->cmpf(node, parent)) {
> > Missing separate debuginfos, use: debuginfo-install
> > bzip2-libs-1.0.6-13.el7.x86_64
> > dbus-libs-1.10.24-7.el7.x86_64
> > elfutils-libelf-0.170-4.el7.x86_64
> > elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64
> > gssproxy-0.7.0-17.el7.x86_64
> > keyutils-libs-1.5.8-3.el7.x86_64
> > krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64
> > libblkid-2.23.2-52.el7.x86_64 libcap-2.22-9.el7.x86_64
> > libcom_err-1.42.9-12.el7_5.x86_64
> > libgcc-4.8.5-28.el7_5.1.x86_64 libgcrypt-1.5.3-14.el7.x86_64
> > libgpg-error-1.12-3.el7.x86_64
> > libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-12.el7.x86_64
> > libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64
> > 

Re: [Nfs-ganesha-devel] Ganesha 2.6.3 Segfault

2018-10-01 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.Looking at the code head->cmpf should be "clnt_req_xid_cmpf" function
address. Your gdb didn't show that, but I don't know how that could happen
with the V2.6.3 code though. @Dan, any insights for this issue?

On Mon, Oct 1, 2018 at 2:22 PM David C  wrote:

> Hi Malahal
>
> Result of that command:
>
> (gdb) p head->cmpf
> $1 = (opr_rbtree_cmpf_t) 0x31fb0b405ba000b7
>
> Thanks,
>
> On Mon, Oct 1, 2018 at 5:55 AM Malahal Naineni  wrote:
>
>> Looks like the head is messed up. Run these in gdb and let us know the
>> second commands output. 1. "frame 0"   2. "p head->cmpf".  I believe,
>> head->cmpf function is NULL or bad leading to this segfault. I haven't seen
>> this crash before and never used Ganesha 2.6 version.
>>
>> Regards, Malahal.
>>
>> On Mon, Oct 1, 2018 at 1:25 AM David C  wrote:
>>
>>> Hi Malahal
>>>
>>> I've set up ABRT so I'm now getting coredumps for the crashes. I've
>>> installed debuginfo package for nfs-ganesha and libntirpc.
>>>
>>> I'd be really grateful if you could give me some guidance on debugging
>>> this.
>>>
>>> Some info on the latest crash:
>>>
>>> The following was echoed to the kernel log:
>>>
>>> traps: ganesha.nfsd[28589] general protection ip:7fcf2421dded
>>>> sp:7fcd9d4d03a0 error:0 in libntirpc.so.1.6.3[7fcf2420d000+3d000]
>>>>
>>>
>>> Last lines of output from # gdb /usr/bin/ganesha.nfsd coredump:
>>>
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `/usr/bin/ganesha.nfsd -L
>>> /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.c'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  0x7fcf2421dded in opr_rbtree_insert (head=head@entry=0x7fcef800c528,
>>> node=node@entry=0x7fce68004750) at
>>> /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
>>> 271 switch (head->cmpf(node, parent)) {
>>> Missing separate debuginfos, use: debuginfo-install
>>> bzip2-libs-1.0.6-13.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
>>> elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64
>>> glibc-2.17-222.el7.x86_64 gssproxy-0.7.0-17.el7.x86_64
>>> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64
>>> libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-52.el7.x86_64
>>> libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
>>> libgcc-4.8.5-28.el7_5.1.x86_64 libgcrypt-1.5.3-14.el7.x86_64
>>> libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64
>>> libselinux-2.5-12.el7.x86_64 libuuid-2.23.2-52.el7.x86_64
>>> lz4-1.7.5-2.el7.x86_64 pcre-8.32-17.el7.x86_64
>>> systemd-libs-219-57.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64
>>> zlib-1.2.7-17.el7.x86_64
>>>
>>> Output from bt:
>>>
>>> (gdb) bt
>>> #0  0x7fcf2421dded in opr_rbtree_insert (head=head@entry=0x7fcef800c528,
>>> node=node@entry=0x7fce68004750) at
>>> /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
>>> #1  0x7fcf24218eac in clnt_req_setup (cc=cc@entry=0x7fce68004720,
>>> timeout=...) at /usr/src/debug/ntirpc-1.6.3/src/clnt_generic.c:515
>>> #2  0x55d62490347f in nsm_unmonitor (host=host@entry=0x7fce00018ea0)
>>> at /usr/src/debug/nfs-ganesha-2.6.3/src/Protocols/NLM/nsm.c:219
>>> #3  0x55d6249425cf in dec_nsm_client_ref (client=0x7fce00018ea0) at
>>> /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:857
>>> #4  0x55d624942f61 in free_nlm_client (client=0x7fce00017500) at
>>> /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1039
>>> #5  0x55d6249431d3 in dec_nlm_client_ref (client=0x7fce00017500) at
>>> /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1130
>>> #6  0x55d6249439ae in free_nlm_owner (owner=owner@entry=0x7fce00024bc0)
>>> at /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1314
>>> #7  0x55d624929a48 in free_state_owner (owner=0x7fce00024bc0) at
>>> /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/state_misc.c:818
>>> #8  0x55d624929dc0 in dec_state_owner_ref (owner=0x7fce00024bc0) at
>>> /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/state_misc.c:968
>>> #9  0x55d6248ff173 in nlm4_Unlock (args=0x7fce68003b98,
>>> req=0x7fce68003490, res=0x7fce68000d70) at
>>> /usr/src/debug/nfs-ganesha-2.6.3/src/Protocols/NLM/n

Re: [Nfs-ganesha-devel] Ganesha 2.6.3 Segfault

2018-09-30 Thread Malahal Naineni
.6
>
> Thanks for your assistance so far on this
> David
>
>
>
>
>
>
>
>
> On Fri, Sep 28, 2018 at 8:06 PM David C  wrote:
>
>> Thanks, Malahal. I'll get the coredumps enabled. I've had a few more
>> crashes today, hopefully they'll shed some light on the issue.
>>
>> On Fri, Sep 28, 2018 at 1:20 PM Malahal Naineni 
>> wrote:
>>
>>> You need to enable coredumps for ganesha. Here are some instructions!
>>> Step2 is NOT needed as your packages are signed:
>>>
>>> https://ganltc.github.io/setup-to-take-ganesha-coredumps.html
>>>
>>> On Fri, Sep 28, 2018 at 4:38 PM David C  wrote:
>>>
>>>> This list has been deprecated. Please subscribe to the new devel list
>>>> at lists.nfs-ganesha.org.
>>>> Hi All
>>>>
>>>> CentOS 7.5
>>>> nfs-ganesha-2.6.3-1.el7.x86_64
>>>> nfs-ganesha-vfs-2.6.3-1.el7.x86_64
>>>> libntirpc-1.6.3-1.el7.x86_64
>>>>
>>>> My Ganesha service crashed and the following was echoed to my kernel
>>>> log:
>>>>
>>>> ganesha.nfsd[28752]: segfault at 0 ip   (null) sp
>>>>> 7ff9a2af8458 error 14 in ganesha.nfsd[559170ef3000+1a4000]
>>>>>
>>>>
>>>> Nothing in my ganesha.log
>>>>
>>>> These are the log settings from my ganesha.conf:
>>>>
>>>> LOG {
>>>>> ## Default log level for all components
>>>>> Default_Log_Level = DEBUG;
>>>>>
>>>>> ## Configure per-component log levels.
>>>>> #Components {
>>>>> #FSAL = INFO;
>>>>> #NFS4 = EVENT;
>>>>> #}
>>>>>
>>>>> ## Where to log
>>>>> Facility {
>>>>> name = FILE;
>>>>> destination = "/var/log/ganesha.log";
>>>>> enable = active;
>>>>> }
>>>>> }
>>>>>
>>>>
>>>> This is an example of one of my exports (they're all Nfsv3 with VFS
>>>> FSAL):
>>>>
>>>> EXPORT
>>>>> {
>>>>> Export_Id = 80;
>>>>> Path = /mnt/dir;
>>>>> Pseudo = /mnt/dir;
>>>>> Access_Type = RW;
>>>>> Protocols = 3;
>>>>> Transports = TCP;
>>>>> Squash = no_root_squash;
>>>>> Disable_ACL=False;
>>>>> Filesystem_Id = 101.1;
>>>>> CLIENT {
>>>>>Clients = *;
>>>>>Squash = None;
>>>>>Access_Type = RW;
>>>>> }
>>>>> FSAL {
>>>>>   Name = VFS;
>>>>>  }
>>>>> }
>>>>>
>>>>>
>>>> The exports are mounted on CentOS 7.4 clients with autofs-5.0.7 and
>>>> nfs-utils-1.3.0-0.48.el7_4.x86_64
>>>>
>>>> This crashed occurred approx 2 hours after I increased the number of
>>>> clients accessing the server by approx five clients, don't know if that's
>>>> related
>>>>
>>>> Could someone help me troubleshoot this please?
>>>>
>>>> Many thanks
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ___
>>>> Nfs-ganesha-devel mailing list
>>>> Nfs-ganesha-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>>
>>>
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Ganesha 2.6.3 Segfault

2018-09-28 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.You need to enable coredumps for ganesha. Here are some instructions! Step2
is NOT needed as your packages are signed:

https://ganltc.github.io/setup-to-take-ganesha-coredumps.html

On Fri, Sep 28, 2018 at 4:38 PM David C  wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> Hi All
>
> CentOS 7.5
> nfs-ganesha-2.6.3-1.el7.x86_64
> nfs-ganesha-vfs-2.6.3-1.el7.x86_64
> libntirpc-1.6.3-1.el7.x86_64
>
> My Ganesha service crashed and the following was echoed to my kernel log:
>
> ganesha.nfsd[28752]: segfault at 0 ip   (null) sp 7ff9a2af8458
>> error 14 in ganesha.nfsd[559170ef3000+1a4000]
>>
>
> Nothing in my ganesha.log
>
> These are the log settings from my ganesha.conf:
>
> LOG {
>> ## Default log level for all components
>> Default_Log_Level = DEBUG;
>>
>> ## Configure per-component log levels.
>> #Components {
>> #FSAL = INFO;
>> #NFS4 = EVENT;
>> #}
>>
>> ## Where to log
>> Facility {
>> name = FILE;
>> destination = "/var/log/ganesha.log";
>> enable = active;
>> }
>> }
>>
>
> This is an example of one of my exports (they're all Nfsv3 with VFS FSAL):
>
> EXPORT
>> {
>> Export_Id = 80;
>> Path = /mnt/dir;
>> Pseudo = /mnt/dir;
>> Access_Type = RW;
>> Protocols = 3;
>> Transports = TCP;
>> Squash = no_root_squash;
>> Disable_ACL=False;
>> Filesystem_Id = 101.1;
>> CLIENT {
>>Clients = *;
>>Squash = None;
>>Access_Type = RW;
>> }
>> FSAL {
>>   Name = VFS;
>>  }
>> }
>>
>>
> The exports are mounted on CentOS 7.4 clients with autofs-5.0.7 and
> nfs-utils-1.3.0-0.48.el7_4.x86_64
>
> This crashed occurred approx 2 hours after I increased the number of
> clients accessing the server by approx five clients, don't know if that's
> related
>
> Could someone help me troubleshoot this please?
>
> Many thanks
> David
>
>
>
>
>
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 2.6.3 Health status is unhealthy

2018-09-25 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.The health check is buggy sometimes, so we did "heartbeat_freq = 0;" in our
configs! We should fix this.

On Tue, Sep 25, 2018 at 5:47 PM, Daniel Gryniewicz  wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> No, but the health check happens every 5 seconds, and the message is not
> rate limited, so if it doesn't happen again in 5 seconds, then the health
> issue has cleared.
>
> Daniel
>
> On 09/24/2018 03:36 PM, David C wrote:
>
>> Hi Daniel
>>
>> Thanks for the response.
>>
>> It's only happened once since the server was started at midnight. There
>> would have been very little activity in that time so that would seem to
>> support your theory.
>>
>> I'll monitor it to see if it reoccurs.
>>
>> Is there anything I need to do to clear the unhealthy status? Would you
>> expect there to be a message to say the server has returned to a healthy
>> state?
>>
>> Thanks
>> David
>>
>>
>> On Mon, 24 Sep 2018, 18:09 Daniel Gryniewicz, > d...@redhat.com>> wrote:
>>
>> This list has been deprecated. Please subscribe to the new devel
>> list at lists.nfs-ganesha.org .
>> I think this is due to the low traffic.  What that check says is
>> that we
>> got new ops enqueued (1, in this case) but no ops dequeued.  However,
>> since there was only 1 op enqueued, I suspect that the issue is that
>> no
>> ops came in during the sampling period, except for one right at the
>> end,
>> which hasn't been handled yet.
>>
>> Does this message keep occurring?  Or does it happen only once?
>>
>> Daniel
>>
>> On 09/24/2018 12:28 PM, David C wrote:
>>  > This list has been deprecated. Please subscribe to the new devel
>> list at lists.nfs-ganesha.org .
>>  >
>>  >
>>  >
>>  > Hi All
>>  >
>>  > CentOS 7.5
>>  > nfs-ganesha-vfs-2.6.3-1.el7.x86_64
>>  > nfs-ganesha-2.6.3-1.el7.x86_64
>>  > libntirpc-1.6.3-1.el7.x86_64
>>  >
>>  > Exporting some directories with VFS FSAL
>>  >
>>  > Nfsv3 only, currently very light traffic (a few clients
>> connecting).
>>  >
>>  > After starting Ganesha the following was logged after about 12
>> hours:
>>  >
>>  > 24/09/2018 12:11:00 : epoch 5ba8165e : fsrv01:
>>  > ganesha.nfsd-22835[dbus_heartbeat] nfs_health :DBUS :WARN
>> :Health
>>  > status is unhealthy. enq new: 11925, old: 11924; deq new:
>> 11924,
>>  > old: 11924
>>  >
>>  >
>>  > Nfs access still seems fine from the clients.
>>  >
>>  > Could someone point me in the direction of how to diagnose this
>> message
>>  > please?
>>  >
>>  > Thanks,
>>  > David
>>  >
>>  >
>>  >
>>  >
>>  > ___
>>  > Nfs-ganesha-devel mailing list
>>  > Nfs-ganesha-devel@lists.sourceforge.net
>> 
>>  > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>  >
>>
>>
>>
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> 
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>>
>
>
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs4 idmapping issue with sssd fully qualified domain names

2018-08-09 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.nfs4_gss_princ_to_ids() should have succeeded if you have set up your
system correctly. We use "winbind" with AD.

On Wed, Aug 8, 2018 at 11:06 PM, Frank Filz  wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> On 08/08/2018 01:55 AM, Sri Krishnachowdary kankanala wrote:
>
>> This list has been deprecated. Please subscribe to the new devel list at
>> lists.nfs-ganesha.org.
>>
>>
>> Hi,
>>
>> Can someone please reply to this.
>>
>> Thanks,
>> Sri Krishna
>>
>> On Thu, Aug 2, 2018 at 1:18 PM, Sri Krishnachowdary kankanala <
>> kankanalaki...@gmail.com > wrote:
>>
>> Hi,
>>
>> I have AD server configured on windows 2012 server. I joined
>> centos node to AD using sssd. I configured sssd with fully
>> qualified domain names for users.
>> I mounted the nfs4 ganesha's export using krb5.
>>
>> I create a file from client node logged in as us...@ad.domain.com
>>  but when I do "ls -I" I see below
>> entries where as I expect the owner to be us...@ad.domain.com
>> 
>>
>> -rw-r--r-- 1 4294967294 4294967294 0 Aug  1 23:12 file1
>>
>>
>> I see the below error in ganesha logs:
>>
>>
>> nfs_req_creds :Could not map principal us...@ad.domain.com
>>  to uid
>>
>>
>> I further went ahead and used nfs4_set_debug() to get more logs
>> and found the below in ganesha logs when principal2uid() is called:
>>
>> nfs4_gss_princ_to_ids: calling nsswitch->princ_to_ids
>>
>> nss_getpwnam: name 'us...@ad.domain.com
>> ' domain '(null)': resulting localname
>> 'user1'
>>
>> nfs4_gss_princ_to_ids: nsswitch->princ_to_ids returned -2
>>
>> nfs4_gss_princ_to_ids: final return value is -2
>>
>>
>>
>> Relevant entries in my idmap.conf:
>>
>>[General]
>>
>> Domain = ad.domain.com 
>>
>>
>> [Translation]
>>
>> Method = nsswitch
>>
>>
>>
> Did you try putting your domain in all caps in your idmap.conf?
>
> The same setup works if I disable fully qualified domain names
>> from sssd.
>>
>> Is there a way to use other methods like umich_ldap and get Fully
>> qualified AD domain  running with nfs4 ganesha?
>> Can you please list the steps I need to follow on order to do that?
>>
>>
> I'm not personally familiar with using AD.
>
> Frank
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] FD reuse in NFSv3

2018-05-07 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.>> NFS client is a legacy implementation and it is not going to be fixed.

How is it going to work if there is a real network failure? In any case,
the behavior you are describing should never happen? Do you have this bug
in your FSAL?

commit c11eb9e421ac90fb788144dc1144888e6f20f7f0
Author: Malahal Naineni <mala...@us.ibm.com>
Date:   Sun Dec 3 23:59:58 2017 +0530

Prevent OPEN upgrade closing an "fd" while using it.



On Mon, May 7, 2018 at 5:18 PM, You Me <yourindian...@gmail.com> wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> My NFS client does not retry I/O.
> Ganesha is reusing FD instead of opening file for every read request.
> Sometimes it closes the fd even when there is a worker waiting still in
> read() call on the same fd. As a result that worker returns read error and
> ganesha drops the RPC. NFS client does not get any reply and it eventually
> times out the I/O.
>
> I see 2 solutions.
>
> 1. NFS client implement I/O retry
> 2. Ganesha open the file and try read again when file not opened error is
> returned by FSAL.
> 3. Ganesha open the file for every read RPC instead of reusing FD.
>
> NFS client is a legacy implementation and it is not going to be fixed.
> Is there a way to achieve solutions 2 or 3?
>
> --Satish
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Byte range lock length and overflow

2018-04-25 Thread Malahal Naineni
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.knfsd seems to send EINVAL based on a quick look at the code. Assuming it
as end of file (0) may work, but will have to check if all cthon tests pass!

On Wed, Apr 25, 2018 at 4:26 AM, Frank Filz  wrote:

> In NFS v4 and NLM, the lock length type is uint64_t. In many of our
> supported filesystems, lock length is defined as off_t or off64_t which are
> signed quantities.
>
>
>
> The Windows NFS v3 client at least has demonstrated issuing a lock length
> of UINT64_MAX.
>
>
>
> FSAL_GPFS returns an error if lock length is greater than LONG_MAX.
>
>
>
> FSAL_GLUSTER and FSAL_VFS stuff the value into an off64_t, and then
> complains about lock length being less than zero.
>
>
>
> I’m not sure about the other FSALs.
>
>
>
> Returning an error ends up making Windows applications not work which is
> not ideal.
>
>
>
> I think it might be best if inside the FSAL, we just quietly reduce lock
> length to a valid value, though I don’t know if that should be:
>
>
>
> 0 – indicating end of file
>
> INT64_MAX – largest possible length
>
> INT64_MAX – offset – so end of lock doesn’t exceed INT64_MAX
>
> Does anyone have any thoughts?
>
>
>
> Frank
>
> ___
> Devel mailing list -- de...@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-le...@lists.nfs-ganesha.org
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ganesha 2.5 - specifying network ID for allowing specific clients

2018-03-29 Thread Malahal Naineni
CLIENT subblock should be the last one. Keep the other setting at the very
beginning. I am not sure if that is your problem or not but some parts of
the code assume that CLIENT/FSAL blocks are last in the list.

Regards, Malahal.

On Wed, Mar 28, 2018 at 11:46 PM, You Me  wrote:

> EXPORT {
> Export_Id = 26541;
> Path = "/cloud_client3";
> CLIENT {
> Clients = 172.19.109.0/22;
> Access_type = RW;
> }
> CLIENT {
> Clients = 2.2.2.2;
> Access_type = RW;
> }
> CLIENT {
> Clients = 172.19.19.23;
> Access_type = RW;
> }
> Disable_ACL = TRUE;
> Anonymous_uid = 4294967294;
> Anonymous_gid = 4294967294;
> Squash = no_root_squash;
> Pseudo = "/cloud_client3";
> SecType = "sys";
> Protocols = 4;
> }
>
> I am not able to mount the share from NFS client with IP address
> 172.19.109.44.
>
> [root@express ~]#  mount 172.24.25.245:/cloud_client3 /mnt/cloud3
> mount.nfs: access denied by server while mounting 172.24.25.245:
> /cloud_client3
> [root@express ~]# ifconfig
> em1: flags=4163  mtu 1500
> inet 172.19.109.44  netmask 255.255.252.0  broadcast 172.19.111.255
>
> ganesha.log
> 
> 28/03/2018 14:04:41 : epoch 5aba57da : vmclient10 :
> ganesha.nfsd-31628[work-13] nfs4_export_check_access :NFS4 :INFO :NFS4:
> INFO: Access not allowed on Export_Id 26541 /cloud_client3 for client
> :::172.19.
>
> Am I configuring the share correctly?
> -
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfstest_delegation

2018-03-28 Thread Malahal Naineni
Yes, it should be default if the code is stable!

regards, malahal.

On Wed, Mar 28, 2018 at 3:49 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> I see that Patrice hasn't posted here about this problem yet.
>
> Linux client folks say our V2.7-dev delegations aren't working.
>
> At this week's bake-a-thon, Patrice has tried turning it on a
> couple of different ways.  Shouldn't delegations be on by default?
>
> Could we get the nfstest suite added to CI?
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nss_getpwnam: name 't...@my.dom@localdomain' does not map into domain 'nix.my.dom'

2018-03-22 Thread Malahal Naineni
That could be a reason why I thought we need two symbols for a feature. For
example, USE_GPFS_FSAL could be used at cmake command line and GPFS_FSAL
could be used in the option().

Can't we use this option() inside conditionals?

regards, malahal.

On Thu, Mar 22, 2018 at 8:43 PM, Daniel Gryniewicz <d...@redhat.com> wrote:

> I don't think this works because of option().  This defines the value to
> it's default (OFF if no default is given), so the value is always defined.
> We can skip using option, but this will break anyone using any tools to
> automate their cmake.
>
> What we need is for option() to distinguish between using the default
> value and having it configured on.
>
> I can play with this a bit and see if I can get something to work, but it
> will be ugly, since cmake doesn't natively support this.
>
> Daniel
>
> On 03/22/2018 10:50 AM, Malahal Naineni wrote:
>
>> Here is what I wanted. Let us say there is a compilation feature called
>> USE_FSAL_GPFS. I want these possibilities:
>>
>> 1. If I enable this feature at the "cmake command line", enable this. If
>> it can't be enabled due to missing packages, then please fail cmake!
>> 2. If I disable this feature at the "cmake command line", please disable
>> it. This is easy.
>> 3. If I neither enable nor disable at the cmake command line, then it can
>> be auto-enabled if sufficient packages are installed.
>>
>> I am not sure if the following works for what I am thinking of (I added
>> braces for clarity):
>>
>> if (DEFINED USE_FSAL_GPFS) {
>>if (USE_FSAL_GPFS) {
>>  case A: admin wants it. Check headers and libs (or
>> packages). If it can't be enabled, fail.
>>   else () {
>>  case B:  admin doesn't want it
>>   }
>> else () {# not defined by the admin
>>   case C: We want to enable this feature if required packages are
>> installed.
>>   case D:  We don't care, just disable
>> }
>>
>> I don't know if DEFINED keyword works the way I want it though. Note that
>> case A is the only one that fails here.
>>
>> Regards, Malahal.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 22, 2018 at 5:33 PM, Daniel Gryniewicz <d...@redhat.com
>> <mailto:d...@redhat.com>> wrote:
>>
>> So, there is an option STRICT_PACKAGE that is supposed to enable
>> this. It's not fully utilized, but it's mostly there.
>>
>> The problem is that we can't tell whether the default is being used
>> (lots of options are on by default but disable themselves if the
>> packages aren't installed) or if the user explicitly turned them on.
>> CMake doesn't seem to give us that information, that I've found.
>>  So, instead, we have STRICT_PACKAGE, and you'll have to explicitly
>> turn off everything that's on by default but that you don't want.
>>
>> If you know of a better way of doing this, then I'm happy to listen
>> and help implement it.
>>
>> Daniel
>>
>> On 03/22/2018 12:28 AM, Malahal Naineni wrote:
>>
>> If I specify an option on the cmake command line, I would like
>> it to be honoured, if not, simply fail. Today,  cmake only gives
>> a warning if it can't meet my option's requirements. Can some
>> cmake guru fix this first?
>>
>> On Tue, Mar 20, 2018 at 8:38 PM, Daniel Gryniewicz
>> <d...@redhat.com <mailto:d...@redhat.com>
>> <mailto:d...@redhat.com <mailto:d...@redhat.com>>> wrote:
>>
>>  It's probably a good idea to add the build options to
>> --version
>>  output, or something.  That way we can ask for it in these
>> types of
>>  situations.  I've added a card to the wishlist for this.
>>
>>  Daniel
>>
>>  On Tue, Mar 20, 2018 at 9:39 AM, TomK <tomk...@mdevsys.com
>> <mailto:tomk...@mdevsys.com>
>>  <mailto:tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>>>
>>
>> wrote:
>>   > On 3/19/2018 9:54 AM, Frank Filz wrote:
>>   >>>
>>   >>> Solved.
>>   >>>
>>   >>> Here's the solution in case it can help someone else.
>>   >>>
>>   >>> To get a certain feature in NFS Ganesha, I had to
>> compile the V2.6
>>   >&

Re: [Nfs-ganesha-devel] nss_getpwnam: name 't...@my.dom@localdomain' does not map into domain 'nix.my.dom'

2018-03-22 Thread Malahal Naineni
Here is what I wanted. Let us say there is a compilation feature called
USE_FSAL_GPFS. I want these possibilities:

1. If I enable this feature at the "cmake command line", enable this. If it
can't be enabled due to missing packages, then please fail cmake!
2. If I disable this feature at the "cmake command line", please disable
it. This is easy.
3. If I neither enable nor disable at the cmake command line, then it can
be auto-enabled if sufficient packages are installed.

I am not sure if the following works for what I am thinking of (I added
braces for clarity):

if (DEFINED USE_FSAL_GPFS) {
  if (USE_FSAL_GPFS) {
case A: admin wants it. Check headers and libs (or
packages). If it can't be enabled, fail.
 else () {
case B:  admin doesn't want it
 }
else () {# not defined by the admin
 case C: We want to enable this feature if required packages are
installed.
 case D:  We don't care, just disable
}

I don't know if DEFINED keyword works the way I want it though. Note that
case A is the only one that fails here.

Regards, Malahal.







On Thu, Mar 22, 2018 at 5:33 PM, Daniel Gryniewicz <d...@redhat.com> wrote:

> So, there is an option STRICT_PACKAGE that is supposed to enable this.
> It's not fully utilized, but it's mostly there.
>
> The problem is that we can't tell whether the default is being used (lots
> of options are on by default but disable themselves if the packages aren't
> installed) or if the user explicitly turned them on. CMake doesn't seem to
> give us that information, that I've found.  So, instead, we have
> STRICT_PACKAGE, and you'll have to explicitly turn off everything that's on
> by default but that you don't want.
>
> If you know of a better way of doing this, then I'm happy to listen and
> help implement it.
>
> Daniel
>
> On 03/22/2018 12:28 AM, Malahal Naineni wrote:
>
>> If I specify an option on the cmake command line, I would like it to be
>> honoured, if not, simply fail. Today,  cmake only gives a warning if it
>> can't meet my option's requirements. Can some cmake guru fix this first?
>>
>> On Tue, Mar 20, 2018 at 8:38 PM, Daniel Gryniewicz <d...@redhat.com
>> <mailto:d...@redhat.com>> wrote:
>>
>> It's probably a good idea to add the build options to --version
>> output, or something.  That way we can ask for it in these types of
>> situations.  I've added a card to the wishlist for this.
>>
>> Daniel
>>
>> On Tue, Mar 20, 2018 at 9:39 AM, TomK <tomk...@mdevsys.com
>> <mailto:tomk...@mdevsys.com>> wrote:
>>  > On 3/19/2018 9:54 AM, Frank Filz wrote:
>>  >>>
>>  >>> Solved.
>>  >>>
>>  >>> Here's the solution in case it can help someone else.
>>  >>>
>>  >>> To get a certain feature in NFS Ganesha, I had to compile the
>> V2.6
>>  >>> release from source.  When configuring to compile, idmapd
>> support got
>>  >>> disabled since packages were missing:
>>  >>>
>>  >>> libnfsidmap-devel-0.25-17.el7.x86_64
>>  >>>
>>  >>> Installed the above package and recompiled with nfsidmap
>> support enabled
>>  >>> and this issue went away.  Users now show up properly off the
>> NFS mount
>>  >>> on clients.
>>  >>
>>  >>
>>  >> Oh, well that was a simple fix :-)
>>  >>
>>  >> I wonder if we could make changes in our cmake files to make it
>> easier to
>>  >> see when stuff got left out due to missing packages? I've been
>> caught out
>>  >> myself.
>>  >>
>>  >> Frank
>>  >>
>>  > Yep, sure was an easy fix.
>>  >
>>  > Wouldn't mind seeing that.  Maybe even a way to find out what
>> options went
>>  > into compiling packages for each distro.
>>  >
>>  >
>>  > --
>>  > Cheers,
>>  > Tom K.
>>  >
>> 
>> -
>>  >
>>  > Living on earth is expensive, but it includes a free trip around
>> the sun.
>>  >
>>
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nss_getpwnam: name 't...@my.dom@localdomain' does not map into domain 'nix.my.dom'

2018-03-21 Thread Malahal Naineni
If I specify an option on the cmake command line, I would like it to be
honoured, if not, simply fail. Today,  cmake only gives a warning if it
can't meet my option's requirements. Can some cmake guru fix this first?

On Tue, Mar 20, 2018 at 8:38 PM, Daniel Gryniewicz  wrote:

> It's probably a good idea to add the build options to --version
> output, or something.  That way we can ask for it in these types of
> situations.  I've added a card to the wishlist for this.
>
> Daniel
>
> On Tue, Mar 20, 2018 at 9:39 AM, TomK  wrote:
> > On 3/19/2018 9:54 AM, Frank Filz wrote:
> >>>
> >>> Solved.
> >>>
> >>> Here's the solution in case it can help someone else.
> >>>
> >>> To get a certain feature in NFS Ganesha, I had to compile the V2.6
> >>> release from source.  When configuring to compile, idmapd support got
> >>> disabled since packages were missing:
> >>>
> >>> libnfsidmap-devel-0.25-17.el7.x86_64
> >>>
> >>> Installed the above package and recompiled with nfsidmap support
> enabled
> >>> and this issue went away.  Users now show up properly off the NFS mount
> >>> on clients.
> >>
> >>
> >> Oh, well that was a simple fix :-)
> >>
> >> I wonder if we could make changes in our cmake files to make it easier
> to
> >> see when stuff got left out due to missing packages? I've been caught
> out
> >> myself.
> >>
> >> Frank
> >>
> > Yep, sure was an easy fix.
> >
> > Wouldn't mind seeing that.  Maybe even a way to find out what options
> went
> > into compiling packages for each distro.
> >
> >
> > --
> > Cheers,
> > Tom K.
> > 
> -
> >
> > Living on earth is expensive, but it includes a free trip around the sun.
> >
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nss_getpwnam: name 't...@my.dom@localdomain' does not map into domain 'nix.my.dom'

2018-03-07 Thread Malahal Naineni
>> Tried identical ifmapd.conf files on client and server but rpcidmapd
tries to start the local copy of nfsd on the nfs Ganesha servers but that
competes with

NFS Ganesha doesn't need rpcidmapd daemon running. So refrain from running
the idmapd daemon. Ganesha uses idmapd libraries, so you should be good as
long as you have the libraries installed (part of the nfs-utils package on
RHEL, I think).

Regards, Malahal.

On Tue, Mar 6, 2018 at 9:15 PM, Tom  wrote:

> t...@my.dom is an ad user.   Nix.my.dom is a subdomain managed freeipa.
>
> Tried identical ifmapd.conf files on client and server but rpcidmapd tries
> to start the local copy of nfsd on the nfs Ganesha servers but that
> competes with nfs-Ganesha and won’t bind on port 2049.  So I need to change
> the port for the old nfs to 12049 etc to get the old nfs started so
> rpcidmapd can start on the Ganesha nfs servers.  They made it a dependency.
>
> That’s when things get messy.   I may try to uninstall the built in nfs
> packages but not sure if they will also pull out the rpcidmapd ones too.
>
> Cheers,
> Tom
>
> Sent from my iPhone
>
> > On Mar 6, 2018, at 9:00 AM, Daniel Gryniewicz  wrote:
> >
> > Based on the error messages, you client is not sending t...@nix.my.dom
> but is sending t...@my.dom@localdomain.  Something is mis-configured on
> the client.  Have you tried having identical (including case) idmapd.conf
> files on both the client and server?
> >
> > Idmap configuration has historically be very picky and hard to set up,
> and I'm far from an expert on it.
> >
> > Daniel
> >
> >> On 03/06/2018 08:24 AM, TomK wrote:
> >> Hey Guy's,
> >> Getting below message which in turn fails to list proper UID / GID on
> NFSv4 mounts from within an unprivileged account. All files show up with
> owner and group as nobody / nobody when viewed from the client.
> >> Wondering if anyone saw this and what the solution could be here?
> >> If not the right list, let me know please.
> >> [root@client01 etc]# cat /etc/idmapd.conf|grep -v "#"| sed -e "/^$/d"
> >> [General]
> >> Verbosity = 7
> >> Domain = nix.my.dom
> >> [Mapping]
> >> [Translation]
> >> [Static]
> >> [UMICH_SCHEMA]
> >> LDAP_server = ldap-server.local.domain.edu
> >> LDAP_base = dc=local,dc=domain,dc=edu
> >> [root@client01 etc]#
> >> Mount looks like this:
> >> nfs-c01.nix.my.dom:/n/my.dom on /n/my.dom type nfs4
> (rw,relatime,vers=4.0,rsize=8192,wsize=8192,namlen=255,
> hard,proto=tcp,port=0,timeo=10,retrans=2,sec=sys,clientaddr=192.168.0.236,
> local_lock=none,addr=192.168.0.80) /var/log/messages
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: key: 0x3f2c257b type: uid
> value: t...@my.dom@localdomain timeout 600
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nfs4_name_to_uid: calling
> nsswitch->name_to_uid
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nss_getpwnam: name 
> >> 't...@my.dom@localdomain'
> domain 'nix.my.dom': resulting localname '(null)'
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nss_getpwnam: name 
> >> 't...@my.dom@localdomain'
> does not map into domain 'nix.my.dom'
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nfs4_name_to_uid:
> nsswitch->name_to_uid returned -22
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nfs4_name_to_uid: final
> return value is -22
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nfs4_name_to_uid: calling
> nsswitch->name_to_uid
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nss_getpwnam: name
> 'nob...@nix.my.dom' domain 'nix.my.dom': resulting localname 'nobody'
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nfs4_name_to_uid:
> nsswitch->name_to_uid returned 0
> >> Mar  6 00:17:27 client01 nfsidmap[14396]: nfs4_name_to_uid: final
> return value is 0
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: key: 0x324b0048 type: gid
> value: t...@my.dom@localdomain timeout 600
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: nfs4_name_to_gid: calling
> nsswitch->name_to_gid
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: nfs4_name_to_gid:
> nsswitch->name_to_gid returned -22
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: nfs4_name_to_gid: final
> return value is -22
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: nfs4_name_to_gid: calling
> nsswitch->name_to_gid
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: nfs4_name_to_gid:
> nsswitch->name_to_gid returned 0
> >> Mar  6 00:17:27 client01 nfsidmap[14398]: nfs4_name_to_gid: final
> return value is 0
> >> Mar  6 00:17:31 client01 systemd-logind: Removed session 23.
> >> Result of:
> >> systemctl restart rpcidmapd
> >> /var/log/messages
> >> ---
> >> Mar  5 23:46:12 client01 systemd: Stopping Automounts filesystems on
> demand...
> >> Mar  5 23:46:13 client01 systemd: Stopped Automounts filesystems on
> demand.
> >> Mar  5 23:48:51 client01 systemd: Stopping NFSv4 ID-name mapping
> service...
> >> Mar  5 23:48:51 client01 systemd: Starting Preprocess NFS
> configuration...
> >> Mar  5 23:48:51 client01 systemd: Started Preprocess NFS configuration.
> >> Mar  

Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance

2018-02-18 Thread Malahal Naineni
Dan, only the decoder threads are per export and ACKs are sent by the TCP
layer itself. I am pretty sure clients do send multiple requests on the
same socket (we have seen it exceed the default 512 after which we drop
requests) and we use multiple worker threads.

Regards, Malahal.

On Wed, Feb 14, 2018 at 7:02 PM, Daniel Gryniewicz  wrote:

> How many clients are you using?  Each client op can only (currently) be
> handled in a single thread, and client's won't send more ops until the
> current one is ack'd, so Ganesha can basically only parallelize on a
> per-client basis at the moment.
>
> I'm sure there are locking issues; so far we've mostly worked on
> correctness rather than performance.  2.6 has changed the threading model a
> fair amount, and 2.7 will have more improvements, but it's a slow process.
>
> Daniel
>
> On 02/13/2018 06:38 PM, Deepak Jagtap wrote:
>
>> Thanks Daniel!
>>
>> Yeah user-kernel context switching is definitely adding up latency, but I
>> wonder ifrpc or some locking overhead is also in the picture.
>>
>> With 70% read 30% random workload nfs ganesha CPU usage was close to 170%
>> while remaining 2 cores were pretty much unused (~18K IOPS, latency ~8ms)
>>
>> With 100% read 30% random nfs ganesha CPU usage ~250% ( ~50K IOPS,
>> latency ~2ms).
>>
>>
>> -Deepak
>>
>> 
>> *From:* Daniel Gryniewicz 
>> *Sent:* Tuesday, February 13, 2018 6:15:47 AM
>> *To:* nfs-ganesha-devel@lists.sourceforge.net
>> *Subject:* Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance
>> Also keep in mind that FSAL VFS can never, by it's very nature, beat
>> knfsd, since it has to do everything knfsd does, but also has userspace
>> <-> kernespace transitions.  Ganesha's strength is exporting
>> userspace-based cluster filesystems.
>>
>> That said, we're always working to make Ganesha faster, and I'm sure
>> there's gains to be made, even in these circumstances.
>>
>> Daniel
>>
>>
>> On 02/12/2018 07:01 PM, Deepak Jagtap wrote:
>>
>>> Hey Guys,
>>>
>>>
>>> I ran few performance tests to compare nfs gansha and nfs kernel server
>>> and noticed significant difference.
>>>
>>>
>>> Please find my test result:
>>>
>>>
>>> SSD formated with EXT3 exported using nfs ganesha  : ~18K IOPSAvg
>>> latency: ~8ms   Throughput: ~60MBPS
>>>
>>> same directory exported using nfs kernel server: ~75K IOPS
>>> Avg latency: ~0.8ms Throughput: ~300MBPS
>>>
>>>
>>> nfs kernel and nfs ganesha both of them are configured with 128
>>> worker threads. nfs ganesha is configured with VFS FSAL.
>>>
>>>
>>> Am I missing something major in nfs ganesha config or this is expected
>>> behavior.
>>>
>>> Appreciate any inputs as how the performance can be improved for nfs
>>> ganesha.
>>>
>>>
>>>
>>> Please find following ganesha config file that I am using:
>>>
>>>
>>> NFS_Core_Param
>>> {
>>>   Nb_Worker = 128 ;
>>> }
>>>
>>> EXPORT
>>> {
>>>   # Export Id (mandatory, each EXPORT must have a unique Export_Id)
>>>  Export_Id = 77;
>>>  # Exported path (mandatory)
>>>  Path = /host/test;
>>>  Protocols = 3;
>>>  # Pseudo Path (required for NFS v4)
>>>  Pseudo = /host/test;
>>>  # Required for access (default is None)
>>>  # Could use CLIENT blocks instead
>>>  Access_Type = RW;
>>>  # Exporting FSAL
>>>  FSAL {
>>>   Name = VFS;
>>>  }
>>>  CLIENT
>>>  {
>>>   Clients = *;
>>>   Squash = None;
>>>   Access_Type = RW;
>>>  }
>>> }
>>>
>>>
>>>
>>> Thanks & Regards,
>>>
>>> Deepak
>>>
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>
>>>
>>>
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high open_fd_count

2018-02-18 Thread Malahal Naineni
See https://review.gerrithub.io/#/c/391267/ for GPFS fsal. You could do
something similar to VFS fsal if you are using VFS fsal.

Regards, Malahal.

On Thu, Feb 15, 2018 at 1:19 AM, bharat singh <bharat064...@gmail.com>
wrote:

> Yeah, that worked and I don't see this going below -1. So initializing it
> to a non-zero value have avoided this for now.
>
> But I still see the 4k fd limit being exhausted after 24hrs of IO. My
> setup currently shows open_fd_count=13k but there are only 30 files.
> # ls -al /proc/25832/fd | wc -l
> 559
>
> Also /proc won't give any clue. So I still believe there are more leaks to
> this counter than the one I saw in fsal_rdwr()
> Regarding the proper fix, when would it be available for us to try it out.
>
>
> On Mon, Feb 12, 2018 at 10:10 AM, Malahal Naineni <mala...@gmail.com>
> wrote:
>
>> Technically you should use atomic fetch to read it , at least on some
>> archs. Also your assertion might not be hit even if the atomic ops are
>> working right. In fact, they better be working correctly.
>>
>> As an example, say it is 1 and both threads check for assertion. Then
>> both threads decrement and the end value would be -1.  If you want to catch
>> in an assert, then please use the return value of the atomic decrement
>> operation for the assertion.
>>
>>
>>
>> On Mon, Feb 12, 2018 at 9:55 PM bharat singh <bharat064...@gmail.com>
>> wrote:
>>
>>> Yeah. Looks like lock-free updates to open_fd_count is creating the
>>> issue.
>>> There is no double close, as I couldn’t hit the assert(open_fd_count >
>>> 0) I have added before the decrements.
>>>
>>> And once it hits this state, it ping-pongs between 0 & ULLONG_MAX.
>>>
>>> So as a workaround I have intitalized open_fd_count = >> thds> to avoid these racey decrements. I haven’t seen the warnings after
>>> this change over a couple of hours of testing.
>>>
>>>
>>>
>>> [work-162] fsal_open :FSAL :CRIT :before increment open_fd_count0
>>> [work-162] fsal_open :FSAL :CRIT :after increment open_fd_count1
>>> [work-128] fsal_close :FSAL :CRIT :before decrement open_fd_count1
>>> [work-128] fsal_close :FSAL :CRIT :after decrement open_fd_count0
>>> [work-153] fsal_open :FSAL :CRIT :before increment open_fd_count0
>>> [work-153] fsal_open :FSAL :CRIT :after increment open_fd_count1
>>> [work-153] fsal_close :FSAL :CRIT :before decrement open_fd_count1
>>> [work-162] fsal_close :FSAL :CRIT :before decrement open_fd_count1
>>> [work-153] fsal_close :FSAL :CRIT :after decrement open_fd_count0
>>> [work-162] fsal_close :FSAL :CRIT :after decrement
>>> open_fd_count18446744073709551615
>>> [work-148] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit
>>> Exceeded.  Disabling FD Cache and waking LRU thread.
>>> open_fd_count=18446744073709551615, fds_hard_limit=4055
>>>
>>> [work-111] fsal_open :FSAL :CRIT :before increment
>>> open_fd_count18446744073709551615
>>> [work-111] fsal_open :FSAL :CRIT :after increment open_fd_count0
>>> [cache_lru] lru_run :INODE LRU :EVENT :Re-enabling FD cache.
>>> [work-111] fsal_close :FSAL :CRIT :before decrement open_fd_count0
>>> [work-111] fsal_close :FSAL :CRIT :after decrement
>>> open_fd_count18446744073709551615
>>>
>>> -bharat
>>>
>>> On Sun, Feb 11, 2018 at 10:32 PM, Frank Filz <ffilz...@mindspring.com>
>>> wrote:
>>>
>>>> Yea, open_fd_count is broken…
>>>>
>>>>
>>>>
>>>> We have been working on the right way to fix it.
>>>>
>>>>
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>> *From:* bharat singh [mailto:bharat064...@gmail.com]
>>>> *Sent:* Saturday, February 10, 2018 7:42 PM
>>>> *To:* Malahal Naineni <mala...@gmail.com>
>>>> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
>>>> *Subject:* Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high
>>>> open_fd_count
>>>>
>>>>
>>>>
>>>> Hey,
>>>>
>>>>
>>>>
>>>> I think there is a leak in open_fd_count.
>>>>
>>>>
>>>>
>>>> fsal_rdwr() uses fsal_open() to open the file, but uses
>>>> obj->obj_ops.close(obj) to close the file and there is no decrement of
>>>> open_fd_count.
>>>>
>>>> So this counter keeps increasin

Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance

2018-02-15 Thread Malahal Naineni
As Bill said, it is not applicable to V2.6. It is there in V2.5 (yes,
please see src/config_samples/config.txt in that version for details).

On Wed, Feb 14, 2018 at 4:50 AM, Deepak Jagtap <deepak.jag...@maxta.com>
wrote:

> Thanks Malahal, William!
>
>
> Tried both v2.5-stable and 2.6 (next branch).
>
> Noticed marginal improvement, ~19K IOPS with 2.6 compared to ~18K IOPS
> with 2.5.
>
> Couldn't find anything in the 2.6 source with name 'Dispatch_Max_Reqs_Xprt
> '? Is this configurable from config file?
>
>
> Regards,
>
> Deepak
> --
> *From:* William Allen Simpson <william.allen.simp...@gmail.com>
> *Sent:* Tuesday, February 13, 2018 4:38:11 AM
> *To:* Malahal Naineni; Matt Benjamin
> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance
>
> On 2/13/18 1:21 AM, Malahal Naineni wrote:
> > If your latency is high, then you most likely need to
> change Dispatch_Max_Reqs_Xprt. What your Dispatch_Max_Reqs_Xprt value?
> >
> That shouldn't do anything anymore in V2.6, other than 9P.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high open_fd_count

2018-02-12 Thread Malahal Naineni
Technically you should use atomic fetch to read it , at least on some
archs. Also your assertion might not be hit even if the atomic ops are
working right. In fact, they better be working correctly.

As an example, say it is 1 and both threads check for assertion. Then both
threads decrement and the end value would be -1.  If you want to catch in
an assert, then please use the return value of the atomic decrement
operation for the assertion.



On Mon, Feb 12, 2018 at 9:55 PM bharat singh <bharat064...@gmail.com> wrote:

> Yeah. Looks like lock-free updates to open_fd_count is creating the
> issue.
> There is no double close, as I couldn’t hit the assert(open_fd_count > 0)
> I have added before the decrements.
>
> And once it hits this state, it ping-pongs between 0 & ULLONG_MAX.
>
> So as a workaround I have intitalized open_fd_count = 
> to avoid these racey decrements. I haven’t seen the warnings after this
> change over a couple of hours of testing.
>
>
>
> [work-162] fsal_open :FSAL :CRIT :before increment open_fd_count0
> [work-162] fsal_open :FSAL :CRIT :after increment open_fd_count1
> [work-128] fsal_close :FSAL :CRIT :before decrement open_fd_count1
> [work-128] fsal_close :FSAL :CRIT :after decrement open_fd_count0
> [work-153] fsal_open :FSAL :CRIT :before increment open_fd_count0
> [work-153] fsal_open :FSAL :CRIT :after increment open_fd_count1
> [work-153] fsal_close :FSAL :CRIT :before decrement open_fd_count1
> [work-162] fsal_close :FSAL :CRIT :before decrement open_fd_count1
> [work-153] fsal_close :FSAL :CRIT :after decrement open_fd_count0
> [work-162] fsal_close :FSAL :CRIT :after decrement
> open_fd_count18446744073709551615
> [work-148] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit
> Exceeded.  Disabling FD Cache and waking LRU thread.
> open_fd_count=18446744073709551615, fds_hard_limit=4055
>
> [work-111] fsal_open :FSAL :CRIT :before increment
> open_fd_count18446744073709551615
> [work-111] fsal_open :FSAL :CRIT :after increment open_fd_count0
> [cache_lru] lru_run :INODE LRU :EVENT :Re-enabling FD cache.
> [work-111] fsal_close :FSAL :CRIT :before decrement open_fd_count0
> [work-111] fsal_close :FSAL :CRIT :after decrement
> open_fd_count18446744073709551615
>
> -bharat
>
> On Sun, Feb 11, 2018 at 10:32 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
>
>> Yea, open_fd_count is broken…
>>
>>
>>
>> We have been working on the right way to fix it.
>>
>>
>>
>> Frank
>>
>>
>>
>> *From:* bharat singh [mailto:bharat064...@gmail.com]
>> *Sent:* Saturday, February 10, 2018 7:42 PM
>> *To:* Malahal Naineni <mala...@gmail.com>
>> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
>> *Subject:* Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high
>> open_fd_count
>>
>>
>>
>> Hey,
>>
>>
>>
>> I think there is a leak in open_fd_count.
>>
>>
>>
>> fsal_rdwr() uses fsal_open() to open the file, but uses
>> obj->obj_ops.close(obj) to close the file and there is no decrement of
>> open_fd_count.
>>
>> So this counter keeps increasing and I could easily hit the 4k hard limit
>> with prolonged read/writes.
>>
>>
>>
>> I changed it to use fsal_close() as it also does the decrement. After
>> this change the open_fd_count was looking OK.
>>
>> But recently I saw open_fd_count being underflown to
>> open_fd_count=18446744073709551615
>>
>>
>>
>> So i am suspecting a double close. Any suggestions ?
>>
>>
>>
>>  Code snippet from // V2.5-stable/src/FSAL/fsal_helper.c
>>
>> fsal_status_t fsal_rdwr(struct fsal_obj_handle *obj,
>>
>> fsal_io_direction_t io_direction,
>>
>> uint64_t offset, size_t io_size,
>>
>> size_t *bytes_moved, void *buffer,
>>
>> bool *eof,
>>
>> bool *sync, struct io_info *info)
>>
>> {
>>
>> ...
>>
>>   loflags = obj->obj_ops.status(obj);
>>
>>   while ((!fsal_is_open(obj))
>>
>>  || (loflags && loflags != FSAL_O_RDWR && loflags !=
>> openflags)) {
>>
>>   loflags = obj->obj_ops.status(obj);
>>
>>   if ((!fsal_is_open(obj))
>>
>>   || (loflags && loflags != FSAL_O_RDWR
>>
>>   && loflags != openflags)) {
>>
>>  

Re: [Nfs-ganesha-devel] Correct initialization sequence

2018-01-30 Thread Malahal Naineni
Looking at the code, dupreq2_pkginit() only depends on Ganesha config
processing to initialize few things, so it should be OK to call anytime
after Ganesha config processing.

Regards, Malahal.

On Wed, Jan 31, 2018 at 8:00 AM, Pradeep  wrote:

> Hi Bill,
>
> Is it ok to move dupreq2_pkginit() before nfs_Init_svc() so that we won't
> hit the crash below?
>
> #0  0x7fb54dd7923b in raise () from /lib64/libpthread.so.0
> #1  0x00442ebd in crash_handler (signo=11, info=0x7fb546efc430,
> ctx=0x7fb546efc300) at /usr/src/debug/nfs-ganesha-2.
> 6-rc2/MainNFSD/nfs_init.c:263
> #2  
> #3  0x004de670 in nfs_dupreq_get_drc (req=0x7fb546422800) at
> /usr/src/debug/nfs-ganesha-2.6-rc2/RPCAL/nfs_dupreq.c:579
> #4  0x004e00bf in nfs_dupreq_start (reqnfs=0x7fb546422800,
> req=0x7fb546422800) at /usr/src/debug/nfs-ganesha-2.
> 6-rc2/RPCAL/nfs_dupreq.c:1011
> #5  0x00457825 in nfs_rpc_process_request (reqdata=0x7fb546422800)
> at /usr/src/debug/nfs-ganesha-2.6-rc2/MainNFSD/nfs_worker_thread.c:852
> #6  0x004599a7 in nfs_rpc_valid_NFS (req=0x7fb546422800) at
> /usr/src/debug/nfs-ganesha-2.6-rc2/MainNFSD/nfs_worker_thread.c:1555
>
> (gdb) print drc_st
> $1 = (struct drc_st *) 0x0
> (gdb) print nfs_init.init_complete
> $2 = false
>
> On Tue, Jan 30, 2018 at 1:39 PM, Matt Benjamin 
> wrote:
>
>> reordering, I hope
>>
>> Matt
>>
>> On Tue, Jan 30, 2018 at 1:40 PM, Pradeep  wrote:
>> > Hello,
>> >
>> > It is possible to receive requests anytime after nfs_Init_svc() is
>> > completed. We initialize several things in nfs_Init() after this. This
>> could
>> > lead to processing of incoming requests racing with the rest of
>> > initialization (ex: dupreq2_pkginit()). Is it possible to re-order
>> > nfs_Init_svc() so that rest of ganesha is ready to process requests as
>> soon
>> > as we start listing on the NFS port? Another way is to return
>> NFS4ERR_DELAY
>> > until 'nfs_init.init_complete' is true. Any thoughts?
>> >
>> >
>> > Thanks,
>> > Pradeep
>> >
>> > 
>> --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > ___
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> >
>>
>>
>>
>> --
>>
>> Matt Benjamin
>> Red Hat, Inc.
>> 315 West Huron Street, Suite 140A
>> Ann Arbor, Michigan 48103
>>
>> http://www.redhat.com/en/technologies/storage
>>
>> tel.  734-821-5101
>> fax.  734-769-8938
>> cel.  734-216-5309
>>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Announce release of V2.5.5 (V2.5-stable)

2018-01-27 Thread Malahal Naineni
Hi, I just pushed V2.5.5 tag that includes many bug fixes from V2.6-rc1 tag.

Highlights


* File descriptor leaks
* Sending invalid attributes
* Ref count leaks
* Cookie collisions in mdcache
* ABBA deadlocks
* Memory leaks

And many more

Regards, Malahal.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] V2.5.5 pre-release

2018-01-25 Thread Malahal Naineni
Folks, V2.5.5 is created with a bunch of new commits from V2.6-rc1. Make
sure it works with your fsals. I will push it very soon, so any issues
please report to this mailing list ASAP.

The pre-release branch is in my personal github account as below:

repo: https://github.com/malahal/nfs-ganesha.git
branch: V2.5-stable
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Patches not backported to V2.5-stable

2018-01-04 Thread Malahal Naineni
Hmm, you listed patches from very old tags as well. V2.5 should have only
patches that fix defects. We should NOT be adding new features, clean up
works etc from V2.6 to V2.5-stable. For example, you listed gerrirt
change-id . It implemented a new
feature and it is buggy as well, so no point in taking it.

I would like people to identify real bugs they fixed, and that should be
the list to be backported to V2.5.  Occasionally, folks may need a new
feature that is NOT intrusive, we could backport such kind as well.

Regards, Malahal.

On Fri, Jan 5, 2018 at 9:01 AM, Frank Filz  wrote:

> I did some work with my script for extracting patch titles and change ids
> and with some manual work on the output, produced the following list of
> patches not included in V2.5-stable (the V2.6 tags are included to help
> identify when the patches arrived):
>
> > I0ccf28339b9296115520a5d54538f02df5ee0089 V2.6-dev.22
> > I1425e3c3246ccd3fa13bd07e06069677b71abcfa NFS: don't trash stateid when
> returning error on FREE_STATEID
> > I789b76f8c5c5a158b846281d1c4491d3ccde538c Pullup NTIRPC through #98
> > Icc9523912e1e3653c3b40f83e0f603349c5e2a8f Consolidate 9P queues and
> workers
> > I367b7e9e2e51f8980d1296dfee50b6c847cd0ad2 FSAL_CEPH: no need to set
> credentials
> > I123dab91583379d191933363c7e99b0946a6b913 config_samples: fix config
> block
> examples
> > I767785d9615e2d9216bec2e4a47a72caaa2cc14d FSAL_GLUSTER : add support for
> rdma volumes
> > Id272de01fce18a19262426891a94ca66072ba232 Fix revoke_owner_layouts
> accessing uninitialized op_ctx
> > I9975962ad441c33302a41301cf4ef53f92737418 glist: preserve the order when
> two items have same priority
> > I95aa5269ecbd4883b1c9a9ea6d1f471b58a3e41a FSAL_GLUSTER: close fd without
> setting credentials at handle_release()
> > I2db688224f44e0e5ad390a643b8a0732eb77a7a4 nfs4 - Add missing put_ref in
> OP_FREE_STATEID
> > I039a5558e1e0bd845bed74a9158f3c732097463e MDCACHE - Fix stacking over
> NULL
> > Ia48857fddab0a334d3c3a815a677745dc6f7d51c NFS4.1 - Allow client to
> specifiy slot count
> > I1f012b50b7ad5f7e5d214072c7041d8a4f649b3a NFS4 STATE: Fixup export (and
> obj) refcounts for layout and delegation
> > Ia87a41cc6ed38659b45fe51dc38153c6ecef547f NFS4 STATE: Fixup export
> refcounts for lock and open states
> > Icc4f17e0a39498f8f07bf828212dea3c7c5ba19c MDCACHE and VFS: Improve debug
> of export release and don't crash
> > Icb6a7b682fd6fa3039c7968dafac6bf0328af98b Improve debug of export
> refcounts
> > I3022e83a8f30987c1429c1d61df450f161a3af6e V2.6-dev.21
> > Ib93b78cf68347a9cd5e39cfa98ff4deba40ddd45 TEST and TOOLS: Cleanup new
> checkpatch errors
> > Ifc595d60f52ef38cbb957322f096b7ddd9fd8619 SUPPORT: Cleanup new
> checkpatch
> errors
> > Ic0e034732f4e29e9e01d0c6fad49c2ecd48c380d LOG: Cleanup new checkpatch
> errors
> > Ide72a52987de228535c07cf5638da2d8632181d6 HASHTABLE: Cleanup new
> checkpatch errors
> > I0c996e15f70a351a76ad9108a29a937fd97c8d0a DBUS: Cleanup new checkpatch
> errors
> > I2fab11037ae8543e81a2edd4a2294ceb609a9cb9 SAL: Cleanup new checkpatch
> errors
> > I8d866f47efdcc04de0fa35ded62d246eb0c6cb82 RPCAL: Cleanup new checkpatch
> errors
> > I9115d5b8a7949b132b956639d79e2f096dcf1808 RQUOTA: Cleanup new checkpatch
> errors
> > I5366af45e9b2acaf3d4d1690ce27b7ce76046b12 NLM: Cleanup new checkpatch
> errors
> > I75bd17c2cd20431b0c36aedbddd85cceb8c6c4ab NFS: Cleanup new checkpatch
> errors
> > I2dd4e39d115449bd855f77187f9d98e357368eb2 NFS4: Cleanup new checkpatch
> errors
> > I8d78ffc75fc254b4ba016bd35348ab4ef906badd NFS3: Cleanup new checkpatch
> errors
> > Idc725fe5915dabb8e301b1bd4e0a4b91b1f4fc2f MNT: Cleanup new checkpatch
> errors
> > I481a4cbe21b5623f20f623ab30b04b23259e4918 9P: Cleanup new checkpatch
> errors
> > I5e7b9fcd169387eb4929067e135738666f12adca MainNFSD: Cleanup new
> checkpatch
> errors
> > I6efa95690dbee6d1af9de67df31d76932a426e24 FSAL and FSAL_UP: Cleanup new
> checkpatch errors
> > Ieb84f4fef3df09fbcc9a16cdb57ea94cfaa4325b NULL: Cleanup new checkpatch
> errors
> > I744df098a9b4fc75d15e0044f6555a1e07d51df9 MDCACHE: Cleanup new
> checkpatch
> errors
> > If42b81b3abcfde4561a0303a9a3521f2da55885a RGW: Cleanup new checkpatch
> errors
> > I939f5faa5339f0d5fef89fa1e0d91d7cbefade2e PROXY: Cleanup new checkpatch
> errors
> > I759c57121457c8fa4186b95c4480517c0d973de3 VFS: Cleanup new checkpatch
> errors
> > I34a1668d8d4216e60e46a1b11e41fbf8c79bc482 GPFS: Clean up new checkpatch
> errors
> > Ib47c7c5e256ff3e7d6dd70fae486eb2cba704534 GLUSTER: Clean up new
> checkpatch
> errors
> > I364aff7db1a143e554a07560f1878a56a372a9c7 CEPH: Cleanup new checkpatch
> errors
> > I42b182912722bb7704cf20ebb93e0d5e0ab7a5df Update checkpatch.pl from
> kernel
> v4.15-rc2
> > I51b00df253f7e63edfa2d85a649632a4bae1d9fa V2.6-dev.20
> > Iff945dbc5a645b0fd1bd8474f88d74bff49430bc ntirpc pullup - fix leak
> > Id8c66de80e6b998653878bfefcfcd22f74789dd9 NFS4.1 - Make the slot table
> size configurable
> > I178733cd95bb27f3875e802578d4a7f02844daca CMake - Allow 

Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high open_fd_count

2018-01-02 Thread Malahal Naineni
The links I gave you will have everything you need. You should be able to
download gerrit reviews by "git review -d " or download from the
gerrit web gui.

"390496" is merged upstream, but the other one is not merged yet.

$ git log --oneline --grep='Fix closing global file descriptors' origin/next
5c2efa8f0 Fix closing global file descriptors





On Tue, Jan 2, 2018 at 3:22 AM, bharat singh <bharat064...@gmail.com> wrote:

> Thanks Malahal
>
> Can you point me to these issues/fixes. I will try to patch V2.5-stable
> and run my tests.
>
> Thanks,
> Bharat
>
> On Mon, Jan 1, 2018 at 10:20 AM, Malahal Naineni <mala...@gmail.com>
> wrote:
>
>> >> I see that mdcache keeps growing beyond the high water mark and lru
>> reclamation can’t keep up.
>>
>> mdcache is different from "FD" cache. I don't think we found an issue
>> with mdcache itself. We found couple of issues with "FD cache"
>>
>> 1) https://review.gerrithub.io/#/c/391266/
>> 2) https://review.gerrithub.io/#/c/390496/
>>
>> Neither of them are in V2.5-stable at this point. We will have to
>> backport these and others soon.
>>
>> Regards, Malahal.
>>
>> On Mon, Jan 1, 2018 at 11:04 PM, bharat singh <bharat064...@gmail.com>
>> wrote:
>>
>>> Adding nfs-ganesha-support..
>>>
>>>
>>> On Fri, Dec 29, 2017 at 11:01 AM, bharat singh <bharat064...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> I am testing NFSv3 Ganesha implementation against nfstest_io tool. I
>>>> see that mdcache keeps growing beyond the high water mark and lru
>>>> reclamation can’t keep up.
>>>>
>>>>
>>>> [cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded.  The LRU
>>>> thread is unable to make progress in reclaiming FDs.  Disabling FD cache.
>>>>
>>>> mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark,
>>>> waking LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
>>>> lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
>>>>
>>>>
>>>> I am on Ganesha V2.5.2 with default config settings
>>>>
>>>>
>>>> So couple of questions:
>>>>
>>>> 1. Is Ganesha tested against these kind of tools, which does a bunch of
>>>> open/close in quick successions.
>>>>
>>>> 2. Is there a way to suppress these error messages and/or expedite the
>>>> lru reclamation process.
>>>>
>>>> 3. Any suggestions regarding the usage of these kind of tools with
>>>> Ganesha.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bharat
>>>>
>>>
>>>
>>>
>>> --
>>> -Bharat
>>>
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>
>>>
>>
>
>
> --
> -Bharat
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high open_fd_count

2018-01-01 Thread Malahal Naineni
>> I see that mdcache keeps growing beyond the high water mark and lru
reclamation can’t keep up.

mdcache is different from "FD" cache. I don't think we found an issue with
mdcache itself. We found couple of issues with "FD cache"

1) https://review.gerrithub.io/#/c/391266/
2) https://review.gerrithub.io/#/c/390496/

Neither of them are in V2.5-stable at this point. We will have to backport
these and others soon.

Regards, Malahal.

On Mon, Jan 1, 2018 at 11:04 PM, bharat singh 
wrote:

> Adding nfs-ganesha-support..
>
>
> On Fri, Dec 29, 2017 at 11:01 AM, bharat singh 
> wrote:
>
>> Hello,
>>
>>
>> I am testing NFSv3 Ganesha implementation against nfstest_io tool. I see
>> that mdcache keeps growing beyond the high water mark and lru
>> reclamation can’t keep up.
>>
>>
>> [cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded.  The LRU
>> thread is unable to make progress in reclaiming FDs.  Disabling FD cache.
>>
>> mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark,
>> waking LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
>> lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
>>
>>
>> I am on Ganesha V2.5.2 with default config settings
>>
>>
>> So couple of questions:
>>
>> 1. Is Ganesha tested against these kind of tools, which does a bunch of
>> open/close in quick successions.
>>
>> 2. Is there a way to suppress these error messages and/or expedite the
>> lru reclamation process.
>>
>> 3. Any suggestions regarding the usage of these kind of tools with
>> Ganesha.
>>
>>
>>
>> Thanks,
>>
>> Bharat
>>
>
>
>
> --
> -Bharat
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposal to manage global file descriptors

2017-12-11 Thread Malahal Naineni
I know the code is a bit messy, but do you still see an issue compiling
linux kernel after commit 5c2efa8f077fafa82023f5aec5e2c474c5ed2fdf?

Regards, Malahal.

On Thu, Oct 19, 2017 at 2:07 PM, LUCAS Patrice  wrote:

> On 10/18/17 15:40, Frank Filz wrote:
>
>> Hmm, this discussion got stalled, but Patrice reminded me that we need to
>> continue it...
>>
>> On 09/21/2017 07:45 PM, Frank Filz wrote:
>>>
 Philippe discovered that recent Ganesha will no longer allow compiling
 the linux kernel due to dangling open file descriptors.

 I'm not sure if there is any true leak, the simple test of echo foo >
 /mnt/foo does show a remaining open fd for /mnt/foo, however that is
 the global fd opened in the course of doing a getattrs on FSAL_ VFS.

 We have been talking about how the current management of open file
 descriptors doesn't really work, so I have a couple proposals:

 1. We really should have a limit on the number of states we allow. Now
 that NLM locks and shares also have a state_t, it would be simple to
 have a count of how many are in use, and return a resource error if an
 operation requires creating a new one past the limit. This can be a
 hard limit with no grace, if the limit is hit, then alloc_state fails.

>>> This I agree with.
>>>
>>> 2. Management of the global fd is more complex, so here goes:

 Part of the proposal is a way for the FSAL to indicate that an FSAL
 call used the global fd in a way that consumes some kind of resource
 the FSAL would like managed.

 FSAL_PROXY should never indicate that (anonymous I/O should be done
 using a special stateid, and a simple file create should result in the
 open stateid immediately being closed, if that's not the case, then
 it's easy enough to indicate use of a limited resource.

 FSAL_VFS would indicate use of the resource any time it utilizes the
 global fd. If it uses a temp fd that is closed after performing the
 operation, it would not indicate use of the limited resource.

 FSAL_GPFS, FSAL_GLUSTER, and FSAL_CEPH should all be similar to

>>> FSAL_VFS.
>>>
 FSAL_RGW only has a global fd, and I don't quite understand how it is
 managed.

>>> If only PROXY doesn't set this, then maybe it's added complexity we don't
>>> need.  Just assume it's set.
>>>
>> Matt, could you chime in on RGW? It sounds like FSAL_RGW and/or the RGW
>> library really manage the open/close state of the file. If so, then you
>> don't need hints from MDCACHE LRU...
>>
>> The main part of the proposal is to actually create a new LRU queue
 for objects that are using the limited resource.

 If we are at the hard limit on the limited resource and an entry that
 is not already in the LRU uses the resource, then we would reap an
 existing entry and call fsal_close on it to release the resource. If
 an entry was not available to be reaped, we would temporarily exceed
 the limit just like we do with mdcache entries.

 If an FSAL call resulted in use of the resource and the entry was
 already in the resource LRU, then it would be bumped to MRU of L1.

 The LRU run thread for the resource would demote objects from LRU L1
 to MRU of L2, and call fsal_close and remove objects from LRU of L2. I
 think it should work to close any files that have not been used in the
 amount of time, really using the L1 and L2 to give a shorter life to
 objects for which the resource is used once and then not used again,
 whereas a file that is accessed multiple times would have more
 resistance to being closed. I think the exact mechanics here may need

>>> some tuning, but that's the general idea.
>>>
 The idea here is to be constantly closing files that have not been
 accessed recently, and also to better manage a count of the files for
 which we are actually using the resources, and not keep a file open
 just because for some reason we do lots of lookups or stats of it (we
 might have to open it for getattrs, but then we might serve a bunch of
 cached attrs, which doesn't go to disk, might as well close the fd).

>>> This sounds almost exactly like the existing LRU thread, except that it
>>>
>> ignores
>>
>>> refcount.  If you remove global FD from the obj_handle, then the LRU as
>>> it
>>> currently exists becomes unnecessary for MDCACHE entries, as they only
>>> need a simple, single-level LRU based only on initial refcounts.  The
>>>
>> current,
>>
>>> multi-level LRU only exists to close the global FD when transitioning LRU
>>> levels.
>>>
>> The multi-level LRU for handle cache still have some value for scan
>> resistance.
>>
>> So, what it sounds like to me is that you're splitting the LRU for entries
>>>
>> from
>>
>>> the LRU for global FDs.  Is this correct?  If so, I think this
>>> complicates
>>>
>> the two
>>
>>> sets of LRU 

Re: [Nfs-ganesha-devel] crash in jemalloc leading to a deadlock.

2017-11-16 Thread Malahal Naineni
Yes. we should handle postrotate signal.

On Wed, Nov 15, 2017 at 11:53 PM, Frank Filz <ffilz...@mindspring.com>
wrote:

> If we keep the logging fd open, then we need a signal to tell it that
> logrotate has occurred…
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Tuesday, November 14, 2017 9:44 PM
> *To:* Frank Filz <ffilz...@mindspring.com>
> *Cc:* d...@redhat.com; nfs-ganesha-devel@lists.sourceforge.net
>
> *Subject:* Re: [Nfs-ganesha-devel] crash in jemalloc leading to a
> deadlock.
>
>
>
> Silly glibc, they should have provided backtrace_symbols_func() to take a
> callback function. Using backtrace_symbols_func() is a bit harder for us
> here. Here is a thought:
>
>
>
> 1. Our logger should NOT open/re-open for every message. It should just
> open once.
>
> 2. Our logger should have an interface to provide such fd using API or
> global symbol.
>
> 3. There are some tricks to get "fd" with syslog tracing as well
>
> 4. Then pass such fd from step 2 to backtrace_symbols_fd()
>
>
>
> Step1 might be optional as open() should mostly succeed though.
>
>
>
>
>
> On Thu, Nov 9, 2017 at 7:53 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
>
> That might be a good solution, though what fd would we use? Can we safely
> open an fd during a sighandler?
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Wednesday, November 8, 2017 11:51 PM
> *To:* d...@redhat.com
> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* Re: [Nfs-ganesha-devel] crash in jemalloc leading to a
> deadlock.
>
>
>
> backtrace_symbols_fd() takes the same buffer and size arguments as
> backtrace_symbols(), but instead of returning an array of strings to the
> caller, it writes the strings, one per line, to the file descriptor fd.
> backtrace_symbols_fd() does not call malloc(3), and so can be employed in
> situations where the latter function might fail.
>
>
>
> On Thu, Nov 9, 2017 at 12:24 AM, Daniel Gryniewicz <d...@redhat.com>
> wrote:
>
> Allocating in a backtrace seems like a very bad idea.  If there's ever a
> crash during an allocation, it is guaranteed to deadlock.
>
> Daniel
>
>
>
> On 11/08/2017 01:43 PM, Pradeep wrote:
>
> I'm using Ganesha 2.6 dev.12 with jemalloc-3.6.0 and hitting a case
> where jemalloc seem to be holding a lock and crashing. In Ganesha's
> gsh_backtrace(), we try to allocate memory and that hangs (ended up in
> deadlock). Have you seen this before? Perhaps it is a good idea not to
> allocate memory in backtrace path?
>
>
> #0 0x7f49b51ff1bd in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x7f49b51fad02 in _L_lock_791 () from /lib64/libpthread.so.0
> #2 0x7f49b51fac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x7f49b65d12dc in arena_bin_malloc_hard () from
> /lib64/libjemalloc.so.1
> #4 0x7f49b65d1516 in je_arena_tcache_fill_small () from
> /lib64/libjemalloc.so.1
> #5 0x7f49b65ea6ff in je_tcache_alloc_small_hard () from
> /lib64/libjemalloc.so.1
> #6 0x7f49b65ca14f in malloc () from /lib64/libjemalloc.so.1
> #7 0x7f49b6c5a785 in _dl_scope_free () from /lib64/ld-linux-x86-64.so.2
> #8 0x7f49b6c55841 in _dl_map_object_deps () from
> /lib64/ld-linux-x86-64.so.2
> #9 0x7f49b6c5ba4b in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
> #10 0x7f49b6c57364 in _dl_catch_error () from
> /lib64/ld-linux-x86-64.so.2
> #11 0x7f49b6c5b35b in _dl_open () from /lib64/ld-linux-x86-64.so.2
> #12 0x7f49b48f5ff2 in do_dlopen () from /lib64/libc.so.6
> #13 0x7f49b6c57364 in _dl_catch_error () from
> /lib64/ld-linux-x86-64.so.2
> #14 0x7f49b48f60b2 in __libc_dlopen_mode () from /lib64/libc.so.6
> #15 0x7f49b48cf595 in init () from /lib64/libc.so.6
> #16 0x7f49b51fdbb0 in pthread_once () from /lib64/libpthread.so.0
> #17 0x7f49b48cf6ac in backtrace () from /lib64/libc.so.6
> #18 0x0045193d in gsh_backtrace () at
> /usr/src/debug/nfs-ganesha-2.6-dev.12/MainNFSD/nfs_init.c:228
> #19 0x004519fe in crash_handler (signo=11,
> info=0x7f49b155db70, ctx=0x7f49b155da40) at
> /usr/src/debug/nfs-ganesha-2.6-dev.12/MainNFSD/nfs_init.c:244
> #20 
> #21 0x7f49b65d0c61 in arena_purge () from /lib64/libjemalloc.so.1
> #22 0x7f49b65d218d in je_arena_dalloc_large () from
> /lib64/libjemalloc.so.1
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.

Re: [Nfs-ganesha-devel] crash in jemalloc leading to a deadlock.

2017-11-14 Thread Malahal Naineni
Silly glibc, they should have provided backtrace_symbols_func() to take a
callback function. Using backtrace_symbols_func() is a bit harder for us
here. Here is a thought:

1. Our logger should NOT open/re-open for every message. It should just
open once.
2. Our logger should have an interface to provide such fd using API or
global symbol.
3. There are some tricks to get "fd" with syslog tracing as well
4. Then pass such fd from step 2 to backtrace_symbols_fd()

Step1 might be optional as open() should mostly succeed though.


On Thu, Nov 9, 2017 at 7:53 PM, Frank Filz <ffilz...@mindspring.com> wrote:

> That might be a good solution, though what fd would we use? Can we safely
> open an fd during a sighandler?
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Wednesday, November 8, 2017 11:51 PM
> *To:* d...@redhat.com
> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* Re: [Nfs-ganesha-devel] crash in jemalloc leading to a
> deadlock.
>
>
>
> backtrace_symbols_fd() takes the same buffer and size arguments as
> backtrace_symbols(), but instead of returning an array of strings to the
> caller, it writes the strings, one per line, to the file descriptor fd.
> backtrace_symbols_fd() does not call malloc(3), and so can be employed in
> situations where the latter function might fail.
>
>
>
> On Thu, Nov 9, 2017 at 12:24 AM, Daniel Gryniewicz <d...@redhat.com>
> wrote:
>
> Allocating in a backtrace seems like a very bad idea.  If there's ever a
> crash during an allocation, it is guaranteed to deadlock.
>
> Daniel
>
>
>
> On 11/08/2017 01:43 PM, Pradeep wrote:
>
> I'm using Ganesha 2.6 dev.12 with jemalloc-3.6.0 and hitting a case
> where jemalloc seem to be holding a lock and crashing. In Ganesha's
> gsh_backtrace(), we try to allocate memory and that hangs (ended up in
> deadlock). Have you seen this before? Perhaps it is a good idea not to
> allocate memory in backtrace path?
>
>
> #0 0x7f49b51ff1bd in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x7f49b51fad02 in _L_lock_791 () from /lib64/libpthread.so.0
> #2 0x7f49b51fac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x7f49b65d12dc in arena_bin_malloc_hard () from
> /lib64/libjemalloc.so.1
> #4 0x7f49b65d1516 in je_arena_tcache_fill_small () from
> /lib64/libjemalloc.so.1
> #5 0x7f49b65ea6ff in je_tcache_alloc_small_hard () from
> /lib64/libjemalloc.so.1
> #6 0x7f49b65ca14f in malloc () from /lib64/libjemalloc.so.1
> #7 0x7f49b6c5a785 in _dl_scope_free () from /lib64/ld-linux-x86-64.so.2
> #8 0x7f49b6c55841 in _dl_map_object_deps () from
> /lib64/ld-linux-x86-64.so.2
> #9 0x7f49b6c5ba4b in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
> #10 0x7f49b6c57364 in _dl_catch_error () from
> /lib64/ld-linux-x86-64.so.2
> #11 0x7f49b6c5b35b in _dl_open () from /lib64/ld-linux-x86-64.so.2
> #12 0x7f49b48f5ff2 in do_dlopen () from /lib64/libc.so.6
> #13 0x7f49b6c57364 in _dl_catch_error () from
> /lib64/ld-linux-x86-64.so.2
> #14 0x7f49b48f60b2 in __libc_dlopen_mode () from /lib64/libc.so.6
> #15 0x7f49b48cf595 in init () from /lib64/libc.so.6
> #16 0x7f49b51fdbb0 in pthread_once () from /lib64/libpthread.so.0
> #17 0x7f49b48cf6ac in backtrace () from /lib64/libc.so.6
> #18 0x0045193d in gsh_backtrace () at
> /usr/src/debug/nfs-ganesha-2.6-dev.12/MainNFSD/nfs_init.c:228
> #19 0x004519fe in crash_handler (signo=11,
> info=0x7f49b155db70, ctx=0x7f49b155da40) at
> /usr/src/debug/nfs-ganesha-2.6-dev.12/MainNFSD/nfs_init.c:244
> #20 
> #21 0x7f49b65d0c61 in arena_purge () from /lib64/libjemalloc.so.1
> #22 0x7f49b65d218d in je_arena_dalloc_large () from
> /lib64/libjemalloc.so.1
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=email

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.17

2017-11-14 Thread Malahal Naineni
I do build rpms, if you are directly compiling and running "make install",
it *should* install all the needed files though.

Regards, Malahal.

On Wed, Nov 15, 2017 at 1:56 AM, Marc Eshel <es...@us.ibm.com> wrote:

> I did update submodules but did not installed the libntirpc, where do you
> get it?
> Thanks, Marc.
>
>
>
> From:   Malahal Naineni <mala...@gmail.com>
> To: Marc Eshel <es...@us.ibm.com>
> Cc: Frank Filz <ffilz...@mindspring.com>,
> nfs-ganesha-devel@lists.sourceforge.net
> Date:   11/14/2017 11:37 AM
> Subject:Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.17
>
>
>
> Marc, I just built and loaded V2.6 dev.17. I am able to mount and do "ls"
> from the client. Did you do "submodules update" and installed the
> libntirpc as a separate rpm as well?
>
> Regards, Malahal.
>
> On Tue, Nov 14, 2017 at 11:06 PM, Marc Eshel <es...@us.ibm.com> wrote:
> I skipped a couple of dev releases but now I am get this when I try to
> mount.
>
> (gdb) c
> Continuing.
> [New Thread 0x71b3b700 (LWP 18716)]
> [New Thread 0x7ffe411ce700 (LWP 18717)]
>
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0x71b3b700 (LWP 18716)]
> 0x75e13989 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>
>
>
> From:   "Frank Filz" <ffilz...@mindspring.com>
> To: <nfs-ganesha-devel@lists.sourceforge.net>
> Date:   11/12/2017 09:36 PM
> Subject:[Nfs-ganesha-devel] Announce Push of V2.6-dev.17
>
>
>
> Branch next
>
> Tag:V2.6-dev.17
>
> Sorry for the delay on this, forgot to send before I came up from office
> on
> Friday...
>
> NOTE: This merge includes an ntirpc pullup, please update your submodule.
>
> Release Highlights
>
> * ntirpc pullup
>
> * SAL: Various cleanup of state recovery bits
>
> * SAL: allow grace period to be lifted early if all clients have sent
> RECLAIM_COMPLETE
>
> * CEPH: do an inode lookup vs. MDS when the Inode is not in cache
>
> * 9P lock: aquire state_lock properly
>
> * Set thread names in FSAL_PROXY and ntirpc initiated threads
>
> * Allow configuration of NFSv4 minor versions.
>
> * Lower message log level for a non-existent user
>
> * Fix cmake failure when /etc/os-release is not present
>
> * GLUSTER: glusterfs_create_export() SEGV for typo ganesha.conf
>
> * handle hosts via libcidr to unify IPv4/IPv4 host/network clients
>
> * Add some detail to config documentation
>
> * NFSv4.1+ return special invalid stateid on close per Section 8.2.3
>
> * Give temp fd in fsal_reopen_obj when verification fails for a fd's
> openflags
>
> * GPFS: Set a FDs 'openflags=FSAL_O_CLOSED' when fd=-1 is set
>
> * Various RPC callback and timeout fixes
>
> Signed-off-by: Frank S. Filz <ffilz...@mindspring.com>
>
> Contents:
>
> d8e89f7 Frank S. Filz V2.6-dev.17
> 542ea90 William Allen Simpson Pull up NTIRPC through #91
> 645f410 Madhu Thorat [GPFS] Set a FDs 'openflags=FSAL_O_CLOSED' when fd=-1
> is set
> 482672a Madhu Thorat Give temp fd in fsal_reopen_obj when verification
> fails
> for a fd's openflags
> 05ade07 Frank S. Filz NFSv4.1+ return special invalid stateid on close per
> Section 8.2.3
> 9bd00bd Frank S. Filz Add some detail to config documentation
> 5ca449d Jan-Martin Rämer handle hosts via libcidr to unify IPv4/IPv4
> host/network clients
> 0819dc4 Kaleb S. KEITHLEY fsal_gluster: glusterfs_create_export() SEGV for
> typo ganesha.conf
> a5da1a0 Malahal Naineni Fix cmake failure when /etc/os-release is not
> present
> ed4bace Malahal Naineni Lower message log level for a non-existent user
> abcd932 Malahal Naineni Allow configuration of NFSv4 minor versions.
> 1a9f1e0 Dominique Martinet FSAL_PROXY: set thread names for logging
> 3b857f1 Dominique Martinet 9P lock: aquire state_lock properly
> 302ab52 Jeff Layton SAL: allow grace period to be lifted early if all
> clients have sent RECLAIM_COMPLETE
> 476c206 Jeff Layton FSAL_CEPH: do an inode lookup vs. MDS when the Inode
> is
> not in cache
> 08a953a Jeff Layton recovery_fs: ensure we free the cid_recov_tag when
> removing the entry
> 0f34e77 Jeff Layton recovery_fs: remove unnecessary conditionals from
> fs_read_recov_clids_impl
> 875495c Jeff Layton NFSv4: remove stable-storage client record on
> DESTROY_CLIENTID
> 35ba7dd Jeff Layton NFSv4: make cid_allow_reclaim a bool
> e016e58 Jeff Layton SAL: fix locking around clnt->cid_recov_tag
> 3e228c8 Jeff Layton SAL: remove check_clid recovery operation
> e132294 Jeff Layton SAL: 

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.17

2017-11-14 Thread Malahal Naineni
Marc, I just built and loaded V2.6 dev.17. I am able to mount and do "ls"
from the client. Did you do "submodules update" and installed the libntirpc
as a separate rpm as well?

Regards, Malahal.

On Tue, Nov 14, 2017 at 11:06 PM, Marc Eshel <es...@us.ibm.com> wrote:

> I skipped a couple of dev releases but now I am get this when I try to
> mount.
>
> (gdb) c
> Continuing.
> [New Thread 0x71b3b700 (LWP 18716)]
> [New Thread 0x7ffe411ce700 (LWP 18717)]
>
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0x71b3b700 (LWP 18716)]
> 0x75e13989 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>
>
>
> From:   "Frank Filz" <ffilz...@mindspring.com>
> To: <nfs-ganesha-devel@lists.sourceforge.net>
> Date:   11/12/2017 09:36 PM
> Subject:[Nfs-ganesha-devel] Announce Push of V2.6-dev.17
>
>
>
> Branch next
>
> Tag:V2.6-dev.17
>
> Sorry for the delay on this, forgot to send before I came up from office
> on
> Friday...
>
> NOTE: This merge includes an ntirpc pullup, please update your submodule.
>
> Release Highlights
>
> * ntirpc pullup
>
> * SAL: Various cleanup of state recovery bits
>
> * SAL: allow grace period to be lifted early if all clients have sent
> RECLAIM_COMPLETE
>
> * CEPH: do an inode lookup vs. MDS when the Inode is not in cache
>
> * 9P lock: aquire state_lock properly
>
> * Set thread names in FSAL_PROXY and ntirpc initiated threads
>
> * Allow configuration of NFSv4 minor versions.
>
> * Lower message log level for a non-existent user
>
> * Fix cmake failure when /etc/os-release is not present
>
> * GLUSTER: glusterfs_create_export() SEGV for typo ganesha.conf
>
> * handle hosts via libcidr to unify IPv4/IPv4 host/network clients
>
> * Add some detail to config documentation
>
> * NFSv4.1+ return special invalid stateid on close per Section 8.2.3
>
> * Give temp fd in fsal_reopen_obj when verification fails for a fd's
> openflags
>
> * GPFS: Set a FDs 'openflags=FSAL_O_CLOSED' when fd=-1 is set
>
> * Various RPC callback and timeout fixes
>
> Signed-off-by: Frank S. Filz <ffilz...@mindspring.com>
>
> Contents:
>
> d8e89f7 Frank S. Filz V2.6-dev.17
> 542ea90 William Allen Simpson Pull up NTIRPC through #91
> 645f410 Madhu Thorat [GPFS] Set a FDs 'openflags=FSAL_O_CLOSED' when fd=-1
> is set
> 482672a Madhu Thorat Give temp fd in fsal_reopen_obj when verification
> fails
> for a fd's openflags
> 05ade07 Frank S. Filz NFSv4.1+ return special invalid stateid on close per
> Section 8.2.3
> 9bd00bd Frank S. Filz Add some detail to config documentation
> 5ca449d Jan-Martin Rämer handle hosts via libcidr to unify IPv4/IPv4
> host/network clients
> 0819dc4 Kaleb S. KEITHLEY fsal_gluster: glusterfs_create_export() SEGV for
> typo ganesha.conf
> a5da1a0 Malahal Naineni Fix cmake failure when /etc/os-release is not
> present
> ed4bace Malahal Naineni Lower message log level for a non-existent user
> abcd932 Malahal Naineni Allow configuration of NFSv4 minor versions.
> 1a9f1e0 Dominique Martinet FSAL_PROXY: set thread names for logging
> 3b857f1 Dominique Martinet 9P lock: aquire state_lock properly
> 302ab52 Jeff Layton SAL: allow grace period to be lifted early if all
> clients have sent RECLAIM_COMPLETE
> 476c206 Jeff Layton FSAL_CEPH: do an inode lookup vs. MDS when the Inode
> is
> not in cache
> 08a953a Jeff Layton recovery_fs: ensure we free the cid_recov_tag when
> removing the entry
> 0f34e77 Jeff Layton recovery_fs: remove unnecessary conditionals from
> fs_read_recov_clids_impl
> 875495c Jeff Layton NFSv4: remove stable-storage client record on
> DESTROY_CLIENTID
> 35ba7dd Jeff Layton NFSv4: make cid_allow_reclaim a bool
> e016e58 Jeff Layton SAL: fix locking around clnt->cid_recov_tag
> 3e228c8 Jeff Layton SAL: remove check_clid recovery operation
> e132294 Jeff Layton SAL: clean up nfs4_check_deleg_reclaim a bit
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.
> avast.com_antivirus=DwIFAw=jf_iaSHvJObTbx-siA1ZOg=
> NhjWjQMiZ2Z3jl9k1z_vFQ=8bQeqLihynhki7AB0Jir49ppgOHtLD
> 15akxDdRJUR1g=W5izlGpe_Bib2XeX0j8VpEpy_kMs7UOkQ2EgkDX1mY0=
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org!
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.
> link_slashdot=DwIFAw=jf_iaSHv

Re: [Nfs-ganesha-devel] crash in jemalloc leading to a deadlock.

2017-11-08 Thread Malahal Naineni
backtrace_symbols_fd() takes the same buffer and size arguments as
backtrace_symbols(), but instead of returning an array of strings to the
caller, it writes the strings, one per line, to the file descriptor fd.
backtrace_symbols_fd() does not call malloc(3), and so can be employed in
situations where the latter function might fail.

On Thu, Nov 9, 2017 at 12:24 AM, Daniel Gryniewicz  wrote:

> Allocating in a backtrace seems like a very bad idea.  If there's ever a
> crash during an allocation, it is guaranteed to deadlock.
>
> Daniel
>
>
> On 11/08/2017 01:43 PM, Pradeep wrote:
>
>> I'm using Ganesha 2.6 dev.12 with jemalloc-3.6.0 and hitting a case
>> where jemalloc seem to be holding a lock and crashing. In Ganesha's
>> gsh_backtrace(), we try to allocate memory and that hangs (ended up in
>> deadlock). Have you seen this before? Perhaps it is a good idea not to
>> allocate memory in backtrace path?
>>
>>
>> #0 0x7f49b51ff1bd in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1 0x7f49b51fad02 in _L_lock_791 () from /lib64/libpthread.so.0
>> #2 0x7f49b51fac08 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3 0x7f49b65d12dc in arena_bin_malloc_hard () from
>> /lib64/libjemalloc.so.1
>> #4 0x7f49b65d1516 in je_arena_tcache_fill_small () from
>> /lib64/libjemalloc.so.1
>> #5 0x7f49b65ea6ff in je_tcache_alloc_small_hard () from
>> /lib64/libjemalloc.so.1
>> #6 0x7f49b65ca14f in malloc () from /lib64/libjemalloc.so.1
>> #7 0x7f49b6c5a785 in _dl_scope_free () from
>> /lib64/ld-linux-x86-64.so.2
>> #8 0x7f49b6c55841 in _dl_map_object_deps () from
>> /lib64/ld-linux-x86-64.so.2
>> #9 0x7f49b6c5ba4b in dl_open_worker () from
>> /lib64/ld-linux-x86-64.so.2
>> #10 0x7f49b6c57364 in _dl_catch_error () from
>> /lib64/ld-linux-x86-64.so.2
>> #11 0x7f49b6c5b35b in _dl_open () from /lib64/ld-linux-x86-64.so.2
>> #12 0x7f49b48f5ff2 in do_dlopen () from /lib64/libc.so.6
>> #13 0x7f49b6c57364 in _dl_catch_error () from
>> /lib64/ld-linux-x86-64.so.2
>> #14 0x7f49b48f60b2 in __libc_dlopen_mode () from /lib64/libc.so.6
>> #15 0x7f49b48cf595 in init () from /lib64/libc.so.6
>> #16 0x7f49b51fdbb0 in pthread_once () from /lib64/libpthread.so.0
>> #17 0x7f49b48cf6ac in backtrace () from /lib64/libc.so.6
>> #18 0x0045193d in gsh_backtrace () at
>> /usr/src/debug/nfs-ganesha-2.6-dev.12/MainNFSD/nfs_init.c:228
>> #19 0x004519fe in crash_handler (signo=11,
>> info=0x7f49b155db70, ctx=0x7f49b155da40) at
>> /usr/src/debug/nfs-ganesha-2.6-dev.12/MainNFSD/nfs_init.c:244
>> #20 
>> #21 0x7f49b65d0c61 in arena_purge () from /lib64/libjemalloc.so.1
>> #22 0x7f49b65d218d in je_arena_dalloc_large () from
>> /lib64/libjemalloc.so.1
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] CI failures

2017-11-02 Thread Malahal Naineni
I think there are two threads, one calling state_unlock() and other calling
state_lock(). The latter doesn't acquire the state_lock leading to the
current crash. My patch is not the culprit. Based on my code reading, the
caller of state_lock() is expected to acquire state_lock if needed. That it
what NFS seems to do.

regards, malahal.

On Thu, Nov 2, 2017 at 9:28 PM, Daniel Gryniewicz  wrote:

> On 11/02/2017 11:46 AM, Frank Filz wrote:
>
>> Ok, so this patch: https://review.gerrithub.io/#/c/385433/ has a real
>> failure visible, however, it clearly has nothing to do with the patch at
>> hand.
>>
>> How do we want to handle that for merge? The patch clearly is ready for
>> merge, but with a -1 Verify, if we're going to make this verification
>> stuff
>> meaningful, we can't proceed.
>>
>> Frank
>>
>>
> It's a use-after-free on the state lock.  As such, it may be caused by the
> previous commit in the sequence:
>
> https://review.gerrithub.io/#/c/385104/
>
> Daniel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Backport list for 2.5.4

2017-11-02 Thread Malahal Naineni
Dan, I remember that we waited for the recovery code (aka IP failover code)
reorganization patches to go into V2.6 alone. Do they now have enough
runtime to get merged into V2.5 stable branch?

Regards, Malahal.

On Tue, Oct 31, 2017 at 11:37 PM, Daniel Gryniewicz  wrote:

> Here's the set of commits that downstream Ceph needs.  Gluster can also
> use the non-Ceph related ones.
>
> Note, these are oldest first, not newest first.
>
> Daniel
>
>
> commit b862fe360b2a0f1b1d9d5d6a8b91f1550b66b269
> Author: Gui Hecheng 
> AuthorDate: Thu Mar 30 10:44:25 2017 +0800
> Commit: Frank S. Filz 
> CommitDate: Fri Aug 11 14:31:22 2017 -0700
>
> SAL: extract fs logic from nfs4_recovery
>
> This is a prepare patch for modulized recovery backends.
> - define recovery apis: struct nfs_recovery_backend
> - define hooks for recovery_fs module
>
> Change-Id: I45523ef9a0e6f9a801fc733b095ba2965dd8751b
> Signed-off-by: Gui Hecheng 
> commit cb787a1cf4a4df4da672c6b00cb0724db5d99e4d
> Author: Gui Hecheng 
> AuthorDate: Thu Mar 30 10:50:18 2017 +0800
> Commit: Frank S. Filz 
> CommitDate: Fri Aug 11 14:31:23 2017 -0700
>
> SAL: introduce new recovery backend based on rados kv store
>
> Use rados OMAP API to implement a kv store for client tracking data
>
> Change-Id: I1aec1e110a2fba87ae39a1439818a363b6cfc822
> Signed-off-by: Gui Hecheng 
> commit fbc905015d01a7f2548b81d84f35b76524543f13
> Author: Gui Hecheng 
> AuthorDate: Wed May 3 09:58:34 2017 +0800
> Commit: Frank S. Filz 
> CommitDate: Fri Aug 11 14:31:23 2017 -0700
>
> cmake: make modulized recovery backends compile as modules
>
> - add USE_RADOS_RECOV option for new rados kv backend
> - keep original fs backend as default
>
> Change-Id: I26c2c4f9a433e6cd70f113fa05194d6817b9377a
> Signed-off-by: Gui Hecheng 
> commit eb4eea1343251f17fe39de48426bc4363eaef957
> Author: Gui Hecheng 
> AuthorDate: Thu May 4 22:43:17 2017 +0800
> Commit: Frank S. Filz 
> CommitDate: Fri Aug 11 14:31:23 2017 -0700
>
> config: add new config options for rados_kv recovery backend
>
> - new config block: RADOS_KV
> - new option: ceph_conf, userid, pool
>
> Change-Id: Id44afa70e8b5adb2cb2b9d48a807b0046f604f30
> Signed-off-by: Gui Hecheng 
> commit f7a09d87851f64a68c2438fdc09372703bcbebec
> Author: Matt Benjamin 
> AuthorDate: Thu Jul 20 15:21:00 2017 -0400
> Commit: Frank S. Filz 
> CommitDate: Thu Aug 17 14:46:29 2017 -0700
>
> config: add config_url and RADOS url provider
>
> Provides a mechanism to to load nfs-ganesha config sections (e.g.,
> export blocks) from a generic URL.  Includes a URL provider
> which maps URLs to Ceph RADOS objects.
>
> Change-Id: I9067eaef2b38a78e9f1a877dfb9eb3c176239e71
> Signed-off-by: Matt Benjamin 
> commit b6ce63479c965c12d2d3417abd1dd082cf0967b8
> Author: Matt Benjamin 
> AuthorDate: Fri Sep 22 14:21:46 2017 -0400
> Commit: Frank S. Filz 
> CommitDate: Fri Sep 22 14:06:12 2017 -0700
>
> rpm spec: add RADOS_URLS
>
> Change-Id: I60ebd4cb5bc3b3184704b8951a5392ed91846cdd
> Signed-off-by: Matt Benjamin 
> commit 247c4a61cd743e7b3430bb0a9780c3f6d3f73a44
> Author: Matt Benjamin 
> AuthorDate: Fri Sep 22 15:38:37 2017 -0400
> Commit: Frank S. Filz 
> CommitDate: Fri Sep 22 14:06:28 2017 -0700
>
> rados url: handle error from rados_read()
>
> Change-Id: If437a989ddaea108216c28af99fab6da0f089e01
> Signed-off-by: Matt Benjamin 
> commit d9f0536b7f3cbe6b9b4d0dc5b4e4acd3337d41b5
> Author: Jeff Layton 
> AuthorDate: Fri Oct 6 14:23:23 2017 -0400
> Commit: Frank S. Filz 
> CommitDate: Fri Oct 6 14:26:45 2017 -0700
>
> FSAL_CEPH: don't clobber the return code with the getlk call
>
> If a lock is denied, the code will call getlk to get the conflicting
> lock
> info. That action then clobbers the return code and makes the lock
> appear
> to be a success.
>
> Also, no need to check conflicting_lock twice here.
>
> See: https://github.com/nfs-ganesha/nfs-ganesha/issues/205
>
> Change-Id: Ibfc8ca92bec84518573f425131ce969479ae15dd
> Signed-off-by: Jeff Layton 
> commit 13a2f2dce7aff5cc86bdb96b058cbb4d20898b66
> Author: Matt Benjamin 
> AuthorDate: Thu Oct 19 12:58:27 2017 -0400
> Commit: Frank S. 

Re: [Nfs-ganesha-devel] ABBA deadlock in 2.5 (likely in 2.6 as well)

2017-10-23 Thread Malahal Naineni
Sounds good, will give it a try with nasty "racer". Thank you Dan.

Regards, Malahal.

On Mon, Oct 23, 2017 at 8:27 PM, Daniel Gryniewicz <d...@redhat.com> wrote:

> Maybe something like this:
>
> https://paste.fedoraproject.org/paste/CptGkmoRutBKYjno5FiSjg/
>
> Daniel
>
>
> On 10/23/2017 10:13 AM, Malahal Naineni wrote:
>
>> Let us say we have X/Y path in the file system. We have attr_lock and
>> content_lock on each object. The locks we are interested here are
>> attr_lock on X (hear after referred to as AX) and content_lock on X
>> (hear after CX).  Similarly we have AY and CY for object named Y.
>>
>> 1. Thread 50 (lookup called for X ) takes AX and waits for CX (attr_lock
>> followed by content_lock is the expected order)
>>
>> 2. Thread 251 (readdirplus on X) takes CX, AY and then waits for CY for
>> processing object Y
>>
>> 3. Thread 132 (readdirplus on Y) takes CY, and then waits for AX (this
>> is due to lookup of parent)
>>
>> Classic philosopher's problem: 1 waits for 2, 2 waits for 3 and then 3
>> waits for 1. The lock ordering for attr_lock and content_lock for an
>> object is attr_lock followed by content_lock. We can assume that parent
>> locks should be acquired before the child locks, but DOTDOT appears as a
>> child in readdirplus/readdir. If we can handle parent differently, we
>> might be OK. Any help would be appreciated.
>>
>> Regards, Malahal.
>>
>> (gdb) thread 50
>> [Switching to thread 50 (Thread 0x3fff6cffe850 (LWP 37851))]
>> #0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
>> /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
>> /lib64/libpthread.so.0
>> #1  0x101adaa4 in mdcache_refresh_attrs (entry=0x3ffccc0541f0,
>> need_acl=false, invalidate=true)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FS
>> AL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1187
>> #2  0x101ae1b0 in mdcache_getattrs (obj_hdl=0x3ffccc054228,
>> attrs_out=0x3fff6cffcfb8)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FS
>> AL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1228
>> #3  0x100b9fdc in nfs_SetPostOpAttr (obj=0x3ffccc054228,
>> Fattr=0x3ffe64050dc8, attrs=0x0)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/Pr
>> otocols/NFS/nfs_proto_tools.c:91
>> #4  0x100c6ba8 in nfs3_lookup (arg=0x3ffa9ff04780,
>> req=0x3ffa9ff03f78, res=0x3ffe64050d50)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/Pr
>> otocols/NFS/nfs3_lookup.c:131
>> #5  0x10065220 in nfs_rpc_execute (reqdata=0x3ffa9ff03f50)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/Ma
>> inNFSD/nfs_worker_thread.c:1290
>> #6  0x10065c9c in worker_run (ctx=0x10013c1d3f0)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/Ma
>> inNFSD/nfs_worker_thread.c:1562
>> #7  0x101670f4 in fridgethr_start_routine (arg=0x10013c1d3f0)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/su
>> pport/fridgethr.c:550
>> #8  0x3fff8a4fc2bc in .start_thread () from /lib64/libpthread.so.0
>> #9  0x3fff8a31b304 in .__clone () from /lib64/libc.so.6
>> (gdb) frame 1
>> #1  0x101adaa4 in mdcache_refresh_attrs (entry=0x3ffccc0541f0,
>> need_acl=false, invalidate=true)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FS
>> AL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1187
>> 1187PTHREAD_RWLOCK_wrlock(>content_lock);
>>
>> (gdb) p entry
>> $1 = (mdcache_entry_t *) 0x3ffccc0541f0
>> (gdb) p entry->content_lock
>> $2 = {__data = {__lock = 0, __nr_readers = 1, __readers_wakeup = 2408,
>> __writer_wakeup = 4494, __nr_readers_queued = 0,
>>  __nr_writers_queued = 6, __writer = 0, __shared = 0, __pad1 = 0,
>> __pad2 = 0, __flags = 0},
>>__size = "\000\000\000\000\000\000\000\001\000\000\th\000\000\021\216
>> \000\000\000\000\000\000\000\006", '\000' , __align =
>> 1}
>> (gdb) thread 251
>> [Switching to thread 251 (Thread 0x3fff2e7fe850 (LWP 37976))]
>> #0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
>> /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
>> /lib64/libpthread.so.0
>> #1  0x101adaa4 in mdcache_refresh_attrs (entry=0x3ffc30041bd0,
>> need_acl=false, invalidate=true)
>>  at /usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FS

[Nfs-ganesha-devel] ABBA deadlock in 2.5 (likely in 2.6 as well)

2017-10-23 Thread Malahal Naineni
Let us say we have X/Y path in the file system. We have attr_lock and
content_lock on each object. The locks we are interested here are
attr_lock on X (hear after referred to as AX) and content_lock on X
(hear after CX).  Similarly we have AY and CY for object named Y.

1. Thread 50 (lookup called for X ) takes AX and waits for CX (attr_lock
   followed by content_lock is the expected order)

2. Thread 251 (readdirplus on X) takes CX, AY and then waits for CY for
   processing object Y

3. Thread 132 (readdirplus on Y) takes CY, and then waits for AX (this
   is due to lookup of parent)

Classic philosopher's problem: 1 waits for 2, 2 waits for 3 and then 3
waits for 1. The lock ordering for attr_lock and content_lock for an
object is attr_lock followed by content_lock. We can assume that parent
locks should be acquired before the child locks, but DOTDOT appears as a
child in readdirplus/readdir. If we can handle parent differently, we
might be OK. Any help would be appreciated.

Regards, Malahal.

(gdb) thread 50
[Switching to thread 50 (Thread 0x3fff6cffe850 (LWP 37851))]
#0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
(gdb) bt
#0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1  0x101adaa4 in mdcache_refresh_attrs (entry=0x3ffccc0541f0,
need_acl=false, invalidate=true)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1187
#2  0x101ae1b0 in mdcache_getattrs (obj_hdl=0x3ffccc054228,
attrs_out=0x3fff6cffcfb8)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1228
#3  0x100b9fdc in nfs_SetPostOpAttr (obj=0x3ffccc054228,
Fattr=0x3ffe64050dc8, attrs=0x0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/Protocols/NFS/nfs_proto_tools.c:91
#4  0x100c6ba8 in nfs3_lookup (arg=0x3ffa9ff04780,
req=0x3ffa9ff03f78, res=0x3ffe64050d50)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/Protocols/NFS/nfs3_lookup.c:131
#5  0x10065220 in nfs_rpc_execute (reqdata=0x3ffa9ff03f50)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1290
#6  0x10065c9c in worker_run (ctx=0x10013c1d3f0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1562
#7  0x101670f4 in fridgethr_start_routine (arg=0x10013c1d3f0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/support/fridgethr.c:550
#8  0x3fff8a4fc2bc in .start_thread () from /lib64/libpthread.so.0
#9  0x3fff8a31b304 in .__clone () from /lib64/libc.so.6
(gdb) frame 1
#1  0x101adaa4 in mdcache_refresh_attrs (entry=0x3ffccc0541f0,
need_acl=false, invalidate=true)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1187
1187 PTHREAD_RWLOCK_wrlock(>content_lock);
(gdb) p entry
$1 = (mdcache_entry_t *) 0x3ffccc0541f0
(gdb) p entry->content_lock
$2 = {__data = {__lock = 0, __nr_readers = 1, __readers_wakeup = 2408,
__writer_wakeup = 4494, __nr_readers_queued = 0,
__nr_writers_queued = 6, __writer = 0, __shared = 0, __pad1 = 0, __pad2
= 0, __flags = 0},
  __size =
"\000\000\000\000\000\000\000\001\000\000\th\000\000\021\216\000\000\000\000\000\000\000\006",
'\000' , __align = 1}
(gdb) thread 251
[Switching to thread 251 (Thread 0x3fff2e7fe850 (LWP 37976))]
#0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
(gdb) bt
#0  0x3fff8a50089c in .__pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1  0x101adaa4 in mdcache_refresh_attrs (entry=0x3ffc30041bd0,
need_acl=false, invalidate=true)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1187
#2  0x101ae1b0 in mdcache_getattrs (obj_hdl=0x3ffc30041c08,
attrs_out=0x3fff2e7fcb58)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1228
#3  0x101c0f50 in mdcache_readdir_chunked
(directory=0x3ffccc0541f0, whence=1298220731,
dir_state=0x3fff2e7fcee8, cb=@0x1024b040: 0x1003f524 ,
attrmask=122830, eod_met=0x3fff2e7fd01c)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3047
#4  0x101ab998 in mdcache_readdir (dir_hdl=0x3ffccc054228,
whence=0x3fff2e7fcfa0, dir_state=0x3fff2e7fcee8,
cb=@0x1024b040: 0x1003f524 , attrmask=122830,
eod_met=0x3fff2e7fd01c)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm008.20M6-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:637
#5  0x10040090 in fsal_readdir (directory=0x3ffccc054228,
cookie=1298220731, nbfound=0x3fff2e7fd018,
eod_met=0x3fff2e7fd01c, attrmask=122830, cb=@0x10250bb0: 0x100cabf4
,
opaque=0x3fff2e7fd038) at

Re: [Nfs-ganesha-devel] The life of tcp drc

2017-10-13 Thread Malahal Naineni
#1. Looks like a bug! Lines 629 and 630 should be deleted
#2. See nfs_rpc_free_user_data(). It sets xp_u2 to NULL and drc ref is
decremented there.
#3. Life time of drc should start when it is allocated
in nfs_dupreq_get_drc() using alloc_tcp_drc().
  It can live beyond xprt's xp_u2 setting to NULL. It will live until
we decide to free in drc_free_expired() using free_tcp_drc().

Regards, Malahal.
PS: The comment "drc cache maintains a ref count." seems to imply that it
will have a refcount for keeping it in the hash table itself. I may have
kept those two lines because of that but It doesn't make sense as refcnt
will never go to zero this way.

On Thu, Oct 12, 2017 at 3:48 PM, Kinglong Mee  wrote:

> Describes in src/RPCAL/nfs_dupreq.c,
>
>  * The life of tcp drc: it gets allocated when we process the first
>  * request on the connection. It is put into rbtree (tcp_drc_recycle_t).
>  * drc cache maintains a ref count. Every request as well as the xprt
>  * holds a ref count. Its ref count should go to zero when the
>  * connection's xprt gets freed (all requests should be completed on the
>  * xprt by this time). When the ref count goes to zero, it is also put
>  * into a recycle queue (tcp_drc_recycle_q). When a reconnection
>  * happens, we hope to find the same drc that was used before, and the
>  * ref count goes up again. At the same time, the drc will be removed
>  * from the recycle queue. Only drc's with ref count zero end up in the
>  * recycle queue. If a reconnection doesn't happen in time, the drc gets
>  * freed by drc_free_expired() after some period of inactivety.
>
> Some questions about the life time of tcp drc,
> 1. The are two references of drc for xprt in nfs_dupreq_get_drc().
>629 /* xprt ref */
>630 drc->refcnt = 1;
>...
>638 (void)nfs_dupreq_ref_drc(drc);  /* xprt ref
> */
>...
>653 req->rq_xprt->xp_u2 = (void *)drc;
>
>I think it's a bug. The first one needs remove. Right?
>
> 2. The is no place to decrease the reference of drc for xprt.
>The xprt argument in nfs_dupreq_put_drc() is unused.
>Should it be used to decrease the ref?
>I think it's the right place to decrease the ref in
> nfs_dupreq_put_drc().
>
> 3. My doubts is that, the life time of drc stored in req->rq_xprt->xp_u2 ?
>Start at #1, end at #2 (req->rq_xprt->xp_u2 = NULL) ?
>If that, the bad case is always lookup drc from tcp_drc_recycle_t.
>
>Otherwise, don't put the reference at #2, when to put it?
>the bad case is the drc ref always be 1 forever, am I right?
>
> thanks,
> Kinglong Mee
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] V2.5-stable maintenance

2017-10-07 Thread Malahal Naineni
Soumya, Added that commit in addition to what I posted. I tried merging all
34 commits but couple things failed, so I bailed out the following. If you
need any of these, please let me know!

1) 09303d9b1 FSAL_PROXY : storing stateid from background NFS server
Merge was successful, but compilation failed. Looks like it needs some
other commit(s) as well.

2) d89d67db2 nfs: fix error handling in nfs_rpc_v41_single
Merge failed but upon further inspection, this is NOT applicable to
V2.5-stable.

Other 32 commits are all cherry-picked with few needing merge conflict
resolution. Here is the branch
https://github.com/malahal/nfs-ganesha/commits/V2.5-stable

I will publish it early next week (may take few commits from dev.13 tag as
well) after some trivial testing!

Regards, Malahal.

On Thu, Oct 5, 2017 at 8:32 PM, Soumya Koduri <skod...@redhat.com> wrote:

> Hi Malahal,
>
> On 10/05/2017 09:06 AM, Malahal Naineni wrote:
>
>> 85bd9217d GLUSTER: make sure to free xstat when meeting error
>>
>
>
> Before applying the above patch, I request to backport below commit as
> well -
>
> 39119aa FSAL_GLUSTER: Use glfs_xreaddirplus_r for readdir
>
> Thanks,
> Soumya
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] V2.5-stable maintenance

2017-10-06 Thread Malahal Naineni
I use "git log --grep=" to get all commits and then use "git
branch --contains" on each returned commit. We should be able to make use
of gerrit's change-id for this purpose as well. A little cumbersome but
this is a better way than maintaining another file where we manually update
things.

"git notes" is another option!

Regards, Malahal.

On Fri, Oct 6, 2017 at 2:13 AM, Frank Filz <ffilz...@mindspring.com> wrote:

> Hmm, I wonder if what would work best is to have a Google Spreadsheet with
> column 1 listing the commits in next, and columns to indicate if backport
> is required or complete for each older stable version we care about?
>
>
>
> I know when handling support issues it would be really helpful to know if
> a given fix was backported or not…
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Wednesday, October 4, 2017 8:52 PM
> *To:* Frank Filz <ffilz...@mindspring.com>
> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* Re: [Nfs-ganesha-devel] V2.5-stable maintenance
>
>
>
> It would be nice to "tag" commits that we want to back port with git notes
> or some such thing.
>
>
>
> Last week's V2.6 commits that need backporting will be ported this week (a
> week behind in time to allow some testing in V2.6)
>
> Whoever is going to tag the release can "cherry-pick" the commits (any
> merge conflicts will be coordinated with the author).
>
>
>
> Regards, Malahal.
>
>
>
> On Thu, Oct 5, 2017 at 3:09 AM, Frank Filz <ffilz...@mindspring.com>
> wrote:
>
> When we first talked about stable maintenance for V2.5, it seemed like IBM
> was most likely to be using V2.5 and we elected Malahal to be V2.5-stable
> maintainer.
>
> It now seems we have pressure from other fronts for V2.5 stable fixes, I'd
> like to open the discussion on how best to manage it? I'd be happy to do
> additional backports and V2.5-stable tagging, or we could ask Kaleb to do
> it, or we can operate on a "make requests to Malahal" procedure.
>
> Any input here is welcome, I want to make sure we respect everyone's needs
> for V2.5-stable.
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient_term=icon>
>  Virus-free.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=emailclient_term=link>
> <#m_-5327551706289419782_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] V2.5-stable maintenance

2017-10-04 Thread Malahal Naineni
It would be nice to "tag" commits that we want to back port with git notes
or some such thing.

Last week's V2.6 commits that need backporting will be ported this week (a
week behind in time to allow some testing in V2.6)
Whoever is going to tag the release can "cherry-pick" the commits (any
merge conflicts will be coordinated with the author).

Regards, Malahal.

On Thu, Oct 5, 2017 at 3:09 AM, Frank Filz  wrote:

> When we first talked about stable maintenance for V2.5, it seemed like IBM
> was most likely to be using V2.5 and we elected Malahal to be V2.5-stable
> maintainer.
>
> It now seems we have pressure from other fronts for V2.5 stable fixes, I'd
> like to open the discussion on how best to manage it? I'd be happy to do
> additional backports and V2.5-stable tagging, or we could ask Kaleb to do
> it, or we can operate on a "make requests to Malahal" procedure.
>
> Any input here is welcome, I want to make sure we respect everyone's needs
> for V2.5-stable.
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] V2.5-stable maintenance

2017-10-04 Thread Malahal Naineni
I haven't received any requests from anyone, so I haven't made any progress
here! This is what my quick look at the commits indicated. I added git
notes to these commits in my github account repo if you want to check. Here
is the list, let me know if you need any additions or removals. I haven't
attempted any merges yet, so I may add some trivial commits to this list to
make 'cherry-pick' happy!

$ git notes --ref=backport list | awk '{print $2}' | xargs -n1 git log -1
--oneline
09303d9b1 FSAL_PROXY : storing stateid from background NFS server
09c74126a nfs: fix error handling in layoutrecall code
0d045c355 GPFS_FSAL : Collect and extract performance stats
14250b586 Fixup request_mask handling in mdcache_refresh_attrs
32c723a0c [GPFS] read_dirents: check status of FD gathering instead of FD
itself.
40ee79dbf packaging (rpm): /var/log/ganesha has incorrect owner (selinux)
42abee8a1 Fix Dispatch_Max_Reqs max value in documentation.
55b5fc62c FSAL_GLUSTER: Free glfs object in case of export creation failures
57c9c3032 Fix open_for_locks parameter in gpfs_lock_op2
698ce898b Fix rpc-statd.service path on debian
7934673a5 Create v4.1+ openowner with confirmed flag already set
7d0629b3b Fix dec_client_record_ref accessing freed memory
7f95da09b [GPFS] return first failure from gpfs_read2 and gpfs_write2
85bd9217d GLUSTER: make sure to free xstat when meeting error
8b67807c5 Use 'v6disabled' flag to know if IPv6 is disabled
8e7b14d75 [GPFS] Removed get_my_fd()
9fbb6bc1a [GPFS] remove duplicate code in find_fd()
a5d2eb13b make the LIBEXECDIR valid for distro Debian or Ubuntu
a5fc5c1ad setclientid: free clientid if client_r_addr is too long.
a635a4b1f FSAL_PROXY : manage clean close of PROXY threads
c0f2d9f31 nfs: nfs_rpc_get_chan must hold cid_mutex to walk sessions list
ca2a2143c FSAL_RGW depends on FSAL status method that it doesn't implement
cca16acb7 FSAL_PROXY : preserving request_mask in getattrs
ce56e157e Remove nfs_rpc_dispatch_stop
d7f1dc1ef GPFS_FSAL : Rectification of perf stats code
d89d67db2 nfs: fix error handling in nfs_rpc_v41_single
deccd5613 FSAL_GLUSTER: detach export in case of failures
e5db2a83d Fix sleep path for debian
f48772730 Fix to make sure op_ctx is set when calling mdcache_lru_unref().
f4e18dddf DBus: Shutdown dbus thread before closing the connection
f527e489c Allow cancellation of upcall threads even if not ready
f7b76ea1b Use state_lock to prevent race between FREE_STATEID and LOCK/new
lock owner
fea84d795 [GPFS] Check find_fd() return status in gpfs_lock_op2


On Thu, Oct 5, 2017 at 3:09 AM, Frank Filz  wrote:

> When we first talked about stable maintenance for V2.5, it seemed like IBM
> was most likely to be using V2.5 and we elected Malahal to be V2.5-stable
> maintainer.
>
> It now seems we have pressure from other fronts for V2.5 stable fixes, I'd
> like to open the discussion on how best to manage it? I'd be happy to do
> additional backports and V2.5-stable tagging, or we could ask Kaleb to do
> it, or we can operate on a "make requests to Malahal" procedure.
>
> Any input here is welcome, I want to make sure we respect everyone's needs
> for V2.5-stable.
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Multiple stateid issues

2017-09-22 Thread Malahal Naineni
Still chasing it but we are running into clientid/client_record issues now.
I will have to chase the client_record bug we have. We seem to using freed
memory. I posted a patch but that didn't help.

Regards, Malahal.

On Thu, Sep 21, 2017 at 11:08 AM, Frank Filz 
wrote:

> Malahal reported that it's possible to have multiple lock stateids for the
> same lock owner/file combination.
>
> I'm not quite sure where the problem is, but here's a fix that will prevent
> races between FREE_STATEID and LOCK/new lock owner for NFS v4.1.
>
> Malahal, if you could find out more about what the path in your case is,
> perhaps we can amend this new logic to fix it. There must be some issue
> with
> LOCK/new lock owner, though I know you are on 4.0 which doesn't have
> FREE_STATEID.
>
> Interestingly, in 4.0, if all locks had been released on a stateid, and
> then
> the client issued a LOCK/new lock owner, and that LOCK request fails,
> Ganesha will NOW free the stateid... (that was true before my patch - we
> always delete the stateid on lock failure when new lock owner was
> requested,
> even if we found an existing stateid).
>
> I wonder if we have any path where 4.0 issues concurrent LOCK/new lock
> owner
> requests? I don't think that should happen because of the open owner seqid
> sequencing, but maybe it can happen. In that case, if one lock succeeded
> and
> the 2nd failed, the stateid could be destroyed along with the held lock...
>
> Here's my patch:
>
> https://review.gerrithub.io/#/c/379421/
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposal to manage global file descriptors

2017-09-22 Thread Malahal Naineni
commit call also uses global fd and RHEL6.3 clients do send commit in your
case. If you use any recent client, that is probably due to getattr though.
Ganesha seems to close global file descriptors in the reaper call if we set
cache_fds as FALSE in our ganesha config. Since reaper is run only
periodically, we may not have much performance degradation, if any, by
closing global fds (with NFSv3) every 90 seconds or so. That is a temporary
work around we are going to use in the short term. Opening and not closing
files is a bad implementation...

Regards, Malahal.

On Fri, Sep 22, 2017 at 5:15 AM, Frank Filz  wrote:

> Philippe discovered that recent Ganesha will no longer allow compiling the
> linux kernel due to dangling open file descriptors.
>
> I'm not sure if there is any true leak, the simple test of echo foo >
> /mnt/foo does show a remaining open fd for /mnt/foo, however that is the
> global fd opened in the course of doing a getattrs on FSAL_ VFS.
>
> We have been talking about how the current management of open file
> descriptors doesn't really work, so I have a couple proposals:
>
> 1. We really should have a limit on the number of states we allow. Now that
> NLM locks and shares also have a state_t, it would be simple to have a
> count
> of how many are in use, and return a resource error if an operation
> requires
> creating a new one past the limit. This can be a hard limit with no grace,
> if the limit is hit, then alloc_state fails.
>
> 2. Management of the global fd is more complex, so here goes:
>
> Part of the proposal is a way for the FSAL to indicate that an FSAL call
> used the global fd in a way that consumes some kind of resource the FSAL
> would like managed.
>
> FSAL_PROXY should never indicate that (anonymous I/O should be done using a
> special stateid, and a simple file create should result in the open stateid
> immediately being closed, if that's not the case, then it's easy enough to
> indicate use of a limited resource.
>
> FSAL_VFS would indicate use of the resource any time it utilizes the global
> fd. If it uses a temp fd that is closed after performing the operation, it
> would not indicate use of the limited resource.
>
> FSAL_GPFS, FSAL_GLUSTER, and FSAL_CEPH should all be similar to FSAL_VFS.
>
> FSAL_RGW only has a global fd, and I don't quite understand how it is
> managed.
>
> The main part of the proposal is to actually create a new LRU queue for
> objects that are using the limited resource.
>
> If we are at the hard limit on the limited resource and an entry that is
> not
> already in the LRU uses the resource, then we would reap an existing entry
> and call fsal_close on it to release the resource. If an entry was not
> available to be reaped, we would temporarily exceed the limit just like we
> do with mdcache entries.
>
> If an FSAL call resulted in use of the resource and the entry was already
> in
> the resource LRU, then it would be bumped to MRU of L1.
>
> The LRU run thread for the resource would demote objects from LRU L1 to MRU
> of L2, and call fsal_close and remove objects from LRU of L2. I think it
> should work to close any files that have not been used in the amount of
> time, really using the L1 and L2 to give a shorter life to objects for
> which
> the resource is used once and then not used again, whereas a file that is
> accessed multiple times would have more resistance to being closed. I think
> the exact mechanics here may need some tuning, but that's the general idea.
>
> The idea here is to be constantly closing files that have not been accessed
> recently, and also to better manage a count of the files for which we are
> actually using the resources, and not keep a file open just because for
> some
> reason we do lots of lookups or stats of it (we might have to open it for
> getattrs, but then we might serve a bunch of cached attrs, which doesn't go
> to disk, might as well close the fd).
>
> I also propose making the limit for the resource configurable independent
> of
> the ulimit for file descriptors, though if an FSAL is loaded that actually
> uses file descriptors for open files should check that the ulimit is big
> enough, it should also include the limit on state_t also. Of course it will
> be impossible to account for file descriptors used for sockets, log files,
> config files, or random libraries that like to open files...
>
> The time has come to fix this...
>
> Frank
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

Re: [Nfs-ganesha-devel] Recommended stable release for NFS-Ganesha

2017-09-16 Thread Malahal Naineni
I just updated the wiki. 2.5.2 is the latest recommended release.

On Fri, Sep 15, 2017 at 2:30 AM, Madhu Venugopal <
madhu.venugo...@riverbed.com> wrote:

> Hi,
>
> I am writing to enquire about the recommended stable release for
> NFS-Ganesha. I see that 2.5.2 is out on https://github.com/nfs-
> ganesha/nfs-ganesha/releases. But the wiki page at https://github.com/nfs-
> ganesha/nfs-ganesha/wiki only talks about versions 2.3 and 2.4. It has a
> line in there saying : "The current 2.4 release is 2.4.1. We recommend
> users upgrade to this version.” Is this still the case? Or does 2.5.2 serve
> as the new stable release?
>
> We use NFS-Ganesha to serve ESXi datastores using NFS v3. Currently we are
> on version 2.2.0 and are looking to upgrade to a newer version. Hence the
> question.
>
> Thanks,
> Madhu
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Continuing CI pain

2017-09-14 Thread Malahal Naineni
One of our colleagues ran into this with RHEL7.4 update. Our ganesha.nfsd
wasn't able to read our config files which are stored in /var directory. :-(

On Thu, Sep 14, 2017 at 9:47 PM, Daniel Gryniewicz  wrote:

> As far as I know, this doesn't happen on Fedora, so it hasn't been
> reported anywhere.
>
> On Thu, Sep 14, 2017 at 6:58 AM, Niels de Vos  wrote:
> > On Wed, Sep 13, 2017 at 10:39:52AM +0200, Niels de Vos wrote:
> >> On Tue, Sep 12, 2017 at 06:41:49PM -0400, William Allen Simpson wrote:
> >> > On 9/12/17 6:06 PM, Frank Filz wrote:
> >> > > So this failure:
> >> > >
> >> > > https://ci.centos.org//job/nfs_ganesha_cthon04/1436/console
> >> > >
> >> > > Is an example of where we need some improvement. I looked at the
> top and
> >> > > scrolled down to the end. I have no idea why it failed. This is a
> case of
> >> > > too much information without a concise error report.
> >> > >
> >> > Installed:
> >> >   libntirpc.x86_64 0:1.6.0-dev.7.el7.centos
> >> >   nfs-ganesha.x86_64 0:2.6-dev.7.el7.centos
> >> >   nfs-ganesha-gluster.x86_64 0:2.6-dev.7.el7.centos
> >> >
> >> > Complete!
> >> > + systemctl start nfs-ganesha
> >> > Job for nfs-ganesha.service failed because the control process exited
> with
> >> > error code. See "systemctl status nfs-ganesha.service" and
> "journalctl -xe"
> >> > for details.
> >> > Build step 'Execute shell' marked build as failure
> >> > Finished: FAILURE
> >> >
> >> > ===
> >> >
> >> > Why not print "systemctl status nfs-ganesha.service" and "journalctl
> -xe"?
> >> >
> >> > Originally I assumed that it was some obscure problem with my code,
> but
> >> > then I looked around, and it seems to be all the submissions for
> >> > nfs_ganesha_cthon04 at the moment
> >>
> >> The additional information will now be logged as well. This change in
> >> the centos-ci branch does it:
> >>
> >>   https://github.com/nfs-ganesha/ci-tests/pull/14/files
> >>
> >> A (manually started) test run logs the errors more clearly:
> >>
> >>   https://ci.centos.org/job/nfs_ganesha_cthon04/1439/console
> >>
> >>
> >> Sep 13 09:33:41 n9.pufty.ci.centos.org bash[20711]: 13/09/2017
> 09:33:41 : epoch 59b8ed65 : n9.pufty.ci.centos.org :
> ganesha.nfsd-20711[main] create_log_facility :LOG :CRIT :Cannot create new
> log file (/var/log/ganesha/ganesha.log), because: Permission denied
> >> Sep 13 09:33:41 n9.pufty.ci.centos.org bash[20711]: 13/09/2017
> 09:33:41 : epoch 59b8ed65 : n9.pufty.ci.centos.org :
> ganesha.nfsd-20711[main] init_logging :LOG :FATAL :Create error (Permission
> denied) for FILE (/var/log/ganesha/ganesha.log) logging!
> >> Sep 13 09:33:41 n9.pufty.ci.centos.org systemd[1]:
> nfs-ganesha.service: control process exited, code=exited status=2
> >> Sep 13 09:33:41 n9.pufty.ci.centos.org systemd[1]: Failed to start
> NFS-Ganesha file server.
> >> Sep 13 09:33:41 n9.pufty.ci.centos.org systemd[1]: Unit
> nfs-ganesha.service entered failed state.
> >> Sep 13 09:33:41 n9.pufty.ci.centos.org systemd[1]: nfs-ganesha.service
> failed.
> >>
> >>
> >> Why creating the logfile fail is not clear to me. Maybe something in the
> >> packaging was changed and the /var/log/ganesha/ directory is not
> >> writable for the ganesha.nfsd process anymore? Have changes for running
> >> as non-root been merged, maybe?
> >
> > This seems to be a problem with the CentOS rebuild of RHEL-7.4. The
> > CentOS CI gets the new packages before the release is made available for
> > all users. I have run a test with SELinux in Permissive mode, and this
> > passed just fine.
> >
> > https://ci.centos.org/job/nfs_ganesha_cthon04/1445/consoleFull
> >
> > As a temporary (hopefully!) solution, doing a 'setenforce 0' in the
> > preparation script should help here:
> >   https://github.com/nfs-ganesha/ci-tests/pull/15
> >
> > I would like to know if this problem has been reported against Fedora or
> > RHEL already. Once the bug is fixed in selinux-policy for RHEL, the
> > CentOS package will get an update soon after, and we can run our tests
> > with SELinux in Enforcing mode again.
> >
> > Thanks,
> > Niels
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-07 Thread Malahal Naineni
Last time I tried, I got the same. A thread was waiting in epoll_wait()
with 29 second timeout that, it was working after such a timeout.

On Fri, Sep 8, 2017 at 3:46 AM, Frank Filz  wrote:

> I wanted to see what is up with shutdown lately...
>
> Running under gdb, I hit a long pause, but shutdown is completing for me,
> during that pause, these are the active threads:
>
> (gdb) thread apply all bt
>
> Thread 276 (Thread 0x7fff6b2fb700 (LWP 5364)):
> #0  0x7638027d in nanosleep () from /lib64/libpthread.so.0
> #1  0x75f4f3dd in work_pool_shutdown (pool=0x76167ac0
> ) at
> /home/ffilz/ganesha/review/src/libntirpc/src/work_pool.c:318
> #2  0x75f4116d in svc_shutdown (flags=0) at
> /home/ffilz/ganesha/review/src/libntirpc/src/svc.c:811
> #3  0x0045a5c8 in do_shutdown () at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_admin_thread.c:512
> #4  0x0045a8b6 in admin_thread (UnusedArg=0x0) at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_admin_thread.c:545
> #5  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #6  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 274 (Thread 0x7fff6c2fd700 (LWP 5362)):
> #0  0x7637fd9d in accept () from /lib64/libpthread.so.0
> #1  0x00458287 in _9p_dispatcher_thread (Arg=0x0) at
> /home/ffilz/ganesha/review/src/MainNFSD/9p_dispatcher.c:582
> #2  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #3  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 9 (Thread 0x77f0e700 (LWP 5083)):
> #0  0x75a49043 in epoll_wait () from /lib64/libc.so.6
> #1  0x75f45ce1 in svc_rqst_epoll_loop (sr_rec=0x72ece8c0) at
> /home/ffilz/ganesha/review/src/libntirpc/src/svc_rqst.c:893
> #2  0x75f45e1e in svc_rqst_run_task (wpe=0x72ece8d0) at
> /home/ffilz/ganesha/review/src/libntirpc/src/svc_rqst.c:945
> #3  0x75f4ede1 in work_pool_thread (arg=0x7fffeffd7080) at
> /home/ffilz/ganesha/review/src/libntirpc/src/work_pool.c:171
> #4  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #5  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 3 (Thread 0x723fe700 (LWP 5077)):
> #0  0x7637ceb9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x004fe9cf in fridgethr_freeze (fr=0x72c44480,
> thr_ctx=0x72c13580) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:416
> #2  0x004ff1f9 in fridgethr_start_routine (arg=0x72c13580) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:554
> #3  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #4  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 2 (Thread 0x72bff700 (LWP 5076)):
> #0  0x7637ceb9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x004fe9cf in fridgethr_freeze (fr=0x72c44480,
> thr_ctx=0x72c13300) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:416
> #2  0x004ff1f9 in fridgethr_start_routine (arg=0x72c13300) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:554
> #3  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #4  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 1 (Thread 0x77f43140 (LWP 5049)):
> #0  0x763786ad in pthread_join () from /lib64/libpthread.so.0
> #1  0x004529f1 in nfs_start (p_start_info=0x7d2ef8
> ) at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_init.c:960
> #2  0x0041d253 in main (argc=8, argv=0x7fffe3d8) at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_main.c:494
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] V2.6 WRT14

2017-09-01 Thread Malahal Naineni
Hopefully, it fixes this valgrind warning I just got:

==17120== Thread 13:
==17120== Conditional jump or move depends on uninitialised value(s)
==17120==at 0x6886A15: svc_vc_recv (svc_vc.c:745)
==17120==by 0x6883573: svc_rqst_xprt_task (svc_rqst.c:683)
==17120==by 0x68839F3: svc_rqst_epoll_events (svc_rqst.c:856)
==17120==by 0x6883B43: svc_rqst_epoll_loop (svc_rqst.c:907)
==17120==by 0x6883C15: svc_rqst_run_task (svc_rqst.c:945)
==17120==by 0x688CC2B: work_pool_thread (work_pool.c:197)
==17120==by 0x6441DC4: start_thread (pthread_create.c:308)
==17120==by 0x6DB673C: clone (clone.S:113)


On Fri, Sep 1, 2017 at 6:56 PM, Daniel Gryniewicz  wrote:

> On 09/01/2017 07:09 AM, William Allen Simpson wrote:
>
>> On 8/30/17 1:34 PM, William Allen Simpson wrote:
>>
>>> On 8/28/17 1:23 AM, Frank Filz wrote:
>>>
 WRT14 is the test that failed that made me kick Bill’s patch out of
 dev.5, then I couldn’t get it to fail again, so I included the patch in
 dev.6.

 Since it turned out not to be a dev.5 issue.  Malahal is reporting
>>> dev.3.
>>>
>>> We should start a new thread, as this isn't about dev.6.
>>>
>>>
>>> I think there is something that is timing sensitive that is exposed, and
 maybe c291141 is the trigger.

 I need someone who can reliably re-create to dive into it… Maybe that
 will be me this week…

 Malahal says he's able to reliably reproduce on GPFS, was going to
>>> test VFS too.
>>>
>>> He also was going to send us his config, but we haven't seen that yet.
>>>
>>
>> DanG was able to reproduce.  Turns out the key was not using loopback to
>> test; required sending over an actual network to induce the timing.  Then
>> logging worked.
>>
>> It only occurs where WRT14 sends a minimal 4 byte XDR fragment after a
>> series of long ones, and the TCP segmentation happens to align.
>>
>> Although I'm not yet able to reproduce myself, the problem was obvious
>> walking through the code.  An accidental line deletion (setting a local
>> flags variable) in 1 of 3 paths.  (That line was present in earlier
>> versions.)  Because it is set properly in the most common path, the
>> compiler didn't give an uninitialized warning.
>>
>> Hopefully DanG will be able to verify, and we'll get it in today!
>>
>
> I've verified the fix.
>
> Daniel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Crash in TIRPC with Ganesha 2.6-dev.5

2017-09-01 Thread Malahal Naineni
Thank you Bill. Hope we can get it back ported to V2.5 as well.

Regards, Malahal.

On Fri, Sep 1, 2017 at 4:15 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 8/31/17 5:42 PM, William Allen Simpson wrote:
>
>> On 8/31/17 12:17 PM, Pradeep wrote:
>>
>>> Thanks Dan and Bill for the quick response. As Dan suggested, is moving
>>> svc_rqst_xprt_register() to the end​ of svc_vc_rendezvous()​ the right​ fix?
>>>
>>> Partly.  Also needs checking the error return.
>>
>> https://github.com/linuxbox2/ntirpc/commit/28d3e96d4a7296805
>> 2216303b496521b364898a6
>>
>> Tested.  Will submit tomorrow.
>>
>
> Simplified.  Just test the error return, no need for saving it in a local
> variable (removed code left over from verifying the problem).
>
> Submitted with another fix for a missing local variable initialization
> that may fix the WRT14 issue.
>
> Dan will review (and verify test) himself, hoping that this will make it
> into today's update.
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.6

2017-08-25 Thread Malahal Naineni
Hi Bill and Frank, I tried pynfs with the latest V2.6, WRT14 fails for me.
It passed with dev-2 and failed with dev-3. The only commit that is suspect
at this point is c29114162bb553270835c8d51d4184ce8bb1ab32

Can someone verify if WRT14 (st_write.testLargeWrite) passes for them? Do
we run pynfs as part of our CI tests? Looks like not!

On Sat, Aug 26, 2017 at 3:12 AM, Frank Filz  wrote:

> Branch next
>
> Tag:V2.6-dev.6
>
> Release Highlights
>
> * Remove FSAL_ZFS and libzfswrap
>
> * FSAL_PROXY : preserving request_mask in getattrs
>
> * FSAL_PROXY : add verbosity in EXCHANGE_ID
>
> * GLUSTER: make sure to free xstat when meeting error
>
> * DBUS interface for purging idmapper cache.
>
> * Napalm nfs_worker_thread NFS_REQUEST queue
>
> Signed-off-by: Frank S. Filz 
>
> Contents:
>
> 28ffcad Frank S. Filz V2.6-dev.6
> 24de99c William Allen Simpson Napalm nfs_worker_thread NFS_REQUEST queue
> b3666c6 Rishabh Sharma DBUS interface for purging idmapper cache.
> 85bd921 Kinglong Mee GLUSTER: make sure to free xstat when meeting error
> 82fcb41 Patrice LUCAS FSAL_PROXY : add verbosity in EXCHANGE_ID
> cca16ac Patrice LUCAS FSAL_PROXY : preserving request_mask in getattrs
> 9238fd9 Frank S. Filz Remove libzfswrap
> 10c9226 Frank S. Filz Strip FSAL_ZFS out
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] V2.5.2

2017-08-24 Thread Malahal Naineni
sorry, I created one locally but forgot to push. Thank you.

On Thu, Aug 24, 2017 at 12:28 AM, Frank Filz 
wrote:

> Malahal pushed the commits for V2.5.2, but he didn't push a signed tag. In
> the interests of moving things along, I have pushed a signed tag for V2.5.2
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2

2017-08-17 Thread Malahal Naineni
I did but the failover/failback code re-org looked like contributed, but I
am not positive.

On Thu, Aug 17, 2017 at 7:40 PM, Frank Filz <ffilz...@mindspring.com> wrote:

> Hmm, did you cherry pick in the original order?
>
>
>
> I’ll take a look at this later today.
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Wednesday, August 16, 2017 11:34 PM
> *To:* Matt Benjamin <mbenj...@redhat.com>
> *Cc:* Frank Filz <ffilz...@mindspring.com>; Soumya Koduri <
> skod...@redhat.com>; nfs-ganesha-devel <nfs-ganesha-devel@lists.
> sourceforge.net>
>
> *Subject:* Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
>
>
>
> Dan, I backported everything that was needed except the following 2 as I
> don't want to mess with cmake! Can you please quickly send ported patches?
> Appreciate your help. The latest V2.5 code is at  my personal github branch
> V2.5-stable:
>
>
>
> https://github.com/malahal/nfs-ganesha/commits/V2.5-stable
>
>
>
> The following 2 commits failed to apply:
>
>
>
> 6bd32da613e26a768ac1dc4db1001395bd10c295 CMake - Have 'make dist'
> generate the correct tarball name
>
> ff98ea64b6d1228443a35b2f7ceb3c61c0a0c1d1 Build libntirpc package when not
> using system ntirpc
>
>
>
>
>
>
>
> On Wed, Aug 16, 2017 at 10:47 PM, Matt Benjamin <mbenj...@redhat.com>
> wrote:
>
> Hi Frank,
>
> On Wed, Aug 16, 2017 at 1:11 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
> > Oh, nice.
>
> >
> >
> > Matt, what about this one?
> >
> >
> >
> > 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
>
> RHCS doesn't officially support this, but I'd say it would be nice to have.
>
> Matt
>
>
> >
> >
> >
> > Frank
> >
> >
> >
> >
> >
> > From: Malahal Naineni [mailto:mala...@gmail.com]
> > Sent: Wednesday, August 16, 2017 9:28 AM
> > To: Soumya Koduri <skod...@redhat.com>
> > Cc: Frank Filz <ffilz...@mindspring.com>; d...@redhat.com; Matt Benjamin
> > <mbenj...@redhat.com>; nfs-ganesha-devel
> > <nfs-ganesha-devel@lists.sourceforge.net>
> > Subject: Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
> >
> >
> >
> > I pushed a notes branch "refs/notes/backport" which has a note saying
> > "backport to V2.5". You should be able to fetch this special branch with
> > "git fetch origin refs/notes/*:refs/notes/*". After fetching this special
> > branch, you should do "export GIT_NOTES_REF=refs/notes/backport" in your
> > SHELL and then run the usual "git log" to see if I missed any commits you
> > are interested in.
> >
> >
> >
> > Alternatively, the following are the commits that will NOT be back
> ported.
> > Let me know if you need any of these. I will cherry pick things tomorrow
> and
> > publish the branch, if there are no comments...
> >
> >
> >
> > 00b9e0798 Revert "CMake - Have 'make dist' generate the correct tarball
> > name"
> >
> > 1b60d5df2 FSAL_MEM - fix UP thread init/cleanup
> >
> > 39119aab0 FSAL_GLUSTER: Use glfs_xreaddirplus_r for readdir
> >
> > 4b4e21ed9 Manpage - Fix installing manpages in RPM
> >
> > 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
> >
> > b862fe360 SAL: extract fs logic from nfs4_recovery
> >
> > c29114162 Napalm dispatch plus plus
> >
> > c8bc40b69 CMake - Have 'make dist' generate the correct tarball name
> >
> > cb787a1cf SAL: introduce new recovery backend based on rados kv store
> >
> > eadfc762e New (empty) sample config
> >
> > eb4eea134 config: add new config options for rados_kv recovery backend
> >
> > fbc905015 cmake: make modulized recovery backends compile as modules
> >
> >
> >
> >
> >
> > On Fri, Aug 11, 2017 at 8:08 AM, Soumya Koduri <skod...@redhat.com>
> wrote:
> >
> >
> >> commit 7f2d461277521301a417ca368d3c7656edbfc903
> >>  FSAL_GLUSTER: Reset caller_garray to NULL upon free
> >>
> >
> > Yes
> >
> > On 08/09/2017 08:57 PM, Frank Filz wrote:
> >
> > 39119aa Soumya Koduri FSAL_GLUSTER: Use glfs_xreaddirplus_r for
> > readdir
> >
> > Yes? No? It's sort of a new feature, but may be critical for some use
> cases.
> > I'd rather it go into stable than end up separately backported for
> > downstream.
> >
> >
> > Right..as it is more of a n

Re: [Nfs-ganesha-devel] v2.6-dev-4 leaves 271 threads hanging around

2017-08-17 Thread Malahal Naineni
Bill, I tried to reproduce without gdb. It goes down after few seconds
(around 30) due to svc_rqst_epoll_loop() waiting for about 29 seconds. I
tried with gdb as well, it came out too. I saw only few threads (about 10)
after sending the signal. Can you tell me how I can reproduce without 'gdb'
? (gpfs fsal has some issues with gdb at times..)

Regards, Malahal.

On Thu, Aug 17, 2017 at 4:56 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 8/17/17 7:19 AM, William Allen Simpson wrote:
>
>> On 8/15/17 11:53 AM, William Allen Simpson wrote:
>>
>>> Rather than spam the entire list, if anybody wants the gdb bt.  I can
>>> send the ganesha.log, too, but it's bigger.
>>>
>>> To test, rm the log, setup the libraries, gdb, run -F -- and on another
>>> connection pkill ganesha.  Nothing else.  That's always my first test.
>>>
>>
>> Retested with V2.6-dev.4a
>>
>> Took a long shower, 6:35-7:05, to ensure plenty of time.  Same result.
>> Exactly 271 danging threads, almost all waiting in nfs_rpc_dequeue_req.
>>
>
> But the exact same code just verified on both centos and CEA at:
>
> # View Change 
>
> Was there a required config parameter change?
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2

2017-08-17 Thread Malahal Naineni
Dan, I backported everything that was needed except the following 2 as I
don't want to mess with cmake! Can you please quickly send ported patches?
Appreciate your help. The latest V2.5 code is at  my personal github branch
V2.5-stable:

https://github.com/malahal/nfs-ganesha/commits/V2.5-stable

The following 2 commits failed to apply:

6bd32da613e26a768ac1dc4db1001395bd10c295 CMake - Have 'make dist' generate
the correct tarball name
ff98ea64b6d1228443a35b2f7ceb3c61c0a0c1d1 Build libntirpc package when not
using system ntirpc



On Wed, Aug 16, 2017 at 10:47 PM, Matt Benjamin <mbenj...@redhat.com> wrote:

> Hi Frank,
>
> On Wed, Aug 16, 2017 at 1:11 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
> > Oh, nice.
>
> >
> >
> > Matt, what about this one?
> >
> >
> >
> > 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
>
> RHCS doesn't officially support this, but I'd say it would be nice to have.
>
> Matt
>
> >
> >
> >
> > Frank
> >
> >
> >
> >
> >
> > From: Malahal Naineni [mailto:mala...@gmail.com]
> > Sent: Wednesday, August 16, 2017 9:28 AM
> > To: Soumya Koduri <skod...@redhat.com>
> > Cc: Frank Filz <ffilz...@mindspring.com>; d...@redhat.com; Matt Benjamin
> > <mbenj...@redhat.com>; nfs-ganesha-devel
> > <nfs-ganesha-devel@lists.sourceforge.net>
> > Subject: Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
> >
> >
> >
> > I pushed a notes branch "refs/notes/backport" which has a note saying
> > "backport to V2.5". You should be able to fetch this special branch with
> > "git fetch origin refs/notes/*:refs/notes/*". After fetching this special
> > branch, you should do "export GIT_NOTES_REF=refs/notes/backport" in your
> > SHELL and then run the usual "git log" to see if I missed any commits you
> > are interested in.
> >
> >
> >
> > Alternatively, the following are the commits that will NOT be back
> ported.
> > Let me know if you need any of these. I will cherry pick things tomorrow
> and
> > publish the branch, if there are no comments...
> >
> >
> >
> > 00b9e0798 Revert "CMake - Have 'make dist' generate the correct tarball
> > name"
> >
> > 1b60d5df2 FSAL_MEM - fix UP thread init/cleanup
> >
> > 39119aab0 FSAL_GLUSTER: Use glfs_xreaddirplus_r for readdir
> >
> > 4b4e21ed9 Manpage - Fix installing manpages in RPM
> >
> > 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
> >
> > b862fe360 SAL: extract fs logic from nfs4_recovery
> >
> > c29114162 Napalm dispatch plus plus
> >
> > c8bc40b69 CMake - Have 'make dist' generate the correct tarball name
> >
> > cb787a1cf SAL: introduce new recovery backend based on rados kv store
> >
> > eadfc762e New (empty) sample config
> >
> > eb4eea134 config: add new config options for rados_kv recovery backend
> >
> > fbc905015 cmake: make modulized recovery backends compile as modules
> >
> >
> >
> >
> >
> > On Fri, Aug 11, 2017 at 8:08 AM, Soumya Koduri <skod...@redhat.com>
> wrote:
> >
> >
> >> commit 7f2d461277521301a417ca368d3c7656edbfc903
> >>  FSAL_GLUSTER: Reset caller_garray to NULL upon free
> >>
> >
> > Yes
> >
> > On 08/09/2017 08:57 PM, Frank Filz wrote:
> >
> > 39119aa Soumya Koduri FSAL_GLUSTER: Use glfs_xreaddirplus_r for
> > readdir
> >
> > Yes? No? It's sort of a new feature, but may be critical for some use
> cases.
> > I'd rather it go into stable than end up separately backported for
> > downstream.
> >
> >
> > Right..as it is more of a new feature, wrt upstream we wanted it to be
> part
> > of only 2.6 on wards so as not to break stable branch (in case if there
> are
> > nit issues).
> >
> > But yes we may end up back-porting to downstream if we do not rebase to
> 2.6
> > by then.
> >
> > Thanks,
> > Soumya
> >
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
> >
> >
> >
> > Virus-free. www.avast.com
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2

2017-08-16 Thread Malahal Naineni
I pushed a notes branch "refs/notes/backport" which has a note saying
"backport to V2.5". You should be able to fetch this special branch with
"git fetch origin refs/notes/*:refs/notes/*". After fetching this special
branch, you should do "export GIT_NOTES_REF=refs/notes/backport" in your
SHELL and then run the usual "git log" to see if I missed any commits you
are interested in.

Alternatively, the following are the commits that will NOT be back ported.
Let me know if you need any of these. I will cherry pick things tomorrow
and publish the branch, if there are no comments...

00b9e0798 Revert "CMake - Have 'make dist' generate the correct tarball
name"
1b60d5df2 FSAL_MEM - fix UP thread init/cleanup
39119aab0 FSAL_GLUSTER: Use glfs_xreaddirplus_r for readdir
4b4e21ed9 Manpage - Fix installing manpages in RPM
814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
b862fe360 SAL: extract fs logic from nfs4_recovery
c29114162 Napalm dispatch plus plus
c8bc40b69 CMake - Have 'make dist' generate the correct tarball name
cb787a1cf SAL: introduce new recovery backend based on rados kv store
eadfc762e New (empty) sample config
eb4eea134 config: add new config options for rados_kv recovery backend
fbc905015 cmake: make modulized recovery backends compile as modules


On Fri, Aug 11, 2017 at 8:08 AM, Soumya Koduri  wrote:

>
> > commit 7f2d461277521301a417ca368d3c7656edbfc903
> >  FSAL_GLUSTER: Reset caller_garray to NULL upon free
> >
>
> Yes
>
> On 08/09/2017 08:57 PM, Frank Filz wrote:
>
>> 39119aa Soumya Koduri FSAL_GLUSTER: Use glfs_xreaddirplus_r for
>>> readdir
>>>
>> Yes? No? It's sort of a new feature, but may be critical for some use
>> cases.
>> I'd rather it go into stable than end up separately backported for
>> downstream.
>>
>>
> Right..as it is more of a new feature, wrt upstream we wanted it to be
> part of only 2.6 on wards so as not to break stable branch (in case if
> there are nit issues).
>
> But yes we may end up back-porting to downstream if we do not rebase to
> 2.6 by then.
>
> Thanks,
> Soumya
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-14 Thread Malahal Naineni
Hi Matt and Bill, we were able to reproduce this crash very easily with a
sleep after closing "fd" . After my fix, things worked fine. The changes
are a lot but mostly trivial. Appreciate any high level review.

ganesha changes (last but one commit at
https://github.com/ganltc/nfs-ganesha/commits/ibm2.3).

Corresponding ntirpc commit (last commit)
https://github.com/ganltc/ntirpc/commits/ibm2.3

On Mon, Aug 14, 2017 at 5:02 PM, Malahal Naineni <mala...@gmail.com> wrote:

> Unfortunately, I need a fix for this issue against ganesha2.3.
>
> Regards, Malahal.
>
> On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson <
> william.allen.simp...@gmail.com> wrote:
>
>> On 8/13/17 11:50 PM, Malahal Naineni wrote:
>>
>>>  >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing
>>> UDP. It's a mess, and has been a mess for a long time.
>>>
>>> We get a file descriptor fd and then create "rec", but while destroying
>>> things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in
>>> clnt_dg_destroy() (and other places) might help fix this issue, but I am
>>> not positive as I am not familiar with this code.
>>>
>>> I am also working on a blind replacement of "fd" by "struct gfd" where
>>> struct gfd has the "fd" as well as a "generation number". The generation
>>> number is incremented when ever such "fd" is created (e.g. accept() call or
>>> socket() call). The changes are many but they are trivial.
>>>
>>> Any thoughts?
>>>
>>> It's not really interesting for the current code base.  In V2.5, I've
>> already eliminated all the various copies of fd, and every SVCXPRT is
>> wrapped inside a dplx_rec, and they all use xp_fd, and it's in only one
>> tree (svc_rqst).  So there's no longer any possibility of multiple
>> generations of fd.
>>
>> That said, the last remaining problem is clnt_dg clnt_call, where the
>> fd can be passed to poll() at the same time as another copy is passed to
>> (or being removed from) epoll().  Requires a complete re-write.
>>
>> I'd started doing the re-write long long ago, even made the rpc_ctx
>> transport independent (committed in V2.6/v1.6 Napalm rendezvous patch).
>> But there are still many problems redesigning with async callbacks.
>>
>> I'm looking at the short-term fix I've mentioned earlier, that we should
>> try TCP before UDP, but given our current code base doesn't even compile,
>> I've given up until next week.
>>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-14 Thread Malahal Naineni
Unfortunately, I need a fix for this issue against ganesha2.3.

Regards, Malahal.

On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 8/13/17 11:50 PM, Malahal Naineni wrote:
>
>>  >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing
>> UDP. It's a mess, and has been a mess for a long time.
>>
>> We get a file descriptor fd and then create "rec", but while destroying
>> things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in
>> clnt_dg_destroy() (and other places) might help fix this issue, but I am
>> not positive as I am not familiar with this code.
>>
>> I am also working on a blind replacement of "fd" by "struct gfd" where
>> struct gfd has the "fd" as well as a "generation number". The generation
>> number is incremented when ever such "fd" is created (e.g. accept() call or
>> socket() call). The changes are many but they are trivial.
>>
>> Any thoughts?
>>
>> It's not really interesting for the current code base.  In V2.5, I've
> already eliminated all the various copies of fd, and every SVCXPRT is
> wrapped inside a dplx_rec, and they all use xp_fd, and it's in only one
> tree (svc_rqst).  So there's no longer any possibility of multiple
> generations of fd.
>
> That said, the last remaining problem is clnt_dg clnt_call, where the
> fd can be passed to poll() at the same time as another copy is passed to
> (or being removed from) epoll().  Requires a complete re-write.
>
> I'd started doing the re-write long long ago, even made the rpc_ctx
> transport independent (committed in V2.6/v1.6 Napalm rendezvous patch).
> But there are still many problems redesigning with async callbacks.
>
> I'm looking at the short-term fix I've mentioned earlier, that we should
> try TCP before UDP, but given our current code base doesn't even compile,
> I've given up until next week.
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-13 Thread Malahal Naineni
>> That trace is the NSM clnt_dg clnt_call, the only use of outgoing UDP.
It's a mess, and has been a mess for a long time.

We get a file descriptor fd and then create "rec", but while destroying
things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in
clnt_dg_destroy() (and other places) might help fix this issue, but I am
not positive as I am not familiar with this code.

I am also working on a blind replacement of "fd" by "struct gfd" where
struct gfd has the "fd" as well as a "generation number". The generation
number is incremented when ever such "fd" is created (e.g. accept() call or
socket() call). The changes are many but they are trivial.

Any thoughts?

Regards, Malahal.



On Fri, Aug 11, 2017 at 8:50 PM, Malahal Naineni <mala...@gmail.com> wrote:

> We do support TCP and UDP for NSM. If this customer is using clients that
> support TCP, then we don't need UDP for this customer. Isn't there a config
> where we can say the daemon only support TCP alone for NSM?
>
> There might be couple MacOS customers that might need UDP, but then I am
> not sure though. They do make use of UDP (including Linux) first if
> available.
>
> Regards, Malahal.
>
> On Fri, Aug 11, 2017 at 8:17 PM, William Allen Simpson <
> william.allen.simp...@gmail.com> wrote:
>
>> On 8/11/17 8:56 AM, Matt Benjamin wrote:
>>
>>> On Fri, Aug 11, 2017 at 8:44 AM, William Allen Simpson
>>> <william.allen.simp...@gmail.com> wrote:
>>>
>>>> On 8/11/17 8:26 AM, William Allen Simpson wrote:
>>>>
>>>>>
>>>>> On 8/11/17 2:29 AM, Malahal Naineni wrote:
>>>>>
>>>>>>
>>>>>> Following confirms that Thread1 (TCP) is trying to use the same "rec"
>>>>>> as
>>>>>> Thread42 (UDP), it is easy to reproduce on the customer system!
>>>>>>
>>>>>> There are 2 duplicated fd indexed trees, not well coordinated.  My
>>>>> 2015
>>>>> code to fix this went in Feb/Mar timeframe for Ganesha v2.5/ntirpc 1.5.
>>>>>
>>>>
>>>>
>>>> That trace is the NSM clnt_dg clnt_call, the only use of outgoing UDP.
>>>> It's a mess, and has been a mess for a long time.
>>>>
>>>> There is still an analogous problem (Dominique reported) where UDP
>>>> uses poll() on an fd at the same time that TCP uses epoll() on the
>>>> same fd.
>>>>
>>>> That's why I was asking whether your IBM systems support TCP for NSM?
>>>>
>>>> It would be a much easier back-portable fix to Ganesha to require TCP.
>>>> The code passes "tcp" parameter, but for some as yet unknown reason
>>>> tries UDP, too.
>>>>
>>>> Again, does IBM support TCP for NSM?
>>>>
>>>
>>> That doesn't sound like it's fixing anything at all.  If someone wants
>>> to do this on a downstream, they're welcome, but we've already had the
>>> upstream discussion about this.
>>>
>>> Who is this royal "we"?
>>
>> Everybody agrees that we need to support UDP incoming to Ganesha.  That's
>> src/svc_dg.c.
>>
>> Whereas src/clnt_dg.c has long been problematic.  As it is used in only
>> one place, it doesn't get much testing.  Ganesha v2.5/ntirpc v1.5 tried to
>> fix the known non-MT cases in clnt_dg.  And my code finally blessed on
>> Tuesday may have fixed some more.  But that won't help IBM shipping v2.3.
>>
>> Linux supports TCP for NSM.  If IBM supports TCP too, we're good to go.
>> That looks like a relatively simple easily back-portable fix.
>>
>> I'm pretty sure that Malahal actually has to please customers.
>>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-10 Thread Malahal Naineni
Hi All,

One of our customers reported the following backtrace. The returned
"rec" seems to be corrupted. Based on oflags, rpc_dplx_lookup_rec() didn't
allocate the "rec" in this call path. Its refcount is 2. More importantly
rec.hdl.xd is 0x51 (a bogus pointer) leading to the crash. GDB data is at
the end of this email. Note that this crash is observed in latest ganesha2.3
release.

Looking at rpc_dplx_lookup_rec() and rpc_dplx_unref(), looks like rec's
refcnt can go to 0 and then back up. Also, rpc_dplx_unref is releasing
rec-lock and then acquires hash-lock to preserve the lock order. After
dropping the lock at line 359 below, someone else could grab and change
refcnt to 1. The second thread could call rpc_dplx_unref() after it is done
beating the first thread and free the "rec". The first thread accessing
">node_k" at line 361 is in danger as it might be accessing freed
memory. In any case, this is NOT our backtrace here. :-(

Also, looking at the users of this "rec", they seem to close the file
descriptor and then call rpc_dplx_unref(). This has very nasty side effects
if my understanding is right. Say, thread one has fd 100, it closed it and
is calling rpc_dplx_unref to free the "rec", but in the mean time another
thread gets fd 100, and is calling rpc_dplx_lookup_rec(). At this point the
second thread is going to use the same "rec" as the first thread, correct?
Can it happen that a "rec" that belonged to UDP is now being given to a
thread doing "TCP"? This is one way I can explain the backtrace! The first
thread has to be UDP that doesn't need "xd" and the second thread should be
"TCP" where it finds that the "xd" is uninitialized because the "rec" was
allocated by a UDP thread. If you are still reading this email, kudos and a
big thank you.

357 if (rec->refcnt == 0) {
358 t = rbtx_partition_of_scalar(_dplx_rec_set.xt,
rec->fd_k);
359 REC_UNLOCK(rec);
360 rwlock_wrlock(>lock);
361 nv = opr_rbtree_lookup(>t, >node_k);
362 rec = NULL;


BORING GDB STUFF:

(gdb) bt
#0  0x3fff7aaaceb0 in makefd_xprt (fd=166878, sendsz=262144,
recvsz=262144, allocated=0x3ffab97fdb4c)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
#1  0x3fff7aaad224 in rendezvous_request (xprt=0x1000b125310,
req=0x3ffa2c0008f0)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:549
#2  0x10065104 in thr_decode_rpc_request (context=0x0,
xprt=0x1000b125310)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1729
#3  0x100657f4 in thr_decode_rpc_requests (thr_ctx=0x3ffedc001280)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1853
#4  0x10195744 in fridgethr_start_routine (arg=0x3ffedc001280)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/support/fridgethr.c:561

(gdb) p oflags
$1 = 0
(gdb) p rec->hdl.xd
$2 = (struct x_vc_data *) 0x51
(gdb) p *rec
$3 = {fd_k = 166878, locktrace = {mtx = {__data = {__lock = 2, __count = 0,
__owner = 92274, __nusers = 1, __kind = 3,
__spins = 0, __list = {__prev = 0x0, __next = 0x0}},
  __size =
"\002\000\000\000\000\000\000\000rh\001\000\001\000\000\000\003", '\000'
,
  __align = 2}, func = 0x3fff7aac6ca0 <__func__.8774> "rpc_dplx_ref",
line = 89}, node_k = {left = 0x0,
right = 0x0, parent = 0x3ff9c80034f0, red = 1, gen = 639163}, refcnt =
2, send = {lock = {we = {mtx = {__data = {
__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3,
__spins = 0, __list = {__prev = 0x0,
  __next = 0x0}}, __size = '\000' , "\003",
'\000' , __align = 0},
cv = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
__wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0}, __size = '\000' , __align = 0}},
  lock_flag_value = 0, locktrace = {func = 0x0, line = 0}}}, recv =
{lock = {we = {mtx = {__data = {__lock = 0,
__count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins =
0, __list = {__prev = 0x0, __next = 0x0}},
  __size = '\000' , "\003", '\000' , __align = 0}, cv = {__data = {
__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
__woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
__broadcast_seq = 0}, __size = '\000' ,
__align = 0}}, lock_flag_value = 0, locktrace = {
func = 0x3ffc00d8 "\300L\001", line = 0}}}, hdl = {xd = 0x51,
xprt = 0x0}}
(gdb)
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Weekly conference call timing

2017-08-10 Thread Malahal Naineni
Hour earlier is very good for me on any day except Friday.

On Thu, Aug 10, 2017 at 2:20 PM, Swen Schillig  wrote:

> Hi Frank
>
> I'd prefer to keep it the same day, an hour earlier is fine with me.
> If you need to move to another day, friday would suit me best.
>
> Cheers Swen
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] only use of UDP client is NSM

2017-08-09 Thread Malahal Naineni
This talks about xnfs, not sure how it is different from nfs though: Here
is a link that says UDP is mandatory for NSM (and TCP is optional).

http://pubs.opengroup.org/onlinepubs/9629799/chap11.htm

On Wed, Aug 9, 2017 at 9:28 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 8/8/17 11:46 PM, Malahal Naineni wrote:
>
>>  >> AFAICT from grep'ing the NFS documents, NFSv3 NSM *MUST* support TCP.
>>
>> As far as I know, there is only one version (version 1) of NSM protocol.
>> Let us not mix this with NFS protocol. Here is a wording I read from an NSM
>> document:
>>
>> "The NSM Protocol is required to support the UDP/IP transport protocol to
>> allow the NLM to operate. However, implementors may also choose to support
>> the TCP/IP transport protocol."
>>
>> Could you point (or send) me at that document?
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] commit test Comparisons

2017-06-29 Thread Malahal Naineni
My brain has easier time if it compares a variable with a constant (meaning
the constant is on the right side of the operator) than other way around.
I have come across many programmers that do this than compare a constant
with a variable.

I always thought that programmers doing the "constant on left" are just
trying to trick me! :-) Kidding aside what benefits does it have other than
compiler having an easier time detecting your mistakes?

Regards, Malahal.

On Thu, Jun 29, 2017 at 6:42 AM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 6/28/17 8:41 PM, William Allen Simpson wrote:
>
>> This is a good programming practice of long-standing value.
>>
>> Why of why do these evil commit tests keep creeping in?
>>
>> bill@simpson91:~/rdma/nfs-ganesha$ git commit --amend -a
>> WARNING: Comparisons should place the constant on the right side of the
>> test
>> #17: FILE: src/MainNFSD/nfs_rpc_dispatcher_thread.c:1777:
>> +if (XPRT_DONE <= stat) {
>>
>>
> Yep, it's another Joe Perches idiocy.  The bane of programmers since he
> changed all the if && || tests in 2009 in the Linux network stack to
> trailing form.  He tried to put that into checkpatch, and there was a
> general uprising -- as there's a lot in kernel NFS code.
>
> Thousands of annoying changes that made it impossible to rebase.
>
> Because of course this is so much more readable:
>
> if (something
> #ifdef TEST
>   &&
> morestuff
> #endif
> ) {
>
>
> I've been teaching leading constants in C for almost 40 years.  Leading
> constants are preferred in many cases, not least because they yield
> better compiler error warnings and better code readability.
>
> As do leading && || 
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] IRC Channel

2017-06-20 Thread Malahal Naineni
I could run a proxy but is there is a place where I can see all IRC chat
messages when I was gone?

Regards, Malahal.

On Mon, Jun 19, 2017 at 9:13 PM, Frank Filz  wrote:

> I'm not sure how well it's known, we do have a #ganesha channel on FreeNode
> where many of the developers and a few users hang out. We are pretty
> friendly and willing to answer questions.
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ntirpc GSS over TCP checksum

2017-06-15 Thread Malahal Naineni
Bill, sorry for the delay. Some in our team do test krb5 authentication but
I am not familiar with this code other than some context hashing. If you
need some testing, feel free to give me a link. I can build and give the
bits to folks who test it.

Regards, malahal.

On Wed, Jun 7, 2017 at 9:41 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> Malahal, I'm told you know this GSS code best, or at least test it.
> And Marcus might have written it.
>
> In svc_auth_gss.c, svcauth_gss_accept_sec_context(), it calls
> svc_getargs() -- same as standard tirpc.
>
> On the UDP side, that calls SVCAUTH_UNWRAP() and possibly
> svc_dg_freeargs() on failure.  No checksum is done here, it is in
> svc_dg_recv() over the raw data.
>
> On the TCP side, that calls SVCAUTH_UNWRAP() and possibly
> svc_dg_freeargs() on failure.  But on success, it does the
> checksum over the authenticated/decrypted data.
>
> Also, for UDP I'm changing to match the TCP code, so that we don't
> have the checksum expense for non-cached error returns.
>
> This means that for GSS, the checksum is done twice?
>
> Standard tirpc has no checksum.
>
> (RDMA doesn't do the checksum at all.)
>
> Would it be OK to remove the GSS call to svc_getargs() and call
> SVCAUTH_UNWRAP() directly?
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ntirpc v1.5.2 .gitignore patch

2017-06-06 Thread Malahal Naineni
>> Much easier to edit the textual patch visually, then git am.

I never manually edited patches! I use "git mergetool" after a rebase
failure which brings up vim in 4 windows. Then it is a matter of editing a
file in vim with all your usual vim commands.  I know, it is important to
know one 'workflow' very well and stick to it.

Regards, Malahal.

On Mon, Jun 5, 2017 at 1:57 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 6/3/17 3:56 PM, Malahal Naineni wrote:
>
>> I don't understand why you need to keep patches when git gives you the
>> power to store your code in its own branches/stashes. Arguably, git commits
>> have more information than a simple patch diff.
>> That's why I use git format-patch.  The default directory is the top of
>>
> the git repository.  All patches there should be ignored.
>
> I know I'm an old fuddy-duddy, but I've been using git practically since
> it was released -- for kernel and dARPA projects.  So I'm probably using
> older techniques, such a preparing my patch series for email as was
> originally intended.
>
> Moreover, I don't find keeping a lot of branches and commits around in
> the git tree very helpful.  Many of my patches here are 2 years old and
> need a lot of massaging to re-commit.  So rebase causes a lot of pain.
>
> Much easier to edit the textual patch visually, then git am.
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ntirpc v1.5.2 .gitignore patch

2017-06-03 Thread Malahal Naineni
I don't understand why you need to keep patches when git gives you the
power to store your code in its own branches/stashes. Arguably, git commits
have more information than a simple patch diff.

On Fri, Jun 2, 2017 at 9:49 PM, Niels de Vos  wrote:

> On Fri, Jun 02, 2017 at 10:48:45AM -0400, Daniel Gryniewicz wrote:
> > https://github.com/nfs-ganesha/ntirpc/pull/49
>
> The nfs-ganesha .gitignore was improved to only un-ignore patches in the
> debian packaging directory. It would have been good to apply that same
> logic to ntirpc:
>   https://review.gerrithub.io/362414
>
> Niels
>
>
> >
> > Daniel
> >
> > On 06/02/2017 09:40 AM, William Allen Simpson wrote:
> > > Just updated my ntirpc, and suddenly my git commit -a and such are
> > > listing my patch files (in red).  What the heck?  Why this change?
> > >
> > > Since I've got a lot of outstanding patches in various stages of
> > > progress, this is really a terrible inconvenience.
> > >
> > > Worse than an inconvenience, as changing branches or reset --hard
> > > will throw away my patches
> > >
> > > 
> --
> > >
> > > Check out the vibrant tech community on one of the world's most
> > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > > ___
> > > Nfs-ganesha-devel mailing list
> > > Nfs-ganesha-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Question on DBUS /org/ganesha/nfsd/admin

2017-05-31 Thread Malahal Naineni
I don't think anyone still uses QT based scripts. ganesha_mgr is a non-GUI
based python script that we use and should work for you as well.

Regards, Malahal.

On Wed, May 31, 2017 at 9:39 AM, Manish Gupta 
wrote:

> Team
>
> I was trying to find various component in GANESHA for which i have enabled
> the logging and the log level.
>
> DBUS interface provides a way through /org/ganesha/nfsd/admin
> 
>  interface
> to do so.
>
> Infact, GANESHA team has provided several python based script to achieve
> the same . But there is dependency of QTDBUS for admin script.
> My OS is CLI based and hence unable to utilize and find the relevant info.
>
> has any one tried or attempted to solve the same problem.
>
> Any other way like ( dbus-send) to achieve the same.
>
> --
> Thanks
> -Manish
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Conditional compile failures

2017-05-09 Thread Malahal Naineni
By 'VFS' you mean USE_FSAL_VFS ? It is like any other fsal, correct.  I
want add _VALGRIND_MEMCHECK as well to the list although it is only used by
GPFS fsal at the moment. We also use _MSPAC_SUPPORT

Others that could be of interest (I don't know much about them though)
LTTng
RDMA
_NO_PORTMAPPER
BLKID/UID
BLKIN tracing

Regards, Malahal.


On Tue, May 9, 2017 at 5:43 PM, Daniel Gryniewicz <d...@redhat.com> wrote:

> I read this as meaning that we should add the CI settings in as cmake
> options, so devs can easily reproduce those builds.  This is probably a
> good idea.  It does not seem that CMake supports random config settings,
> so we'd have to write a wrapper for that, which seems problematic.  But
> having a few preset common configs that both devs and the CI can run is
> probably a good idea.
>
> Do people have suggestions?  My list would be:
>
> VFS
> DBUS
> 9P
> NFS3/NLM
> VSOCK
> ADMIN_TOOLS/GUI_ADMIN_TOOLS
>
> Daniel
>
> On 05/08/2017 12:33 PM, Frank Filz wrote:
> > I’ve tried building with different cmake options, and half the time I
> > don’t succeed in getting the cache cleared correctly…
> >
> >
> >
> > Plus we have to train everyone to run all those configs…
> >
> >
> >
> > Much easier to use CI to run the various configs. And for anything
> > compile/build related, I don’t have a problem if the CI immediately
> > starts issuing Verify-1. If there’s a build option set someone thinks
> > might be relevant that we don’t succeed in compiling for, we need to fix
> > it immediately.
> >
> >
> >
> > Frank
> >
> >
> >
> > *From:*Malahal Naineni [mailto:mala...@gmail.com]
> > *Sent:* Monday, May 8, 2017 9:01 AM
> > *To:* Frank Filz <ffilz...@mindspring.com>
> > *Cc:* nfs-ganesha-devel@lists.sourceforge.net
> > *Subject:* Re: [Nfs-ganesha-devel] Conditional compile failures
> >
> >
> >
> > If we make those as some kind of cmake build configs, then developers
> > can also build those combinations before we push our code for review.
> > Maybe, have randconfig as well for developers to use but don't enforce
> > in CI for the time being (until it has some run time!)
> >
> >
> >
> > Regards, malahal.
> >
> >
> >
> > On Mon, May 8, 2017 at 9:19 PM, Frank Filz <ffilz...@mindspring.com
> > <mailto:ffilz...@mindspring.com>> wrote:
> >
> > Every once in a while we get a report of a build failure from
> > someone using
> > build options other than what we normally build with.
> >
> > I'd like to propose that we pick some common combinations folks
> > might like
> > to build with and have those as part of one of our CI tests run on
> every
> > patch since that is totally the type of thing that's easy to
> > automate and a
> > royal pain to test manually.
> >
> > Frank
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
> >
> >
> >
> > <https://www.avast.com/sig-email?utm_medium=email_
> source=link_campaign=sig-email_content=emailclient_term=icon>
> >   Virus-free. www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email_
> source=link_campaign=sig-email_content=emailclient_term=link>
> >
> >
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Announce Push of V2.5-rc3

2017-05-04 Thread Malahal Naineni
Yes, I also noticed and we have a defect filed. It is easy to fix but the
whole notion of passing export from upcall thread needs some fixing. This
is already broken though.

On May 5, 2017 12:01 AM, "Marc Eshel" <es...@us.ibm.com> wrote:

> Hi Frank,
>
> You recently added
> commit 65599c645a81edbe8b953cc29ac29978671a11be
> Author: Frank S. Filz <ffilz...@mindspring.com>
> Date:   Wed Mar 1 17:28:18 2017 -0800
>
> Fill in FATTR4_SUPPORTED_ATTRS from FSAL fs_supported_attrs
>
>
> but when posix2fsal_attributes() is called from the up thread op_ctx is
> null.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x73442280 (LWP 10492)]
> 0x0041d36f in posix2fsal_attributes (buffstat=0x73441020,
> fsalattr=0x73440e40)
> at /nas/ganesha/new-ganesha/src/FSAL/fsal_convert.c:422
> 422 fsalattr->supported =
> op_ctx->fsal_export->exp_ops.fs_supported_attrs(
> (gdb) where
> #0  0x0041d36f in posix2fsal_attributes (buffstat=0x73441020,
> fsalattr=0x73440e40)
> at /nas/ganesha/new-ganesha/src/FSAL/fsal_convert.c:422
> #1  0x734527e2 in GPFSFSAL_UP_Thread (Arg=0x7ecb40) at
> /nas/ganesha/new-ganesha/src/FSAL/FSAL_GPFS/fsal_up.c:342
> #2  0x765cadf3 in start_thread (arg=0x73442280) at
> pthread_create.c:308
> #3  0x75c8b3dd in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> (gdb) p op_ctx
> $1 = (struct req_op_context *) 0x0
>
>
>
>
>
>
> From:   "Frank Filz" <ffilz...@mindspring.com>
> To: <nfs-ganesha-devel@lists.sourceforge.net>
> Date:   04/28/2017 03:47 PM
> Subject:[Nfs-ganesha-devel] Announce Push of V2.5-rc3
>
>
>
> Branch next
>
> Tag:V2.5-rc3
>
> Release Highlights
>
> * fix nlm and nsm refcounts
>
> * make state_owner hash table work like other hash tables
>
> * Fix F_GETLK/SETLK/SETLKW having F_GETLK64/SETLK64/SETLKW64 value
>
> * Don't call state_share_remove with support_ex
>
> * fixup use of handles into wire, host, key, and fsal object (3 patches)
>
> * FSAL_MEM - Check for valid attrs. CID #161621
>
> * Fix NFSv4 messages with NFS_V4 component
>
> * FSAL_GLUSTER - initialize buffxstat. CID 161510
>
> * Fix borked MDCACHE LTTng tracepoints
>
> * Fix CMake configuration so ganesha_conf is installed correctly.
>
> * fix some things in FSAL_GPFS
>
> * logging (non-root): move log files from /var/log to /var/log/ganesha/
>
> Signed-off-by: Frank S. Filz <ffilz...@mindspring.com>
>
> Contents:
>
> 1c71400 Frank S. Filz V2.5-rc3
> 58877b0 Kaleb S. KEITHLEY logging (non-root): move log files from /var/log
> to /var/log/ganesha/
> afa662e Swen Schillig [FSAL_GPFS] Fix root_fd gathering on export.
> 5eac292 Swen Schillig [FSAL_GPFS] Remove dead code.
> 36b5776 Swen Schillig [FSAL_GPFS] Remove duplicate code.
> 4fa6683 Wyllys Ingersoll Fix CMake configuration so ganesha_conf is
> installed correctly.
> 06a274f Daniel Gryniewicz Fix borked MDCACHE LTTng tracepoints
> 5d8126b Daniel Gryniewicz FSAL_GLUSTER - initialize buffxstat. CID 161510
> 872f174 Malahal Naineni Fix NFSv4 messages with NFS_V4 component
> 870bd0a Malahal Naineni Replace mdcache_key_t in struct fsdir by
> host-handle
> 292f3b0 Daniel Gryniewicz Clean up handle method naming in FSAL API
> ab9d2f1 Daniel Gryniewicz FSAL_MEM - Check for valid attrs. CID #161621
> ffd225b Malahal Naineni Add handle_to_key export operation.
> c362183 Malahal Naineni Don't call state_share_remove with support_ex
> 4fe2a3a Malahal Naineni Fix F_GETLK/SETLK/SETLKW having
> F_GETLK64/SETLK64/SETLKW64 value
> b049eb9 Malahal Naineni Convert state_owner hash table to behave like
> others
> 52e0e12 Malahal Naineni Fix nlm state refcount going up from zero
> acb632c Malahal Naineni Fix nsm client refcount going up from zero
> feb12d2 Malahal Naineni Fix nlm client refcount going up from zero
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> en

Re: [Nfs-ganesha-devel] drc refcnt

2017-05-03 Thread Malahal Naineni
Matt, you are correct. We lose some memory (drc and dupreqs) for a client
that never reconnects. Doing solely time based strategy is not scalable as
well unless we fork multiple threads for doing this. My understanding is
that there will be one time based strategy (hopefully, the time is long
enough that it does not interfere with current strategy) in __addition__ to
the current retiring strategy.

Regards, Malahal.

On Thu, May 4, 2017 at 3:56 AM, Matt Benjamin <mbenja...@redhat.com> wrote:

> Hi Guys,
>
> To get on the record here, the current retire strategy using new requests
> to retire old ones is an intrinsic good, particularly with TCP and related
> cots-ord transports where requests are totally ordered.  I don't think
> moving to a strictly time-based strategy is preferable.  Apparently the
> actually observed or theorized issue has to do with not disposing of
> requests in invalidated DRCs?  That seems to be a special case, no?
>
> Matt
>
> - Original Message -
> > From: "Malahal Naineni" <mala...@gmail.com>
> > To: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> > Cc: "Matt Benjamin" <mbenja...@redhat.com>, nfs-ganesha-devel@lists.
> sourceforge.net
> > Sent: Tuesday, May 2, 2017 2:21:48 AM
> > Subject: Re: [Nfs-ganesha-devel] drc refcnt
> >
> > Sorry, every cacheable request holds a ref on its DRC as well as its
> > DUPREQ. The ref on DUPREQ should be released when the request goes away
> > (via nfs_dupreq_rele). The ref on DRC will be released when the
> > corresponding DUPREQ request gets released. Since we release DUPREQs
> while
> > processing other requests, you are right that the DRC won't be freed if
> > there are no more requests that would use the same DRC.
> >
> > I think we should be freeing dupreq periodically using a timed function,
> > something like that drc_free_expired.
> >
> > Regards, Malahal.
> >
> >
> >
> > On Tue, May 2, 2017 at 10:38 AM, Satya Prakash GS <
> g.satyaprak...@gmail.com>
> > wrote:
> >
> > > > On Tue, May 2, 2017 at 7:58 AM, Malahal Naineni <mala...@gmail.com>
> > > wrote:
> > > > A dupreq will place a refcount on its DRC when it calls xxx_get_drc,
> so
> > > we
> > > > will release that DRC refcount when we free the dupreq.
> > >
> > > Ok, so every dupreq holds a ref on the drc. In case of drc cache hit,
> > > a dupreq entry can ref the
> > > drc more than once. This is still fine because unless the dupreq entry
> > > ref goes to zero the drc isn't freed.
> > >
> > > > nfs_dupreq_finish() shouldn't free its own dupreq. When it does free
> some
> > > > other dupreq, we will release DRC refcount corresponding to that
> dupreq.
> > >
> > > > When we free all dupreqs that belong to a DRC
> > >
> > > In the case of a disconnected client when are all the dupreqs freed ?
> > >
> > > When all the filesystem operations subside from a client (mount point
> > > is no longer in use),
> > > nfs_dupreq_finish doesn't get called anymore. This is the only place
> > > where dupreq entries are removed from
> > > the drc. If the entries aren't removed from drc, drc refcnt doesn't go
> to
> > > 0.
> > >
> > > >, its refcount should go to
> > > > zero (maybe another ref is held by the socket itself, so the socket
> has
> > > to
> > > > be closed as well).
> > > >
> > > >
> > > > In fact, if we release DRC refcount without freeing the dupreq, that
> > > would
> > > > be a bug!
> > > >
> > > > Regards, Malahal.
> > > >
> > > Thanks,
> > > Satya.
> > >
> >
>
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] user cred in FSAL when krb5 is used

2017-05-02 Thread Malahal Naineni
fsal should get uid/gid stuff in krb5 authentication. The krb5
authentication and retrieval of uid/gid is part of core ganesha.

regards, malahal.

On Tue, May 2, 2017 at 7:27 PM, Satish Chandra Kilaru 
wrote:

> Hi All,
>
> What will an FSAL see when an AD user mounts a share using krb5 security?
> is SYS security is used, FSAL gets uid/gid.
>
> struct user_cred {
> uid_t caller_uid;
> gid_t caller_gid;
> unsigned int caller_glen;
> gid_t *caller_garray;
> };
>
> Looks like that structure supports only SYS security.
>
> FSAL needs to see some identifier that uniquely identifies the AD user
> accessing the share. Where can this info be found?
>
> --Satish
> Please Donate to www.wikipedia.org
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-02 Thread Malahal Naineni
Sorry, every cacheable request holds a ref on its DRC as well as its
DUPREQ. The ref on DUPREQ should be released when the request goes away
(via nfs_dupreq_rele). The ref on DRC will be released when the
corresponding DUPREQ request gets released. Since we release DUPREQs while
processing other requests, you are right that the DRC won't be freed if
there are no more requests that would use the same DRC.

I think we should be freeing dupreq periodically using a timed function,
something like that drc_free_expired.

Regards, Malahal.



On Tue, May 2, 2017 at 10:38 AM, Satya Prakash GS <g.satyaprak...@gmail.com>
wrote:

> > On Tue, May 2, 2017 at 7:58 AM, Malahal Naineni <mala...@gmail.com>
> wrote:
> > A dupreq will place a refcount on its DRC when it calls xxx_get_drc, so
> we
> > will release that DRC refcount when we free the dupreq.
>
> Ok, so every dupreq holds a ref on the drc. In case of drc cache hit,
> a dupreq entry can ref the
> drc more than once. This is still fine because unless the dupreq entry
> ref goes to zero the drc isn't freed.
>
> > nfs_dupreq_finish() shouldn't free its own dupreq. When it does free some
> > other dupreq, we will release DRC refcount corresponding to that dupreq.
>
> > When we free all dupreqs that belong to a DRC
>
> In the case of a disconnected client when are all the dupreqs freed ?
>
> When all the filesystem operations subside from a client (mount point
> is no longer in use),
> nfs_dupreq_finish doesn't get called anymore. This is the only place
> where dupreq entries are removed from
> the drc. If the entries aren't removed from drc, drc refcnt doesn't go to
> 0.
>
> >, its refcount should go to
> > zero (maybe another ref is held by the socket itself, so the socket has
> to
> > be closed as well).
> >
> >
> > In fact, if we release DRC refcount without freeing the dupreq, that
> would
> > be a bug!
> >
> > Regards, Malahal.
> >
> Thanks,
> Satya.
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-01 Thread Malahal Naineni
A dupreq will place a refcount on its DRC when it calls xxx_get_drc, so we
will release that DRC refcount when we free the dupreq.
nfs_dupreq_finish() shouldn't free its own dupreq. When it does free some
other dupreq, we will release DRC refcount corresponding to that dupreq.
When we free all dupreqs that belong to a DRC, its refcount should go to
zero (maybe another ref is held by the socket itself, so the socket has to
be closed as well).

In fact, if we release DRC refcount without freeing the dupreq, that would
be a bug!

Regards, Malahal.


On Mon, May 1, 2017 at 10:42 PM, Satya Prakash GS 
wrote:

> Daniel,
>
> I meant to say - nfs_dupreq_finish doesn't call put_drc always. It
> does only if it meets certain criteria (drc_should_retire).
> Say the maxsize is 1000, hiwat is 800 and retire window size = 0.
> At the time of unmount if the drc size is just 100 wouldn't the
> refcount stay > 0.
>
> Thanks,
> Satya.
>
> >nfs_dupreq_finish() calls dupreq_entry_put() at about line 1238, and
> >nfs_dupreq_put_drc() at about line 1222, so I think this is okay.
>
> >Daniel
>
> >On 05/01/2017 11:08 AM, Satya Prakash GS wrote:
> >> Hi,
> >>
> >> DRC refcnt is incremented on every get_drc. However, every
> >> nfs_dupreq_finish doesn't call a put_drc. How is it ensured that the
> >> drc refcnt drops to zero. On doing an umount, is drc eventually
> >> cleaned up.
> >>
> >> Thanks,
> >> Satya.
> >>
> >> 
> --
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> ___
> >> Nfs-ganesha-devel mailing list
> >> Nfs-ganesha-devel@...
> >> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >>
>
>
> On Mon, May 1, 2017 at 9:09 PM, Matt Benjamin 
> wrote:
> > Hi Satya,
> >
> > I don't -think- that's the case (that DRCs are leaked).  If so, we would
> certainly wish to correct it.  Malahal has most recently updated these code
> paths.
> >
> > Regards,
> >
> > Matt
> >
> > - Original Message -
> >> From: "Satya Prakash GS" 
> >> To: nfs-ganesha-devel@lists.sourceforge.net
> >> Sent: Monday, May 1, 2017 11:08:48 AM
> >> Subject: [Nfs-ganesha-devel] drc refcnt
> >>
> >> Hi,
> >>
> >> DRC refcnt is incremented on every get_drc. However, every
> >> nfs_dupreq_finish doesn't call a put_drc. How is it ensured that the
> >> drc refcnt drops to zero. On doing an umount, is drc eventually
> >> cleaned up.
> >>
> >> Thanks,
> >> Satya.
> >>
> >> 
> --
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> ___
> >> Nfs-ganesha-devel mailing list
> >> Nfs-ganesha-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >>
> >
> > --
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > http://www.redhat.com/en/technologies/storage
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Key / handle fixups

2017-04-26 Thread Malahal Naineni
Dan, I think your code should work except that you call
create_handle(fh_desc) after calling extract_handle(fh_desc) in
mdcache_locate_wire(). I think making a copy here would preserve the wire
handle before passing to create_handle. Samething
with nfs3_FhandleToCache(), I believe.

GPFS currently assumes that create_handle() really gets host-handle (not
just GPFS, everyone else as well as it lacks flags parameter prior to your
patch), NOT wire handle. That is easy to change, but I wonder if we really
need endian conversion in extract_handle as well as create handle. So I
added handle_to_key export operation to get the key and extract_handle only
extracts the handle, not the key.

Here are couple commits, the second one is really a clean up. I just
compiled, and haven't done any testing at this point. I will test after
some comments!

https://review.gerrithub.io/358746 Add handle_to_key export operation.
https://review.gerrithub.io/358747 Replace mdcache_key_t in struct fsdir by
host-handle



On Wed, Apr 26, 2017 at 8:30 PM, Daniel Gryniewicz  wrote:

> Hi, Malahal.
>
> Since we discussed wire handle / key issues on the call, I went and took
> a look, and it seems I screwed it up when I did the MDCACHE conversion.
> I was calling extract_handle() before calling create_handle(),
> effectively removing wire handles for the system.
>
> I believe that the set of API functions we have is sufficient, and so
> I've fixed the usage. I've put up a PR here:
> https://review.gerrithub.io/358710
>
> Could you take a look at the way it is now, and see if it works for GPFS?
>
> Thanks,
> Daniel
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Device or resource busy when runltp cleanup test-files

2017-04-12 Thread Malahal Naineni
Sorry, I missed your note below. This is definitely due to an NFS
client's silly rename. All ".nfs" are due to silly rename
implementation in NFS client. You might want to read about it.

>> rm: cannot remove ‘/mnt/nfs/ltp-JEYAuky2dz/.nfsaa46457a6a72f8ea14f5’: 
>> Device or resource busy

On Thu, Apr 13, 2017 at 7:37 AM, Malahal Naineni <mala...@gmail.com> wrote:
> What are the file names under the directory? What does "ls -la" show
> both at the client and at the server in that directory?
>
> On Thu, Apr 13, 2017 at 5:13 AM, Kinglong Mee <mijinl...@open-fs.com> wrote:
>> There are some files under "rmderQsjV" (that's not a silly rename dir) really
>> at underlying filesystem, but the nfs client shows empty.
>>
>> Are there some problems in MDCACHE or cache timeouts?
>>
>> On 4/12/2017 22:48, Malahal Naineni wrote:
>>> Could be due to NFS client silly rename.
>>>
>>> On Apr 12, 2017 8:06 PM, "Kinglong Mee" <mijinl...@open-fs.com 
>>> <mailto:mijinl...@open-fs.com>> wrote:
>>>
>>> When I testing ganesha nfs bases on glusterfs, the runltp always 
>>> warning as,
>>>
>>> rm: cannot remove 
>>> ‘/mnt/nfs/ltp-JEYAuky2dz/.nfsaa46457a6a72f8ea14f5’: Device or resource 
>>> busy
>>> rm: cannot remove ‘/mnt/nfs/ltp-JEYAuky2dz/rmderQsjV’: Directory not 
>>> empty
>>>
>>> and, "rmderQsjV" also contains files at the back-end, and nfs client 
>>> shows empty.
>>>
>>> My test environments are,
>>> Centos 7 (kernel-3.10.0-514.10.2.el7.x86_64),
>>> Glusterfs (glusterfs-3.8.10-1.el7.x86_64),
>>> NFS-Ganesha (nfs-ganesha-2.3.3-1.el7.x86_64).
>>>
>>> #cat /etc/ganesha/ganesha.conf
>>> EXPORT
>>> {
>>>  SecType = "sys";
>>>  Pseudo = "/gvtest";
>>>  Squash = "No_Root_Squash";
>>>  Access_Type = "RW";
>>>  Path = "/gvtest";
>>>  Export_Id = 1;
>>> FSAL {
>>> Name = "GLUSTER";
>>> Hostname = "localhost";
>>> Volume = "gvtest";
>>> Volpath = "/";
>>>  }
>>> }
>>>
>>> # gluster volume info
>>>
>>> Volume Name: gvtest
>>> Type: Distribute
>>> Volume ID: 65d20de1-16cd-4ae8-a860-254b3d6c56d0
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 192.168.9.111:/gluster-test/gvtest
>>> Brick2: 192.168.9.112:/gluster-test/gvtest
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: off
>>> transport.address-family: inet
>>> performance.write-behind: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.quick-read: off
>>> performance.open-behind: off
>>> performance.stat-prefetch: off
>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net 
>>> <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel 
>>> <https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel>
>>>
>>>
>>>
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>
>>>
>>>
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Device or resource busy when runltp cleanup test-files

2017-04-12 Thread Malahal Naineni
What are the file names under the directory? What does "ls -la" show
both at the client and at the server in that directory?

On Thu, Apr 13, 2017 at 5:13 AM, Kinglong Mee <mijinl...@open-fs.com> wrote:
> There are some files under "rmderQsjV" (that's not a silly rename dir) really
> at underlying filesystem, but the nfs client shows empty.
>
> Are there some problems in MDCACHE or cache timeouts?
>
> On 4/12/2017 22:48, Malahal Naineni wrote:
>> Could be due to NFS client silly rename.
>>
>> On Apr 12, 2017 8:06 PM, "Kinglong Mee" <mijinl...@open-fs.com 
>> <mailto:mijinl...@open-fs.com>> wrote:
>>
>> When I testing ganesha nfs bases on glusterfs, the runltp always warning 
>> as,
>>
>> rm: cannot remove 
>> ‘/mnt/nfs/ltp-JEYAuky2dz/.nfsaa46457a6a72f8ea14f5’: Device or resource 
>> busy
>> rm: cannot remove ‘/mnt/nfs/ltp-JEYAuky2dz/rmderQsjV’: Directory not 
>> empty
>>
>> and, "rmderQsjV" also contains files at the back-end, and nfs client 
>> shows empty.
>>
>> My test environments are,
>> Centos 7 (kernel-3.10.0-514.10.2.el7.x86_64),
>> Glusterfs (glusterfs-3.8.10-1.el7.x86_64),
>> NFS-Ganesha (nfs-ganesha-2.3.3-1.el7.x86_64).
>>
>> #cat /etc/ganesha/ganesha.conf
>> EXPORT
>> {
>>  SecType = "sys";
>>  Pseudo = "/gvtest";
>>  Squash = "No_Root_Squash";
>>  Access_Type = "RW";
>>  Path = "/gvtest";
>>  Export_Id = 1;
>> FSAL {
>> Name = "GLUSTER";
>> Hostname = "localhost";
>> Volume = "gvtest";
>> Volpath = "/";
>>  }
>> }
>>
>> # gluster volume info
>>
>> Volume Name: gvtest
>> Type: Distribute
>> Volume ID: 65d20de1-16cd-4ae8-a860-254b3d6c56d0
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.9.111:/gluster-test/gvtest
>> Brick2: 192.168.9.112:/gluster-test/gvtest
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: off
>> transport.address-family: inet
>> performance.write-behind: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.quick-read: off
>> performance.open-behind: off
>> performance.stat-prefetch: off
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net 
>> <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel 
>> <https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel>
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Question about key vs object handle vs nfs handle

2017-04-11 Thread Malahal Naineni
Take for example nfs3_FhandleToCache().  It gets wire handle and calls
extract_handle() method. What is the job of extract_handle method?
Assuming that it is supposed to return "key" handle as the comments
there say. Then nfs3_FhandleToCache calls create_handle() with the
"key" handle. This is unexpected, right?

Note that GPFS fsal indeed returns "key" handle in V2.3 and handles
get corrupted due to create_handle() getting "key" handle rather than
the full handle!

In V2.5, GPFS extract_handle() returns full handle (after some endian
conversions).  Note that it doesn't actually return "key" handle as
expected, but mdcache_create_handle() assumes that it is given a "key"
handle. So the "mdcache_create_handle --> mdcache_locate_keyed -->
mdcache_find_keyed" **should** never returns success with GPFS
objects!

Am I missing something?

On Mon, Apr 10, 2017 at 11:32 PM, Frank Filz  wrote:
>> On 04/10/2017 11:57 AM, Frank Filz wrote:
>> >> Hi All, there is usually a 1:1 relationship (ignoring handles across
>> > architectures
>> >> and versions) between nfs handle and object handle. One thing that is
>> >> not clear is the "key" which is used for hashing the objects
>> > (mdcache_entry_t).
>> >> Ganesha 2.5 has handle_to_key() method to take unique bits from the
>> >> handle for acching purpose. How did this work in ganesha 2.3? Ganesha
>> >> 2.3 comments say that extract_handle method is used to get the key,
>> >> but this eventually results in losing the actual handle.
>> >>
>> >>
>> >> Also comment at mdcache_create_handle() indicates that it takes "key"
>> >> but the actual argument "struct gsh_buffdesc *hdl_desc" is written as
>> >> "Buffer descriptor for the "wire" handle".  Is this wire handle or key?
>> >> If this is not the "key", then we will be looking for an incorrect
>> >> object,
>> > right?
>> >
>> > Looking over the code, I THINK everything is right...
>> >
>> > There is still a handle_digest method, and for GPFS it uses
>> > gpfs_sizeof_handle to determine the size to copy, so that may well
>> > copy the full handle.
>> >
>> > The code is definitely very confusing...
>> >
>> > There IS a handle_cmp method...
>> >
>> > I suggest we consider re-writing and clarifying all of this handle
>> > processing code.
>> >
>> > I think everywhere, the handle digest should be whatever the FSAL
>> > wants encapsulated.
>> >
>> > The handle_to_key function should be used to extract the key to be
>> > hashed by anything that wants to hash things, and since we will be
>> > using that for hashing, we should drop the handle_cmp method (or have
>> > a default version that just calls handle_to_key on the two handles it
>> > will compare, and then compare memory using the buffer descriptors
>> returned by handle_to_key).
>>
>> This is, in fact, what it does.
>
> Yea, it does look like handle_to_key is passed a gsh_buffdesc which is then
> filled in with the start address and length of what the FSAL would like to
> be the key, so no memcpy occurs. On the other hand, handle_digest is passed
> a gsh_buffdesc which describes where to place the handle, so a memcpy
> occurs. That is probably unavoidable since to create a wire handle we have
> to append the handle to the Ganesha handle header.
>
>> > FSALs having a custom handle_cmp isn't useful because they can't pick
>> > and choose fields in the handle to compare (since a hash of such a
>> > discontinuous handles would hash bytes that aren't part of the
>> > comparison, generating different hashes for handles that are the same
>> key).
>>
>> But it may be able to do this comparison faster, and avoiding copying
> things
>> around.
>
> We shouldn't need to copy the handle bytes around, just form a new buffer
> descriptor that indicates the start of the key and it's length.
>
> If we want keys that are not a contiguous set of bytes, then the hashing
> needs to be done by the FSAL also, we can't hash bytes 0-31 but only compare
> 0-3, 8-11, 16-19, 24-27... Since the hash would include bytes that could be
> different in different versions of the same handle.
>
> So yea, everything works. But it confuses people until they follow through
> all of the code...
>
> Frank
>
>> > I THINK this really only affects GPFS which puts extra stuff into some
>> > of it's handles that may be different at different times for the same
> object.
>> >
>> > Part of why this was never really clear is the folks who did all the
>> > stuff really didn't understand GPFS handles (I remember lots of
>> > confusion before we got it right).
>> >
>> > Frank
>> >
>> >
>>
>> Daniel
>>
>>
>>
> 
> --
>> Check out the vibrant tech community on one of the world's most engaging
>> tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> 

[Nfs-ganesha-devel] Question about key vs object handle vs nfs handle

2017-04-10 Thread Malahal Naineni
Hi All, there is usually a 1:1 relationship (ignoring handles across
architectures and versions) between nfs handle and object handle. One
thing that is not clear is the "key" which is used for hashing the
objects (mdcache_entry_t).  Ganesha 2.5 has handle_to_key() method to
take unique bits from the handle for acching purpose. How did this
work in ganesha 2.3? Ganesha 2.3 comments say that extract_handle
method is used to get the key, but this eventually results in losing
the actual handle.


Also comment at mdcache_create_handle() indicates that it takes "key"
but the actual argument "struct gsh_buffdesc *hdl_desc" is written as
"Buffer descriptor for the "wire" handle".  Is this wire handle or
key?
If this is not the "key", then we will be looking for an incorrect
object, right?

Regards, Malahal.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] handling init_export_root failures

2017-04-07 Thread Malahal Naineni
Hi All,

If init_export_root() returns a failure, the caller is not handling
properly properly.  The call path is exports_pkginit() -->
init_export_cb() --> init_export_root(). The last function is supposed
to set exp_root_obj if it succeeds. It fails, this won't be set.

The caller of init_export_root() just bails out processing the rest of
exports. The failed export still exists in the list without
exp_root_obj. The problem is originally found in Ganesha 2.3.

The dynamic exports code doesn't go through this path. Is there a
reason why we need two different code paths? Can't we combine both
(dynamic exports time and start up time) code paths? Any suggestion to
fix it?

PS: I will try removing the export in init_export_root() before
returning the failure and see how that goes. Need to do lock
manipulations as well.

Regards, Malahal.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] What is PROTO_CLIENT client type ?

2017-03-29 Thread Malahal Naineni
I think, if you specify a hostname that doesn't resolve to an IP, you
get this kind of warning. Let us know if you find something else!

Regards, Malahal.

On Wed, Mar 29, 2017 at 10:32 PM, Daniel Gryniewicz  wrote:
> It appears PROTO_CLIENT is a placeholder for an uninitialized entry in
> the client config list.  Can you post the config for this failure.
>
> As for aborting, critical is not the same as fatal.  In fact, major is
> worse than critical, and fatal is worse than major.  In general, we
> don't abort on config errors for things that can be updated at runtime,
> such as exports or clients, since people with long-running instances
> don't want it to abort when config is updated, but would rather get an
> error that they can fix and retry, without cutting off existing connections.
>
> Daniel
>
> On 03/29/2017 08:10 AM, Sachin C Punadikar wrote:
>> Hi,
>> One of the Ganesha (Ganesha 2.2) user reported below kind of error:
>>
>> 2017-03-07T14:01:54.134482-06:00 mgmt002st001 nfs-ganesha[2197212]:
>> [work-236] LogClientListEntry :EXPORT :CRIT :  0x2320470 PROTO_CLIENT:
>> (no_root_squash, RWrw,,   ,   ,
>> ,,)
>> 2017-03-07T14:01:54.145127-06:00 mgmt002st001 nfs-ganesha[2197212]:
>> [work-134] LogClientListEntry :EXPORT :CRIT :  0x2320470 PROTO_CLIENT:
>> (no_root_squash, RWrw,,   ,   ,
>> ,,)
>> 2017-03-07T14:01:54.145956-06:00 mgmt002st001 nfs-ganesha[2197212]:
>> [work-163] LogClientListEntry :EXPORT :CRIT :  0x2320470 PROTO_CLIENT:
>> (no_root_squash, RWrw,,   ,   ,
>> ,,)
>>
>> When I looked at the code, we do check all defined NFS clients (from the
>> export info) and report above critical error on finding any client
>> having type PROTO_CLIENT. This check is performed on every access to
>> that export.
>> I would like to know:
>> 1. What does it mean when the client type is PROTO_CLIENT ?
>> 2. Why the client definition can not have type PROTO_CLIENT ?
>> 3. If the error is critical, are we still supposed to continue to run
>> Ganesha or should exit ?
>>
>> with regards,
>> Sachin Punadikar
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-19 Thread Malahal Naineni
If I understand, you have renewable ticket and commands fail when the
ticket expires? I will let our folks tests it. Any more details on
reproducing this issue.

On Fri, Mar 17, 2017 at 9:59 AM, Satya Prakash GS
 wrote:
> Has anyone seen client ops failing with error -13 because of context
> expiry on client (gss_verify_mic fails).
> Surprisingly with little load, it's consistently reproducible on my setup.
> Can someone point me to the relevant commits if this has already been fixed.
>
> Thanks,
> Satya.
>
> On Mon, Mar 13, 2017 at 4:01 PM, Satya Prakash GS
>  wrote:
>> My bad, I should have mentioned the version in the original post.
>>
>> Mahalal was kind enough to share a list of relevant commits. With the
>> patches I continued to see the issue. I suspect the client code is not
>> handling GSS_S_CONTEXT_EXPIRED correctly on a call to gss_verify_mic.
>> Instead I fixed the server code to timeout the ticket 5 mins before
>> the actual timeout (Ganesha is already timing the ticket 5 seconds
>> earlier).
>> So far, the issue hasn't got reproduced but I will continue running
>> the test for a day or two before confirming if the fix works. Do you
>> see any issue with this fix ?
>>
>> Thanks,
>> Satya.
>>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] maxwrite/maxread : uint64_t or uint32_t

2017-03-13 Thread Malahal Naineni
Probably a bug! Having said that we have a long way to go from the
current 1MB to going beyond 4GB. It is nice to fix it though.

Regards, Malahal.

On Mon, Mar 13, 2017 at 3:33 PM, LUCAS Patrice  wrote:
> Hi,
>
>
> Why fs_maxwrite and fs_maxread export functions are returning an
> uint32_t (src/include/fsal_api.h:859) whereas maxread and maxwrite
> values are often managed as uint64_t, like in the gsh_export structure
> (src/include/export_mgr.h:64) or in the fsal_dynamicfsinfo_t structure
> (src/include/fsal_types.h:792), or in NFSv4.1 RFC5661 or NFSv4 RFC7530
> recommended attributes (section 5.7) ?
>
>
> Regards,
>
> --
> Patrice LUCAS
> Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S
> tel : +33 (0)1 69 26 47 86
> e-mail : patrice.lu...@cea.fr
>
>
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-12 Thread Malahal Naineni
>>  Indeed, 2.4 was mostly a bug fix release

Actually, 2.4 has couple big features as far as ganesha project is
concerned, but Bill is probably indicating that libntirpc
corresponding to ganesha2.4 is mostly bug fix release.

Regards, Malahal.

On Sun, Mar 12, 2017 at 8:15 PM, William Allen Simpson
 wrote:
> On 3/11/17 8:15 AM, Satya Prakash GS wrote:
>>
>> We are using 2.3-stable. Given that most of our testing has been done
>> it's a bit difficult for us to move to 2.5 now but we can take fixes
>> from 2.5.
>>
> Sorry, I should have asked long ago what version you were using.
>
> On this list, I always assume that you are using the most recent -dev
> release.  There are an awful lot of bug fixes since 2.3.  Indeed, 2.4
> was mostly a bug fix release, and 2.5 is supposed to be a performance
> release (but has a fair number of bug fixes, too).

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-11 Thread Malahal Naineni
gd->gen is not used in the latest code. If I remember, there was a bug
removing recent cached entries resulting in permission errors. What
version are you using? Try using V2.5.

Regards, Malahal.

On Sat, Mar 11, 2017 at 12:54 AM, Satya Prakash GS
 wrote:
> On Sat, Mar 11, 2017 at 12:37 AM, William Allen Simpson
>  wrote:
>> I'm not familiar with this code, so not likely to be much help.
>> Looks mostly written by Matt, but Malahal made the most recent
>> changes in July 2016.
>>
>> On 3/10/17 9:35 AM, Satya Prakash GS wrote:
>>>
>>> Is this a possibility :
>>>
>>> Server first rejects a client op with CREDPROBLEM/REJECTEDCRED,
>>> Client does an upcall and gssd initializes the context with the server.
>>> However the server recycles it immediately before the operation was
>>> retried (looks like there is a bug in the LRU implementation on
>>> Ganesha. To make things worse, I enabled the server debugs and it
>>> slowed down the client operations making the eviction of the entry
>>> easier). This happens thrice failing the client op.
>>>
>> Problem is not obvious.
>>
>> axp->gen is initialized to zero with the rest of *axp -- mem_zalloc().
>>
>> gd->gen is initialized to zero by alloc_svc_rpc_gss_data().
>>
>> axp->gen is bumped by one (++) each time it is handled by LRU code in
>> authgss_ctx_hash_get().
>>
>
> If a node gen isn't getting incremented it means that node is not
> being looked up often.
>
>> atomic_inc_uint32_t(>gen) is immediately after that.
>>
>> You think gd->gen also needs to be set to axp->gen in _set()?
>>
>
>> I'm not sure they are related.  There are many gd per axp, so
>> axp->gen could be much higher than gd->gen.
>>
>
> >From authgss_ctx_gc_idle ->
>
> if (abs(axp->gen - gd->gen) > __svc_params->gss.max_idle_gen) {
> Remove the entry from the tree; //gd is no more in the cache after this
> }
>
> Translates to - gd wasn't looked up in quite sometime let's clean it up.
>
> //gss.max_idle_gen -> by default set to 1024
>
> If tree's gen is 5000 and a new node gets inserted into the tree, node
> gen shouldn't start at 0 or it might pass the above condition in the
> next authgss_ctx_gc_idle call.
>
>> Both _get and _set are only called in svc_auth_gss.c _svcauth_gss().
>>
>> Admittedly, it is hard to track that there are 2 fields both called gen.
>>
>>> Thanks,
>>> Satya.
>>>
>>> On Thu, Mar 9, 2017 at 8:07 PM, Satya Prakash GS
>>>  wrote:

 Looks like the gen field in svc_rpc_gss_data is used to check the
 freshness of a context. However it is not initialized to axp->gen in
 authgss_ctx_hash_set.
 Will this not result in evicting the entries out early or am I missing
 something ?

 Thanks,
 Satya.

>>>
>>>
>>> --
>>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>>> dictionary content that is easy and intuitive to access. Sign up for an
>>> account today to start using our lexical data to power your apps and
>>> projects. Get started today and enter our developer competition.
>>> http://sdm.link/oxford
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>
>>
>
> Thanks,
> Satya.
>
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] avltree question

2017-03-09 Thread Malahal Naineni
Based on my code reading, both those functions give the node matching
the key if one is present. _inf() will give a node that would be prev
of key and _sub() will give _next of key if key isn't present in the
tree. You should be able to call avltree_inf() and then call
avltree_next on the returned node (which should be same as
avltree_sub). You could also do avltree_sub() and then avltree_prev.

Regards, Malahal.

On Fri, Mar 10, 2017 at 6:30 AM, Frank Filz  wrote:
>> If node > key, we are going "left", so smaller values should be on the left 
>> side
>> of the tree for this to make sense, correct?
>
> Ok, the order of operands for the compare function and how that maps to res > 
> 0 makes for confusion... Now I get it...
>
> I have another question, I'm trying to understand what avltree_inf and 
> avltree_sup are supposed to accomplish.
>
> I have need of a function to find the nodes that would be on either side of a 
> particular key without inserting the key into the tree (at which point of 
> course one could get avltree_prev and avltree_next).
>
> Thanks
>
> Frank
>
>> Regards, Malahal.
>>
>> On Thu, Mar 9, 2017 at 6:35 AM, Frank Filz  wrote:
>> > Looking at the AVL code, I'm wondering what order it sorts in.
>> >
>> > I'm confused by the following code:
>> >
>> > while (node) {
>> > res = avl_dirent_hk_cmpf(node, key);
>> > if (res == 0)
>> > return node;
>> > if (res > 0)
>> > node = node->left;
>> > else
>> > node = node->right;
>> > }
>> >
>> > Given that the comparison functions return -1, 0, 1 for <, ==, >, it
>> > seems like it effectively sorts in reverse order.
>> >
>> > Is that correct, or am I confused how the tree works...
>> >
>> > Frank
>> >
>> >
>> >
>> > ---
>> > This email has been checked for viruses by Avast antivirus software.
>> > https://www.avast.com/antivirus
>> >
>> >
>> > --
>> >  Announcing the Oxford Dictionaries API! The API offers
>> > world-renowned dictionary content that is easy and intuitive to
>> > access. Sign up for an account today to start using our lexical data
>> > to power your apps and projects. Get started today and enter our
>> > developer competition.
>> > http://sdm.link/oxford
>> > ___
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] avltree question

2017-03-08 Thread Malahal Naineni
If node > key, we are going "left", so smaller values should be on the
left side of the tree for this to make sense, correct?

Regards, Malahal.

On Thu, Mar 9, 2017 at 6:35 AM, Frank Filz  wrote:
> Looking at the AVL code, I'm wondering what order it sorts in.
>
> I'm confused by the following code:
>
> while (node) {
> res = avl_dirent_hk_cmpf(node, key);
> if (res == 0)
> return node;
> if (res > 0)
> node = node->left;
> else
> node = node->right;
> }
>
> Given that the comparison functions return -1, 0, 1 for <, ==, >, it seems
> like it effectively sorts in reverse order.
>
> Is that correct, or am I confused how the tree works...
>
> Frank
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ganesha config editor user interface thoughts

2017-03-06 Thread Malahal Naineni
One line change to allow path! I will add it. Yes, it will pick the first
export with the given key (if there are multiple).

Regards, Malahal.

On Mar 6, 2017 11:54 PM, "Frank Filz" <ffilz...@mindspring.com> wrote:

> In a non-NFSv4 environment where the flag to switch NFS v3 and 9P mount to
> use Pseudo, Path SHOULD be unique (it could still be non-unique if Tag is
> used), so you could allow Path, and just return error is Path is not unique
> (actually, we will just find the first export with that Path, so could
> still work). You could also allow specification by Tag, on the other hand,
> with the new config to use Pseudo, I’d love to deprecate Tag.
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Monday, March 6, 2017 5:08 AM
> *To:* d...@redhat.com
> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* Re: [Nfs-ganesha-devel] ganesha config editor user interface
> thoughts
>
>
>
> Yes, it makes sense while creating for sure. Someone needs to remember the
> exportid while changing an entry that was created before. So I added
> exportid and pseudo for now. We can add anything later.  "export export_id
> 14" and "export pseudo /root/exp1" are valid specifications now
>
>
>
>
>
> On Mar 6, 2017 6:30 PM, "Daniel Gryniewicz" <d...@redhat.com> wrote:
>
> Isn't export_id the only actual always-required unique key?  Maybe just
> use that?
>
> Daniel
>
> On 03/05/2017 11:39 PM, Malahal Naineni wrote:
> > Posted a patch that works at gerritio. It just creats blocks and key
> > value pairs without checking if they constitute a valid ganesha config
> > block. Currently, "export" block takes "pseudo value" and "client
> > block takes "clients-value" as additional arguments. pseudo may not be
> > used in NFSv3 only environments (not sure about 9P).
> >
> > I am thinking to support "pseudo", "path" or "export_id" as well. So
> > to change an export block that has export id as 16, one would do
> > "ganesha_conf set export export_id 16 --param1 value1 --param2
> > value2". If one wants to use pseudo instead, it can be done as
> > "ganesha_conf set export pseudo /nfsroot/spath1 value1 --param2 value2"
> >
> > I am thinking of allowing "export_id", "pseudo" and "path" keys for
> > "export" block identification. We only use "clients" for the client
> > block, but to be consistent with the export block, we will have
> > "ganesha_conf export export_id 16 client clients 192.168.1.0 --param1
> > --value" to change the corresponding "client" block.
> >
> > Any suggestions or issues with this approach?
> >
> > Regards, Malahal.
> >
> > On Mon, Feb 27, 2017 at 2:45 PM, Dominique Martinet
> > <dominique.marti...@cea.fr> wrote:
> >> Malahal Naineni wrote on Sat, Feb 25, 2017 at 03:33:17PM +0530:
> >>> - All config is in blocks
> >>> - Most blocks are unique with their tag names
> >>>   - exceptions: "export" and "client" blocks.
> >>>   - "export" is unique by "path" value
> >>
> >> More like unique by pseudo path; path can be identical for various
> >> reasons (e.g. exporting the same backend multiple times with different
> >> options)
> >>
> >>>   - "client" is unique by "clients" value with in the export block.
> >>> - Log blocks have few subblocks.
> >>
> >> export can also have an arbitrary number of sub-blocks (for FSAL and
> >> stackable FSALs); I think the syntax here should be generic enough and
> >> recursively handled e.g. maybe
> >> ganesha_config set blockname.subblock[.subblock[...]] key value
> >>
> >> --
> >> Dominique
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> 

Re: [Nfs-ganesha-devel] ganesha config editor user interface thoughts

2017-03-06 Thread Malahal Naineni
Yes, it makes sense while creating for sure. Someone needs to remember the
exportid while changing an entry that was created before. So I added
exportid and pseudo for now. We can add anything later.  "export export_id
14" and "export pseudo /root/exp1" are valid specifications now


On Mar 6, 2017 6:30 PM, "Daniel Gryniewicz" <d...@redhat.com> wrote:

> Isn't export_id the only actual always-required unique key?  Maybe just
> use that?
>
> Daniel
>
> On 03/05/2017 11:39 PM, Malahal Naineni wrote:
> > Posted a patch that works at gerritio. It just creats blocks and key
> > value pairs without checking if they constitute a valid ganesha config
> > block. Currently, "export" block takes "pseudo value" and "client
> > block takes "clients-value" as additional arguments. pseudo may not be
> > used in NFSv3 only environments (not sure about 9P).
> >
> > I am thinking to support "pseudo", "path" or "export_id" as well. So
> > to change an export block that has export id as 16, one would do
> > "ganesha_conf set export export_id 16 --param1 value1 --param2
> > value2". If one wants to use pseudo instead, it can be done as
> > "ganesha_conf set export pseudo /nfsroot/spath1 value1 --param2 value2"
> >
> > I am thinking of allowing "export_id", "pseudo" and "path" keys for
> > "export" block identification. We only use "clients" for the client
> > block, but to be consistent with the export block, we will have
> > "ganesha_conf export export_id 16 client clients 192.168.1.0 --param1
> > --value" to change the corresponding "client" block.
> >
> > Any suggestions or issues with this approach?
> >
> > Regards, Malahal.
> >
> > On Mon, Feb 27, 2017 at 2:45 PM, Dominique Martinet
> > <dominique.marti...@cea.fr> wrote:
> >> Malahal Naineni wrote on Sat, Feb 25, 2017 at 03:33:17PM +0530:
> >>> - All config is in blocks
> >>> - Most blocks are unique with their tag names
> >>>   - exceptions: "export" and "client" blocks.
> >>>   - "export" is unique by "path" value
> >>
> >> More like unique by pseudo path; path can be identical for various
> >> reasons (e.g. exporting the same backend multiple times with different
> >> options)
> >>
> >>>   - "client" is unique by "clients" value with in the export block.
> >>> - Log blocks have few subblocks.
> >>
> >> export can also have an arbitrary number of sub-blocks (for FSAL and
> >> stackable FSALs); I think the syntax here should be generic enough and
> >> recursively handled e.g. maybe
> >> ganesha_config set blockname.subblock[.subblock[...]] key value
> >>
> >> --
> >> Dominique
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ganesha config editor user interface thoughts

2017-02-27 Thread Malahal Naineni
Based on Sriram's options idea:

"ganesha_config set blocknames [sub-blocknames] --parameter value"
will created block/subblocks if doesn't exist and will add the given
parameter values inside the block.

For example, "ganesha_config set NFS_CORE_PARAM --MNT_port " will
create nfs_core_param block if it is not already there and will add
MNT_port as .

"ganesha_config set LOG COMPONENTS --IDMAPPER FULL_DEBUG"  will create

 LOG {
 COMPONENTS {
   IDMAPPER = FULL_DEBUG;
 }
}

Regards, Malahal.



On Mon, Feb 27, 2017 at 12:45 PM, sriram patil <spsrirampa...@gmail.com> wrote:
> Hi Malahal,
>
> Can we have the key value pairs as options to the commands? For example,
>
> ganesha_config add export path --pseudo  --protocols 4 --access
> RW --exportid 30
>
> This way the block can be setup in a single command. Also, one does not have
> to worry about the exact key and cases (like Export_id).
>
> But, the line of code will increase for sure.
>
> Thanks,
> Sriram
>
>
> On Sun, Feb 26, 2017 at 9:26 AM, Frank Filz <ffilz...@mindspring.com> wrote:
>>
>> There are Sind empty blocks such as log components can be empty.
>>
>> Sent from my iPhone
>>
>> > On Feb 25, 2017, at 7:20 PM, Malahal Naineni <mala...@gmail.com> wrote:
>> >
>> > Assuming that there is no point in creating empty blocks, then we can
>> > just have "set" and "del" commands. The "set" command's last two
>> > arguments are always "key, value" pairs.
>> >
>> > ganesha_conf set block [subblocks] key value
>> > ganesha_conf del block [subblocks] [key]
>> >
>> > We might need a show command  to complement this as well.
>> >
>> >> On Sat, Feb 25, 2017 at 3:33 PM, Malahal Naineni <mala...@gmail.com>
>> >> wrote:
>> >> Hi All, As as I said last week, here are my thoughts on command line
>> >> interface to edit ganesha config. Appreciate any thoughts on this.
>> >>
>> >> Observations:
>> >>
>> >> - All config is in blocks
>> >> - Most blocks are unique with their tag names
>> >>  - exceptions: "export" and "client" blocks.
>> >>  - "export" is unique by "path" value
>> >>  - "client" is unique by "clients" value with in the export block.
>> >> - Log blocks have few subblocks.
>> >> - Blocks contain a list of key value pairs and possibly some subblocks.
>> >>
>> >> Commands to create a block/subblock
>> >> (block and subblock names should be validated)
>> >>
>> >> ganesha_config add blockname
>> >> ganesha_config add log [subblocks]
>> >> ganesha_config add export path
>> >> ganesha_config add export path client clients
>> >>
>> >> Add, delete, modify a key value pair in a block/subblock
>> >> (key and values need to be validated)
>> >>
>> >> ganehsa_config mod blockname key value
>> >> ganesha_config mod log [sub-blocks] key value
>> >> ganesha_config mod export path key value
>> >> ganehsa_config mod export path client clients key value
>> >>
>> >> Absence of "value" will delete the key itself from the block. This
>> >> means "key"
>> >> name can't be a subblock name. Is this true today and want to preserve
>> >> this
>> >> behavior? Otherwise, we will have to use "delete" as a special reserved
>> >> value to delete a key. Any thoughts?
>> >>
>> >>
>> >> Commands to delete a block/subblock (same as their "add" counter parts)
>> >>
>> >> ganesha_config del blockname
>> >> ganesha_config del log [subblocks]
>> >> ganesha_config del export path
>> >> ganesha_config del export path client clients
>> >
>> >
>> > --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > ___
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] how does ganesha limit number of FDs

2017-02-16 Thread Malahal Naineni
The only way for ganesha to close NFSv3 opens was with this fd count limit
in an earlier release. Now, do we open for each NFSv3 read/write and close
after the read/write?

On Tue, Feb 14, 2017 at 12:51 AM, Frank Filz 
wrote:

> Open fd tracking is an area Ganesha actually needs some additional work,
> however, the tracking in the non-support_ex methods did not limit the
> number of NFS v4 opens in any way, it just was used to manage how many fds
> Ganesha kept open for I/O access to files.
>
>
>
> With support_ex, an fd is associated with each file clients have open (NFS
> v4), and additional fds are (may) be used per file/lock owner pair.
>
>
>
> Ultimately we do need to have a limit on how much state (NFS v4 open,
> NLM_SHARE, locks, delegations, layouts) is active.
>
>
>
> Frank
>
>
>
> *From:* Naresh Babu [mailto:snareshb...@gmail.com]
> *Sent:* Saturday, February 11, 2017 4:21 PM
> *To:* nfs-ganesha-devel@lists.sourceforge.net
> *Subject:* [Nfs-ganesha-devel] how does ganesha limit number of FDs
>
>
>
> If a particular FSAL implements extended ops like
> open2/write2/read2/commit2, open_fd_count doesn't get tracked. Then, how
> does ganesha limit the number of open FDs? In general, what is the
> rationale behind not tracking open fd count?
>
>
>
> Thanks,
>
> Naresh
>
>
> --
> [image: Avast logo] 
>
> This email has been checked for viruses by Avast antivirus software.
> www.avast.com 
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Please restore this branch in ntirpc repo

2017-01-08 Thread Malahal Naineni
IBM ganesha1.5 project as well as upstream ganesha 1.5 version (1.5.x
branch) uses duplex-9 branch. Please restore this branch.

Regards, Malahal.

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] push to gerrit forbidden

2016-12-05 Thread Malahal Naineni
Thank you Neils. The ssh based remote worked fine. Thank you for the
information.

Regards, Malahal.

On Mon, Dec 5, 2016 at 10:17 AM, Niels de Vos <nde...@redhat.com> wrote:
> On Mon, Dec 05, 2016 at 01:16:39AM +0530, Malahal Naineni wrote:
>> Since Swen and I are getting into issue, I assume it is due to how we set
>> up our repos. I have the following:
>>
>> $ git remote -v | grep gerrit
>> gerrit https://gerrithub.io/ffilz/nfs-ganesha (fetch)
>> gerrit https://gerrithub.io/ffilz/nfs-ganesha (push)
>>
>>
>> How about you guys? We do "git push gerrit HEAD:refs/for/next". It all used
>> to work until couple weeks ago. Things must have changed on gerrit end.
>
> I was able to post a patch last week (or the week before) with the help
> of 'git review' (from the git-review package).
>
> My git remote for GerritHub uses ssh, not https like yours. Maybe that
> is a key difference?
>
>$ git remote -v | grep gerrithub
>gerrit   ssh://nixpa...@review.gerrithub.io:29418/ffilz/nfs-ganesha 
> (fetch)
>gerrit   ssh://nixpa...@review.gerrithub.io:29418/ffilz/nfs-ganesha 
> (push)
>
> You should be able to add a public ssh-key in your gerrithub settings, I
> do not think it allows for password login.
>
> HTH,
> Niels
>
>
>>
>> Regards, Malahal.
>>
>> On Fri, Dec 2, 2016 at 10:37 AM, Frank Filz <ffilz...@mindspring.com> wrote:
>>
>> > I don’t see anything wrong in the permissions.
>> >
>> >
>> >
>> > Let me know if it’s still a problem tomorrow and we’ll look for where to
>> > get help.
>> >
>> >
>> >
>> > Thanks
>> >
>> >
>> >
>> > Frank
>> >
>> >
>> >
>> > *From:* Malahal Naineni [mailto:mala...@gmail.com]
>> > *Sent:* Thursday, December 1, 2016 5:11 PM
>> > *To:* Frank Filz <ffilz...@mindspring.com>
>> > *Cc:* nfs-ganesha-devel@lists.sourceforge.net; Swen Schillig <
>> > s...@vnet.ibm.com>
>> > *Subject:* Re: [Nfs-ganesha-devel] push to gerrit forbidden
>> >
>> >
>> >
>> > I also got same when I tried to remove WIP to one of my patches.
>> >
>> >
>> >
>> >
>> >
>> > On Dec 2, 2016 5:59 AM, "Frank Filz" <ffilz...@mindspring.com> wrote:
>> >
>> > > Is there something wrong with gerrit or am I just not allowed to push
>> > > anymore ?
>> > >
>> > > $ git push gerrit HEAD:refs/for/next
>> > > Counting objects: 26, done.
>> > > Delta compression using up to 8 threads.
>> > > Compressing objects: 100% (26/26), done.
>> > > Writing objects: 100% (26/26), 3.25 KiB | 0 bytes/s, done.
>> > > Total 26 (delta 22), reused 0 (delta 0)
>> > > error: RPC failed; HTTP 403 curl 22 The requested URL returned error: 403
>> > > Forbidden
>> > > fatal: The remote end hung up unexpectedly
>> > > fatal: The remote end hung up unexpectedly
>> > >
>> > > Thanks for your support in advance.
>> >
>> > Hmm, not sure what happened here.
>> >
>> > Has anyone else had issues?
>> >
>> > Frank
>> >
>> >
>> > ---
>> > This email has been checked for viruses by Avast antivirus software.
>> > https://www.avast.com/antivirus
>> >
>> >
>> > 
>> > --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > ___
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> >
>> >
>> >
>> > --
>> > [image: Avast logo] <https://www.avast.com/antivirus>
>> >
>> > This email has been checked for viruses by Avast antivirus software.
>> > www.avast.com <https://www.avast.com/antivirus>
>> >
>> >
>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] push to gerrit forbidden

2016-12-04 Thread Malahal Naineni
Since Swen and I are getting into issue, I assume it is due to how we set
up our repos. I have the following:

$ git remote -v | grep gerrit
gerrit https://gerrithub.io/ffilz/nfs-ganesha (fetch)
gerrit https://gerrithub.io/ffilz/nfs-ganesha (push)


How about you guys? We do "git push gerrit HEAD:refs/for/next". It all used
to work until couple weeks ago. Things must have changed on gerrit end.

Regards, Malahal.

On Fri, Dec 2, 2016 at 10:37 AM, Frank Filz <ffilz...@mindspring.com> wrote:

> I don’t see anything wrong in the permissions.
>
>
>
> Let me know if it’s still a problem tomorrow and we’ll look for where to
> get help.
>
>
>
> Thanks
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Thursday, December 1, 2016 5:11 PM
> *To:* Frank Filz <ffilz...@mindspring.com>
> *Cc:* nfs-ganesha-devel@lists.sourceforge.net; Swen Schillig <
> s...@vnet.ibm.com>
> *Subject:* Re: [Nfs-ganesha-devel] push to gerrit forbidden
>
>
>
> I also got same when I tried to remove WIP to one of my patches.
>
>
>
>
>
> On Dec 2, 2016 5:59 AM, "Frank Filz" <ffilz...@mindspring.com> wrote:
>
> > Is there something wrong with gerrit or am I just not allowed to push
> > anymore ?
> >
> > $ git push gerrit HEAD:refs/for/next
> > Counting objects: 26, done.
> > Delta compression using up to 8 threads.
> > Compressing objects: 100% (26/26), done.
> > Writing objects: 100% (26/26), 3.25 KiB | 0 bytes/s, done.
> > Total 26 (delta 22), reused 0 (delta 0)
> > error: RPC failed; HTTP 403 curl 22 The requested URL returned error: 403
> > Forbidden
> > fatal: The remote end hung up unexpectedly
> > fatal: The remote end hung up unexpectedly
> >
> > Thanks for your support in advance.
>
> Hmm, not sure what happened here.
>
> Has anyone else had issues?
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> --
> [image: Avast logo] <https://www.avast.com/antivirus>
>
> This email has been checked for viruses by Avast antivirus software.
> www.avast.com <https://www.avast.com/antivirus>
>
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] push to gerrit forbidden

2016-12-01 Thread Malahal Naineni
I also got same when I tried to remove WIP to one of my patches.


On Dec 2, 2016 5:59 AM, "Frank Filz"  wrote:

> > Is there something wrong with gerrit or am I just not allowed to push
> > anymore ?
> >
> > $ git push gerrit HEAD:refs/for/next
> > Counting objects: 26, done.
> > Delta compression using up to 8 threads.
> > Compressing objects: 100% (26/26), done.
> > Writing objects: 100% (26/26), 3.25 KiB | 0 bytes/s, done.
> > Total 26 (delta 22), reused 0 (delta 0)
> > error: RPC failed; HTTP 403 curl 22 The requested URL returned error: 403
> > Forbidden
> > fatal: The remote end hung up unexpectedly
> > fatal: The remote end hung up unexpectedly
> >
> > Thanks for your support in advance.
>
> Hmm, not sure what happened here.
>
> Has anyone else had issues?
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] pre-commit refusals for ANSI formatted strings

2016-11-29 Thread Malahal Naineni
Forgot to include the mailing list in my previous email, also my
cut-paste garbled what I wanted to say, but you get the idea!

On Tue, Nov 29, 2016 at 9:38 PM, Malahal Naineni <mala...@gmail.com> wrote:
> Bill indicated he didn't like splitting like this:
>
> "NFS DISPATCHER: FAILURE: Error while calling svc_sendreply on a new
> request. rpcxid=%"
> PRIu32
> " socket=%d function:%s client:%s program:%"
> PRIu32
> " nfs version:%"
> PRIu32
> " proc:%"
> PRIu32
> " errno: %d",
>
> He could actually do, something like the following:
>
> "NFS DISPATCHER: FAILURE: Error while calling svc_sendreply on a new
> request. rpcxid=%"
> PRIu32 " socket=%d function:%s client:%s program:%" PRIu32
> " nfs version:%" PRIu32 " proc:%"  PRIu32 " errno: %d",
>
> The first one may exceed 80 chars, but that should be fine for the
> current checkpatch. I am not sure if that works for Bill, but I don't
> like checkpatch configuration dictating us what to do for no good
> reason. Is there a good reason for this madness?
>
> Regards, Malahal.
>
>
> On Tue, Nov 29, 2016 at 7:11 PM, Daniel Gryniewicz <d...@redhat.com> wrote:
>> On 11/29/2016 02:37 AM, William Allen Simpson wrote:
>>> On 11/28/16 1:28 PM, Malahal Naineni wrote:
>>>> I think SPLIT_STRING is enabled by default. This supposedly helps some
>>>> linux kernel hackers to map a given syslog message with the actual
>>>> code line by simply grepping. Debatable as printk could contain string
>>>> symbols evading the matches. Enabling this and the 80 character line
>>>> limit is a mess! Both are debatable. I am for disabling SPLIT_STRING
>>>> warning from our checkpatch.conf.
>>>>
>>>> Any comments to keep it and how useful that was so far? I never really
>>>> needed it!
>>>>
>>>>> When did somebody turn this stupidity on?!?!
>>>>>
>>>>> Turn this code-nazi nonsense off
>>>>>
>>> Please turn SPLIT_STRING off.  This was a patch that committed perfectly
>>> well in the past.  Yet after much work rebasing over the holiday, now
>>> won't commit at all -- no matter how many hours I've spent massaging it.
>>>
>>> SPLIT_STRING makes it really hard to review printf formats.  Because they
>>> mostly are off the screen, we now never see most of the line without
>>> tedious scrolling back and forth
>>>
>>> Human readability and code maintainability trump some nitwit using grep.
>>>
>>> Do it today, so that next week I can try to commit again!
>>>
>>
>> The issue is with the PRIu32 placeholders, which are not part of the
>> quoted string (by necessity).  You need to split at those.
>>
>> Daniel
>>
>>
>> --
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ERROR: trailing statements should be on next line

2016-11-26 Thread Malahal Naineni
I am not sure about the exact error that the checkpatch is complaining
here.  Maybe it is getting confused with our conditionals! About the
_USE_NLM, it is done incorrectly the way I understand it. We should
define P_NLM only if _USE_NLM is defined. Then the current code will
run to compilation issues and fixing them would get the code right!

If the intention is to remove the feature and NOT the code size, then
this could just be a config parameter (enable_NLM = [true]/false)
which will make the code easier to read. Does anyone know the original
intention of _USE_NLM?

Regards, Malahal.

On Sun, Nov 27, 2016 at 5:44 AM, William Allen Simpson
 wrote:
> Should the #ifdef and #endif be one line higher?  That lets me commit.
>
> Or is this just new commit hook code-zanyism?
>
> The error is for my test code; there are some defines to try to make
> things fit in 80 columns.  See the current code listing below
>
> The current code will no longer commit.
>
> ===
>
> ERROR: trailing statements should be on next line
> #216: FILE: src/MainNFSD/nfs_rpc_dispatcher_thread.c:1765:
> +   } else if (req->rq_msg.cb_prog == NFS_pcpp[P_MNT]
> [...]
> +  && ((NFS_pcpco & CORE_OPTION_NFSV3) != 0)) {
>
>
> ===
>
>  } else if (req->rq_prog == nfs_param.core_param.program[P_NLM]
> #ifdef _USE_NLM
> && ((nfs_param.core_param.core_options & 
> CORE_OPTION_NFSV3)
> != 0)) {
>  if (req->rq_vers == NLM4_VERS) {
>  if (req->rq_proc <= NLMPROC4_FREE_ALL)
>  return true;
>  else
>  goto noproc_err;
>  } else {
>  lo_vers = NLM4_VERS;
>  hi_vers = NLM4_VERS;
>  goto progvers_err;
>  }
>  } else if (req->rq_prog == nfs_param.core_param.program[P_MNT]
> #endif /* _USE_NLM */
> && ((nfs_param.core_param.core_options & 
> CORE_OPTION_NFSV3)
> != 0)) {
>  /* Some clients may use the wrong mount version to umount, so
>   * always allow umount, otherwise only allow request if the
>   * appropriate mount version is enabled.  Also need to allow
>   * dump and export, so just disallow mount if version not
>   * supported.
>   */
>
> --
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Fwd: [nfs-ganesha/nfs-ganesha] RGW: failing to bind to librados should be caught (#123)

2016-10-26 Thread Malahal Naineni
Yes, we can do that. I have few more. I will try an update next week.

On Oct 26, 2016 8:39 PM, "Daniel Gryniewicz"  wrote:

> Malahal,
>
> This PR has been requested to be applied to 2.3.  Is that okay?
>
> Daniel
>
>
>  Forwarded Message 
> Subject:[nfs-ganesha/nfs-ganesha] RGW: failing to bind to librados
> should be caught (#123)
> Date:   Sun, 23 Oct 2016 11:25:26 -0700
> From:   Karol Mroz 
> Reply-To:   nfs-ganesha/nfs-ganesha
>  1424c29692a169ce0b026...@reply.github.com>
> To: nfs-ganesha/nfs-ganesha 
>
>
>
> If librados initialization fails and is unhandled, the rgw_mount() code
> path ultimately causes a segfault. Check the status of librgw_create()
> and handle accordingly.
>
> Fixes #112 
>
> Change-Id: Ie95caf6b2329e87c30552cf97e018133f0b219d9
> Signed-off-by: Karol Mroz km...@suse.de 
> (cherry picked from commit edf4f57
>  edf4f579897f5cea71339c444800cf910e03325d>)
>
> 
>
>
>  You can view, comment on, or merge this pull request online at:
>
> https://github.com/nfs-ganesha/nfs-ganesha/pull/123
>
>
>  Commit Summary
>
>* RGW: failing to bind to librados should be caught
>
>
>  File Changes
>
>* *M* src/FSAL/FSAL_RGW/main.c
>  
> (7)
>
>
>  Patch Links:
>
>* https://github.com/nfs-ganesha/nfs-ganesha/pull/123.patch
>* https://github.com/nfs-ganesha/nfs-ganesha/pull/123.diff
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> , or mute the
> thread
>  b587qNcRk6CXleks5q26aWgaJpZM4KeNCS>.
>
>
> 
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] assert in dec_state_owner_ref() with V2.4.0.3

2016-10-26 Thread Malahal Naineni
Please post if you have an easy reproducer. We will try to recreate
and root cause it.

On Wed, Oct 26, 2016 at 6:15 AM, Eric Eastman
 wrote:
> A little more info on this issue.  I did a 24 hour run of my test
> using the POSIX FSAL with an ext4 file system as the backstore, and
> saw 9 asserts during this test run, all caused by the variable
> "refcount" ending up at -1.  The errors seem to be occurring while
> running "rm -rf" on a directory with 1000 sub-directories, with each
> having 11 files in it.
>
> This looks to me like a race condition and I am having issues finding
> the root cause reading through the source code.  There are notes from
> commit e7307c5, dated Jan 5 2016,  on "Resolve race between
> get_state_owner and dec_state_owner_ref differently"  so this looks
> like an area that there has been issues before.
>
> If anyone has an idea on what the root problem is or where to look,
> please let me know, as we cannot use Ganesha NFS if it is going to
> assert during production.
>
> Thanks,
> Eric
>
> On Thu, Oct 20, 2016 at 1:22 AM, Eric Eastman
>  wrote:
>> While testing Ganesha NFS V2.4.0.3 using the CEPH FSAL to a ceph file
>> system, I am seeing the ganesha.nfsd process die due to an assert call
>> multiple times per hour.  I have also seen it die at the same place in
>> the code using the VFS FSAL with a ext4 file system, but it dies much
>> less often.
>>
>> It is dying at line 917 in src/SAL/state_misc.c, which is called by
>> src/SAL/state_misc.c at line 1010.  The assert call is in
>> dec_state_owner_ref() at the line:
>>
>>assert(refcount > 0);
>>
>> Looking at the core files and adding in some debugging code confirms
>> that refcount is -1 when the assert call is made.
>>
>> It looks like the owner count is trying to go to -1 in
>> uncache_nfs4_owner(), but as it occurs only on occasions, I think it
>> is a race condition.
>>
>> Info on the build:
>>
>> Host OS is Ubuntu 14.04 with a 4.8.2 x86_64 kernel on a 8 processor system
>>
>> Cmake command:
>> # cmake -DCMAKE_INSTALL_PREFIX=/opt/keeper -DALLOCATOR=jemalloc
>> -DUSE_ADMIN_TOOLS=ON -DUSE_DBUS=ON ../src
>>
>> # ganesha.nfsd -v
>> ganesha.nfsd compiled on Oct 17 2016 at 16:50:18
>> Release = V2.4.0.3
>> Release comment = GANESHA file server is 64 bits compliant and
>> supports NFS v3,4.0,4.1 (pNFS) and 9P
>> Git HEAD = 0f55a9a97a4bf232fb0e42542e4ca7491fbf84ce
>> Git Describe = V2.4.0.3-0-g0f55a9a
>>
>> # ceph -v
>> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>>
>> # cat ganesha.conf
>> LOG {
>> components {
>>ALL = INFO;
>> }
>> }
>>
>> EXPORT_DEFAULTS {
>> SecType = none, sys;
>> Protocols = 3, 4;
>> Transports = TCP;
>> }
>>
>> # define CephFS export
>> EXPORT {
>> Export_ID = 42;
>> Path = /top;
>> Pseudo = /top;
>> Access_Type = RW;
>> Squash = No_Root_Squash;
>> FSAL {
>> Name = CEPH;
>> }
>> }
>>
>> The VFS export for the ext4 tests was:
>>
>> # define CephFS export
>> EXPORT {
>> Export_ID = 43;
>> Path = /var/top;
>> Pseudo = /var/top;
>> Access_Type = RW;
>> Squash = No_Root_Squash;
>> FSAL {
>> Name = VFS;
>> }
>> }
>>
>> The test was 2 Ubuntu 14.04 NFS clients each having 6 processes,
>> writing 11,000 256k files in separate directory trees with 11 files
>> per lowest level node. On each Ubuntu client, 3 processes wrote to a
>> NFS 3 mount and 3 wrote to a NFS 4 mount. The files are then read and
>> verified, deleted, and the test restarts.
>>
>> Regards,
>> Eric
>
> --
> The Command Line: Reinvented for Modern Developers
> Did the resurgence of CLI tooling catch you by surprise?
> Reconnect with the command line and become more productive.
> Learn the new .NET and ASP.NET CLI. Get your free copy!
> http://sdm.link/telerik
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approach

2016-10-23 Thread Malahal Naineni
Frame 45 has 1MB as max read/writes from ganesha.
 
Regards, Malahal.
 
- Original message -From: Malahal Naineni/Beaverton/IBMTo: Olaf Weiser/Germany/IBM@IBMDECc: d...@redhat.com, mala...@gmail.com, nfs-ganesha-devel@lists.sourceforge.netSubject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approachDate: Sat, Oct 22, 2016 7:35 AM 
This 64MB could be for the pseudo export. Let me check some more...
 
Regards, Malahal.
 
- Original message -From: Malahal Naineni/Beaverton/IBMTo: Olaf Weiser/Germany/IBM@IBMDECc: d...@redhat.com, mala...@gmail.com, nfs-ganesha-devel@lists.sourceforge.netSubject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approachDate: Sat, Oct 22, 2016 6:34 AM 
Ganesha is sending 64MB as max write and max read size in frame 17 in client's tcpdump you attached. I am assuming that you changed from the default 1MB. What is your client's distro/kernel version?
 
Regards, Malahal.
 
- Original message -From: Olaf Weiser/Germany/IBMTo: Daniel Gryniewicz <d...@redhat.com>Cc: Malahal Naineni <mala...@gmail.com>, Malahal Naineni <nain...@us.ibm.com>, NFS Ganesha Developers <nfs-ganesha-devel@lists.sourceforge.net>Subject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approachDate: Sat, Oct 22, 2016 1:37 AM Hi Malahal, as requested.. I collected 2 two tcpdump (server side and client side)  - we try to extract some information, but could'nt find anything what turns out the 256K limit.. @Danial - we use RHEL 7.1 ... and have this issue . do you have a solution ? 
 (See attached file: tcpdumfile.client)
 (See attached file: tcpdumfile.server)*** i just cancel the tcpdump after the file system was mounted . .thank you very much in advance .for checking/ investigating the data.. Mit freundlichen Grüßen / Kind regards Olaf WeiserEMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform,---IBM DeutschlandIBM Allee 171139 EhningenPhone: +49-170-579-44-66E-Mail: olaf.wei...@de.ibm.com---IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin JetterGeschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus KoernerSitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 Daniel Gryniewicz ---10/21/2016 07:57:06 AM---A user just showed up on IRC claiming this exact problem, but only on CentOS/RHEL.  I know it doesn'From: Daniel Gryniewicz <d...@redhat.com>To: Malahal Naineni <nain...@us.ibm.com>Cc: Malahal Naineni <mala...@gmail.com>, NFS Ganesha Developers <nfs-ganesha-devel@lists.sourceforge.net>, Olaf Weiser/Germany/IBM@IBMDEDate: 10/21/2016 07:57 AMSubject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approach
A user just showed up on IRC claiming this exact problem, but only onCentOS/RHEL.  I know it doesn't happen on Fedora.  Maybe it's relatedto default buffer sizes on sockets, or something?On Thu, Oct 20, 2016 at 10:44 PM, Malahal Naineni <nain...@us.ibm.com> wrote:> Please provide ganesha config and tcpdump that include FSINFO request (start> tcpdump before the mount and do a very small I/O before you kill the> tcpdump).  Also, I am assuming that GPFS that deals with ganesha is not> splitting the I/O as these traces don't indicate what ganesha is actually> using, right?>> I know for a fact that we have seen I/Os with 1MB, so this can't be a> ganesha hard limitation (including NFSv4 that I just experimented).>> Regards, Malahal.>>> - Original message -> From: Malahal Naineni <mala...@gmail.com>> To: Marc Eshel/Almaden/IBM@IBMUS> Cc: Frank Filz <ffilz...@mindspring.com>, Matt Benjamin> <mbenja...@redhat.com>, Malahal Naineni/Beaverton/IBM@IBMUS, NFS Ganesha> Developers <nfs-ganesha-devel@lists.sourceforge.net>, Olaf Weiser> <olaf.wei...@de.ibm.com>> Subject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces /> new approach> Date: Fri, Oct 21, 2016 7:55 AM>> I think that rpc code is about receive/send size which is quite> different from NFS i/o size. Default max io size is 1M with GPFS fsal,> but it is the client that could also limit. Since the same client is> not limiting with kNFS, I am pretty sure you have something wrong in> your ganesh config. The actual value should be provided to NFS client> as part of FSINFO response. Please look at the FSINFO response with> tcpdump.>> I know for a fact we have seen I/Os with 1MB at least with NFSv3. I> will exp

Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approach

2016-10-23 Thread Malahal Naineni
Ganesha is sending 64MB as max write and max read size in frame 17 in client's tcpdump you attached. I am assuming that you changed from the default 1MB. What is your client's distro/kernel version?
 
Regards, Malahal.
 
- Original message -From: Olaf Weiser/Germany/IBMTo: Daniel Gryniewicz <d...@redhat.com>Cc: Malahal Naineni <mala...@gmail.com>, Malahal Naineni <nain...@us.ibm.com>, NFS Ganesha Developers <nfs-ganesha-devel@lists.sourceforge.net>Subject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approachDate: Sat, Oct 22, 2016 1:37 AM Hi Malahal, as requested.. I collected 2 two tcpdump (server side and client side)  - we try to extract some information, but could'nt find anything what turns out the 256K limit.. @Danial - we use RHEL 7.1 ... and have this issue . do you have a solution ? 
 (See attached file: tcpdumfile.client)
 (See attached file: tcpdumfile.server)*** i just cancel the tcpdump after the file system was mounted . .thank you very much in advance .for checking/ investigating the data.. Mit freundlichen Grüßen / Kind regards Olaf WeiserEMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform,---IBM DeutschlandIBM Allee 171139 EhningenPhone: +49-170-579-44-66E-Mail: olaf.wei...@de.ibm.com---IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin JetterGeschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus KoernerSitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 Daniel Gryniewicz ---10/21/2016 07:57:06 AM---A user just showed up on IRC claiming this exact problem, but only on CentOS/RHEL.  I know it doesn'From: Daniel Gryniewicz <d...@redhat.com>To: Malahal Naineni <nain...@us.ibm.com>Cc: Malahal Naineni <mala...@gmail.com>, NFS Ganesha Developers <nfs-ganesha-devel@lists.sourceforge.net>, Olaf Weiser/Germany/IBM@IBMDEDate: 10/21/2016 07:57 AMSubject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approach
A user just showed up on IRC claiming this exact problem, but only onCentOS/RHEL.  I know it doesn't happen on Fedora.  Maybe it's relatedto default buffer sizes on sockets, or something?On Thu, Oct 20, 2016 at 10:44 PM, Malahal Naineni <nain...@us.ibm.com> wrote:> Please provide ganesha config and tcpdump that include FSINFO request (start> tcpdump before the mount and do a very small I/O before you kill the> tcpdump).  Also, I am assuming that GPFS that deals with ganesha is not> splitting the I/O as these traces don't indicate what ganesha is actually> using, right?>> I know for a fact that we have seen I/Os with 1MB, so this can't be a> ganesha hard limitation (including NFSv4 that I just experimented).>> Regards, Malahal.>>> - Original message -> From: Malahal Naineni <mala...@gmail.com>> To: Marc Eshel/Almaden/IBM@IBMUS> Cc: Frank Filz <ffilz...@mindspring.com>, Matt Benjamin> <mbenja...@redhat.com>, Malahal Naineni/Beaverton/IBM@IBMUS, NFS Ganesha> Developers <nfs-ganesha-devel@lists.sourceforge.net>, Olaf Weiser> <olaf.wei...@de.ibm.com>> Subject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces /> new approach> Date: Fri, Oct 21, 2016 7:55 AM>> I think that rpc code is about receive/send size which is quite> different from NFS i/o size. Default max io size is 1M with GPFS fsal,> but it is the client that could also limit. Since the same client is> not limiting with kNFS, I am pretty sure you have something wrong in> your ganesh config. The actual value should be provided to NFS client> as part of FSINFO response. Please look at the FSINFO response with> tcpdump.>> I know for a fact we have seen I/Os with 1MB at least with NFSv3. I> will experiment with NFSv4 just in case, but I doubt this is due to> code.>> On Fri, Oct 21, 2016 at 7:19 AM, Marc Eshel <es...@us.ibm.com> wrote:>> We are not able to get IO bigger than 256K with Ganesha, same client and>> kNFS can get 1M.>> Is there something in Ganesha that limmits the IO size? see attached email>> maxsize = 256 * 1024; /* XXX */, is that a problem ?>> Marc.>>>>>>>> From:   Sven Oehme/Almaden/IBM>> To:     Olaf Weiser/Germany/IBM@IBMDE>> Cc:     Malahal Naineni/Beaverton/IBM@IBMUS, dhil...@us.ibm.com,>> fschm...@us.ibm.com, gfsch...@us.ibm.com, Marc Eshel/Almaden/IBM@IBMUS,>> robg...@us.ibm.com>> Date:   10/20/2016 05:32 PM>> Subject:     

Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approach

2016-10-23 Thread Malahal Naineni
This 64MB could be for the pseudo export. Let me check some more...
 
Regards, Malahal.
 
- Original message -From: Malahal Naineni/Beaverton/IBMTo: Olaf Weiser/Germany/IBM@IBMDECc: d...@redhat.com, mala...@gmail.com, nfs-ganesha-devel@lists.sourceforge.netSubject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approachDate: Sat, Oct 22, 2016 6:34 AM 
Ganesha is sending 64MB as max write and max read size in frame 17 in client's tcpdump you attached. I am assuming that you changed from the default 1MB. What is your client's distro/kernel version?
 
Regards, Malahal.
 
- Original message -From: Olaf Weiser/Germany/IBMTo: Daniel Gryniewicz <d...@redhat.com>Cc: Malahal Naineni <mala...@gmail.com>, Malahal Naineni <nain...@us.ibm.com>, NFS Ganesha Developers <nfs-ganesha-devel@lists.sourceforge.net>Subject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approachDate: Sat, Oct 22, 2016 1:37 AM Hi Malahal, as requested.. I collected 2 two tcpdump (server side and client side)  - we try to extract some information, but could'nt find anything what turns out the 256K limit.. @Danial - we use RHEL 7.1 ... and have this issue . do you have a solution ? 
 (See attached file: tcpdumfile.client)
 (See attached file: tcpdumfile.server)*** i just cancel the tcpdump after the file system was mounted . .thank you very much in advance .for checking/ investigating the data.. Mit freundlichen Grüßen / Kind regards Olaf WeiserEMEA Storage Competence Center Mainz, German / IBM Systems, Storage Platform,---IBM DeutschlandIBM Allee 171139 EhningenPhone: +49-170-579-44-66E-Mail: olaf.wei...@de.ibm.com---IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin JetterGeschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus KoernerSitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 Daniel Gryniewicz ---10/21/2016 07:57:06 AM---A user just showed up on IRC claiming this exact problem, but only on CentOS/RHEL.  I know it doesn'From: Daniel Gryniewicz <d...@redhat.com>To: Malahal Naineni <nain...@us.ibm.com>Cc: Malahal Naineni <mala...@gmail.com>, NFS Ganesha Developers <nfs-ganesha-devel@lists.sourceforge.net>, Olaf Weiser/Germany/IBM@IBMDEDate: 10/21/2016 07:57 AMSubject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces / new approach
A user just showed up on IRC claiming this exact problem, but only onCentOS/RHEL.  I know it doesn't happen on Fedora.  Maybe it's relatedto default buffer sizes on sockets, or something?On Thu, Oct 20, 2016 at 10:44 PM, Malahal Naineni <nain...@us.ibm.com> wrote:> Please provide ganesha config and tcpdump that include FSINFO request (start> tcpdump before the mount and do a very small I/O before you kill the> tcpdump).  Also, I am assuming that GPFS that deals with ganesha is not> splitting the I/O as these traces don't indicate what ganesha is actually> using, right?>> I know for a fact that we have seen I/Os with 1MB, so this can't be a> ganesha hard limitation (including NFSv4 that I just experimented).>> Regards, Malahal.>>> - Original message -> From: Malahal Naineni <mala...@gmail.com>> To: Marc Eshel/Almaden/IBM@IBMUS> Cc: Frank Filz <ffilz...@mindspring.com>, Matt Benjamin> <mbenja...@redhat.com>, Malahal Naineni/Beaverton/IBM@IBMUS, NFS Ganesha> Developers <nfs-ganesha-devel@lists.sourceforge.net>, Olaf Weiser> <olaf.wei...@de.ibm.com>> Subject: Re: [Nfs-ganesha-devel] nfs testing for SAP - status - new traces /> new approach> Date: Fri, Oct 21, 2016 7:55 AM>> I think that rpc code is about receive/send size which is quite> different from NFS i/o size. Default max io size is 1M with GPFS fsal,> but it is the client that could also limit. Since the same client is> not limiting with kNFS, I am pretty sure you have something wrong in> your ganesh config. The actual value should be provided to NFS client> as part of FSINFO response. Please look at the FSINFO response with> tcpdump.>> I know for a fact we have seen I/Os with 1MB at least with NFSv3. I> will experiment with NFSv4 just in case, but I doubt this is due to> code.>> On Fri, Oct 21, 2016 at 7:19 AM, Marc Eshel <es...@us.ibm.com> wrote:>> We are not able to get IO bigger than 256K with Ganesha, same client and>> kNFS can get 1M.>> Is there something in Ganesha that limmits the IO size? see attached email>> maxsize = 256 * 1024; /* XX

  1   2   3   >