Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread William Allen Simpson

On 9/8/17 9:44 AM, Daniel Gryniewicz wrote:

On 09/08/2017 09:07 AM, William Allen Simpson wrote:

On 9/7/17 10:47 PM, Malahal Naineni wrote:

Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 
second timeout that, it was working after such a timeout.


I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().



I'm looking at using an eventfd to wake up threads on shutdown.  That way, we 
can sleep for a long time while polling.


There's already a signal to awaken the threads on shutdown.

Finally figured it out, but it was complicated and took too long for
review and inclusion into this week's dev release:

(1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with
SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel.

(2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and
over, which sets up another transport epoll fd and then deletes it after
each reply.

Presumably this is unregistering services.  Should probably unregister
services *before* nfs_rpc_dispatch_stop() kills the listeners?

Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call()
repeatedly instead.  No need to emulate UDP with TCP!

(3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(),
svc_rqst_shutdown(), and work_pool_shutdown().

(4) svc_xprt_shutdown() kills any remaining open transports.

(5) svc_rqst_shutdown() didn't kill epolls that have no transports.  The
fix is to kill again channels previously killed in step #1, even though
they no longer have any open transports.

(6) work_pool_shutdown() waited until timeout caused that one remaining
channel for the epoll fd (step #2) to terminate.

This whole process has obviously been a problem in the past, and there
were several otherwise extraneous state flags.  This fix means they are
not needed anymore.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Announce Push of V2.6-dev.8

2017-09-08 Thread Frank Filz
Branch next

Tag:V2.6-dev.8

Release Highlights

* Various Debian fixes

* Fix open_for_locks parameter in gpfs_lock_op2

* Fix Dispatch_Max_Reqs max value in documentation.

* FSAL_RGW: Remove obsolete (non-support_ex) create method

* PROXY improvements

* Fix to make sure op_ctx is set when calling mdcache_lru_unref().

* Fixes to improve the call back foundation for delegations

* setclientid: free clientid if client_r_addr is too long.

* [GPFS] read_dirents: check status of FD gathering instead of FD itself.

Signed-off-by: Frank S. Filz 

Contents:

2e2b8e6 Frank S. Filz V2.6-dev.8
a5d2eb1 Renaud Fortier make the LIBEXECDIR valid for distro Debian or Ubuntu
32c723a Swen Schillig [GPFS] read_dirents: check status of FD gathering
instead of FD itself.
a5fc5c1 Swen Schillig setclientid: free clientid if client_r_addr is too
long.
7d1aec0 Jeff Layton nfs: use OPEN_DELEGATE_NONE_EXT when not granting a
delegation on v4.1+
72f778f Jeff Layton Take a reference to the session over a v4.1+ callback
86e6d81 Jeff Layton nfs: fix cleanup after delegrecall
9205652 Jeff Layton nfs: nfs41_complete_single -> nfs41_release_single
f487727 Pradeep Fix to make sure op_ctx is set when calling
mdcache_lru_unref().
9e3d200 Patrice LUCAS FSAL_PROXY : code cleaning, remove useless comments
09303d9 Patrice LUCAS FSAL_PROXY : storing stateid from background NFS
server
1feb444 Frank S. Filz FSAL_RGW: Remove obsolete (non-support_ex) create
method
698ce89 Malahal Naineni Fix rpc-statd.service path on debian
e5db2a8 Malahal Naineni Fix sleep path for debian
42abee8 Malahal Naineni Fix Dispatch_Max_Reqs max value in documentation.
57c9c30 Malahal Naineni Fix open_for_locks parameter in gpfs_lock_op2


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-08 Thread Daniel Gryniewicz



It would really help if we could have someone with better time zone
overlap with me who could manage the CI stuff, but that may not be

realistic.

We can sign up anyone in the NFS-Ganesha community to do this. It takes a
little time to get familiar with the scripts and tools that are used, but

once

that settled it is relatively straight forward.

Volunteers?


I hope we can have more folks on this. I've tried to understand CI stuff and
just get lost...



I'll join.  I have Centos logins already, although that may not extend 
to CI stuff.


Daniel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-08 Thread Frank Filz
> Added a retry logic which is now live, and should get applied for all
upcoming
> tests:
...

> > I'd say they need to be kept at least a week, if we could have time
> > based retention rather than number of results retention, I think that
would
> help.
> 
> Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not
> really cost us anything, so I'll change it to 14 days. A screenshot for
these
> settings has been attached. It can be that I missed updating a job so let
us
> know in case logs are deleted too early.


...

> I really understand this, a CI should be helpful in identifying problems,
and
> not introduce problems from itself. Lets try hard to not have you needing
to
> rant about it much more :-)

Thanks for the efforts, I hope that will make C entos CI a lot more usable.

> > It would really help if we could have someone with better time zone
> > overlap with me who could manage the CI stuff, but that may not be
> realistic.
> 
> We can sign up anyone in the NFS-Ganesha community to do this. It takes a
> little time to get familiar with the scripts and tools that are used, but
once
> that settled it is relatively straight forward.
> 
> Volunteers?

I hope we can have more folks on this. I've tried to understand CI stuff and
just get lost...

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-08 Thread Niels de Vos
And now with screenshot! :)

Have a good weekend,
Niels


On Fri, Sep 08, 2017 at 05:41:30PM +0200, Niels de Vos wrote:
> On Fri, Sep 08, 2017 at 06:55:19AM -0700, Frank Filz wrote:
> > > On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote:
> > > > Lately, we have been plagued by a lot of intermittent test failures.
> > > >
> > > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16.
> > > > These have not been resolved by the latest ntirpc pullup.
> > > >
> > > > Additionally, we see a lot of intermittent failures in the continuous
> > > > integration.
> > > >
> > > > A big issue with the Centos CI is that it seems to have a fragile
> > > > setup, and sometimes doesn't even succeed in trying to build Ganesha,
> > > > and then fires a Verified -1. This makes it hard to evaluate what
> > > > patches are actually ready for integration.
> > > 
> > > We can look into this, but it helps if you can provide a link to the patch
> > in
> > > GerritHub or the job in the CI.
> > 
> > Here's one merged last week with a Gluster CI Verify -1:
> > 
> > https://review.gerrithub.io/#/c/375463/
> > 
> > And just to preserve it in case... here's the log:
> > 
> > Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode.
> > [EnvInject] - Loading node environment variables.
> > Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace
> > /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster
> > [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe
> > /tmp/jenkins5031649144466335345.sh
> > + set +x
> >   % Total% Received % Xferd  Average Speed   TimeTime Time
> > Current
> >  Dload  Upload   Total   SpentLeft
> > Speed
> > 
> >   0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
> > 0
> >   0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
> > 0
> > 100  1735  100  17350 0   8723  0 --:--:-- --:--:-- --:--:--
> > 8718
> > Traceback (most recent call last):
> >   File "bootstrap.py", line 33, in 
> > b=json.loads(dat)
> >   File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
> > return _default_decoder.decode(s)
> >   File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
> > obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> >   File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
> > raise ValueError("No JSON object could be decoded")
> > ValueError: No JSON object could be decoded
> > https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console :
> > FAILED
> > Build step 'Execute shell' marked build as failure
> > Finished: FAILURE
> > 
> > Which tells me not much about why it failed, though it looks like a failure
> > that has nothing to do with Ganesha...
> 
> From #centos-devel on Freenode:
> 
> 15:49 < ndevos> bstinson: is 
> https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3487/console a 
> known duffy problem? and how can the jobs work around this?
> 15:51 < bstinson> ndevos: you may be hitting the rate limit
> 15:52 < ndevos> bstinson: oh, that is possible, I guess... it might happen 
> when a series of patches get sent
> 15:53 < ndevos> bstinson: should I do a sleep and retry in case of such a 
> failure?
> 15:55 < bstinson> ndevos: yeah, that should work. we measure your usage over 
> 5 minutes
> 15:57 < ndevos> bstinson: ok, so sleeping 5 minutes, retry and loop should be 
> acceptible?
> 15:59 < ndevos> bstinson: is there a particular message returned by duffy 
> when the rate limit is hit? the reply is not json, but maybe some error?
> 15:59 < ndevos> (in plain text format?)
> 15:59 < bstinson> yeah 5 minutes should be acceptable, it does return a plain 
> text error message
> 16:00 < bstinson> 'Deployment rate over quota, try again in a few minutes'
> 
> Added a retry logic which is now live, and should get applied for all
> upcoming tests:
> 
> https://github.com/nfs-ganesha/ci-tests/commit/ed055058c7956ebb703464c742837a9ace797129
> 
> 
> > > > An additional issue with the Centos CI is that the failure logs often
> > > > aren't preserved long enough to even diagnose the issue.
> > > 
> > > That is something we can change. Some jobs do not delete the results, but
> > > others seem to do. How long (in days), or how many results would you like
> > to
> > > keep?
> > 
> > I'd say they need to be kept at least a week, if we could have time based
> > retention rather than number of results retention, I think that would help.
> 
> Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not
> really cost us anything, so I'll change it to 14 days. A screenshot for
> these settings has been attached. It can be that I missed updating a job
> so let us know in case logs are deleted too early.
> 
> > At least after a week, it's reasonable to expect folks to rebase their
> > patches and re-submit, which would trigger a new run.
> > 
> > > > The result is that honestly, I mostly igno

Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-08 Thread Niels de Vos
On Fri, Sep 08, 2017 at 06:55:19AM -0700, Frank Filz wrote:
> > On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote:
> > > Lately, we have been plagued by a lot of intermittent test failures.
> > >
> > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16.
> > > These have not been resolved by the latest ntirpc pullup.
> > >
> > > Additionally, we see a lot of intermittent failures in the continuous
> > > integration.
> > >
> > > A big issue with the Centos CI is that it seems to have a fragile
> > > setup, and sometimes doesn't even succeed in trying to build Ganesha,
> > > and then fires a Verified -1. This makes it hard to evaluate what
> > > patches are actually ready for integration.
> > 
> > We can look into this, but it helps if you can provide a link to the patch
> in
> > GerritHub or the job in the CI.
> 
> Here's one merged last week with a Gluster CI Verify -1:
> 
> https://review.gerrithub.io/#/c/375463/
> 
> And just to preserve it in case... here's the log:
> 
> Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode.
> [EnvInject] - Loading node environment variables.
> Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace
> /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster
> [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe
> /tmp/jenkins5031649144466335345.sh
> + set +x
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>  Dload  Upload   Total   SpentLeft
> Speed
> 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
> 0
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
> 0
> 100  1735  100  17350 0   8723  0 --:--:-- --:--:-- --:--:--
> 8718
> Traceback (most recent call last):
>   File "bootstrap.py", line 33, in 
> b=json.loads(dat)
>   File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
> return _default_decoder.decode(s)
>   File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console :
> FAILED
> Build step 'Execute shell' marked build as failure
> Finished: FAILURE
> 
> Which tells me not much about why it failed, though it looks like a failure
> that has nothing to do with Ganesha...

>From #centos-devel on Freenode:

15:49 < ndevos> bstinson: is 
https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3487/console a known 
duffy problem? and how can the jobs work around this?
15:51 < bstinson> ndevos: you may be hitting the rate limit
15:52 < ndevos> bstinson: oh, that is possible, I guess... it might happen when 
a series of patches get sent
15:53 < ndevos> bstinson: should I do a sleep and retry in case of such a 
failure?
15:55 < bstinson> ndevos: yeah, that should work. we measure your usage over 5 
minutes
15:57 < ndevos> bstinson: ok, so sleeping 5 minutes, retry and loop should be 
acceptible?
15:59 < ndevos> bstinson: is there a particular message returned by duffy when 
the rate limit is hit? the reply is not json, but maybe some error?
15:59 < ndevos> (in plain text format?)
15:59 < bstinson> yeah 5 minutes should be acceptable, it does return a plain 
text error message
16:00 < bstinson> 'Deployment rate over quota, try again in a few minutes'

Added a retry logic which is now live, and should get applied for all
upcoming tests:

https://github.com/nfs-ganesha/ci-tests/commit/ed055058c7956ebb703464c742837a9ace797129


> > > An additional issue with the Centos CI is that the failure logs often
> > > aren't preserved long enough to even diagnose the issue.
> > 
> > That is something we can change. Some jobs do not delete the results, but
> > others seem to do. How long (in days), or how many results would you like
> to
> > keep?
> 
> I'd say they need to be kept at least a week, if we could have time based
> retention rather than number of results retention, I think that would help.

Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not
really cost us anything, so I'll change it to 14 days. A screenshot for
these settings has been attached. It can be that I missed updating a job
so let us know in case logs are deleted too early.

> At least after a week, it's reasonable to expect folks to rebase their
> patches and re-submit, which would trigger a new run.
> 
> > > The result is that honestly, I mostly ignore the Centos CI results.
> > > They almost might as well not be run...
> > 
> > This is definitely not what we want, so lets fix the problems.
> 
> Yea, and thus my rant...

I really understand this, a CI should be helpful in identifying
problems, and not introduce problems from itself. Lets try hard to not
have you needing to rant 

[Nfs-ganesha-devel] Problem connecting with FSAL_PROXY to a server sharing with fsid specified

2017-09-08 Thread Doug Ortega
I'm trying to export /dev/shm from a NFS v4.x server to FSAL_PROXY.  
Server is running Centos 7.3.  To share /dev/shm, 'fsid=x' is required 
in /etc/exportfs.  Can mount from a 4.0 Linux client with the command:


   mount -v -t nfs -o vers=4.0 :/ /mnt/test

without issue.  However, using FSAL_PROXY cannot access share with either:

   Path="/"    (in which case the system crashes, pxy_lookup_path()
   returns success error code but doesn't create handle)
   Path="/dev/shm"  (in which case get no such file or directory)

Have repeated problem with other server mount points structured the same 
way (i.e., with fsid = x in exports).   Problem replicated in 2.5.0 and 
also in latest 2.6. How can I connect to this share?


Server /etc/exports: ():

   /dev/shm *(fsid=1,rw,async,no_subtree_check,no_root_squash)

ganesha.conf:

 NFSV4
 {
 DomainName="localdomain";
 }

 EXPORT
 {
 Export_Id = 77;

 CLIENT {
 clients = *;
 Access_Type = RW;
 }

 Path = "/dev/shm";
 # Path = "/";

 Pseudo = "/proxy/dell/home_dell";
 SecType = "sys";
 Disable_ACL = TRUE;

 FSAL {
 Name = PROXY;
 }
 }

 PROXY
 {
 Remote_Server
 {
 Srv_Addr=;
 Use_Privileged_Client_Port = TRUE;
 }
 }



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-08 Thread Frank Filz
> On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote:
> > Lately, we have been plagued by a lot of intermittent test failures.
> >
> > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16.
> > These have not been resolved by the latest ntirpc pullup.
> >
> > Additionally, we see a lot of intermittent failures in the continuous
> > integration.
> >
> > A big issue with the Centos CI is that it seems to have a fragile
> > setup, and sometimes doesn't even succeed in trying to build Ganesha,
> > and then fires a Verified -1. This makes it hard to evaluate what
> > patches are actually ready for integration.
> 
> We can look into this, but it helps if you can provide a link to the patch
in
> GerritHub or the job in the CI.

Here's one merged last week with a Gluster CI Verify -1:

https://review.gerrithub.io/#/c/375463/

And just to preserve it in case... here's the log:

Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode.
[EnvInject] - Loading node environment variables.
Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace
/home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster
[nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe
/tmp/jenkins5031649144466335345.sh
+ set +x
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed

  0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
0
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
0
100  1735  100  17350 0   8723  0 --:--:-- --:--:-- --:--:--
8718
Traceback (most recent call last):
  File "bootstrap.py", line 33, in 
b=json.loads(dat)
  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console :
FAILED
Build step 'Execute shell' marked build as failure
Finished: FAILURE

Which tells me not much about why it failed, though it looks like a failure
that has nothing to do with Ganesha...

> > An additional issue with the Centos CI is that the failure logs often
> > aren't preserved long enough to even diagnose the issue.
> 
> That is something we can change. Some jobs do not delete the results, but
> others seem to do. How long (in days), or how many results would you like
to
> keep?

I'd say they need to be kept at least a week, if we could have time based
retention rather than number of results retention, I think that would help.

At least after a week, it's reasonable to expect folks to rebase their
patches and re-submit, which would trigger a new run.

> > The result is that honestly, I mostly ignore the Centos CI results.
> > They almost might as well not be run...
> 
> This is definitely not what we want, so lets fix the problems.

Yea, and thus my rant...

> > Let's talk about CI more on a near time concall (it would help if
> > Niels and Jiffin could join a call to talk about this, our next call
> > might be too soon for that).
> 
> Tuesdays tend to be very busy for me, and I am not sure I can join the
call
> next week. Arthy did some work on the jobs in the CentOS CI, she could
> probably work with Jiffin to make any changes that improve the experience
> for you. I'm happy to help out where I can too, of course :-)

If we can figure out another time to have a CI call, that would be helpful.
It would be good to pull in Patrice from CEA as well as anyone else who
cares.

It would really help if we could have someone with better time zone overlap
with me who could manage the CI stuff, but that may not be realistic.

Frank

p.s. here's another patch that had a failure, in this case, it looks like
the CI ran again and passed 2nd time:

https://review.gerrithub.io/#/c/377712/

Log:

Triggered by Gerrit: https://review.gerrithub.io/377712 in silent mode.
[EnvInject] - Loading node environment variables.
Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace
/home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster
[nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe
/tmp/jenkins2362831021118510052.sh
+ set +x
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed

  0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
0
100  1735  100  17350 0   9161  0 --:--:-- --:--:-- --:--:--
9179
Traceback (most recent call last):
  File "bootstrap.py", line 33, in 
b=json.loads(dat)
  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
  File "/usr/lib64/p

Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread Daniel Gryniewicz

On 09/08/2017 09:07 AM, William Allen Simpson wrote:

On 9/7/17 10:47 PM, Malahal Naineni wrote:
Last time I tried, I got the same. A thread was waiting 
in epoll_wait() with 29 second timeout that, it was working after such 
a timeout.



I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().



I'm looking at using an eventfd to wake up threads on shutdown.  That 
way, we can sleep for a long time while polling.


Daniel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread William Allen Simpson

On 9/7/17 10:47 PM, Malahal Naineni wrote:

Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 
second timeout that, it was working after such a timeout.


I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: GPFS: disable delegations temporarily

2017-09-08 Thread GerritHub
>From Jeff Layton :

Jeff Layton has uploaded this change for review. ( 
https://review.gerrithub.io/377712


Change subject: GPFS: disable delegations temporarily
..

GPFS: disable delegations temporarily

The core delegation code has been pretty broken for a long time now, so
we're reworking it into something saner. Once we do that, GPFS will need
to be converted to use the new interfaces. In the meantime, ensure that
GPFS won't try to get a delegation by disabling it in the FSAL's
staticinfo.

Change-Id: I82b78b4dfc36601e8fdd74cb91e47b8894c20635
Signed-off-by: Jeff Layton 
---
M src/FSAL/FSAL_GPFS/main.c
1 file changed, 3 insertions(+), 0 deletions(-)



  git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha 
refs/changes/12/377712/1
-- 
To view, visit https://review.gerrithub.io/377712
To unsubscribe, visit https://review.gerrithub.io/settings

Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I82b78b4dfc36601e8fdd74cb91e47b8894c20635
Gerrit-Change-Number: 377712
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: ceph: wire up delegation requests and callbacks

2017-09-08 Thread GerritHub
>From Jeff Layton :

Jeff Layton has uploaded this change for review. ( 
https://review.gerrithub.io/377714


Change subject: ceph: wire up delegation requests and callbacks
..

ceph: wire up delegation requests and callbacks

Allow ganesha to request a delegation from cephfs using the new
ceph_ll_delegation call. For now, only read delegations are supported,
but it should be possible to support write delegations in the future.

When the callback comes in from cephfs, issue a CB_RECALL to the
NFS client.

Change-Id: I110f357368c22af9a5710a5db56ea7eef8feb315
Signed-off-by: Jeff Layton 
---
M src/CMakeLists.txt
M src/FSAL/FSAL_CEPH/handle.c
M src/FSAL/FSAL_CEPH/main.c
M src/cmake/modules/FindCephFS.cmake
M src/include/config-h.in.cmake
5 files changed, 87 insertions(+), 0 deletions(-)



  git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha 
refs/changes/14/377714/1
-- 
To view, visit https://review.gerrithub.io/377714
To unsubscribe, visit https://review.gerrithub.io/settings

Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I110f357368c22af9a5710a5db56ea7eef8feb315
Gerrit-Change-Number: 377714
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Rework delegations to use new lease_op2 interface

2017-09-08 Thread GerritHub
>From Jeff Layton :

Hello Soumya,

I'd like you to do a code review. Please visit

https://review.gerrithub.io/377711

to review the following change.


Change subject: Rework delegations to use new lease_op2 interface
..

Rework delegations to use new lease_op2 interface

This patch is heavily based on Soumya Koduri's original work to add a
new lease_op2 operation. The original proposal had quite a bit more
fields to handle reclaims, and different fields to handle setting
and removing and whether it was a read only delegation.

Ganesha never reclaims delegations, so I don't think we really need
that setting at this point. Also, the r/w setting is ignored when
we're removing a delegation. That leaves us with only 3 different
states for lease_op2 -- no delegation, read delegation, or read/write
delegation.

This proposed lease_op2 interface just takes an owner pointer and a
fsal_deleg_t enum, that is one of FSAL_DELEG_NONE, FSAL_DELEG_RD,
FSAL_DELEG_WR. I think this covers all of the use cases and is simpler
for FSAL's to get right.

Change-Id: I9ef314125c5eb86fd21c449787349b78f32da1df
Signed-off-by: Soumya Koduri 
Signed-off-by: Jeff Layton 
---
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h
M src/FSAL/default_methods.c
M src/SAL/state_deleg.c
M src/include/fsal_api.h
M src/include/fsal_types.h
7 files changed, 86 insertions(+), 38 deletions(-)



  git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha 
refs/changes/11/377711/1
-- 
To view, visit https://review.gerrithub.io/377711
To unsubscribe, visit https://review.gerrithub.io/settings

Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ef314125c5eb86fd21c449787349b78f32da1df
Gerrit-Change-Number: 377711
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton 
Gerrit-Reviewer: Soumya 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: NFSv4: drop DO_DELEGATION define

2017-09-08 Thread GerritHub
>From Jeff Layton :

Jeff Layton has uploaded this change for review. ( 
https://review.gerrithub.io/377713


Change subject: NFSv4: drop DO_DELEGATION define
..

NFSv4: drop DO_DELEGATION define

I don't think we need the delegation code commented out. There are
enough configuration settings that nothing should really have this on
now anyway, and anything that does (GPFS?) should basically work with
the other recent changes.

Also, just remove state_open_deleg_conflict. The counters for this live
in the FSAL now, so it can't work anyway. The FSAL must enforce this if
necessary.

Change-Id: I92c32f142b211ea9f1f9fb0b27c39c2de57fc257
Signed-off-by: Jeff Layton 
---
M src/Protocols/NFS/nfs4_op_open.c
M src/SAL/state_deleg.c
M src/include/sal_functions.h
3 files changed, 0 insertions(+), 85 deletions(-)



  git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha 
refs/changes/13/377713/1
-- 
To view, visit https://review.gerrithub.io/377713
To unsubscribe, visit https://review.gerrithub.io/settings

Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I92c32f142b211ea9f1f9fb0b27c39c2de57fc257
Gerrit-Change-Number: 377713
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-08 Thread Niels de Vos
On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote:
> Lately, we have been plagued by a lot of intermittent test failures.
> 
> I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. These
> have not been resolved by the latest ntirpc pullup.
> 
> Additionally, we see a lot of intermittent failures in the continuous
> integration.
> 
> A big issue with the Centos CI is that it seems to have a fragile setup, and
> sometimes doesn't even succeed in trying to build Ganesha, and then fires a
> Verified -1. This makes it hard to evaluate what patches are actually ready
> for integration.

We can look into this, but it helps if you can provide a link to the
patch in GerritHub or the job in the CI.

> An additional issue with the Centos CI is that the failure logs often aren't
> preserved long enough to even diagnose the issue.

That is something we can change. Some jobs do not delete the results,
but others seem to do. How long (in days), or how many results would you
like to keep?

> The result is that honestly, I mostly ignore the Centos CI results. They
> almost might as well not be run...

This is definitely not what we want, so lets fix the problems.

... snip!

> Let's talk about CI more on a near time concall (it would help if Niels and
> Jiffin could join a call to talk about this, our next call might be too soon
> for that).

Tuesdays tend to be very busy for me, and I am not sure I can join the
call next week. Arthy did some work on the jobs in the CentOS CI, she
could probably work with Jiffin to make any changes that improve the
experience for you. I'm happy to help out where I can too, of course :-)

Thanks,
Niels

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel