Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 9/8/17 9:44 AM, Daniel Gryniewicz wrote: On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). I'm looking at using an eventfd to wake up threads on shutdown. That way, we can sleep for a long time while polling. There's already a signal to awaken the threads on shutdown. Finally figured it out, but it was complicated and took too long for review and inclusion into this week's dev release: (1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel. (2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and over, which sets up another transport epoll fd and then deletes it after each reply. Presumably this is unregistering services. Should probably unregister services *before* nfs_rpc_dispatch_stop() kills the listeners? Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call() repeatedly instead. No need to emulate UDP with TCP! (3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(), svc_rqst_shutdown(), and work_pool_shutdown(). (4) svc_xprt_shutdown() kills any remaining open transports. (5) svc_rqst_shutdown() didn't kill epolls that have no transports. The fix is to kill again channels previously killed in step #1, even though they no longer have any open transports. (6) work_pool_shutdown() waited until timeout caused that one remaining channel for the epoll fd (step #2) to terminate. This whole process has obviously been a problem in the past, and there were several otherwise extraneous state flags. This fix means they are not needed anymore. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Announce Push of V2.6-dev.8
Branch next Tag:V2.6-dev.8 Release Highlights * Various Debian fixes * Fix open_for_locks parameter in gpfs_lock_op2 * Fix Dispatch_Max_Reqs max value in documentation. * FSAL_RGW: Remove obsolete (non-support_ex) create method * PROXY improvements * Fix to make sure op_ctx is set when calling mdcache_lru_unref(). * Fixes to improve the call back foundation for delegations * setclientid: free clientid if client_r_addr is too long. * [GPFS] read_dirents: check status of FD gathering instead of FD itself. Signed-off-by: Frank S. Filz Contents: 2e2b8e6 Frank S. Filz V2.6-dev.8 a5d2eb1 Renaud Fortier make the LIBEXECDIR valid for distro Debian or Ubuntu 32c723a Swen Schillig [GPFS] read_dirents: check status of FD gathering instead of FD itself. a5fc5c1 Swen Schillig setclientid: free clientid if client_r_addr is too long. 7d1aec0 Jeff Layton nfs: use OPEN_DELEGATE_NONE_EXT when not granting a delegation on v4.1+ 72f778f Jeff Layton Take a reference to the session over a v4.1+ callback 86e6d81 Jeff Layton nfs: fix cleanup after delegrecall 9205652 Jeff Layton nfs: nfs41_complete_single -> nfs41_release_single f487727 Pradeep Fix to make sure op_ctx is set when calling mdcache_lru_unref(). 9e3d200 Patrice LUCAS FSAL_PROXY : code cleaning, remove useless comments 09303d9 Patrice LUCAS FSAL_PROXY : storing stateid from background NFS server 1feb444 Frank S. Filz FSAL_RGW: Remove obsolete (non-support_ex) create method 698ce89 Malahal Naineni Fix rpc-statd.service path on debian e5db2a8 Malahal Naineni Fix sleep path for debian 42abee8 Malahal Naineni Fix Dispatch_Max_Reqs max value in documentation. 57c9c30 Malahal Naineni Fix open_for_locks parameter in gpfs_lock_op2 --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration
It would really help if we could have someone with better time zone overlap with me who could manage the CI stuff, but that may not be realistic. We can sign up anyone in the NFS-Ganesha community to do this. It takes a little time to get familiar with the scripts and tools that are used, but once that settled it is relatively straight forward. Volunteers? I hope we can have more folks on this. I've tried to understand CI stuff and just get lost... I'll join. I have Centos logins already, although that may not extend to CI stuff. Daniel -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration
> Added a retry logic which is now live, and should get applied for all upcoming > tests: ... > > I'd say they need to be kept at least a week, if we could have time > > based retention rather than number of results retention, I think that would > help. > > Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not > really cost us anything, so I'll change it to 14 days. A screenshot for these > settings has been attached. It can be that I missed updating a job so let us > know in case logs are deleted too early. ... > I really understand this, a CI should be helpful in identifying problems, and > not introduce problems from itself. Lets try hard to not have you needing to > rant about it much more :-) Thanks for the efforts, I hope that will make C entos CI a lot more usable. > > It would really help if we could have someone with better time zone > > overlap with me who could manage the CI stuff, but that may not be > realistic. > > We can sign up anyone in the NFS-Ganesha community to do this. It takes a > little time to get familiar with the scripts and tools that are used, but once > that settled it is relatively straight forward. > > Volunteers? I hope we can have more folks on this. I've tried to understand CI stuff and just get lost... Frank --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration
And now with screenshot! :) Have a good weekend, Niels On Fri, Sep 08, 2017 at 05:41:30PM +0200, Niels de Vos wrote: > On Fri, Sep 08, 2017 at 06:55:19AM -0700, Frank Filz wrote: > > > On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote: > > > > Lately, we have been plagued by a lot of intermittent test failures. > > > > > > > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. > > > > These have not been resolved by the latest ntirpc pullup. > > > > > > > > Additionally, we see a lot of intermittent failures in the continuous > > > > integration. > > > > > > > > A big issue with the Centos CI is that it seems to have a fragile > > > > setup, and sometimes doesn't even succeed in trying to build Ganesha, > > > > and then fires a Verified -1. This makes it hard to evaluate what > > > > patches are actually ready for integration. > > > > > > We can look into this, but it helps if you can provide a link to the patch > > in > > > GerritHub or the job in the CI. > > > > Here's one merged last week with a Gluster CI Verify -1: > > > > https://review.gerrithub.io/#/c/375463/ > > > > And just to preserve it in case... here's the log: > > > > Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode. > > [EnvInject] - Loading node environment variables. > > Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace > > /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster > > [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe > > /tmp/jenkins5031649144466335345.sh > > + set +x > > % Total% Received % Xferd Average Speed TimeTime Time > > Current > > Dload Upload Total SpentLeft > > Speed > > > > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- > > 0 > > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- > > 0 > > 100 1735 100 17350 0 8723 0 --:--:-- --:--:-- --:--:-- > > 8718 > > Traceback (most recent call last): > > File "bootstrap.py", line 33, in > > b=json.loads(dat) > > File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads > > return _default_decoder.decode(s) > > File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode > > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > > File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode > > raise ValueError("No JSON object could be decoded") > > ValueError: No JSON object could be decoded > > https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console : > > FAILED > > Build step 'Execute shell' marked build as failure > > Finished: FAILURE > > > > Which tells me not much about why it failed, though it looks like a failure > > that has nothing to do with Ganesha... > > From #centos-devel on Freenode: > > 15:49 < ndevos> bstinson: is > https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3487/console a > known duffy problem? and how can the jobs work around this? > 15:51 < bstinson> ndevos: you may be hitting the rate limit > 15:52 < ndevos> bstinson: oh, that is possible, I guess... it might happen > when a series of patches get sent > 15:53 < ndevos> bstinson: should I do a sleep and retry in case of such a > failure? > 15:55 < bstinson> ndevos: yeah, that should work. we measure your usage over > 5 minutes > 15:57 < ndevos> bstinson: ok, so sleeping 5 minutes, retry and loop should be > acceptible? > 15:59 < ndevos> bstinson: is there a particular message returned by duffy > when the rate limit is hit? the reply is not json, but maybe some error? > 15:59 < ndevos> (in plain text format?) > 15:59 < bstinson> yeah 5 minutes should be acceptable, it does return a plain > text error message > 16:00 < bstinson> 'Deployment rate over quota, try again in a few minutes' > > Added a retry logic which is now live, and should get applied for all > upcoming tests: > > https://github.com/nfs-ganesha/ci-tests/commit/ed055058c7956ebb703464c742837a9ace797129 > > > > > > An additional issue with the Centos CI is that the failure logs often > > > > aren't preserved long enough to even diagnose the issue. > > > > > > That is something we can change. Some jobs do not delete the results, but > > > others seem to do. How long (in days), or how many results would you like > > to > > > keep? > > > > I'd say they need to be kept at least a week, if we could have time based > > retention rather than number of results retention, I think that would help. > > Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not > really cost us anything, so I'll change it to 14 days. A screenshot for > these settings has been attached. It can be that I missed updating a job > so let us know in case logs are deleted too early. > > > At least after a week, it's reasonable to expect folks to rebase their > > patches and re-submit, which would trigger a new run. > > > > > > The result is that honestly, I mostly igno
Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration
On Fri, Sep 08, 2017 at 06:55:19AM -0700, Frank Filz wrote: > > On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote: > > > Lately, we have been plagued by a lot of intermittent test failures. > > > > > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. > > > These have not been resolved by the latest ntirpc pullup. > > > > > > Additionally, we see a lot of intermittent failures in the continuous > > > integration. > > > > > > A big issue with the Centos CI is that it seems to have a fragile > > > setup, and sometimes doesn't even succeed in trying to build Ganesha, > > > and then fires a Verified -1. This makes it hard to evaluate what > > > patches are actually ready for integration. > > > > We can look into this, but it helps if you can provide a link to the patch > in > > GerritHub or the job in the CI. > > Here's one merged last week with a Gluster CI Verify -1: > > https://review.gerrithub.io/#/c/375463/ > > And just to preserve it in case... here's the log: > > Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode. > [EnvInject] - Loading node environment variables. > Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace > /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster > [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe > /tmp/jenkins5031649144466335345.sh > + set +x > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft > Speed > > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- > 0 > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- > 0 > 100 1735 100 17350 0 8723 0 --:--:-- --:--:-- --:--:-- > 8718 > Traceback (most recent call last): > File "bootstrap.py", line 33, in > b=json.loads(dat) > File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads > return _default_decoder.decode(s) > File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode > raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded > https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console : > FAILED > Build step 'Execute shell' marked build as failure > Finished: FAILURE > > Which tells me not much about why it failed, though it looks like a failure > that has nothing to do with Ganesha... >From #centos-devel on Freenode: 15:49 < ndevos> bstinson: is https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3487/console a known duffy problem? and how can the jobs work around this? 15:51 < bstinson> ndevos: you may be hitting the rate limit 15:52 < ndevos> bstinson: oh, that is possible, I guess... it might happen when a series of patches get sent 15:53 < ndevos> bstinson: should I do a sleep and retry in case of such a failure? 15:55 < bstinson> ndevos: yeah, that should work. we measure your usage over 5 minutes 15:57 < ndevos> bstinson: ok, so sleeping 5 minutes, retry and loop should be acceptible? 15:59 < ndevos> bstinson: is there a particular message returned by duffy when the rate limit is hit? the reply is not json, but maybe some error? 15:59 < ndevos> (in plain text format?) 15:59 < bstinson> yeah 5 minutes should be acceptable, it does return a plain text error message 16:00 < bstinson> 'Deployment rate over quota, try again in a few minutes' Added a retry logic which is now live, and should get applied for all upcoming tests: https://github.com/nfs-ganesha/ci-tests/commit/ed055058c7956ebb703464c742837a9ace797129 > > > An additional issue with the Centos CI is that the failure logs often > > > aren't preserved long enough to even diagnose the issue. > > > > That is something we can change. Some jobs do not delete the results, but > > others seem to do. How long (in days), or how many results would you like > to > > keep? > > I'd say they need to be kept at least a week, if we could have time based > retention rather than number of results retention, I think that would help. Some jobs seem to have been set to keep 7 days, max 7 jobs. It does not really cost us anything, so I'll change it to 14 days. A screenshot for these settings has been attached. It can be that I missed updating a job so let us know in case logs are deleted too early. > At least after a week, it's reasonable to expect folks to rebase their > patches and re-submit, which would trigger a new run. > > > > The result is that honestly, I mostly ignore the Centos CI results. > > > They almost might as well not be run... > > > > This is definitely not what we want, so lets fix the problems. > > Yea, and thus my rant... I really understand this, a CI should be helpful in identifying problems, and not introduce problems from itself. Lets try hard to not have you needing to rant
[Nfs-ganesha-devel] Problem connecting with FSAL_PROXY to a server sharing with fsid specified
I'm trying to export /dev/shm from a NFS v4.x server to FSAL_PROXY. Server is running Centos 7.3. To share /dev/shm, 'fsid=x' is required in /etc/exportfs. Can mount from a 4.0 Linux client with the command: mount -v -t nfs -o vers=4.0 :/ /mnt/test without issue. However, using FSAL_PROXY cannot access share with either: Path="/" (in which case the system crashes, pxy_lookup_path() returns success error code but doesn't create handle) Path="/dev/shm" (in which case get no such file or directory) Have repeated problem with other server mount points structured the same way (i.e., with fsid = x in exports). Problem replicated in 2.5.0 and also in latest 2.6. How can I connect to this share? Server /etc/exports: (): /dev/shm *(fsid=1,rw,async,no_subtree_check,no_root_squash) ganesha.conf: NFSV4 { DomainName="localdomain"; } EXPORT { Export_Id = 77; CLIENT { clients = *; Access_Type = RW; } Path = "/dev/shm"; # Path = "/"; Pseudo = "/proxy/dell/home_dell"; SecType = "sys"; Disable_ACL = TRUE; FSAL { Name = PROXY; } } PROXY { Remote_Server { Srv_Addr=; Use_Privileged_Client_Port = TRUE; } } -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration
> On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote: > > Lately, we have been plagued by a lot of intermittent test failures. > > > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. > > These have not been resolved by the latest ntirpc pullup. > > > > Additionally, we see a lot of intermittent failures in the continuous > > integration. > > > > A big issue with the Centos CI is that it seems to have a fragile > > setup, and sometimes doesn't even succeed in trying to build Ganesha, > > and then fires a Verified -1. This makes it hard to evaluate what > > patches are actually ready for integration. > > We can look into this, but it helps if you can provide a link to the patch in > GerritHub or the job in the CI. Here's one merged last week with a Gluster CI Verify -1: https://review.gerrithub.io/#/c/375463/ And just to preserve it in case... here's the log: Triggered by Gerrit: https://review.gerrithub.io/375463 in silent mode. [EnvInject] - Loading node environment variables. Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe /tmp/jenkins5031649144466335345.sh + set +x % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1735 100 17350 0 8723 0 --:--:-- --:--:-- --:--:-- 8718 Traceback (most recent call last): File "bootstrap.py", line 33, in b=json.loads(dat) File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3455//console : FAILED Build step 'Execute shell' marked build as failure Finished: FAILURE Which tells me not much about why it failed, though it looks like a failure that has nothing to do with Ganesha... > > An additional issue with the Centos CI is that the failure logs often > > aren't preserved long enough to even diagnose the issue. > > That is something we can change. Some jobs do not delete the results, but > others seem to do. How long (in days), or how many results would you like to > keep? I'd say they need to be kept at least a week, if we could have time based retention rather than number of results retention, I think that would help. At least after a week, it's reasonable to expect folks to rebase their patches and re-submit, which would trigger a new run. > > The result is that honestly, I mostly ignore the Centos CI results. > > They almost might as well not be run... > > This is definitely not what we want, so lets fix the problems. Yea, and thus my rant... > > Let's talk about CI more on a near time concall (it would help if > > Niels and Jiffin could join a call to talk about this, our next call > > might be too soon for that). > > Tuesdays tend to be very busy for me, and I am not sure I can join the call > next week. Arthy did some work on the jobs in the CentOS CI, she could > probably work with Jiffin to make any changes that improve the experience > for you. I'm happy to help out where I can too, of course :-) If we can figure out another time to have a CI call, that would be helpful. It would be good to pull in Patrice from CEA as well as anyone else who cares. It would really help if we could have someone with better time zone overlap with me who could manage the CI stuff, but that may not be realistic. Frank p.s. here's another patch that had a failure, in this case, it looks like the CI ran again and passed 2nd time: https://review.gerrithub.io/#/c/377712/ Log: Triggered by Gerrit: https://review.gerrithub.io/377712 in silent mode. [EnvInject] - Loading node environment variables. Building remotely on nfs-ganesha-ci-slave01 (nfs-ganesha) in workspace /home/nfs-ganesha/workspace/nfs-ganesha_trigger-fsal_gluster [nfs-ganesha_trigger-fsal_gluster] $ /bin/sh -xe /tmp/jenkins2362831021118510052.sh + set +x % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1735 100 17350 0 9161 0 --:--:-- --:--:-- --:--:-- 9179 Traceback (most recent call last): File "bootstrap.py", line 33, in b=json.loads(dat) File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/usr/lib64/p
Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). I'm looking at using an eventfd to wake up threads on shutdown. That way, we can sleep for a long time while polling. Daniel -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: GPFS: disable delegations temporarily
>From Jeff Layton : Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/377712 Change subject: GPFS: disable delegations temporarily .. GPFS: disable delegations temporarily The core delegation code has been pretty broken for a long time now, so we're reworking it into something saner. Once we do that, GPFS will need to be converted to use the new interfaces. In the meantime, ensure that GPFS won't try to get a delegation by disabling it in the FSAL's staticinfo. Change-Id: I82b78b4dfc36601e8fdd74cb91e47b8894c20635 Signed-off-by: Jeff Layton --- M src/FSAL/FSAL_GPFS/main.c 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/12/377712/1 -- To view, visit https://review.gerrithub.io/377712 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: I82b78b4dfc36601e8fdd74cb91e47b8894c20635 Gerrit-Change-Number: 377712 Gerrit-PatchSet: 1 Gerrit-Owner: Jeff Layton -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: ceph: wire up delegation requests and callbacks
>From Jeff Layton : Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/377714 Change subject: ceph: wire up delegation requests and callbacks .. ceph: wire up delegation requests and callbacks Allow ganesha to request a delegation from cephfs using the new ceph_ll_delegation call. For now, only read delegations are supported, but it should be possible to support write delegations in the future. When the callback comes in from cephfs, issue a CB_RECALL to the NFS client. Change-Id: I110f357368c22af9a5710a5db56ea7eef8feb315 Signed-off-by: Jeff Layton --- M src/CMakeLists.txt M src/FSAL/FSAL_CEPH/handle.c M src/FSAL/FSAL_CEPH/main.c M src/cmake/modules/FindCephFS.cmake M src/include/config-h.in.cmake 5 files changed, 87 insertions(+), 0 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/14/377714/1 -- To view, visit https://review.gerrithub.io/377714 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: I110f357368c22af9a5710a5db56ea7eef8feb315 Gerrit-Change-Number: 377714 Gerrit-PatchSet: 1 Gerrit-Owner: Jeff Layton -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Rework delegations to use new lease_op2 interface
>From Jeff Layton : Hello Soumya, I'd like you to do a code review. Please visit https://review.gerrithub.io/377711 to review the following change. Change subject: Rework delegations to use new lease_op2 interface .. Rework delegations to use new lease_op2 interface This patch is heavily based on Soumya Koduri's original work to add a new lease_op2 operation. The original proposal had quite a bit more fields to handle reclaims, and different fields to handle setting and removing and whether it was a read only delegation. Ganesha never reclaims delegations, so I don't think we really need that setting at this point. Also, the r/w setting is ignored when we're removing a delegation. That leaves us with only 3 different states for lease_op2 -- no delegation, read delegation, or read/write delegation. This proposed lease_op2 interface just takes an owner pointer and a fsal_deleg_t enum, that is one of FSAL_DELEG_NONE, FSAL_DELEG_RD, FSAL_DELEG_WR. I think this covers all of the use cases and is simpler for FSAL's to get right. Change-Id: I9ef314125c5eb86fd21c449787349b78f32da1df Signed-off-by: Soumya Koduri Signed-off-by: Jeff Layton --- M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h M src/FSAL/default_methods.c M src/SAL/state_deleg.c M src/include/fsal_api.h M src/include/fsal_types.h 7 files changed, 86 insertions(+), 38 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/11/377711/1 -- To view, visit https://review.gerrithub.io/377711 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: I9ef314125c5eb86fd21c449787349b78f32da1df Gerrit-Change-Number: 377711 Gerrit-PatchSet: 1 Gerrit-Owner: Jeff Layton Gerrit-Reviewer: Soumya -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: NFSv4: drop DO_DELEGATION define
>From Jeff Layton : Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/377713 Change subject: NFSv4: drop DO_DELEGATION define .. NFSv4: drop DO_DELEGATION define I don't think we need the delegation code commented out. There are enough configuration settings that nothing should really have this on now anyway, and anything that does (GPFS?) should basically work with the other recent changes. Also, just remove state_open_deleg_conflict. The counters for this live in the FSAL now, so it can't work anyway. The FSAL must enforce this if necessary. Change-Id: I92c32f142b211ea9f1f9fb0b27c39c2de57fc257 Signed-off-by: Jeff Layton --- M src/Protocols/NFS/nfs4_op_open.c M src/SAL/state_deleg.c M src/include/sal_functions.h 3 files changed, 0 insertions(+), 85 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/13/377713/1 -- To view, visit https://review.gerrithub.io/377713 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: I92c32f142b211ea9f1f9fb0b27c39c2de57fc257 Gerrit-Change-Number: 377713 Gerrit-PatchSet: 1 Gerrit-Owner: Jeff Layton -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration
On Fri, Sep 01, 2017 at 03:09:34PM -0700, Frank Filz wrote: > Lately, we have been plagued by a lot of intermittent test failures. > > I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. These > have not been resolved by the latest ntirpc pullup. > > Additionally, we see a lot of intermittent failures in the continuous > integration. > > A big issue with the Centos CI is that it seems to have a fragile setup, and > sometimes doesn't even succeed in trying to build Ganesha, and then fires a > Verified -1. This makes it hard to evaluate what patches are actually ready > for integration. We can look into this, but it helps if you can provide a link to the patch in GerritHub or the job in the CI. > An additional issue with the Centos CI is that the failure logs often aren't > preserved long enough to even diagnose the issue. That is something we can change. Some jobs do not delete the results, but others seem to do. How long (in days), or how many results would you like to keep? > The result is that honestly, I mostly ignore the Centos CI results. They > almost might as well not be run... This is definitely not what we want, so lets fix the problems. ... snip! > Let's talk about CI more on a near time concall (it would help if Niels and > Jiffin could join a call to talk about this, our next call might be too soon > for that). Tuesdays tend to be very busy for me, and I am not sure I can join the call next week. Arthy did some work on the jobs in the CentOS CI, she could probably work with Jiffin to make any changes that improve the experience for you. I'm happy to help out where I can too, of course :-) Thanks, Niels -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel