[Nfs-ganesha-devel] Pull up NTIRPC #80 & #81

2017-10-03 Thread William Allen Simpson
https://review.gerrithub.io/#/c/380970/ Begging for a mid-week dev release. These patches attempt to fix a crash found early in Bake-a-thon. DanG and I couldn't reproduce, so we need this to determine whether it has fixed the QE crash -- so they can move onward with more testing. I'd hoped

Re: [Nfs-ganesha-devel] clnt_call callbacks

2017-09-21 Thread William Allen Simpson
On 9/20/17 10:08 AM, William Allen Simpson wrote: My current expectation is that various fields of the Ganesha rpc_call_t should be merged/replaced by fields in the NTI_RPC struct svc_req so that async dispatch can handle the callback. To be more concrete, there are currently 3 structs

Re: [Nfs-ganesha-devel] clnt_call callbacks

2017-09-21 Thread William Allen Simpson
in the call thread rather than in a listening service thread. That should be part of this effort On Wed, Sep 20, 2017 at 10:08 AM, William Allen Simpson <william.allen.simp...@gmail.com> wrote: Currently, when clnt_call() is invoked, that thread waits for the result to come bac

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.10

2017-09-20 Thread William Allen Simpson
It's harder, as that message block appears twice in the code. I've pushed a patch to linuxbox2 ntirpc branch was16: svc_vc_wait to distinguish MSG_WAITALL Try that with Default_Log_Level = FULL_DEBUG; We'll get to the bottom of this.

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.10

2017-09-20 Thread William Allen Simpson
On 9/20/17 5:36 PM, William Allen Simpson wrote: On 9/20/17 2:04 PM, Marc Eshel wrote: recv errno 0 Belay that. Different message. recv returned -1, then errno was 0. I don't know what errno 0 means. rlen = recv(xprt->xp_fd, uv->v.vio_tail, xd->sx_fbtbc, MSG

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.10

2017-09-20 Thread William Allen Simpson
On 9/20/17 2:04 PM, Marc Eshel wrote: recv errno 0 "recv() returns 0 only when you request a 0-byte buffer or the other peer has gracefully disconnected" We know it's not 0-byte fragments, because that's not allowed: if (unlikely(!xd->sx_fbtbc)) {

Re: [Nfs-ganesha-devel] CEA-HPC test upgrade

2017-09-20 Thread William Allen Simpson
On 9/20/17 10:24 AM, William Allen Simpson wrote: Besides, I've not even started the UDP side.  Just some quick and dirty TCP code.  I've tried to make that transport agnostic.  But it's not well integrated or ready for prime time.  I'm looking for better ideas. I probably ought to mention

Re: [Nfs-ganesha-devel] CEA-HPC test upgrade

2017-09-20 Thread William Allen Simpson
On 9/20/17 9:45 AM, Frank Filz wrote: On 9/20/17 8:57 AM, Jeff Layton wrote: On Wed, 2017-09-20 at 08:50 -0400, William Allen Simpson wrote: Soumya, Jeff, how close is new delegation code to being stable? The new interface is merged, and I don't see us needing to make further changes

[Nfs-ganesha-devel] clnt_call callbacks

2017-09-20 Thread William Allen Simpson
Currently, when clnt_call() is invoked, that thread waits for the result to come back over the network. There are 250 or so "fridge" threads during startup for this alone. I've already changed the rpc_ctx_xfer_replymsg() to be transport agnostic (it was clnt_vc only). And added a result

Re: [Nfs-ganesha-devel] CEA-HPC test upgrade

2017-09-20 Thread William Allen Simpson
On 9/20/17 9:31 AM, William Allen Simpson wrote: On 9/20/17 8:57 AM, Jeff Layton wrote: FSAL_GPFS still needs to be converted to use the new interface (not too hard to do, but I'd rather it be done by someone who is able to test the result). I have a patch for FSAL_CEPH more or less ready

Re: [Nfs-ganesha-devel] CEA-HPC test upgrade

2017-09-20 Thread William Allen Simpson
On 9/20/17 8:57 AM, Jeff Layton wrote: On Wed, 2017-09-20 at 08:50 -0400, William Allen Simpson wrote: Soumya, Jeff, how close is new delegation code to being stable? The new interface is merged, and I don't see us needing to make further changes to it. Excellent. It's too close to Friday

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.10

2017-09-20 Thread William Allen Simpson
h next Tag:V2.6-dev.10 NOTE: This release contains an ntirpc pullup, please update your submodules Release Highlights * ntirpc pullup Signed-off-by: Frank S. Filz <ffilz...@mindspring.com> Contents: 2a86360 Frank S. Filz V2.6-dev.10 32db433 William Allen Simpson Pull up NTIRPC #75

Re: [Nfs-ganesha-devel] CEA-HPC test upgrade

2017-09-20 Thread William Allen Simpson
On 9/20/17 5:47 AM, LUCAS Patrice wrote: The CEA-HPC test configuration is now upgraded with a newer linux kernel (from centos 7.3) which solves the previous OPEN_DELEGATE_NONE_EXT bug. The CEA-HPC test should now be considered as relevant. Thank you! Much appreciated. Soumya, Jeff, how

Re: [Nfs-ganesha-devel] Continuing CI pain

2017-09-13 Thread William Allen Simpson
On 9/13/17 4:39 AM, Niels de Vos wrote: Why creating the logfile fail is not clear to me. Maybe something in the packaging was changed and the /var/log/ganesha/ directory is not writable for the ganesha.nfsd process anymore? Have changes for running as non-root been merged, maybe? Why does the

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: TIRPC_DEBUG_FLAG_DEFAULT

2017-09-13 Thread William Allen Simpson
On 9/13/17 12:10 AM, William Allen Simpson wrote: On 9/12/17 1:25 PM, William Allen Simpson wrote: On 9/12/17 10:12 AM, GerritHub wrote: william.allen.simp...@gmail.com has uploaded this change for *review*. View Change <https://review.gerrithub.io/378137> NTIRPC should configure e

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: TIRPC_DEBUG_FLAG_DEFAULT

2017-09-12 Thread William Allen Simpson
On 9/12/17 1:25 PM, William Allen Simpson wrote: On 9/12/17 10:12 AM, GerritHub wrote: william.allen.simp...@gmail.com has uploaded this change for *review*. View Change <https://review.gerrithub.io/378137> NTIRPC should configure exactly like Ganesha.  It's less confusing. On the

Re: [Nfs-ganesha-devel] Continuing CI pain

2017-09-12 Thread William Allen Simpson
On 9/12/17 6:06 PM, Frank Filz wrote: So this failure: https://ci.centos.org//job/nfs_ganesha_cthon04/1436/console Is an example of where we need some improvement. I looked at the top and scrolled down to the end. I have no idea why it failed. This is a case of too much information without a

Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-11 Thread William Allen Simpson
On 9/9/17 12:16 AM, William Allen Simpson wrote: On 9/8/17 9:44 AM, Daniel Gryniewicz wrote: On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout

Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread William Allen Simpson
On 9/8/17 9:44 AM, Daniel Gryniewicz wrote: On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen

Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread William Allen Simpson
On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second

Re: [Nfs-ganesha-devel] Segfault seen in libntirpc code even when values of the input arguments to function 'recvmsg' look fine

2017-09-07 Thread William Allen Simpson
On 9/6/17 5:58 PM, Frank Filz wrote: Ganesha uses ntirpc for NFS, MOUNTD, RQUOTA, and NLM (It does not support the non-standardized POSIX ACL sideband protocol for NFS 3). There has been discussion of Ganesha bringing rpc.statd inside Ganesha, I don't know if any clients use UDP for NSM

[Nfs-ganesha-devel] xdr_free in nfs_worker_thread.c

2017-09-06 Thread William Allen Simpson
In my most recent Napalm patch, I'd moved all the code that accesses the function_desc in nfs_worker_thread.c into nfs_worker_thread.c, hopefully making it easier to understand. Looking at it today, I remember that I've had a long time question. Both uses of xdr_free() use xdr_decode_func().

Re: [Nfs-ganesha-devel] V2.6 WRT14

2017-09-06 Thread William Allen Simpson
I think we're good here. I worked back to git checkout 81fbc56, and that's exactly the problem that I've fixed, where flags wasn't initialized in 1 of 3 code paths: if (xd->sx_fbtbc || (flags & UIO_FLAG_MORE)) { On 9/5/17 8:53 AM, Daniel Gryniewicz wrote: Can you pinpoint the line in your

Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-06 Thread William Allen Simpson
On 9/5/17 7:59 AM, Swen Schillig wrote: On Tue, 2017-09-05 at 05:41 -0400, William Allen Simpson wrote: Of course, my WRT5 passes. But this is wonderful.  Please tell us how you get this 100% reproducible result, so that we can reproduce it I'm afraid I'm not doing anything special

Re: [Nfs-ganesha-devel] Segfault seen in libntirpc code even when values of the input arguments to function 'recvmsg' look fine

2017-09-06 Thread William Allen Simpson
On 9/5/17 10:44 AM, Daniel Gryniewicz wrote: I'm stumped, then.  It all looks fine to me. I think you'll find that things work better by switching to TCP. The UDP client code (clnt_dg) is badly bugged in general. That needs re-writing, but UDP hasn't been a priority. There were unlocked

Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-05 Thread William Allen Simpson
On 9/4/17 6:59 AM, Swen Schillig wrote: On Sat, 2017-09-02 at 00:15 -0400, William Allen Simpson wrote: On 9/1/17 6:09 PM, Frank Filz wrote: Lately, we have been plagued by a lot of intermittent test failures. I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. These have

Re: [Nfs-ganesha-devel] Intermittent test failures - manual tests and continuous integration

2017-09-01 Thread William Allen Simpson
On 9/1/17 6:09 PM, Frank Filz wrote: Lately, we have been plagued by a lot of intermittent test failures. I have seen intermittent failures in pynfs WRT14, WRT15, and WRT16. These have not been resolved by the latest ntirpc pullup. Details? What's WRT16? My pynfs results say: WRT13

[Nfs-ganesha-devel] V2.6 WRT14

2017-09-01 Thread William Allen Simpson
On 8/30/17 1:34 PM, William Allen Simpson wrote: On 8/28/17 1:23 AM, Frank Filz wrote: WRT14 is the test that failed that made me kick Bill’s patch out of dev.5, then I couldn’t get it to fail again, so I included the patch in dev.6. Since it turned out not to be a dev.5 issue.  Malahal

Re: [Nfs-ganesha-devel] Crash in TIRPC with Ganesha 2.6-dev.5

2017-09-01 Thread William Allen Simpson
On 8/31/17 5:42 PM, William Allen Simpson wrote: On 8/31/17 12:17 PM, Pradeep wrote: Thanks Dan and Bill for the quick response. As Dan suggested, is moving svc_rqst_xprt_register() to the end​ of svc_vc_rendezvous()​ the right​ fix? Partly.  Also needs checking the error return. https

Re: [Nfs-ganesha-devel] Crash in TIRPC with Ganesha 2.6-dev.5

2017-08-31 Thread William Allen Simpson
On 8/31/17 9:14 AM, Daniel Gryniewicz wrote: On 08/30/2017 10:06 PM, Pradeep wrote: Hi all, I'm hitting a crash in TIRPC with Ganesha 2.6-dev.5. It appears to me that there is a race between a incoming RPC message on a new xprt (for which accept() was done on the FD) and TIRPC setting the

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.6

2017-08-26 Thread William Allen Simpson
On 8/26/17 6:00 AM, William Allen Simpson wrote: On 8/26/17 1:11 AM, Malahal Naineni wrote: Hi Bill and Frank, I tried pynfs with the latest V2.6, WRT14 fails for me. It passed with dev-2 and failed with dev-3. The only commit that is suspect at this point

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.6

2017-08-26 Thread William Allen Simpson
On 8/26/17 7:06 AM, William Allen Simpson wrote: On 8/25/17 5:42 PM, Frank Filz wrote: Branch next Tag:V2.6-dev.6 After pynfs 4.0 seeing (among others): Direct leak of 2480 byte(s) in 2 object(s) allocated from: #0 0x76efcfe0 in calloc (/lib64/libasan.so.3+0xc6fe0) #1 0x59e871

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.6

2017-08-26 Thread William Allen Simpson
On 8/25/17 5:42 PM, Frank Filz wrote: Branch next Tag:V2.6-dev.6 In addition to the leaks reported in dev.5 (simple startup followed by pkill ganesha), after one mount followed by umount seeing: Indirect leak of 1024 byte(s) in 1 object(s) allocated from: #0 0x76efce20 in malloc

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.6

2017-08-26 Thread William Allen Simpson
On 8/26/17 1:11 AM, Malahal Naineni wrote: Hi Bill and Frank, I tried pynfs with the latest V2.6, WRT14 fails for me. It passed with dev-2 and failed with dev-3. The only commit that is suspect at this point is c29114162bb553270835c8d51d4184ce8bb1ab32 Can someone verify if WRT14

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Remove support_ex FSAL method

2017-08-22 Thread William Allen Simpson
On 8/11/17 4:38 PM, GerritHub wrote: Frank Filz has uploaded this change for *review*. View Change Remove support_ex FSAL method Frank, I reviewed the simple removes, and even this one, but got flummoxed where you change code to assume support_ex. I

Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev.5

2017-08-18 Thread William Allen Simpson
On 8/17/17 8:01 PM, Frank Filz wrote: Branch next Tag:V2.6-dev.5 A quick test gave a pile of address sanitizer errors: ==11249==ERROR: LeakSanitizer: detected memory leaks Direct leak of 176 byte(s) in 1 object(s) allocated from: #0 0x76efcfe0 in calloc (/lib64/libasan.so.3+0xc6fe0)

Re: [Nfs-ganesha-devel] v2.6-dev-4 leaves 271 threads hanging around

2017-08-18 Thread William Allen Simpson
. I tried with gdb as well, it came out too. I saw only few threads (about 10) after sending the signal. Can you tell me how I can reproduce without 'gdb' ? (gpfs fsal has some issues with gdb at times..) Regards, Malahal. On Thu, Aug 17, 2017 at 4:56 PM, William Allen Simpson <william.allen.s

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Napalm nfs_worker_thread NFS_REQUEST queue

2017-08-17 Thread William Allen Simpson
nge-Id: Ib5ce7e184a2029ff36830e8b0d59d96df3f717fa Signed-off-by: William Allen Simpson <william.allen.simp...@redhat.com> --- M src/MainNFSD/nfs_rpc_dispatcher_thread.c M src/MainNFSD/nfs_worker_thread.c M src/include/nfs_init.h 3 files changed, 89 insertions(+), 138 deletions(-)

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: WIP dev-4 test no change

2017-08-17 Thread William Allen Simpson
Works in dev.4a! Thanks, Malahal. On 8/13/17 8:07 AM, William Allen Simpson wrote: I was trying to test my next code, and had strange ASan results. So I dropped back to dev-4; it compiles, but never terminates. So I made this WIP test with no change, and CI doesn't even compile

Re: [Nfs-ganesha-devel] v2.6-dev-4 leaves 271 threads hanging around

2017-08-17 Thread William Allen Simpson
On 8/15/17 11:53 AM, William Allen Simpson wrote: Rather than spam the entire list, if anybody wants the gdb bt. I can send the ganesha.log, too, but it's bigger. To test, rm the log, setup the libraries, gdb, run -F -- and on another connection pkill ganesha. Nothing else. That's always my

[Nfs-ganesha-devel] v2.6-dev-4 leaves 271 threads hanging around

2017-08-15 Thread William Allen Simpson
Rather than spam the entire list, if anybody wants the gdb bt. I can send the ganesha.log, too, but it's bigger. To test, rm the log, setup the libraries, gdb, run -F -- and on another connection pkill ganesha. Nothing else. That's always my first test.

Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-15 Thread William Allen Simpson
...@gmail.com <mailto:mala...@gmail.com>> wrote: Unfortunately, I need a fix for this issue against ganesha2.3. Regards, Malahal. On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson <william.allen.simp...@gmail.com <mailto:william.allen.simp...@gmail.com>> wrote:

Re: [Nfs-ganesha-devel] exporting cephfs as nfs share on RDMA transport

2017-08-15 Thread William Allen Simpson
On 8/14/17 1:31 PM, Raju Rangoju wrote: So, I was wondering maybe there is something else that needs to be configured for RDMA transport? Or Am I missing something? Can someone please provide some pointers on this. RDMA was never completed. It required a significant re-write of both ntirpc

Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-14 Thread William Allen Simpson
On 8/13/17 11:50 PM, Malahal Naineni wrote: >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing UDP. It's a mess, and has been a mess for a long time. We get a file descriptor fd and then create "rec", but while destroying things, we close "fd" and then rpc_dplx_unref().

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: WIP dev-4 test no change

2017-08-13 Thread William Allen Simpson
base. On 8/13/17 7:51 AM, GerritHub wrote: william.allen.simp...@gmail.com has uploaded this change for *review*. View Change <https://review.gerrithub.io/374060> WIP dev-4 test no change Change-Id: I28e58e32c151c0ea8adafe60ae56b84b2f9bd02f Signed-off-by: William Allen S

Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-11 Thread William Allen Simpson
On 8/11/17 8:35 AM, Matt Benjamin wrote: On Fri, Aug 11, 2017 at 8:26 AM, William Allen Simpson <william.allen.simp...@gmail.com> wrote: On 8/11/17 2:29 AM, Malahal Naineni wrote: Following confirms that Thread1 (TCP) is trying to use the same "rec" as Thread42 (UDP), it is e

Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-11 Thread William Allen Simpson
On 8/11/17 2:29 AM, Malahal Naineni wrote: Following confirms that Thread1 (TCP) is trying to use the same "rec" as Thread42 (UDP), it is easy to reproduce on the customer system! There are 2 duplicated fd indexed trees, not well coordinated. My 2015 code to fix this went in Feb/Mar

Re: [Nfs-ganesha-devel] only use of UDP client is NSM

2017-08-08 Thread William Allen Simpson
On 8/8/17 1:58 PM, Daniel Gryniewicz wrote: On 08/08/2017 01:17 PM, William Allen Simpson wrote: NSM should be accessible by TCP. Why are we using UDP? Is there a downstream need? Yes, there is a downstream need for NSM. Would prefer folks answer the question asked. I didn't ask about

[Nfs-ganesha-devel] only use of UDP client is NSM

2017-08-08 Thread William Allen Simpson
Frank, Dominique tracked it down: #0 0x4e2ea0 in calloc (/export/nfs-ganesha/build/MainNFSD/ganesha.nfsd+0x4e2ea0) #1 0x5d0447 in gsh_calloc__ /export/nfs-ganesha/src/include/abstract_mem.h:145:12 #2 0x758f7eb6 in svc_dg_xprt_zalloc

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-07 Thread William Allen Simpson
On 8/7/17 9:42 AM, Frank Filz wrote: It never has been. In cache_inode, a pin-ref kept it from being reaped, now any ref beyond 1 keeps it. Guess we need to do something about that... We need to put limits on state somewhere, that would take care of it mostly. We could still have some files

Re: [Nfs-ganesha-devel] NFSv4 delegation in Ganesha

2017-08-07 Thread William Allen Simpson
On 8/7/17 2:29 AM, Soumya Koduri wrote: clubbing with lock, a dedicated fop Amusing India'isms? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org!

[Nfs-ganesha-devel] Formal request to merge PRs no later than Thursdays

2017-08-05 Thread William Allen Simpson
The second week in a row, Frank's merge late on Friday caused a build conflict. Many/most of our developers don't work on Frank's schedule. Therefore, these conflicts cannot be resolved for upwards of 3 days. We have a status conference call on Tuesdays. There's no reason that we could not

[Nfs-ganesha-devel] address sanitizer spurious pynfs errors

2017-08-03 Thread William Allen Simpson
[root@simpson91 nfs4.0]# ./testserver.py 127.0.0.1:/testnfs/test40-v4 --maketree all Got error: Connection closed Sleeping for 120 seconds: Normally, no connection closed. Same results at end, though. Some runs, says that twice, and then hangs. Have to kill the connection, ^c and ^d do not

[Nfs-ganesha-devel] address sanitizer defeats -O0

2017-08-03 Thread William Allen Simpson
It's improperly reorganizing code. === Accessing freed memory at ==> Logically, this couldn't happen! int free_nfs_request(request_data_t *reqdata) { atomic_dec_uint32_t(>r_d_refs); LogDebug(COMPONENT_DISPATCH, "%s: %p fd %d xp_refs %" PRIu32 " r_d_refs %"

[Nfs-ganesha-devel] make dist failures

2017-08-02 Thread William Allen Simpson
Last week, the new make dist code failed because both ntirpc and ganesha tried to build target "dist". This week, dist cannot find ntirpc-1.6.0.tar.gz? https://ci.centos.org/job/nfs-ganesha_trigger-fsal_gluster/3196//console I don't see this on my local system, even building as a submodule, so

Re: [Nfs-ganesha-devel] Compile options

2017-08-02 Thread William Allen Simpson
On 8/1/17 1:14 PM, Daniel Gryniewicz wrote: On 08/01/2017 12:49 PM, Daniel Gryniewicz wrote: On the call, it was requested that I start a thread listing compile options that I use, so others can chime in, allowing everyone to learn from each other. Here's my standard list: CFLAGS=-O0 -g

Re: [Nfs-ganesha-devel] destroying timing window

2017-07-30 Thread William Allen Simpson
On 7/30/17 7:45 AM, William Allen Simpson wrote: On 7/29/17 8:46 AM, William Allen Simpson wrote: Because it will be an API change to SVC_RELEASE() -- a macro and inline function -- we cannot backport. Obviously, we haven't seen this very often. But it will be a good reason for moving

Re: [Nfs-ganesha-devel] destroying timing window

2017-07-30 Thread William Allen Simpson
On 7/29/17 8:46 AM, William Allen Simpson wrote: On 7/28/17 7:37 AM, Dominique Martinet wrote: I also get a random crash with ASAN during init, one out of 3-4 times [...] I'm not able to reproduce. But Matt and I agree that you've found a race condition that has been there for a long time

[Nfs-ganesha-devel] destroying timing window

2017-07-29 Thread William Allen Simpson
Thought I'd share my findings and proposed solution with the whole group. On 7/28/17 7:37 AM, Dominique Martinet wrote: I also get a random crash with ASAN during init, one out of 3-4 times Here's the stack: #0 0x00507470 in __sanitizer::Die() () #1 0x004ea1be in

Re: [Nfs-ganesha-devel] parent->content_lock.__data.__writer

2017-07-28 Thread William Allen Simpson
FYI, my cmake output: [bill@wasite w541]$ . cmake-rdma.sh -- cmake version 3.9 -- The C compiler identification is GNU 7.1.1 -- The CXX compiler identification is GNU 7.1.1 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler

[Nfs-ganesha-devel] parent->content_lock.__data.__writer

2017-07-28 Thread William Allen Simpson
Setting up another test machine under Fedora 26, got this error that I've not seen before. There's another library that cmake isn't checking? [ 72%] Building C object FSAL/Stackable_FSALs/FSAL_MDCACHE/CMakeFiles/fsalmdcache.dir/mdcache_helpers.c.o In file included from

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Napalm dispatch

2017-07-26 Thread William Allen Simpson
On 7/26/17 5:47 AM, William Allen Simpson wrote: On 7/14/17 1:08 PM, GerritHub wrote: william.allen.simp...@gmail.com has uploaded this change for *review*. View Change <https://review.gerrithub.io/369641> Refresh call for review. A note on the patch format. Some functions remain in

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Napalm dispatch

2017-07-26 Thread William Allen Simpson
On 7/26/17 5:47 AM, William Allen Simpson wrote: On 7/14/17 1:08 PM, GerritHub wrote: william.allen.simp...@gmail.com has uploaded this change for *review*. View Change <https://review.gerrithub.io/369641> Refresh call for review. I'd already made this available for pre-review on

Re: [Nfs-ganesha-devel] Balancing input workers

2017-07-23 Thread William Allen Simpson
, I'm not sure what you mean by lookahead. On Fri, Jul 21, 2017 at 11:06 AM, William Allen Simpson <william.allen.simp...@gmail.com> wrote: My current Napalm code essentially gives the following priorities: New UDP, TCP, RDMA, or 9P connections are the "same" priority, as they eac

Re: [Nfs-ganesha-devel] About FSAL_GLUSTER performance

2017-07-21 Thread William Allen Simpson
On 7/21/17 6:19 AM, Jiffin Tony Thottan wrote: On 21/07/17 13:11, gui mark wrote: Hi all (cc maintainers), We've tried a performance test comparing nfs-ganesha and gNFS from gluster, we found that gNFS out performs nfs-ganesha by nearly 2 times on OPS. As I read throught the code

[Nfs-ganesha-devel] Balancing input workers

2017-07-21 Thread William Allen Simpson
My current Napalm code essentially gives the following priorities: New UDP, TCP, RDMA, or 9P connections are the "same" priority, as they each have their own channel, and they each have a dedicated epoll thread. The only limit is the OS runs out of file descriptors and rejects the connection

Re: [Nfs-ganesha-devel] Where else is the nTIRPC submodule version stored?

2017-07-21 Thread William Allen Simpson
On 7/21/17 9:08 AM, Daniel Gryniewicz wrote: -set(NTIRPC_MIN_VERSION 1.5.0) +set(NTIRPC_MIN_VERSION 1.6.0) Had already done that part. > -%{_libdir}/libntirpc.so.1.5 > +%{_libdir}/libntirpc.so.@NTIRPC_ABI_EMBED@ That's what I was suggesting. Thanks.

Re: [Nfs-ganesha-devel] Where else is the nTIRPC submodule version stored?

2017-07-20 Thread William Allen Simpson
I've updated my patch to do the same thing as Niels. But this is inherently unsatisfying. What we really might need is to split @NTIRPC_VERSION_EMBED@ into two strings for V2.6 going forward: @NTIRPC_PATCH_VERSION_EMBED@ @NTIRPC_MINOR_VERSION_EMBED@ Or something like it. I'll talk with DanG

Re: [Nfs-ganesha-devel] Where else is the nTIRPC submodule version stored?

2017-07-20 Thread William Allen Simpson
On 7/20/17 5:48 PM, William Allen Simpson wrote: On 7/20/17 5:37 PM, William Allen Simpson wrote: In https://review.gerrithub.io/#/c/369641/ https://ci.centos.org//job/nfs_ganesha_cthon04/1047/console The libntirpc submodule version is updated to 1.6. Why is there still a reference somewhere

Re: [Nfs-ganesha-devel] Where else is the nTIRPC submodule version stored?

2017-07-20 Thread William Allen Simpson
On 7/20/17 5:37 PM, William Allen Simpson wrote: In https://review.gerrithub.io/#/c/369641/ https://ci.centos.org//job/nfs_ganesha_cthon04/1047/console The libntirpc submodule version is updated to 1.6. Why is there still a reference somewhere to libntirpc.so.1.5? It may be src/nfs

[Nfs-ganesha-devel] Where else is the nTIRPC submodule version stored?

2017-07-20 Thread William Allen Simpson
In https://review.gerrithub.io/#/c/369641/ https://ci.centos.org//job/nfs_ganesha_cthon04/1047/console The libntirpc submodule version is updated to 1.6. Why is there still a reference somewhere to libntirpc.so.1.5? OTOH, CEA-HPC failed in mysterious ways on previous versions of this patch,

[Nfs-ganesha-devel] WIP pre-review WAS26napalm

2017-07-10 Thread William Allen Simpson
https://github.com/linuxbox2/nfs-ganesha/tree/was26napalm If you compare with the now up-to-date next branch in the same tree, the changes should be fairly easy to review. I'm planning on pushing that gerrithub review later in the week. For compiling purposes, you'd need:

Re: [Nfs-ganesha-devel] commit test Comparisons

2017-06-28 Thread William Allen Simpson
On 6/28/17 8:41 PM, William Allen Simpson wrote: This is a good programming practice of long-standing value. Why of why do these evil commit tests keep creeping in? bill@simpson91:~/rdma/nfs-ganesha$ git commit --amend -a WARNING: Comparisons should place the constant on the right side

[Nfs-ganesha-devel] commit test Comparisons

2017-06-28 Thread William Allen Simpson
This is a good programming practice of long-standing value. Why of why do these evil commit tests keep creeping in? bill@simpson91:~/rdma/nfs-ganesha$ git commit --amend -a WARNING: Comparisons should place the constant on the right side of the test #17: FILE:

[Nfs-ganesha-devel] Trying Ganesha 2.5.0.2

2017-06-25 Thread William Allen Simpson
-- Found NTIRPC: /home/bill/rdma/install/include/ntirpc (found suitable version "1.5.2", minimum required is "1.4.0") Somebody needs to update that for Ganesha 2.5-stable, as the minimum ntirpc that will compile is 1.5.0, and the current version is 1.5.2.

[Nfs-ganesha-devel] cache and hash and partitions should be primes

2017-06-21 Thread William Allen Simpson
Was looking through ntirpc cache/hash sizes, and discovered that: svc_auth_des.c has 64 (not prime); authgss_hash.c has 255 (not prime). Configurable number of partitions aren't checked for primality. So began checking Ganesha as well. src/support/export_mgr.c has a nice size of 769,

[Nfs-ganesha-devel] dividends from new interrupt processing

2017-06-20 Thread William Allen Simpson
Just a note that I'm seeing unexpected improvements from just the interrupt processing. For some unknown timing reason, client responses are better aggregated in buffers, allowing faster inline processing. Note the change from non-INLINE to INLINE OF course, I don't know why we are

Re: [Nfs-ganesha-devel] async dispatch not good

2017-06-20 Thread William Allen Simpson
On 6/19/17 10:47 PM, Matt Benjamin wrote: From: "William Allen Simpson" <william.allen.simp...@gmail.com> While I'm thinking about it, why does Ganesha call svc_reg()? AFAICT, that's just filling in a tree that is never used anymore. Can I remove that code in Ganesha? It's a

Re: [Nfs-ganesha-devel] UDP VSOCK?

2017-06-19 Thread William Allen Simpson
On 6/19/17 3:44 PM, Matt Benjamin wrote: there is no UDP vsock, it's always a stream socket, this could be done differently, as desired Good, 'cause I've ready submitted the patch. Also, VSOCK only needs to support NFS v3 and v4, not the other programs? But I could be wrong? This

Re: [Nfs-ganesha-devel] async dispatch not good

2017-06-19 Thread William Allen Simpson
On 6/19/17 3:41 PM, Matt Benjamin wrote: it's not about memory, this is the problem we're trying to avoid but, referring for context to our verbal discussion earlier today, your suggestion to hybridize the existing output side (which depends on blocking sockets) and an async input side using

[Nfs-ganesha-devel] async dispatch not good

2017-06-19 Thread William Allen Simpson
As folks may have noticed, I've been re-working my old 2015 dispatch patches that eliminate the network input-side queues in Ganesha. Matt had wanted fully async non-blocking I-O. I've been poking at it for a week, and now am sure that's the wrong way to go. It might still be good for FSALs.

Re: [Nfs-ganesha-devel] VSOCK initialization issues?

2017-06-17 Thread William Allen Simpson
On 6/16/17 4:38 PM, William Allen Simpson wrote: Tried to talk to DanG today, but he went home earlier than usual. So maybe somebody else knows: void Create_SVCXPRTs(void) { protos p; LogFullDebug(COMPONENT_DISPATCH, "Allocation of the SVCXPRT"); for (p = P_NFS; p &l

[Nfs-ganesha-devel] UDP VSOCK?

2017-06-16 Thread William Allen Simpson
Tried to talk to DanG today, but he went home earlier than usual. So maybe somebody else knows: void Create_SVCXPRTs(void) { protos p; LogFullDebug(COMPONENT_DISPATCH, "Allocation of the SVCXPRT"); for (p = P_NFS; p < P_COUNT; p++) if

[Nfs-ganesha-devel] no longer allowing every protocol on every port

2017-06-16 Thread William Allen Simpson
I've discovered that Ganesha accepts every protocol on every port. That is, an NLM port will actually allow NFSv3 or v4 and vice versa. Checked with Bruce Fields, the kernel doesn't do that. One of my old Napalm patches (Aug 3, 2015) combined is_rpc_call_valid() with nfs_rpc_get_funcdesc();

Re: [Nfs-ganesha-devel] ntirpc GSS over TCP checksum

2017-06-16 Thread William Allen Simpson
it. OK, will do. On Wed, Jun 7, 2017 at 9:41 PM, William Allen Simpson <william.allen.simp...@gmail.com <mailto:william.allen.simp...@gmail.com>> wrote: [...] This means that for GSS, the checksum is done twice? Standard tirpc has no checksum. (RDMA doesn't do the che

Re: [Nfs-ganesha-devel] ntirpc v1.5.2 .gitignore patch

2017-06-05 Thread William Allen Simpson
On 6/3/17 3:56 PM, Malahal Naineni wrote: I don't understand why you need to keep patches when git gives you the power to store your code in its own branches/stashes. Arguably, git commits have more information than a simple patch diff. That's why I use git format-patch. The default directory

Re: [Nfs-ganesha-devel] ntirpc v1.5.2 .gitignore patch

2017-06-05 Thread William Allen Simpson
On 6/2/17 12:19 PM, Niels de Vos wrote: On Fri, Jun 02, 2017 at 10:48:45AM -0400, Daniel Gryniewicz wrote: https://github.com/nfs-ganesha/ntirpc/pull/49 I'd already found that. Log wasn't very helpful. In fact, seemed wrong. There's no good reason to change the practices for all upstreams

Re: [Nfs-ganesha-devel] Any plans to support callback API model for FSALs?

2017-05-12 Thread William Allen Simpson
Soumya and I have been working on-and-off for a couple of months on a design for both async callback and zero-copy, based upon APIs already implemented for Gluster. Once we have something comprehensive and well-written, I'd like to get feedback from other FSALs. And of course, zero-copy is the

Re: [Nfs-ganesha-devel] dispatch queues

2017-03-10 Thread William Allen Simpson
On 3/9/17 10:22 AM, Matt Benjamin wrote: >> From: "William Allen Simpson" <william.allen.simp...@gmail.com> >> After a somewhat loud discussion with Matt, we've agreed on a >> different approach. This will also be useful for fully async IO >> that is

Re: [Nfs-ganesha-devel] UDP duplicate cache in both Ganesha and ntirpc?

2017-03-09 Thread William Allen Simpson
On 3/9/17 10:12 AM, Daniel Gryniewicz wrote: > It probably should stay. ntirpc is intended to be useful to more than > Ganesha, and this seems like a useful feature for potential users. It's > not the codepaths called by Ganesha, so it doesn't cause any problems. > It *is* the codepaths called

[Nfs-ganesha-devel] UDP duplicate cache in both Ganesha and ntirpc?

2017-03-09 Thread William Allen Simpson
Anybody have any objections to my removing the ntirpc version? Clearly, this is done in RPCAL/nfs_duprec.c, so why is it also in libntirpc/src/svc_dg.c? According to blame, Matt, Malahal, and Frank have all worked on this, but not since early 2015.

Re: [Nfs-ganesha-devel] dispatch queues

2017-03-08 Thread William Allen Simpson
On 3/8/17 5:34 AM, William Allen Simpson wrote: > Ganesha currently has 2 phases of dispatch queuing: one for input and > decoding, then another for executing/encoding output. (I've fixed the > third queue for later sending the output, where the thread should stay > hot as long as

[Nfs-ganesha-devel] dispatch queues

2017-03-08 Thread William Allen Simpson
Ganesha currently has 2 phases of dispatch queuing: one for input and decoding, then another for executing/encoding output. (I've fixed the third queue for later sending the output, where the thread should stay hot as long as there's more to process.) On Monday, Matt told me we were having

Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-07 Thread William Allen Simpson
On 3/7/17 4:56 AM, William Allen Simpson wrote: > On 3/6/17 6:58 PM, Matt Benjamin wrote: >> Looking briefly at section 5.3.3.3 of rfc2203, it seems like that would be >> correct. If the client has just refreshed its credentials, why is it >> continuing to send with the ex

Re: [Nfs-ganesha-devel] Permission denied error with Kerberos enabled

2017-03-07 Thread William Allen Simpson
On 3/6/17 6:58 PM, Matt Benjamin wrote: > Looking briefly at section 5.3.3.3 of rfc2203, it seems like that would be > correct. If the client has just refreshed its credentials, why is it > continuing to send with the expired context? > I don't know, but I'll take a look. Now that we always

[Nfs-ganesha-devel] ntirpc ready for V2.5 this week?

2017-03-04 Thread William Allen Simpson
In December to mid-January, I prepared a fairly large amount of internal alloc/free and lock restructuring. It took only a few days to integrate some old patches and add some new ideas. Some of these patches were originally ~18 months old. It's taken two months to feed those patches into the

[Nfs-ganesha-devel] revert/repair ntirpc patch b4254b92

2017-02-24 Thread William Allen Simpson
Last week, Daniel G had to revert one of my patches, because the CEA tests segfaulted during shutdown. Neither of us could reproduce. Yesterday, I updated my Fedora debuginfos. Suddenly, I could reproduce! After _many_ *MANY* hours, I've discovered that it wasn't in my patch. :( ntirpc git

Re: [Nfs-ganesha-devel] [RFC] change struct state_t to include fsal_fd

2017-02-23 Thread William Allen Simpson
On 2/23/17 3:43 AM, Swen Schillig wrote: > while removing some legacy GPFS code I stumbled over this > > my_fd = (struct gpfs_fd *)(state + 1); > > which is probably just copied from an early version to all our FSALs. > Looks like the state struct is followed by per-FSAL independent data. A

Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: pull up ntirpc #41 & #42

2017-02-23 Thread William Allen Simpson
On 2/21/17 12:30 PM, GerritHub wrote: > william.allen.simp...@gmail.com uploaded this change for *review*. > > View Change > > pull up ntirpc #41 > > Remove unused SVC_VC_CREATE_BOTHWAYS > This is now: pull up ntirpc #41 & #42 Many thanks to Daniel G for his

[Nfs-ganesha-devel] ntirpc pull request #41

2017-02-17 Thread William Allen Simpson
https://github.com/nfs-ganesha/ntirpc/pull/41 https://github.com/linuxbox2/ntirpc/tree/was14wrap There's even one of Swen's ideas in it, so I'm hoping that he'll take a look. -- Check out the vibrant tech community on

<    1   2   3   >