Re: [Nfs-ganesha-devel] libntirpc thread local storage

2017-12-09 Thread William Allen Simpson

On 12/9/17 5:28 PM, Matt Benjamin wrote:

I've already proposed we remove this.  No one is invested in it, I don't think.


OK.  I'll take a poke at it today.  It makes sense that this is a
good time to handle, as we've already made a major change to
CLNT_CALL in this release.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] libntirpc thread local storage

2017-12-09 Thread Matt Benjamin
I've already proposed we remove this.  No one is invested in it, I don't
think.

Matt

On Dec 10, 2017 2:38 AM, "William Allen Simpson" <
william.allen.simp...@gmail.com> wrote:

> I've run into another TLS problem.  It's been there since tirpc.
>
> Apparently, once upon a time, rpc_createerr was a static global.
> It still says that in the man pages.
>
> When a client create function fails, they stash the error there,
> and return NULL for the CLIENT.  Basically, you check for NULL,
> and then check rpc_createerr
>
> This is also used extensively by the RPC bind code.
>
> Then, they made it a keyed thread local to try and salvage it
> without a major code re-write.
>
> With async threads, that's not working.  We've got memory leaks.
> AFAICT, only on errors.  But accessing them on a different
> thread gives the wrong error code (or none at all).  Not good.
>
> All the functions that use it are still not MT-safe, usually
> because they stash a string in global memory without locking.
> They need re-definition to alloc/free the string.
>
> Worse, it's not a good definition.
>
> rpc_createerr has both clnt_stat and rpc_err, but struct rpc_err
> also has a clnt_stat (since original tirpc).  clnt_stat is not
> consistently set properly, as it is in two places.  So the error
> checking code is often wrong.
>
> I'd like to get rid of the whole mess, but that means every client
> create would have new semantics.  Fortunately, there aren't many
> (in Ganesha).  Plus we already have new definitions -- all named
> *_ncreate with a tirpc_compat.h to munge them back.
>
> But should we do it now?  Or in 2.7?  We've been living with it for
> years, although recent code changes have made it worse.  Again, it
> only happens on errors.  Especially for RPC bind.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] libntirpc thread local storage

2017-12-09 Thread William Allen Simpson

I've run into another TLS problem.  It's been there since tirpc.

Apparently, once upon a time, rpc_createerr was a static global.
It still says that in the man pages.

When a client create function fails, they stash the error there,
and return NULL for the CLIENT.  Basically, you check for NULL,
and then check rpc_createerr

This is also used extensively by the RPC bind code.

Then, they made it a keyed thread local to try and salvage it
without a major code re-write.

With async threads, that's not working.  We've got memory leaks.
AFAICT, only on errors.  But accessing them on a different
thread gives the wrong error code (or none at all).  Not good.

All the functions that use it are still not MT-safe, usually
because they stash a string in global memory without locking.
They need re-definition to alloc/free the string.

Worse, it's not a good definition.

rpc_createerr has both clnt_stat and rpc_err, but struct rpc_err
also has a clnt_stat (since original tirpc).  clnt_stat is not
consistently set properly, as it is in two places.  So the error
checking code is often wrong.

I'd like to get rid of the whole mess, but that means every client
create would have new semantics.  Fortunately, there aren't many
(in Ganesha).  Plus we already have new definitions -- all named
*_ncreate with a tirpc_compat.h to munge them back.

But should we do it now?  Or in 2.7?  We've been living with it for
years, although recent code changes have made it worse.  Again, it
only happens on errors.  Especially for RPC bind.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Stacked FSALs and fsal_export parameters and op_ctx

2017-12-09 Thread William Allen Simpson

On 12/8/17 10:13 AM, Matt Benjamin wrote:

I'd like to see this use of TLS as a "hidden parameter" replaced
regardless.  It has been a source of bugs, and locks us into a
pthreads execution model I think needlessly.


With future async FSAL calls, it's going to stop working.

We already have a svc_req and now clnt_req.  That's what will be
returned on the new thread.  So it behooves us to stash the op_ctx
pointer in them (as a void *).

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel