Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-03 Thread Matt Benjamin
Hi Frank,

On Tue, Apr 3, 2018 at 11:33 AM, Frank Filz <ffilz...@mindspring.com> wrote:
> Thanks for the explanation. You are observing in practice something I 
> considered in theory...
>
> I like the idea of demoting entries when entries > entries_hiwat, Matt, 
> Daniel, do you see any negative side effects to that?

It sounds reasonable to me.  I don't remember anything that would make
it problematic.

>
> About the open fds, my thinking is shifting to not keeping an fd open for 
> getattr/settattr. If the global fd is already open use it, otherwise, just 
> open a temp fd for the operation. With NFS v4 clients, that will virtually 
> eliminate global fd usage and for V3 clients will mean the global fd is only 
> open for files the client is doing I/O on.

That sounds like a great idea.

Matt

>
> Frank
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfstest_delegation

2018-03-28 Thread Matt Benjamin
In order to test delegations, Ganesha needs to be in a FSAL
configuration that supports them.

To the best of my understanding, delegations are enabled by default
for GPFS, which until recently was the only FSAL which supported
delegations.  Jeff Layton recently added delegation support for CEPH.
You need a recent Ceph to use them, due to a libcephfs API dependency,
iiuc.

Matt

On Wed, Mar 28, 2018 at 7:26 AM, Malahal Naineni <mala...@gmail.com> wrote:
> Yes, it should be default if the code is stable!
>
> regards, malahal.
>
> On Wed, Mar 28, 2018 at 3:49 PM, William Allen Simpson
> <william.allen.simp...@gmail.com> wrote:
>>
>> I see that Patrice hasn't posted here about this problem yet.
>>
>> Linux client folks say our V2.7-dev delegations aren't working.
>>
>> At this week's bake-a-thon, Patrice has tried turning it on a
>> couple of different ways.  Shouldn't delegations be on by default?
>>
>> Could we get the nfstest suite added to CI?
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping profile

2018-03-25 Thread Matt Benjamin
With N=10 and num_calls=100, on Lemon, test_rbt averages 2.8M
reqs/s.  That's about half the rate when N=1, which I think is
expected.  If this is really an available rbt in-order
search-remove-insert retire rate when N is 10, my intuition would
be it's sufficiently fast not to be the bottleneck your result claims,
and I think it's necessary to understand why.

Matt

On Sun, Mar 25, 2018 at 6:17 PM, Matt Benjamin <mbenj...@redhat.com> wrote:
> 1 What is the peak outstanding size of outstanding calls
>
> 1.1 if e.g. > 100k is that correct: as last week, why would a sensible
> client issue more than e.g. 1000 calls without seeing replies?
>
> 1.3 if outstanding calls is <= 1, why can test_rbt retire millions of
> duty cycles / s in that scenario?
>
> 2 what does the search workload look like when replies are mixed with calls?
> Ie bidirectional rpc this is intended for?
>
> 2.2 Hint: xid dist is not generally sorted;  client defines only its own
> issue order, not reply order nor peer xids;  why is it safe to base reply
> matching around xids being in sorted order?
>
> Matt
>
> On Sun, Mar 25, 2018, 1:40 PM William Allen Simpson
> <william.allen.simp...@gmail.com> wrote:
>>
>> On 3/24/18 7:50 AM, William Allen Simpson wrote:
>> > Noting that the top problem is exactly my prediction by knowledge of
>> > the code:
>> >clnt_req_callback() opr_rbtree_insert()
>> >
>> > The second is also exactly as expected:
>> >
>> >svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf()
>> >
>> > These are both inserted in ascending order, sorted in ascending order,
>> > and removed in ascending order
>> >
>> > QED: rb_tree is a poor data structure for this purpose.
>>
>> I've replaced those 2 rbtrees with TAILQ, so that we are not
>> spending 49% of the time there anymore, and am now seeing:
>>
>> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
>> program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287
>> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
>> program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817
>>
>> This is probably good enough for now.  Time to move on to
>> more interesting things.
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping profile

2018-03-25 Thread Matt Benjamin
1 What is the peak outstanding size of outstanding calls

1.1 if e.g. > 100k is that correct: as last week, why would a sensible
client issue more than e.g. 1000 calls without seeing replies?

1.3 if outstanding calls is <= 1, why can test_rbt retire millions of
duty cycles / s in that scenario?

2 what does the search workload look like when replies are mixed with
calls?  Ie bidirectional rpc this is intended for?

2.2 Hint: xid dist is not generally sorted;  client defines only its own
issue order, not reply order nor peer xids;  why is it safe to base reply
matching around xids being in sorted order?

Matt

On Sun, Mar 25, 2018, 1:40 PM William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 3/24/18 7:50 AM, William Allen Simpson wrote:
> > Noting that the top problem is exactly my prediction by knowledge of
> > the code:
> >clnt_req_callback() opr_rbtree_insert()
> >
> > The second is also exactly as expected:
> >
> >svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf()
> >
> > These are both inserted in ascending order, sorted in ascending order,
> > and removed in ascending order
> >
> > QED: rb_tree is a poor data structure for this purpose.
>
> I've replaced those 2 rbtrees with TAILQ, so that we are not
> spending 49% of the time there anymore, and am now seeing:
>
> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
> program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287
> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
> program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817
>
> This is probably good enough for now.  Time to move on to
> more interesting things.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping

2018-03-14 Thread Matt Benjamin
Hi Bill,

I was not (not intentionally, and, I think, not at all) imputing
sentiment to Daniel nor myself.  I was paraphrasing statements Daniel
made not only informally to me, but publically in this week's
nfs-ganesha call, in which you were present.

I'm not denigrating your work, but I did dispute your conclusion on
the performance of rbtree in the client in a specific scenario, for
which I provided detailed measurement and a program for reproducing
them.  I did also challenge the unambiguous claim by you that
something about my use of rbtree in ntirpc and DRC was an
embarrassment to computing science, but did not use inappropriate
language or imputations (fighting words) in doing so.  THAT appears to
be a denigration of my work by you, not the other way around.  There
were other examples in that email, but it was the most glaring.  I
could go into that pattern further, but I don't think I can or should
on a public mailing list.  I won't post further on this or similar
threads.

Matt

On Wed, Mar 14, 2018 at 4:27 PM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 3/14/18 7:27 AM, Matt Benjamin wrote:
>>
>> Daniel doesn't think you've measured much accurately yet, but at least
>> the effort (if not the discussion) aims to.
>>
> I'm sure Daniel can speak for himself.  At your time of writing,
> Daniel had not yet arrived in the office after my post this am.
>
> So I'm assuming you're speculating.  Or denigrating my work and
> attributing that sentiment to Daniel.  I'd appreciate you cease
> doing that.
>
> I've done my best with the Tigran's code design that you held onto
> for 6 years without putting it into the tree or keeping it up-to-date.
>
> At this time, there's no indication any numbers are in error.
>
> If you have quantitative information, please provide it.



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping

2018-03-14 Thread Matt Benjamin
Daniel doesn't think you've measured much accurately yet, but at least
the effort (if not the discussion) aims to.

On Wed, Mar 14, 2018 at 2:54 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:

Matt

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping

2018-03-13 Thread Matt Benjamin
On Tue, Mar 13, 2018 at 2:38 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 3/12/18 6:25 PM, Matt Benjamin wrote:
>>
>> If I understand correctly, we always insert records in xid order, and
>> xid is monotonically increasing by 1.  I guess pings might come back
>> in any order,
>
>
> No, they always come back in order.  This is TCP.  I've gone to some
> lengths to fix the problem that operations were being executed in
> arbitrary order.  (As was reported in the past.)

We're aware of the issues with former req queuing.  It was one of my
top priorities to fix in napalm, and we did it.

>
> For UDP, there is always the possibility of loss or re-ordering of
> datagrams, one of the reasons for switching to TCP in NFSv3 (and
> eliminating UDP in NFSv4).
>
> Threads can still block in apparently random order, because of
> timing variances inside FSAL calls.  Should not be an issue here.
>
>
>> but if we assume xids retire in xid order also,
>
>
> They do.  Should be no variance.  Eliminating the dupreq caching --
> also using the rbtree -- significantly improved the timing.

It's certainly correct not to cache, but it's also a special case that
arises from...benchmarking with rpcping, not NFS.
Same goes for retire order.  Who said, let's assume the rpcping
requests retire in order?  Oh yes, me above.  Do you think NFS
requests in general are required to retire in arrival order?  No, of
course not.  What workload is the general case for the DRC?  NFS.

>
> Apparently picked the worst tree choice for this data, according to
> computer science. If all you have is a hammer

What motivates you to write this stuff?

Here are two facts you may have overlooked:

1. The DRC has a constant insert-delete workload, and for this
application, IIRC, I put the last inserted entries directly into the
cache.  This both applies standard art on trees (rbtree vs avl
perfomance on insert/delete heavy workloads, and ostensibly avoids
searching the tree in the common case;  I measured hitrate informally,
looked to be working).

2. the key in the DRC caches is hk,not xid.

>
>
>> and keep
>> a window of 1 records in-tree, that seems maybe like a reasonable
>> starting point for measuring this?
>> I've not tried 10,000 or 100,000 recently.  (The original code
>
> default sent 100,000.)
>
> I've not recorded how many remain in-tree during the run.
>
> In my measurements, using the new CLNT_CALL_BACK(), the client thread
> starts sending a stream of pings.  In every case, it peaks at a
> relatively stable rate.
>
> For 1,000, <4,000/s.  For 100, 40,000/s.  Fairly linear relationship.
>
> By running multiple threads, I showed that each individual thread ran
> roughly the same (on average).  But there is some variance per run.
>
> I only posted the 5 thread results, lowest and highest achieved.
>
> My original message had up to 200 threads and 4 results, but I decided
> such a long series was overkill, so removed them before sending.
>
> That 4,000 and 40,000 per client thread was stable across all runs.
>
>
>> I wrote a gtest program (gerrit) that I think does the above in a
>> single thread, no locks, for 1M cycles (search, remove, insert).  On
>> lemon, compiled at O2, the gtest profiling says the test finishes in
>> less than 150ms (I saw as low as 124).  That's over 6M cycles/s, I
>> think.
>>
> What have you compared it to?  Need a gtest of avl and tailq with the
> same data.  That's what the papers I looked at do

The point is, that is very low latency, a lot less than I expected.
It's probably minimized from CPU caching and so forth, but it tries to
address the more basic question, is expected or unexpected latency
from searching the rb tree a likely contributor to overall latency?
If we get 2M retires per sec (let alone 6-7), is that a likely
supposition?

The rb tree either is, or isn't a major contributor to latency.  We'll
ditch it if it is.  Substituting a tailq (linear search) seems an
unlikely choice, but if you can prove your case with the numbers, no
one's going to object.

Matt

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping

2018-03-12 Thread Matt Benjamin
That's certainly suggestive.

I found it hard to believe the red-black tree performance could be
that bad, at a loading of 10K items--even inserting, searching, and
removing in-order.  Then again, I never benchmarked the opr_rbtree
code.

If I understand correctly, we always insert records in xid order, and
xid is monotonically increasing by 1.  I guess pings might come back
in any order, but if we assume xids retire in xid order also, and keep
a window of 1 records in-tree, that seems maybe like a reasonable
starting point for measuring this?

I wrote a gtest program (gerrit) that I think does the above in a
single thread, no locks, for 1M cycles (search, remove, insert).  On
lemon, compiled at O2, the gtest profiling says the test finishes in
less than 150ms (I saw as low as 124).  That's over 6M cycles/s, I
think.

Matt

Matt

On Mon, Mar 12, 2018 at 4:06 PM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> [These are with a Ganesha that doesn't dupreq cache the null operation.]
>
> Just how slow is this RB tree?
>
> Here's a comparison of 1000 entries versus 100 entries in ops per second:
>
> rpcping tcp localhost threads=5 count=1000 (port=2049 program=13
> version=3 procedure=0): average 2963.2517, total 14816.2587
> rpcping tcp localhost threads=5 count=1000 (port=2049 program=13
> version=3 procedure=0): average 3999.0897, total 19995.4486
>
> rpcping tcp localhost threads=5 count=100 (port=2049 program=13
> version=3 procedure=0): average 39738.1842, total 198690.9208
> rpcping tcp localhost threads=5 count=100 (port=2049 program=13
> version=3 procedure=0): average 39913.1032, total 199565.5161



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] zero-copy read

2018-03-11 Thread Matt Benjamin
No, you won't.

Matt

On Sun, Mar 11, 2018 at 7:15 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 3/10/18 11:18 AM, Matt Benjamin wrote:
>>
>> Marcus has code that prototypes using gss_iov from mit-krb5 1.1.12.  I
>> recall describing this to you in 2013.
>>
> That would be surprising, as I didn't start working on this project
> until a year or so later than that
>
> Anyway, last year Marcus sent me a link to his prototype.  It's
> hopelessly out of date by now.  I'll need to start over.



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] zero-copy read

2018-03-10 Thread Matt Benjamin
Marcus has code that prototypes using gss_iov from mit-krb5 1.1.12.  I
recall describing this to you in 2013.

On Sat, Mar 10, 2018 at 11:12 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 3/10/18 10:24 AM, William Allen Simpson wrote:
>
> But as I delved deeper, I'll have to make GSS work on vector i-o, as
> it currently requires one big buffer input.  This will be awhile.
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] zero-copy read

2018-03-10 Thread Matt Benjamin
Hi Bill,

On Sat, Mar 10, 2018 at 10:24 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> Now that DanG has a workable vector i-o for read and write, I'm
> trying again to make reading zero-copy.  Man-oh-man, do we have
> our work cut out for us
>
> It seems that currently we provide a buffer to read.  Then XDR
> makes a new object, puts headers into it, makes another data_val
> and copies data into that, then it is all eventually passed to
> ntirpc where a buffer is created and all copied into that.
>
> If GSS, another copy is made.  (This one cannot be avoided.)
>
> So we're copying large amounts of data 4-5 times.   Not counting
> whatever the FSAL library call does internally.

Sounds like you have lots to work on.

>
> Then there is NFS_LOOKAHEAD_READ, and a nfs_request_lookahead.
> Could somebody explain what that is doing?

These are just tracking the ops being decoded, currently to help
decide if reqs should be cached.  It copies no data.

>
> AFAICT, the only test is in dup_req, and it won't keep the dup_req
> "because large, though idempotent".  Isn't a large read exactly
> where we'd benefit from holding onto a dup_req?

Basically, no.  The purpose of the DRC is to identify requests that
change the state of files/objects (are non-idempotent), and to satisfy
a client's retry of such a request, without re-executing it.  The goal
is not to be a data cache.

>
> NFS_LOOKAHEAD_HIGH_LATENCY is never used.

It's a constant.

>
> There are a lot of XDR tests for x_public having the pointer to
> nfs_request_lookahead, yet setting that pointer is one of the
> early things in nfs_worker_thread.c nfs_rpc_process_request().

It sets the structure in x_public so that it can be accessed from
within XDR decoders.

>
> Finally, and what I'll do this weekend, my attempt to edit
> xdr_nfs23.c won't pass checkpatch commit, because all the headers
> are still pre-1989 pre-ANSI K
>
> Unfortunately, Red Hat Linux doesn't seem to have cproto built-in,
> even though it's on the usual https://linux.die.net/man/1/cproto.
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] review request https://review.gerrithub.io/#/c/390652/

2018-03-09 Thread Matt Benjamin
Hi Satya,

To reply, to a reply on the top level (can even be blank), all your
inline comments will publish then.

Matt

On Fri, Mar 9, 2018 at 11:21 AM, Satya Prakash GS
<g.satyaprak...@gmail.com> wrote:
> I had replied to the comments on the same day Matt posted. My replies show
> as drafts, looks like I have to publish them. I don't see a publish button
> either. Can you guys help me out.
>
> Thanks,
> Satya.
>
> On 9 Mar 2018 20:48, "Frank Filz" <ffilz...@mindspring.com> wrote:
>>
>> Matt had called for additional discussion on this, so let's get that
>> discussion going.
>>
>> Could you address Matt's questions?
>>
>> Frank
>>
>> > -Original Message-
>> > From: Satya Prakash GS [mailto:g.satyaprak...@gmail.com]
>> > Sent: Friday, March 9, 2018 4:17 AM
>> > To: nfs-ganesha-devel@lists.sourceforge.net
>> > Cc: Malahal Naineni <mala...@gmail.com>; Frank Filz
>> > <ffilz...@mindspring.com>
>> > Subject: review request https://review.gerrithub.io/#/c/390652/
>> >
>> > Can somebody please review this change :
>> > https://review.gerrithub.io/#/c/390652/
>> >
>> > It addresses this issue :
>> >
>> > Leak in DRC when client disconnects nfs_dupreq_finish doesn't call
>> > put_drc
>> > always. It does only if it meets certain criteria (drc_should_retire).
>> > This can leak
>> > the drc and the dupreq entries within it when the client disconnects.
>> > More
>> > information can be found here : https://sourceforge.net/p/nfs-
>> > ganesha/mailman/message/35815930/
>> >
>> > 
>> >
>> > Main idea behind the change.
>> >
>> > Introduced a new drc queue which holds all the active drc objects
>> > (tcp_drc_q in
>> > drc_st).
>> > Every new drc is added to tcp_drc_q initially. Eventually it is moved to
>> > tcp_drc_recycle_q. Drcs are freed from tcp_drc_recycle_q. Every drc is
>> > either in
>> > the active drc queue or in the recycle queue.
>> >
>> > DRC Refcount and transition from active drc to recycle queue :
>> >
>> > Drc refcnt is initialized to 2. In dupreq_start, increment the drc
>> > refcount. In
>> > dupreq_rele, decrement the drc refcnt. Drc refcnt is also decremented in
>> > nfs_rpc_free_user_data. When drc refcnt goes to 0 and drc is found not
>> > in use
>> > for 10 minutes, pick it up and free the entries in iterations of 32
>> > items at at time.
>> > Once the dupreq entries goes to 0, remove the drc from tcp_drc_q and add
>> > it to
>> > tcp_drc_recycle_q. Today, entries added to tcp_drc_recycle_q are cleaned
>> > up
>> > periodically. Same logic should clean up these entries too.
>> >
>> > Thanks,
>> > Satya.
>>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Multiprotocol support in ganesha

2018-03-06 Thread Matt Benjamin
Basically, this.  For the sake of discussion, is Daniel's take on SMB
integration directly in nfs-ganesha what you're thinking of, or were
you ireferring to co-export of Linux filesystems with Samba or some
other Linux-integrated SMB stack?

Matt

On Tue, Mar 6, 2018 at 12:29 PM, Daniel Gryniewicz <d...@redhat.com> wrote:
> Ganesha has multi-protocol (NFS3, NFS4, and 9P).  There are no plans to add
> CIFS, since that is an insanely complicated protocol, and has a userspace
> daemon implementation already (in the form of Samba).  I personally wouldn't
> reject such support if it was offered, but as far as I know, no one is even
> thinking about working on it.
>
> Daniel
>
>
> On 03/06/2018 12:20 PM, Pradeep wrote:
>>
>> Hello,
>>
>> Is there plans to implement multiprotocol (NFS and CIFS accessing same
>> export/share) in ganesha? I believe current FD cache will need changes to
>> support that.
>>
>> Thanks,
>> Pradeep
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance

2018-02-21 Thread Matt Benjamin
Hi Bill,

Let's just talk about the rpc-ping-mt/request minimal latency.

On Wed, Feb 21, 2018 at 8:24 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:

>>
>> It measures minimal request latency all the way to nfs-ganesha's, or
>> the Linux kernel's, rpcnull handler--the "top" of the request handling
>> stack, in the given client/server/network configuration.  Scaled up,
>> it can report the max such calls/s, which is a kind of best-possible
>> value for iops, taking FSAL ops to have 0 latency.
>>
> As I've mentioned elsewhere, this should be entirely dominated by the
> link speed and protocol.  We should see UDP as slowest, TCP in the
> middle, and RDMA as fastest.

I think the point is, the "should" assumes perfect behavior from the
dispatch and (minimal) request execution path within nfs-ganesha, does
it not?  In other words, profiling of rpc null is effectively
profiling the first stage of the request pipeline, and if it's not
fast, more interesting ops won't be.

>
> OTOH, the "max such calls/s" would be reported by using XDR_RAW, which
> is currently not working.
>

My intended meaning, and Daniel's when he articulated the same point
Friday, is that "max such calls/s" -is- meaningful for transports NFS
actually supports, and those are the ones we can compare with other
implementations.  I think the raw transport opens up some interesting
paths, but it seems like they would involve more development.

Matt


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance

2018-02-20 Thread Matt Benjamin
Hi,

On Tue, Feb 20, 2018 at 8:12 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 2/18/18 2:47 PM, Matt Benjamin wrote:
>>
>> On Fri, Feb 16, 2018 at 11:23 AM, William Allen Simpson
>>>
>>> But the planned 2.7 improvements are mostly throughput related, not IOPS.
>>
>>
>> Not at all, though I am trying to ensure that we get async FSAL ops
>> in.  There are people working on IOPs too.
>>
> async FSAL ops are not likely to further improve IOPS.
>
> As I've pointed out many times in the past, async only allows
> the same number of threads to handle more concurrent operations.
>
> But it's actually slower.  It basically doubles the number of
> system calls.  System calls are one of the reasons that Ganesha is
> slower than knfsd.  It also ruins CPU cache coherency.
>
> If we're CPU bound or network bandwidth constrained, it won't help.

Not everything being worked on is a direct answer to this problem.

>
> The effort that I've been waiting for *3* years -- add io vectors to
> the FSAL interface, so that we can zero-copy buffers -- is more
> likely to improve throughput.

Which, as you've noted, also is happening.

>
> Moreover, going through the code and removing locking other such
> bottlenecks is more likely to improve IOPS.

No one is disputing this.  It is not a new discovery, however.

>
>
>>> If Ganesha is adding 6 ms to every read operation, we have a serious
>>> problem, and need to profile immediately!
>>>
>>
>> That's kind of what our team is doing.  I look forward to your work
>> with rpc-ping-mt.
>>
> Well, you were able to send me a copy after 6 pm on a Friday, so I'm
> now taking a look at it.  Hopefully I'll have something by Friday.

It came up in a meeting on Friday, that's why I sent it Friday.  I got
busy in meetings and issues, that's why after 6 pm.

>
> But I really wish you'd posted it 3 years ago.  It doesn't really test
> IOPS, other than whatever bandwidth limit is imposed by the interface,
> but it does test the client call interface.

It measures minimal request latency all the way to nfs-ganesha's, or
the Linux kernel's, rpcnull handler--the "top" of the request handling
stack, in the given client/server/network configuration.  Scaled up,
it can report the max such calls/s, which is a kind of best-possible
value for iops, taking FSAL ops to have 0 latency.

It was posted to this list by Tigran, iirc, in 2011 or 2012.

>
> We've been using Jeff Layton's delegation callback work to test, and
> this test would have been better and easier.
>
> But a unit test is not what we need.  I wrote "profile".  We need to
> know where the CPU bottlenecks are in Ganesha itself.

You also wrote unit test.

Matt

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance

2018-02-19 Thread Matt Benjamin
On Fri, Feb 16, 2018 at 11:23 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:

>>
> Actually, 2.6 should handle as many concurrent client requests as you like.
> (Up to 250 of them.)  That's one of its features.
>
> The client is not sending concurrent requests.

This seems worth digging into.  That's my understanding of 2.6
dispatch too, anyway.

>
>>
> But the planned 2.7 improvements are mostly throughput related, not IOPS.

Not at all, though I am trying to ensure that we get async FSAL ops
in.  There are people working on IOPs too.

>
>
> If Ganesha is adding 6 ms to every read operation, we have a serious
> problem, and need to profile immediately!
>

That's kind of what our team is doing.  I look forward to your work
with rpc-ping-mt.

Matt


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs ganesha vs nfs kernel performance

2018-02-12 Thread Matt Benjamin
Hi Deepak,

There's known knowns and unknown knowns related to nfs-ganesha
performance, including lots of ongoing work.

One of the first things I think we'd like to know is...what version of
the software you're testing.

Matt

On Mon, Feb 12, 2018 at 7:01 PM, Deepak Jagtap <deepak.jag...@maxta.com> wrote:
> Hey Guys,
>
>
> I ran few performance tests to compare nfs gansha and nfs kernel server and
> noticed significant difference.
>
>
> Please find my test result:
>
>
> SSD formated with EXT3 exported using nfs ganesha  : ~18K IOPSAvg
> latency: ~8ms   Throughput: ~60MBPS
>
> same directory exported using nfs kernel server: ~75K IOPS
> Avg latency: ~0.8msThroughput: ~300MBPS
>
>
> nfs kernel and nfs ganesha both of them are configured with 128 worker
> threads. nfs ganesha is configured with VFS FSAL.
>
>
> Am I missing something major in nfs ganesha config or this is expected
> behavior.
>
> Appreciate any inputs as how the performance can be improved for nfs
> ganesha.
>
>
>
> Please find following ganesha config file that I am using:
>
>
> NFS_Core_Param
> {
> Nb_Worker = 128 ;
> }
>
> EXPORT
> {
> # Export Id (mandatory, each EXPORT must have a unique Export_Id)
>Export_Id = 77;
># Exported path (mandatory)
>Path = /host/test;
>Protocols = 3;
># Pseudo Path (required for NFS v4)
>Pseudo = /host/test;
># Required for access (default is None)
># Could use CLIENT blocks instead
>Access_Type = RW;
># Exporting FSAL
>FSAL {
> Name = VFS;
>}
>CLIENT
>{
> Clients = *;
> Squash = None;
> Access_Type = RW;
>}
> }
>
>
>
> Thanks & Regards,
>
> Deepak
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Correct initialization sequence

2018-01-30 Thread Matt Benjamin
reordering, I hope

Matt

On Tue, Jan 30, 2018 at 1:40 PM, Pradeep <pradeeptho...@gmail.com> wrote:
> Hello,
>
> It is possible to receive requests anytime after nfs_Init_svc() is
> completed. We initialize several things in nfs_Init() after this. This could
> lead to processing of incoming requests racing with the rest of
> initialization (ex: dupreq2_pkginit()). Is it possible to re-order
> nfs_Init_svc() so that rest of ganesha is ready to process requests as soon
> as we start listing on the NFS port? Another way is to return NFS4ERR_DELAY
> until 'nfs_init.init_complete' is true. Any thoughts?
>
>
> Thanks,
> Pradeep
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Is there a field in the SVCXPRT Ganesha can use

2018-01-30 Thread Matt Benjamin
Restating this, the code is unified for tcp v4 and v3.  It consults
xid and the checksum, which I think is a correct intent.  The
slot-reply cache in 4.1+ is, we hope, the improved mechanism.

Matt

On Tue, Jan 30, 2018 at 9:45 AM, Matt Benjamin <mbenj...@redhat.com> wrote:
> I don't think that's the case.  the behavior is distinct for v4 and v3
> (which does not have clientids).  DRC is bypassed for v4.1+
>
> Matt
>
> On Tue, Jan 30, 2018 at 9:36 AM, William Allen Simpson
> <william.allen.simp...@gmail.com> wrote:
>> On 1/30/18 9:22 AM, William Allen Simpson wrote:
>>>
>>> On 1/29/18 3:32 PM, Frank Filz wrote:
>>>>
>>>> I haven't looked at how the SVCXPRT structure has changed, but if there's
>>>> a
>>>> field in there we can attach a Ganesha structure to that would be cool,
>>>> or
>>>> if not, if we could add one.
>>>>
>>> There are two: xp_u1, and xp_u2.
>>>
>>> Right now, Ganesha is using xp_u2 for dup request cache pointers.
>>>
>>> But I've eliminated all old usage of xp_u1 in V2.6.
>>
>>
>> Looking at src/RPCAL/nfs_dupreq.c, I'm not sure why that doesn't already
>> have a client or export reference there.  It seems we'll return the
>> duplicate data to any client that happens to use the same xid.  Seems
>> like a bug
>>
>> But the code is obscure, so I could be missing something.
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Is there a field in the SVCXPRT Ganesha can use

2018-01-30 Thread Matt Benjamin
innovations?  ok.  there's other innovation needed in the gss impl,
that would be welcome as well

Matt

On Tue, Jan 30, 2018 at 9:50 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 1/30/18 9:36 AM, William Allen Simpson wrote:
>>
>> But the code is obscure, so I could be missing something.
>
>
> Also, it bears repeating that the dupreq cache wasn't working for
> secure connections.  Pre-V2.6 checksummed the ciphertext, which is by
> definition different on every request.  We'd never see duplicates.
>
> One of the innovations in V2.6 (ntirpc 1.6) is the checksum is of the
> plaintext.  So duplicate requests will be detected.
>
> I'm not sure how often we have duplicate requests, but it should be
> working now.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Is there a field in the SVCXPRT Ganesha can use

2018-01-30 Thread Matt Benjamin
I don't think that's the case.  the behavior is distinct for v4 and v3
(which does not have clientids).  DRC is bypassed for v4.1+

Matt

On Tue, Jan 30, 2018 at 9:36 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 1/30/18 9:22 AM, William Allen Simpson wrote:
>>
>> On 1/29/18 3:32 PM, Frank Filz wrote:
>>>
>>> I haven't looked at how the SVCXPRT structure has changed, but if there's
>>> a
>>> field in there we can attach a Ganesha structure to that would be cool,
>>> or
>>> if not, if we could add one.
>>>
>> There are two: xp_u1, and xp_u2.
>>
>> Right now, Ganesha is using xp_u2 for dup request cache pointers.
>>
>> But I've eliminated all old usage of xp_u1 in V2.6.
>
>
> Looking at src/RPCAL/nfs_dupreq.c, I'm not sure why that doesn't already
> have a client or export reference there.  It seems we'll return the
> duplicate data to any client that happens to use the same xid.  Seems
> like a bug
>
> But the code is obscure, so I could be missing something.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Is there a field in the SVCXPRT Ganesha can use

2018-01-30 Thread Matt Benjamin
That sounds right, I figured we could reclaim xp_u1.

Matt

On Tue, Jan 30, 2018 at 9:22 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 1/29/18 3:32 PM, Frank Filz wrote:
>>
>> I haven't looked at how the SVCXPRT structure has changed, but if there's
>> a
>> field in there we can attach a Ganesha structure to that would be cool, or
>> if not, if we could add one.
>>
> There are two: xp_u1, and xp_u2.
>
> Right now, Ganesha is using xp_u2 for dup request cache pointers.
>
> But I've eliminated all old usage of xp_u1 in V2.6.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-26 Thread Matt Benjamin
Yes, I wasn't claiming there is anything missing.  Before 2.6, there
was a rearm method being called.

Matt

On Fri, Jan 26, 2018 at 9:20 AM, Daniel Gryniewicz <d...@redhat.com> wrote:
> I don't think you re-arm a FD in epoll.  You arm it once, and it fires until
> you disarm it, as far as I know.  You just call epoll_wait() to get new
> events.
>
> The thread model is a bit odd;  When the epoll fires, all the events are
> found, and a thread is submitted for each one except one.  That one is
> handled in the local thread (since it's expected that most epoll triggers
> will have one event on them, thus using the current hot thread).  In
> addition, a new thread is submitted to go back and wait for events, so
> there's no delay handling new events.  So EAGAIN is handled by just
> indicating this thread is done, and returning it to the thread pool.  When
> the socket is ready again, it will trigger a new event on the thread waiting
> on the epoll.
>
> Bill, please correct me if I'm wrong.
>
> Daniel
>
>
> On 01/25/2018 09:13 PM, Matt Benjamin wrote:
>>
>> Hmm.  We used to handle that ;)
>>
>> Matt
>>
>> On Thu, Jan 25, 2018 at 9:11 PM, Pradeep <pradeeptho...@gmail.com> wrote:
>>>
>>> If recv() returns EAGAIN, then svc_vc_recv() returns without rearming the
>>> epoll_fd. How does it get back to svc_vc_recv() again?
>>>
>>> On Wed, Jan 24, 2018 at 9:26 PM, Pradeep <pradeeptho...@gmail.com> wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I seem to be hitting a corner case where ganesha (2.6-rc2) does not
>>>> respond to a RENEW request from 4.0 client. Enabled the debug logs and
>>>> noticed that NFS layer has not seen the RENEW request (I can see it in
>>>> tcpdump).
>>>>
>>>> I collected netstat output periodically and found that there is a time
>>>> window of ~60 sec where the receive buffer size remains the same. This
>>>> means
>>>> the RPC layer somehow missed a 'recv' call. Now if I enable debug on
>>>> TIRPC,
>>>> I can't reproduce the issue. Any pointers to potential races where I
>>>> could
>>>> enable selective prints would be helpful.
>>>>
>>>> svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
>>>> another thread to svc_rqst_rearm_events()? In that case if
>>>> svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events
>>>> and
>>>> complete the current receive before the other thread could call
>>>> epoll_ctl(),
>>>> right?
>>>>
>>>> Thanks,
>>>> Pradeep
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>
>>
>>
>>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-25 Thread Matt Benjamin
Hmm.  We used to handle that ;)

Matt

On Thu, Jan 25, 2018 at 9:11 PM, Pradeep <pradeeptho...@gmail.com> wrote:
> If recv() returns EAGAIN, then svc_vc_recv() returns without rearming the
> epoll_fd. How does it get back to svc_vc_recv() again?
>
> On Wed, Jan 24, 2018 at 9:26 PM, Pradeep <pradeeptho...@gmail.com> wrote:
>>
>> Hello,
>>
>> I seem to be hitting a corner case where ganesha (2.6-rc2) does not
>> respond to a RENEW request from 4.0 client. Enabled the debug logs and
>> noticed that NFS layer has not seen the RENEW request (I can see it in
>> tcpdump).
>>
>> I collected netstat output periodically and found that there is a time
>> window of ~60 sec where the receive buffer size remains the same. This means
>> the RPC layer somehow missed a 'recv' call. Now if I enable debug on TIRPC,
>> I can't reproduce the issue. Any pointers to potential races where I could
>> enable selective prints would be helpful.
>>
>> svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
>> another thread to svc_rqst_rearm_events()? In that case if
>> svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events and
>> complete the current receive before the other thread could call epoll_ctl(),
>> right?
>>
>> Thanks,
>> Pradeep
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Patches not backported to V2.5-stable

2018-01-05 Thread Matt Benjamin
s to
>> use new
>> lease_op2 interface
>>  > Ia1f36cfaa8bd49474d7c3cf103af2515edea3fce nfs: fix error handling
>> in
>> nfs_rpc_v41_single
>>  > Ie9ff2d2181e184b2dfef7b6b2b793d1f40e7bb82 V2.6-dev.8
>>  > I808ec1051b2ef58309b14f3b1fdc42d866d5c196 nfs: use
>> OPEN_DELEGATE_NONE_EXT
>> when not granting a delegation on v4.1+
>>  > Ic0c0eca52182009289634a985c2e4d1e34dcb895 Take a reference to the
>> session
>> over a v4.1+ callback
>>  > I4c6407aacb4f119fa856d0857df60b7d19791e55 nfs: fix cleanup after
>> delegrecall
>>  > Ic6fac7c66d802b25dece9aa0c3d34ac2671b8f74 nfs:
>> nfs41_complete_single ->
>> nfs41_release_single
>>  > Ic774170db363f3782754672d5ae5675603ee1a41 FSAL_PROXY : code
>> cleaning,
>> remove useless comments
>>  > I80adb7c583c2c28e57d0f75686220cb474d65367 FSAL_PROXY : storing
>> stateid
>> from background NFS server
>>  > If675f156b253247d2d0ac1f5cb82aa38e9eb83bb FSAL_RGW: Remove obsolete
>> (non-support_ex) create method
>>  > I6b049361f205e662551d382750766dddfc090652 V2.6-dev.7
>>  > I9f24cc76405ba8ddede2395abadb04660b57840b Fix RPM Release tag when
>> no
>> GANESHA_EXTRA_VERSION
>>  > I8c6da75bb3f3266fe1871143bb97f01ba383066e nfs: make delegation
>> recalls use
>> new callback infrastructure
>>  > I517ff36a9276bbda75d550a96be0574a630bc783 nfs: make the single-op
>> callback
>> helper be minorversion neutral
>>  > Ie3341c897d4b9141c91b20f634fb8aa8d58202f3 nfs: remove "free_op"
>> argument
>> from nfs_rpc_v41_single
>>  > Ib7f6b8ec0a66e753cb40744123c136e7d8153fdb Remove non-support_ex
>> FSAL open,
>> reopen, and status methods
>>  > If40959b04cfb39ef56fabf082a8aa8367bb9c7ba Remove non-support_ex
>> FSAL
>> commit method
>>  > I042a4b3352926fc5484f2aac9fcc0c8b2ff0cf14 Remove non-support_ex
>> FSAL
>> share_op method
>>  > Ic1f43f41f970e48633006d02f3f7fdf48fa80717 Remove non-support_ex
>> lock_op
>> and fso_lock_support_owner
>>  > I01405fe6b788e16f7b24f89b9ddd72ed037e623a Remove non-support_ex
>> create
>> FSAL method
>>  > I7c986c444f17c83aecd730569f105f6e649b1ca3 Remove non-support_ex
>> setattrs
>> FSAL method
>>  > I24b8c9d292d2869540df0c22d420541283f16662 Remove the
>> non-support_ex read
>> and write methods
>>  > I09e17c305ba4c90e2699251940649a91882e8c9f Remove support_ex FSAL
>> method
>>  > I069b41f0d497bbbc79fbb2b9dea1149bb3a58475 Assume support_ex in
>> FSAL/fsal_helper.c and include/fsal.h
>>  > Id87f54ffe86911604bbcf270ae095c385a04fc25 Remove share counters
>> from SAL -
>>     WARNING - delegations have been broken
>>  > I54f5a2ad56ce6feb0bcbd52950c741174c1c4b93 Replace calls to
>> state_share_anonymous_io_start with state_deleg_conflict
>>  > I3a6f021d344ddafd95f1309c8189e91a2faf9aa1 Assume support_ex in
>> NLM_SHARE/NLM_UNSHARE handling
>>  > Iab3ec0848624757c0f93944aac2a781f7e1ca601 Strip legacy state_share
>> functions used by NFS4
>>  > I580d8615ba33e71488956960c9f4bd4f553d511e Assume support_ex in
>> SAL/state_lock.c
>>  > I97528bcf148c25d4fb7509c1cd02943e6f1dcc99 Disable do_lease_op -
>> FSAL
>> lock_op method is not implemented by anyone
>>  > Ic3cf431ccb02f30774ce7d402b50a3ce642f05da Assume support_ex in
>> SAL/nfs4_state.c
>>  > I0b6b8136f47ac47b03ab1f12436a5d4a428c5f02 Assume support_ex in NFS4
>> protocol functions
>>  > Id9fc3e0c6ce76b377a55ba96d086e825c3312685 Always assume
>> support_ex in NFS3
>> protocol functions
>>  > I1505e5e606d2a360e3c58833d692aa70883fe00f Always assume
>> support_ex in 9p
>>  > I45ed9dd89fab9dcad902340021b11ee31f17dbe6 (libntirpc) update pull
>> request
>> #70 & #71
>>  > I51517b281ffca7411a6a229e07088028ca336c44 V2.6-dev.6
>>  > Ib5ce7e184a2029ff36830e8b0d59d96df3f717fa Napalm nfs_worker_thread
>> NFS_REQUEST queue
>>  > Ie6a6d625cf114091bec2e6785602154b9f2df6dc DBUS interface for
>> purging
>> idmapper cache.
>>  > Ia53c8eec07a840425877e03ec58682c42d512b34 FSAL_PROXY : add
>> verbosity in
>> EXCHANGE_ID
>>  > Icb67b2df86a6060c68a8b2c16389d9b44c8aafe5 Remove libzfswrap
>>  > Id020de3fb0d91e31d3eb81c86a2ef9f3b9097ce7 Strip FSAL_ZFS out
>>  > I2611365be9f2f342760c35c6023ee4ab02766fa9 V2.6-dev.5
>>  > Ie2aec841612e3270f2a7904fa88eda39db93c190 V2.6-dev.4a
>>  > Ibc31cf3745a5a20a4236a9a37712572d1f0f87f4 V2.6-dev-4
>>  > I6f2dfd9dc4431df6372d30e3c6510b290ec9e8de CMake - Have 'make dist'
>> generate the correct tarball name
>>  > I0428d3c316a12fc1cab750f745640a50c03a34cc FSAL_MEM - fix UP thread
>> init/cleanup
>>  > I9c0810bfb211dd133b3f33e04036b57f69ef0c4f V2.6-dev-3
>>  > I81a5e85ad5eb9935a09937d1262b959fdc7cabb9 Napalm dispatch plus plus
>>  > Icc0c7d806e2e8dcaa715d72f26a98d5f7f71c77d Revert "CMake - Have
>> 'make dist'
>> generate the correct tarball name"
>>  > I6925dc73cf930bb8cfe747baab1642164045 V2.6-dev-2
>>  > I10dc7925db271eab2bcd3f9a035ffbdfb21e2450 CMake - Have 'make dist'
>> generate the correct tarball name
>>  > I195759fd0c1394651d9bf188eb19229ebcf46f68 V2.6-dev-1
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus <https://www.avast.com/antivirus>
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> <https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel>
>>
>>
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Implement a FSAL for S3-compatible storage

2018-01-04 Thread Matt Benjamin
That would work.

The logic that the RGW FSAL uses to do all of Frank describes is in
the Ceph source rather than FSAL_RGW for the same reason.  The
strategy I take is to represent the RGW file handle as concatenated
hashes of S3/swift container name and full object name, which yields
stable handles.  For directories, cookies are also the hash of the
entry name.  Frank's whence-is-name and compute-readdir-cookie apis
were invented to support the RGW FSAL. Using them, you avoid the need
to keep an indexed representation of the S3 namespace in the FSAL (or
in my case, librgw).

Matt

On Thu, Jan 4, 2018 at 7:18 AM, DENIEL Philippe <philippe.den...@cea.fr> wrote:
> Hi Aurélien,
>
> I can provide you an alternate solution, still nfs-ganesha based. For the
> need of a project, I developed an open-source library that emulate a POSIX
> namespace using a KVS (for metadata) and an object store (for data). For
> example, you can use REDIS and RADOS. I have written a FSAL for it (it is
> not pushed in the official branch) but with no compliancy to support_ex,
> it's still using the former FSAL semantics (so it should be ported to
> support_ex). If you are interested, I can give you some pointers (the code
> is on github). You could use S3 as data storage for example. In particular,
> I had to solve the same "inode" issue that you met. This solution as very
> few impact on nfs-ganesha code (it just adds a new FSAL).
>
>  Regards
>
> Philippe
>
> On 01/03/18 19:58, Aurelien RAINONE wrote:
>
> To follow up on the development on an FSAL for S3, I have some doubts and
> questions I'd like to share.
>
> Apart from its full path, S3 doesn't have the concept of file descriptor, I
> mean, there's nothing else
>
> than the full path that I can provide to S3 in order to get attribute of
> content of a specific object.
>
> I have some doubts regarding the implementation of the S3 fsal object handle
> (s3_fsal_obj_handle).
>
>
>
> Should s3_fsal_obj_handle be very simple, for example should it only contain
> a key that maps to the full S3 filename, in an key-value store.
>
> Or on the contrary, should the handle implement a tree like structure, like
> I saw in FSAL_MEM?
>
> Or something in between, but what?
>
> Having a very simple handle has some advantages but may require some more
> frequent network calls,
>
> for example readdir won't have any kind of information about the content of
> the directory.
>
> Having a whole tree-like structure in the handle would allow to have direct
> access to directory content,
>
> but isn't that the role of ganesha cache to do that?
>
> My questions probably shows that I have problems to understand the
> responsability of my FSAL implementation
>
> regarding the cache. Who does what, what doesn't do what?
>
> Good evening,
>
> Aurélien
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] NTIRPC ENOMEM

2017-12-20 Thread Matt Benjamin
The established plan is to finish async and evaluate.

Matt

On Dec 20, 2017 5:02 AM, "William Allen Simpson" <
william.allen.simp...@gmail.com> wrote:

> DanG has raised an interesting issue about recovery from low memory.
> In Ganesha, we've been assiduously changing NULL checks to assert or
> segfault on alloc failures.  Just had a few more patches by Kaleb.
>
> Since 2013 or 2014, we've been doing the same to NTIRPC.  There are
> currently 105 mem_.*alloc locations, and almost all of them
> deliberately segfault.
>
> DanG argues that we should report the ENOMEM in an error return, or
> in the alternative return NULL in those cases, and let the caller
> decide what to do, to make the library more general.
>
> The current TI-RPC does return NULL in many cases.  Rarely reports
> ENOMEM.  And often segfaults.
>
> This would be a major reworking.  Should we do this?  If so, what is
> the target date?
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] NFS-Ganesha v2.5.2 Ceph and RGW Peformance

2017-12-14 Thread Matt Benjamin
Hi Supriti,

On Thu, Dec 14, 2017 at 4:26 PM, Supriti Singh <supriti.si...@suse.com> wrote:
> Hello all,
>
> At SUSE, we did a performance benchmarking for nfs-ganesha v2.5.2 for FSAL
> Ceph and RGW. For FSAL CEPH, we did a comparison for kernel cephfs client
> and nfs-ganesha. Please find report attached.
>
> Key points
> 1. Earlier, I shared results on ceph-devel mailing list as well. I have
> added comparison between those results and current results on page 9. In
> earlier test, I did not tune the parameters "Dispatch_Max_Reqs" and cephfs
> default pool size. That could be one reason, why ganesha performs better in
> the current test. Thanks to Malahal for pointing out the parameters.
>
> 2. For multiple clients and multiple jobs, single nfs-ganesha server
> performance degrades significantly. As we go ahead, active-active
> nfs-ganesha server or pnfs may improve performance. Any thoughts on this? I
> did stumble upon Jeff's blog:
> https://jtlayton.wordpress.com/2017/11/07/active-active-nfs-over-cephfs/ Is
> this something already in loop for 2.7?

I have reason to believe that the request queuing and throttling
behavior in versions before 2.6 is responsible for a lot of the
degradation, even with good tuning.  I'm very interested to see your
results with 2.6.  Looking forwards, yes, I think pNFS hsa the
potential to help out a lot.  Also, I believe there are available
speedups in libcephfs, but I haven't worked with it in a long time.

>
> 3. For NFS-RGW, as it supports only sync write, I was not able to use fio
> for testing. Is there any other tool that someone else has used?

The semantics of FSAL_RGW were designed as those of upload, not full
NFS semantics, so the usual benchmarks dont really work.  Im not sure
what to suggest, beyond less general benchmarking, like copying up
some large files.  I'm working on removing the restricted upload
semantics in Mimic.

>
> 4. For 2.6, I am aware that "Dispatch_Max_Reqs" goes away. But I have not
> looked into code, so please excuse if question is not to the point. Are
> those changes expected to improve performance?

Yes.  Fairness between clients should be improved.  You may want to
experiment with different (larger) values of nb_worker in 2.6.  post
2.6 we're moving to async dispatch.

>
> 5. Also, I think it would be nice to have event messages for tunable
> parameters. So that if nfs-ganesha is slowing down, because some parameters
> have reached threshold values, users can understand for the log messages. I
> am only aware of health messages, but it does not explain what could be
> wrong.
>
> Please let me know your feedback, if I missed out on some optimization or
> analysis.
>
> Thanks,
> Supriti
>
> --
> Supriti Singh
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
>
>

Matt

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] XID missing in error path for RPC AUTH failure.

2017-12-12 Thread Matt Benjamin
That sounds right, I'm uncertain whether this has regressed in the
text, or maybe in the likelihood of inlining in the new dispatch
model.  Bill?

Matt

On Wed, Dec 13, 2017 at 9:38 AM, Pradeep <pradeeptho...@gmail.com> wrote:
> Hello,
>
> When using krb5 exports, I noticed that TIRPC does not send XID in response
> - see xdr_reply_encode() for MSG_DENIED case. Looks like Linux clients can't
> decode the message and go in to an infinite loop retrying the same NFS
> operation. I tried adding XID back (like it is done for normal case) and it
> seems to have fixed the problem. Is this the right thing to do?
>
> diff --git a/src/rpc_dplx_msg.c b/src/rpc_dplx_msg.c
> index 01e5a5c..a585e8a 100644
> --- a/src/rpc_dplx_msg.c
> +++ b/src/rpc_dplx_msg.c
> @@ -194,9 +194,12 @@ xdr_reply_encode(XDR *xdrs, struct rpc_msg *dmsg)
> __warnx(TIRPC_DEBUG_FLAG_RPC_MSG,
> "%s:%u DENIED AUTH",
> __func__, __LINE__);
> -   buf = XDR_INLINE(xdrs, 2 * BYTES_PER_XDR_UNIT);
> +   buf = XDR_INLINE(xdrs, 5 * BYTES_PER_XDR_UNIT);
>
> if (buf != NULL) {
> +   IXDR_PUT_INT32(buf, dmsg->rm_xid);
> +   IXDR_PUT_ENUM(buf, dmsg->rm_direction);
> +   IXDR_PUT_ENUM(buf, dmsg->rm_reply.rp_stat);
> IXDR_PUT_ENUM(buf, rr->rj_stat);
> IXDR_PUT_ENUM(buf, rr->rj_why);
> } else if (!xdr_putenum(xdrs, rr->rj_stat)) {
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] libntirpc thread local storage

2017-12-09 Thread Matt Benjamin
I've already proposed we remove this.  No one is invested in it, I don't
think.

Matt

On Dec 10, 2017 2:38 AM, "William Allen Simpson" <
william.allen.simp...@gmail.com> wrote:

> I've run into another TLS problem.  It's been there since tirpc.
>
> Apparently, once upon a time, rpc_createerr was a static global.
> It still says that in the man pages.
>
> When a client create function fails, they stash the error there,
> and return NULL for the CLIENT.  Basically, you check for NULL,
> and then check rpc_createerr
>
> This is also used extensively by the RPC bind code.
>
> Then, they made it a keyed thread local to try and salvage it
> without a major code re-write.
>
> With async threads, that's not working.  We've got memory leaks.
> AFAICT, only on errors.  But accessing them on a different
> thread gives the wrong error code (or none at all).  Not good.
>
> All the functions that use it are still not MT-safe, usually
> because they stash a string in global memory without locking.
> They need re-definition to alloc/free the string.
>
> Worse, it's not a good definition.
>
> rpc_createerr has both clnt_stat and rpc_err, but struct rpc_err
> also has a clnt_stat (since original tirpc).  clnt_stat is not
> consistently set properly, as it is in two places.  So the error
> checking code is often wrong.
>
> I'd like to get rid of the whole mess, but that means every client
> create would have new semantics.  Fortunately, there aren't many
> (in Ganesha).  Plus we already have new definitions -- all named
> *_ncreate with a tirpc_compat.h to munge them back.
>
> But should we do it now?  Or in 2.7?  We've been living with it for
> years, although recent code changes have made it worse.  Again, it
> only happens on errors.  Especially for RPC bind.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Stacked FSALs and fsal_export parameters and op_ctx

2017-12-08 Thread Matt Benjamin
I'd like to see this use of TLS as a "hidden parameter" replaced
regardless.  It has been a source of bugs, and locks us into a
pthreads execution model I think needlessly.

Matt

On Fri, Dec 8, 2017 at 10:07 AM, Frank Filz <ffilz...@mindspring.com> wrote:
>> On 12/7/17 7:54 PM, Frank Filz wrote:
>> > Stacked FSALs often depend on op_ctx->fsal_export being set.
>> >
>> > We also have lots of FSAL methods that take the fsal_export as a
>> parameter.
>> >
>> The latter sounds better.
>>
>> Now that we know every single thread local storage access involves a hidden
>> lock/unlock sequence in glibc "magically" invoked by the linker, it would be
>> better to remove as many TLS references as possible!
>>
>> After all, too many lock/unlock are a real performance issue.
>>
>> Perhaps we should pass op_ctx as the parameter instead.
>
> I thought the lock was only to create the TLS variable, and not on every 
> reference.
>
> Frank
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Ganesha 2.5 - mdc_readdir_chunk_object :INODE :CRIT :Collision while adding dirent for .nfsFD8E

2017-11-07 Thread Matt Benjamin
cookie == 7fff == 2^31 looks maybe just a bit suspicious?

Matt

On Tue, Nov 7, 2017 at 10:14 AM, Sachin Punadikar
<punadikar.sac...@gmail.com> wrote:
> Hello,
> During tests on Ganesha 2.5, we are getting below logs with the critical
> message:
> 2017-11-03 05:30:05 : epoch 000100d3 : c40abc1pn13.gpfs.net :
> ganesha.nfsd-36297[work-226] mdcache_avl_insert_ck :INODE :WARN :Already
> existent when inserting dirent 0x3ffbe8015a60 for .nfsFD8E on
> entry=0x3ffb08019ed0 FSAL cookie=7fff, duplicated directory cookies make
> READDIR unreliable.
> 2017-11-03 05:30:05 : epoch 000100d3 : c40abc1pn13.gpfs.net :
> ganesha.nfsd-36297[work-226] mdc_readdir_chunk_object :INODE :CRIT
> :Collision while adding dirent for .nfsFD8E
>
> Would like to understand what exactly mean by FSAL cookie collision ? Does
> it mean same operation has been done by UPCALL thread ? Is the message
> really CRIT ?
> If I compare with 2.3 code (I know there is lot of change related to
> caching), there we are not throwing any CRIT message.
>
> --
> with regards,
> Sachin Punadikar
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: CLNT_CALL with clnt_req

2017-11-03 Thread Matt Benjamin
oh, come on.  not sure what needs to be done to reduce log noise, but
I'm sure we can make a dent.

Matt

On Sat, Nov 4, 2017 at 1:38 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> We already discussed this on Tuesday, October 24th.  Malahal agreed
> that a half second was good, 3 seconds was OK, 5 seconds was long.
> And Matt agreed we'd log more than 10 seconds.
>
> Obviously, you have vastly more Internet experience than I, and
> therefore are much better able to decide Internet timing parameters.
>
> Also, your time is so much more valuable than mine, and you need to
> post the weekly maintenance updates late (eastern time) on Friday to
> avoid disrupting your thought processes -- so that I have to work
> weekends, and my commits are stalled by yet another week.
>
> 
>
> Note that I'm currently taking a lot of half-week holiday over the
> next few months, so hopefully this will make it in *early* next week.
> Then maybe, just maybe, my next patch will be ready the week before
> Thanksgiving
>
> On 11/3/17 2:04 PM, GerritHub wrote:
>>
>> Frank Filz *posted comments* on this change.
>>
>> View Change <https://review.gerrithub.io/385451>
>>
>> Patch set 1:
>>
>> I'd like a review by malahal before merging this one
>>
>> I'm really not sure about the 3 second timeouts
>>
>> Can anyone test this in something resembling a real customer environment?
>>
>> To view, visit change 385451 <https://review.gerrithub.io/385451>. To
>> unsubscribe, visit settings <https://review.gerrithub.io/settings>.
>>
>> Gerrit-Project: ffilz/nfs-ganesha
>> Gerrit-Branch: next
>> Gerrit-MessageType: comment
>> Gerrit-Change-Id: I92b02eca435f4b1f6104b740c6c5b3747c380840
>> Gerrit-Change-Number: 385451
>> Gerrit-PatchSet: 1
>> Gerrit-Owner: william.allen.simp...@gmail.com
>> Gerrit-Reviewer: CEA-HPC <gerrithub-...@cea.fr>
>> Gerrit-Reviewer: Daniel Gryniewicz <d...@redhat.com>
>> Gerrit-Reviewer: Frank Filz <ffilz...@mindspring.com>
>> Gerrit-Reviewer: Gluster Community Jenkins <gerrit...@gluster.org>
>> Gerrit-Reviewer: Malahal <mala...@gmail.com>
>> Gerrit-Reviewer: openstack-ci-service+rdo-ci-cen...@redhat.com
>> Gerrit-Comment-Date: Fri, 03 Nov 2017 18:04:27 +
>> Gerrit-HasComments: No
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nlm_async retries

2017-11-01 Thread Matt Benjamin
A few people know this code.  One is Frank.  I don't immediately see
the reason for strong concern, feel free to improve.

Matt

On Wed, Nov 1, 2017 at 5:20 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> I'm flummoxed.  Who knows this code?
>
> Problem 1: the timeout is set to 10 microseconds.  Holy heck?  And
> historically, that's the maximum total wait time, so it would try at
> least three (3) times within 10 *MICRO*seconds?
>
> Probably should be milliseconds.
>
> Problem 2: there's a retry loop that sets up and tears down a TCP
> (or UDP) connection 3 times.  On top of the 3 tries in RPC itself?
>
> This looks like a lot of self-flagellation, maybe because the
> timeout above was set too short?
>
> Problem 3: this isn't really async -- nlm_send_async() actually
> runs a pthread_cond_timedwait() before returning.  That's sync!
>
> But we already have a timedwait in RPC.  And a signal.  So this
> completely duplicates the underlying RPC library code.  Why?
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] The life of tcp drc

2017-10-13 Thread Matt Benjamin
Yes, nfs_rpc_free_user_data() is the secret :)

Matt

On Fri, Oct 13, 2017 at 4:11 AM, Kinglong Mee <kinglong...@gmail.com> wrote:
> Hi Malahal,
>
> Thanks for your reply.
>
> #1. I'd like to post a patch to delete it, because I have some cleanups for 
> drc.
> #2/#3. Sorry for my missing of nfs_rpc_free_user_data(). With it, everything 
> is okay.
>
> thanks,
> Kinglong Mee
>
> On 10/13/2017 15:52, Malahal Naineni wrote:
>> #1. Looks like a bug! Lines 629 and 630 should be deleted
>> #2. See nfs_rpc_free_user_data(). It sets xp_u2 to NULL and drc ref is 
>> decremented there.
>> #3. Life time of drc should start when it is allocated in 
>> nfs_dupreq_get_drc() using alloc_tcp_drc().
>>   It can live beyond xprt's xp_u2 setting to NULL. It will live until we 
>> decide to free in drc_free_expired() using free_tcp_drc().
>>
>> Regards, Malahal.
>> PS: The comment "drc cache maintains a ref count." seems to imply that it 
>> will have a refcount for keeping it in the hash table itself. I may have 
>> kept those two lines because of that but It doesn't make sense as refcnt 
>> will never go to zero this way.
>>
>> On Thu, Oct 12, 2017 at 3:48 PM, Kinglong Mee <kinglong...@gmail.com 
>> <mailto:kinglong...@gmail.com>> wrote:
>>
>> Describes in src/RPCAL/nfs_dupreq.c,
>>
>>  * The life of tcp drc: it gets allocated when we process the first
>>  * request on the connection. It is put into rbtree (tcp_drc_recycle_t).
>>  * drc cache maintains a ref count. Every request as well as the xprt
>>  * holds a ref count. Its ref count should go to zero when the
>>  * connection's xprt gets freed (all requests should be completed on the
>>  * xprt by this time). When the ref count goes to zero, it is also put
>>  * into a recycle queue (tcp_drc_recycle_q). When a reconnection
>>  * happens, we hope to find the same drc that was used before, and the
>>  * ref count goes up again. At the same time, the drc will be removed
>>  * from the recycle queue. Only drc's with ref count zero end up in the
>>  * recycle queue. If a reconnection doesn't happen in time, the drc gets
>>  * freed by drc_free_expired() after some period of inactivety.
>>
>> Some questions about the life time of tcp drc,
>> 1. The are two references of drc for xprt in nfs_dupreq_get_drc().
>>629 /* xprt ref */
>>630 drc->refcnt = 1;
>>...
>>638 (void)nfs_dupreq_ref_drc(drc);  /* xprt 
>> ref */
>>...
>>653 req->rq_xprt->xp_u2 = (void *)drc;
>>
>>I think it's a bug. The first one needs remove. Right?
>>
>> 2. The is no place to decrease the reference of drc for xprt.
>>The xprt argument in nfs_dupreq_put_drc() is unused.
>>Should it be used to decrease the ref?
>>I think it's the right place to decrease the ref in 
>> nfs_dupreq_put_drc().
>>
>> 3. My doubts is that, the life time of drc stored in req->rq_xprt->xp_u2 
>> ?
>>Start at #1, end at #2 (req->rq_xprt->xp_u2 = NULL) ?
>>If that, the bad case is always lookup drc from tcp_drc_recycle_t.
>>
>>Otherwise, don't put the reference at #2, when to put it?
>>the bad case is the drc ref always be 1 forever, am I right?
>>
>> thanks,
>> Kinglong Mee
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net 
>> <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel 
>> <https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel>
>>
>>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposal to manage global file descriptors

2017-09-22 Thread Matt Benjamin
esource configurable independent
>> of
>> the ulimit for file descriptors, though if an FSAL is loaded that actually
>> uses file descriptors for open files should check that the ulimit is big
>> enough, it should also include the limit on state_t also. Of course it
>> will
>> be impossible to account for file descriptors used for sockets, log files,
>> config files, or random libraries that like to open files...
>
>
> Hmmm... I don't think we can do any kind of checking, if we're not going to
> use ulimit by default, since it depends on which FSALs are in use at any
> given time.  I say we either default the limits to ulimit, or just ignore
> ulimit entirely and log an appropriate error when EMFILE is returned.
>
> Daniel
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] clnt_call callbacks

2017-09-20 Thread Matt Benjamin
The 4.1 call/reply logic definitely did work, but whatever.  The call
flow is completely changed.

As far as I know, there should be no reason to queue anything above
ntirpc, after refactoring.

Matt

On Wed, Sep 20, 2017 at 10:08 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> Currently, when clnt_call() is invoked, that thread waits for the
> result to come back over the network.  There are 250 or so "fridge"
> threads during startup for this alone.
>
> I've already changed the rpc_ctx_xfer_replymsg() to be transport
> agnostic (it was clnt_vc only).  And added a result XPRT_DISPATCH to
> both NTI-RPC and NFS-Ganesha to prepare for distinguishing CALL
> from REPLY dispatching.
>
> There's already a lot of mechanism in struct _rpc_call rpc_call_t
> handling a callback.  But the nfs_rpc_submit_call() flag
> NFS_RPC_CALL_INLINE skipping the nfs_rpc_enqueue_req() doesn't
> seem to work well (or at all).
>
> Ideally, we'd never queue, and an async callback would complete
> the transaction.  We need to discuss how to tie that together.
>
> My current expectation is that various fields of the Ganesha
> rpc_call_t should be merged/replaced by fields in the NTI_RPC
> struct svc_req so that async dispatch can handle the callback.
>
> But I don't really know the callback code as well as others, so
> really would like some discussion and planning.
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2

2017-08-17 Thread Matt Benjamin
I'm confused, napalm is 2.6-dev, not 2.5.x

On Thu, Aug 17, 2017 at 5:13 PM, Frank Filz <ffilz...@mindspring.com> wrote:

> Actually, the problem was Bill’s Napalm…
>
>
>
> I was able to make sense of the manual merge for
>
>
>
> ff98ea64b6d1228443a35b2f7ceb3c61c0a0c1d1 Build libntirpc package when not
> using system ntirpc
>
>
>
> But not for
>
>
>
> 6bd32da613e26a768ac1dc4db1001395bd10c295 CMake - Have 'make dist'
> generate the correct tarball name
>
>
>
> I pushed my results to
>
>
>
> https://github.com/ffilz/nfs-ganesha/commits/V2.5-stable
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Thursday, August 17, 2017 10:39 AM
> *To:* Frank Filz <ffilz...@mindspring.com>
> *Cc:* Matt Benjamin <mbenj...@redhat.com>; Soumya Koduri <
> skod...@redhat.com>; nfs-ganesha-devel <nfs-ganesha-devel@lists.
> sourceforge.net>
>
> *Subject:* Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
>
>
>
> I did but the failover/failback code re-org looked like contributed, but I
> am not positive.
>
>
>
> On Thu, Aug 17, 2017 at 7:40 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
>
> Hmm, did you cherry pick in the original order?
>
>
>
> I’ll take a look at this later today.
>
>
>
> Frank
>
>
>
> *From:* Malahal Naineni [mailto:mala...@gmail.com]
> *Sent:* Wednesday, August 16, 2017 11:34 PM
> *To:* Matt Benjamin <mbenj...@redhat.com>
> *Cc:* Frank Filz <ffilz...@mindspring.com>; Soumya Koduri <
> skod...@redhat.com>; nfs-ganesha-devel <nfs-ganesha-devel@lists.
> sourceforge.net>
>
>
> *Subject:* Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
>
>
>
> Dan, I backported everything that was needed except the following 2 as I
> don't want to mess with cmake! Can you please quickly send ported patches?
> Appreciate your help. The latest V2.5 code is at  my personal github branch
> V2.5-stable:
>
>
>
> https://github.com/malahal/nfs-ganesha/commits/V2.5-stable
>
>
>
> The following 2 commits failed to apply:
>
>
>
> 6bd32da613e26a768ac1dc4db1001395bd10c295 CMake - Have 'make dist'
> generate the correct tarball name
>
> ff98ea64b6d1228443a35b2f7ceb3c61c0a0c1d1 Build libntirpc package when not
> using system ntirpc
>
>
>
>
>
>
>
> On Wed, Aug 16, 2017 at 10:47 PM, Matt Benjamin <mbenj...@redhat.com>
> wrote:
>
> Hi Frank,
>
> On Wed, Aug 16, 2017 at 1:11 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
> > Oh, nice.
>
> >
> >
> > Matt, what about this one?
> >
> >
> >
> > 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
>
> RHCS doesn't officially support this, but I'd say it would be nice to have.
>
> Matt
>
>
> >
> >
> >
> > Frank
> >
> >
> >
> >
> >
> > From: Malahal Naineni [mailto:mala...@gmail.com]
> > Sent: Wednesday, August 16, 2017 9:28 AM
> > To: Soumya Koduri <skod...@redhat.com>
> > Cc: Frank Filz <ffilz...@mindspring.com>; d...@redhat.com; Matt Benjamin
> > <mbenj...@redhat.com>; nfs-ganesha-devel
> > <nfs-ganesha-devel@lists.sourceforge.net>
> > Subject: Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
> >
> >
> >
> > I pushed a notes branch "refs/notes/backport" which has a note saying
> > "backport to V2.5". You should be able to fetch this special branch with
> > "git fetch origin refs/notes/*:refs/notes/*". After fetching this special
> > branch, you should do "export GIT_NOTES_REF=refs/notes/backport" in your
> > SHELL and then run the usual "git log" to see if I missed any commits you
> > are interested in.
> >
> >
> >
> > Alternatively, the following are the commits that will NOT be back
> ported.
> > Let me know if you need any of these. I will cherry pick things tomorrow
> and
> > publish the branch, if there are no comments...
> >
> >
> >
> > 00b9e0798 Revert "CMake - Have 'make dist' generate the correct tarball
> > name"
> >
> > 1b60d5df2 FSAL_MEM - fix UP thread init/cleanup
> >
> > 39119aab0 FSAL_GLUSTER: Use glfs_xreaddirplus_r for readdir
> >
> > 4b4e21ed9 Manpage - Fix installing manpages in RPM
> >
> > 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
> >
> > b862fe360 SAL: extract fs logic from nfs4_recovery
> >
> > c29114162 Napalm dispatch plus plus
> >
> > c8bc40b69 C

Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2

2017-08-16 Thread Matt Benjamin
Hi Frank,

On Wed, Aug 16, 2017 at 1:11 PM, Frank Filz <ffilz...@mindspring.com> wrote:
> Oh, nice.

>
>
> Matt, what about this one?
>
>
>
> 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified

RHCS doesn't officially support this, but I'd say it would be nice to have.

Matt

>
>
>
> Frank
>
>
>
>
>
> From: Malahal Naineni [mailto:mala...@gmail.com]
> Sent: Wednesday, August 16, 2017 9:28 AM
> To: Soumya Koduri <skod...@redhat.com>
> Cc: Frank Filz <ffilz...@mindspring.com>; d...@redhat.com; Matt Benjamin
> <mbenj...@redhat.com>; nfs-ganesha-devel
> <nfs-ganesha-devel@lists.sourceforge.net>
> Subject: Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2
>
>
>
> I pushed a notes branch "refs/notes/backport" which has a note saying
> "backport to V2.5". You should be able to fetch this special branch with
> "git fetch origin refs/notes/*:refs/notes/*". After fetching this special
> branch, you should do "export GIT_NOTES_REF=refs/notes/backport" in your
> SHELL and then run the usual "git log" to see if I missed any commits you
> are interested in.
>
>
>
> Alternatively, the following are the commits that will NOT be back ported.
> Let me know if you need any of these. I will cherry pick things tomorrow and
> publish the branch, if there are no comments...
>
>
>
> 00b9e0798 Revert "CMake - Have 'make dist' generate the correct tarball
> name"
>
> 1b60d5df2 FSAL_MEM - fix UP thread init/cleanup
>
> 39119aab0 FSAL_GLUSTER: Use glfs_xreaddirplus_r for readdir
>
> 4b4e21ed9 Manpage - Fix installing manpages in RPM
>
> 814e9cd65 FSAL_RGW: adopt new rgw_mount2 with bucket specified
>
> b862fe360 SAL: extract fs logic from nfs4_recovery
>
> c29114162 Napalm dispatch plus plus
>
> c8bc40b69 CMake - Have 'make dist' generate the correct tarball name
>
> cb787a1cf SAL: introduce new recovery backend based on rados kv store
>
> eadfc762e New (empty) sample config
>
> eb4eea134 config: add new config options for rados_kv recovery backend
>
> fbc905015 cmake: make modulized recovery backends compile as modules
>
>
>
>
>
> On Fri, Aug 11, 2017 at 8:08 AM, Soumya Koduri <skod...@redhat.com> wrote:
>
>
>> commit 7f2d461277521301a417ca368d3c7656edbfc903
>>  FSAL_GLUSTER: Reset caller_garray to NULL upon free
>>
>
> Yes
>
> On 08/09/2017 08:57 PM, Frank Filz wrote:
>
> 39119aa Soumya Koduri FSAL_GLUSTER: Use glfs_xreaddirplus_r for
> readdir
>
> Yes? No? It's sort of a new feature, but may be critical for some use cases.
> I'd rather it go into stable than end up separately backported for
> downstream.
>
>
> Right..as it is more of a new feature, wrt upstream we wanted it to be part
> of only 2.6 on wards so as not to break stable branch (in case if there are
> nit issues).
>
> But yes we may end up back-porting to downstream if we do not rebase to 2.6
> by then.
>
> Thanks,
> Soumya
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
>
> Virus-free. www.avast.com
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-15 Thread Matt Benjamin
Hi,

On Tue, Aug 15, 2017 at 8:42 AM, William Allen Simpson
 wrote:
> On 8/14/17 4:32 PM, Malahal Naineni wrote:
>>
>> Hi Matt and Bill, we were able to reproduce this crash very easily with a
>> sleep after closing "fd" . After my fix, things worked fine. The changes are
>> a lot but mostly trivial. Appreciate any high level review.
>>
> If a hackish band-aid is what you want, that's no skin off my nose.

Malahal's change is sensible and consistent with the intent of the
code.  It actually fixes the UDP nsm behavior.

>
>
>> ganesha changes (last but one commit at
>> https://github.com/ganltc/nfs-ganesha/commits/ibm2.3).
>>
>> Corresponding ntirpc commit (last commit)
>> https://github.com/ganltc/ntirpc/commits/ibm2.3
>>
>> On Mon, Aug 14, 2017 at 5:02 PM, Malahal Naineni > > wrote:
>>
>> Unfortunately, I need a fix for this issue against ganesha2.3.
>>
>> Regards, Malahal.
>>
>> On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson
>> >
>> wrote:
>>
>> I'm looking at the short-term fix I've mentioned earlier, that we
>> should
>> try TCP before UDP, but given our current code base doesn't even
>> compile,
>> I've given up until next week.
>>
> Well, my investigation should fix all versions.  After all, when the code
> already explicitly says it wants TCP:
>
> nsm_clnt = gsh_clnt_create("localhost", SM_PROG, SM_VERS, "tcp");
>
> #15 0x71d311 in nsm_connect
> /export/nfs-ganesha/src/Protocols/NLM/nsm.c:55:13
> #16 0x722ae9 in nsm_unmonitor_all
> /export/nfs-ganesha/src/Protocols/NLM/nsm.c:252:7
>
> But ntirpc executes UDP instead:
>
> #7 0x758e11c2 in clnt_dg_ncreate
> /export/nfs-ganesha/src/libntirpc/ntirpc/rpc/clnt.h:505:10
> #8 0x758e0e64 in clnt_tli_ncreate
> /export/nfs-ganesha/src/libntirpc/src/clnt_generic.c:383:8
>
> That surely looks like a serious bug to me!

Not respecting the caller's choice of transports is a bug.

Matt

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-14 Thread Matt Benjamin
Hi Malahal,

This looks like a clean solution for 2.3 and I guess 2.4, to me.

Matt

On Mon, Aug 14, 2017 at 4:32 PM, Malahal Naineni  wrote:
> Hi Matt and Bill, we were able to reproduce this crash very easily with a
> sleep after closing "fd" . After my fix, things worked fine. The changes are
> a lot but mostly trivial. Appreciate any high level review.
>
> ganesha changes (last but one commit at
> https://github.com/ganltc/nfs-ganesha/commits/ibm2.3).
>
> Corresponding ntirpc commit (last commit)
> https://github.com/ganltc/ntirpc/commits/ibm2.3
>
> On Mon, Aug 14, 2017 at 5:02 PM, Malahal Naineni  wrote:
>>
>> Unfortunately, I need a fix for this issue against ganesha2.3.
>>
>> Regards, Malahal.
>>
>> On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson
>>  wrote:
>>>
>>> On 8/13/17 11:50 PM, Malahal Naineni wrote:

  >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing
 UDP. It's a mess, and has been a mess for a long time.

 We get a file descriptor fd and then create "rec", but while destroying
 things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in
 clnt_dg_destroy() (and other places) might help fix this issue, but I am 
 not
 positive as I am not familiar with this code.

 I am also working on a blind replacement of "fd" by "struct gfd" where
 struct gfd has the "fd" as well as a "generation number". The generation
 number is incremented when ever such "fd" is created (e.g. accept() call or
 socket() call). The changes are many but they are trivial.

 Any thoughts?

>>> It's not really interesting for the current code base.  In V2.5, I've
>>> already eliminated all the various copies of fd, and every SVCXPRT is
>>> wrapped inside a dplx_rec, and they all use xp_fd, and it's in only one
>>> tree (svc_rqst).  So there's no longer any possibility of multiple
>>> generations of fd.
>>>
>>> That said, the last remaining problem is clnt_dg clnt_call, where the
>>> fd can be passed to poll() at the same time as another copy is passed to
>>> (or being removed from) epoll().  Requires a complete re-write.
>>>
>>> I'd started doing the re-write long long ago, even made the rpc_ctx
>>> transport independent (committed in V2.6/v1.6 Napalm rendezvous patch).
>>> But there are still many problems redesigning with async callbacks.
>>>
>>> I'm looking at the short-term fix I've mentioned earlier, that we should
>>> try TCP before UDP, but given our current code base doesn't even compile,
>>> I've given up until next week.
>>
>>
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-14 Thread Matt Benjamin
will do, thanks for posting

Matt

On Mon, Aug 14, 2017 at 4:32 PM, Malahal Naineni  wrote:
> Hi Matt and Bill, we were able to reproduce this crash very easily with a
> sleep after closing "fd" . After my fix, things worked fine. The changes are
> a lot but mostly trivial. Appreciate any high level review.
>
> ganesha changes (last but one commit at
> https://github.com/ganltc/nfs-ganesha/commits/ibm2.3).
>
> Corresponding ntirpc commit (last commit)
> https://github.com/ganltc/ntirpc/commits/ibm2.3
>
> On Mon, Aug 14, 2017 at 5:02 PM, Malahal Naineni  wrote:
>>
>> Unfortunately, I need a fix for this issue against ganesha2.3.
>>
>> Regards, Malahal.
>>
>> On Mon, Aug 14, 2017 at 4:18 PM, William Allen Simpson
>>  wrote:
>>>
>>> On 8/13/17 11:50 PM, Malahal Naineni wrote:

  >> That trace is the NSM clnt_dg clnt_call, the only use of outgoing
 UDP. It's a mess, and has been a mess for a long time.

 We get a file descriptor fd and then create "rec", but while destroying
 things, we close "fd" and then rpc_dplx_unref(). Re-arranging these in
 clnt_dg_destroy() (and other places) might help fix this issue, but I am 
 not
 positive as I am not familiar with this code.

 I am also working on a blind replacement of "fd" by "struct gfd" where
 struct gfd has the "fd" as well as a "generation number". The generation
 number is incremented when ever such "fd" is created (e.g. accept() call or
 socket() call). The changes are many but they are trivial.

 Any thoughts?

>>> It's not really interesting for the current code base.  In V2.5, I've
>>> already eliminated all the various copies of fd, and every SVCXPRT is
>>> wrapped inside a dplx_rec, and they all use xp_fd, and it's in only one
>>> tree (svc_rqst).  So there's no longer any possibility of multiple
>>> generations of fd.
>>>
>>> That said, the last remaining problem is clnt_dg clnt_call, where the
>>> fd can be passed to poll() at the same time as another copy is passed to
>>> (or being removed from) epoll().  Requires a complete re-write.
>>>
>>> I'd started doing the re-write long long ago, even made the rpc_ctx
>>> transport independent (committed in V2.6/v1.6 Napalm rendezvous patch).
>>> But there are still many problems redesigning with async callbacks.
>>>
>>> I'm looking at the short-term fix I've mentioned earlier, that we should
>>> try TCP before UDP, but given our current code base doesn't even compile,
>>> I've given up until next week.
>>
>>
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Matt Benjamin
It's not supposed to, as presently defined, right (scan resistence)?

Matt

On Fri, Aug 11, 2017 at 11:48 AM, Daniel Gryniewicz  wrote:
> On 08/11/2017 09:21 AM, Frank Filz wrote:
>>>
>>> That seems overkill to me.  How many strategies would we support (and
>>> test)?
>>>
>>> Part of the problem is that we've drastically changed how FDs are
>>> handled.
>>> We need to rethink how LRU should work in that context, I think.
>>
>>
>> I wonder also if taking pinning out of the equation (which moved cache
>> objects that had persistent state on them into an entirely separate queue)
>> has had an effect.
>
>
> Could be.
>
>> Hopefully those objects get quickly promoted to MRU of L1
>> (since they should have multiple NFS requests against them).
>
>
> Hmmm... This raises an interesting point.  Yes, more operations should
> happen, but the primary ref for the handle (taken by NFS4_OP_PUTFH) will be
> once per compound, not once per op.  So it would take multiple compounds to
> advance to the MRU of L1.  Not a problem for multiple reads or writes, but
> if a file is opened and read/written once, and then left alone, it won't
> advance to the MRU of L1.
>
> Daniel
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Matt Benjamin
initially, just a couple--but the strategizing step forces an internal
api to develop.

Matt

On Fri, Aug 11, 2017 at 8:49 AM, Daniel Gryniewicz <d...@redhat.com> wrote:
> That seems overkill to me.  How many strategies would we support (and test)?
>
> Part of the problem is that we've drastically changed how FDs are handled.
> We need to rethink how LRU should work in that context, I think.
>
> Daniel
>
>
> On 08/10/2017 07:59 PM, Matt Benjamin wrote:
>>
>> I think the particular thresholds of opens and inode count are
>> interacting in a way we'd like to change.  I think it might make sense
>> to delegate the various decision points to maybe a vector of strategy
>> functions, letting more varied approaches compete?
>>
>> Matt
>>
>> On Thu, Aug 10, 2017 at 7:12 PM, Pradeep <pradeep.tho...@gmail.com> wrote:
>>>
>>> Debugged this a little more. It appears that the entries that can be
>>> reaped
>>> are not at the LRU position (head) of the L1 queue. So those can be
>>> free'd
>>> later by lru_run(). I don't see it happening either for some reason.
>>>
>>> (gdb) p LRU[1].L1
>>> $29 = {q = {next = 0x7fb459e71960, prev = 0x7fb3ec3c0d30}, id =
>>> LRU_ENTRY_L1, size = 260379}
>>>
>>> head of the list is an entry with refcnt 2; but there are several entries
>>> with refcnt 1.
>>>
>>> (gdb) p *(mdcache_lru_t *)0x7fb459e71960
>>> $30 = {q = {next = 0x7fb43ddea8a0, prev = 0x7d68a0 <LRU+224>}, qid =
>>> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 1, cf = 2}
>>> (gdb) p *(mdcache_lru_t *)0x7fb43ddea8a0
>>> $31 = {q = {next = 0x7fb3f041f9a0, prev = 0x7fb459e71960}, qid =
>>> LRU_ENTRY_L1, refcnt = 1, flags = 0, lane = 1, cf = 0}
>>> (gdb) p *(mdcache_lru_t *)0x7fb3f041f9a0
>>> $32 = {q = {next = 0x7fb466960200, prev = 0x7fb43ddea8a0}, qid =
>>> LRU_ENTRY_L1, refcnt = 1, flags = 0, lane = 1, cf = 0}
>>> (gdb) p *(mdcache_lru_t *)0x7fb466960200
>>> $33 = {q = {next = 0x7fb451e20570, prev = 0x7fb3f041f9a0}, qid =
>>> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 1, cf = 1}
>>>
>>> The entries with refcnt 1 are moved to L2 by the background thread
>>> (lru_run). However it does it only of the open file count is greater than
>>> low water mark. In my case, the open_fd_count is not high; so lru_run()
>>> doesn't call lru_run_lane() to demote those entries to L2. What is the
>>> best
>>> approach to handle this scenario?
>>>
>>> Thanks,
>>> Pradeep
>>>
>>>
>>>
>>> On Mon, Aug 7, 2017 at 6:08 AM, Daniel Gryniewicz <d...@redhat.com>
>>> wrote:
>>>>
>>>>
>>>> It never has been.  In cache_inode, a pin-ref kept it from being
>>>> reaped, now any ref beyond 1 keeps it.
>>>>
>>>> On Fri, Aug 4, 2017 at 1:31 PM, Frank Filz <ffilz...@mindspring.com>
>>>> wrote:
>>>>>>
>>>>>> I'm hitting a case where mdcache keeps growing well beyond the high
>>>>>> water
>>>>>> mark. Here is a snapshot of the lru_state:
>>>>>>
>>>>>> 1 = {entries_hiwat = 10, entries_used = 2306063, chunks_hiwat =
>>>>>
>>>>> 10,
>>>>>>
>>>>>> chunks_used = 16462,
>>>>>>
>>>>>> It has grown to 2.3 million entries and each entry is ~1.6K.
>>>>>>
>>>>>> I looked at the first entry in lane 0, L1 queue:
>>>>>>
>>>>>> (gdb) p LRU[0].L1
>>>>>> $9 = {q = {next = 0x7fad64256f00, prev = 0x7faf21a1bc00}, id =
>>>>>> LRU_ENTRY_L1, size = 254628}
>>>>>> (gdb) p (mdcache_entry_t *)(0x7fad64256f00-1024)
>>>>>> $10 = (mdcache_entry_t *) 0x7fad64256b00
>>>>>> (gdb) p $10->lru
>>>>>> $11 = {q = {next = 0x7fad65ea0f00, prev = 0x7d67c0 }, qid =
>>>>>> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 0, cf = 0}
>>>>>> (gdb) p $10->fh_hk.inavl
>>>>>> $13 = true
>>>>>
>>>>>
>>>>> The refcount 2 prevents reaping.
>>>>>
>>>>> There could be a refcount leak.
>>>>>
>>>>> Hmm, though, I thought the entries_hwmark was a hard limit, guess
>>>>> not...
>>>>>
>>>>> Frank
>>>>>
>>>>>> Lane 1:
>>>>>> (gdb) 

Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-11 Thread Matt Benjamin
I didn't recall this reached 2.5, independent of the current rework.
(offhand, what branch shows the tree consolidation in 2015?)  In any
case though, perhaps we should start from pulling up the ntirpc
experimentally.

Matt

On Fri, Aug 11, 2017 at 8:26 AM, William Allen Simpson
 wrote:
> On 8/11/17 2:29 AM, Malahal Naineni wrote:
>>
>> Following confirms that Thread1 (TCP) is trying to use the same "rec" as
>> Thread42 (UDP), it is easy to reproduce on the customer system!
>>
> There are 2 duplicated fd indexed trees, not well coordinated.  My 2015
> code to fix this went in Feb/Mar timeframe for Ganesha v2.5/ntirpc 1.5.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-10 Thread Matt Benjamin
I think the particular thresholds of opens and inode count are
interacting in a way we'd like to change.  I think it might make sense
to delegate the various decision points to maybe a vector of strategy
functions, letting more varied approaches compete?

Matt

On Thu, Aug 10, 2017 at 7:12 PM, Pradeep  wrote:
> Debugged this a little more. It appears that the entries that can be reaped
> are not at the LRU position (head) of the L1 queue. So those can be free'd
> later by lru_run(). I don't see it happening either for some reason.
>
> (gdb) p LRU[1].L1
> $29 = {q = {next = 0x7fb459e71960, prev = 0x7fb3ec3c0d30}, id =
> LRU_ENTRY_L1, size = 260379}
>
> head of the list is an entry with refcnt 2; but there are several entries
> with refcnt 1.
>
> (gdb) p *(mdcache_lru_t *)0x7fb459e71960
> $30 = {q = {next = 0x7fb43ddea8a0, prev = 0x7d68a0 }, qid =
> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 1, cf = 2}
> (gdb) p *(mdcache_lru_t *)0x7fb43ddea8a0
> $31 = {q = {next = 0x7fb3f041f9a0, prev = 0x7fb459e71960}, qid =
> LRU_ENTRY_L1, refcnt = 1, flags = 0, lane = 1, cf = 0}
> (gdb) p *(mdcache_lru_t *)0x7fb3f041f9a0
> $32 = {q = {next = 0x7fb466960200, prev = 0x7fb43ddea8a0}, qid =
> LRU_ENTRY_L1, refcnt = 1, flags = 0, lane = 1, cf = 0}
> (gdb) p *(mdcache_lru_t *)0x7fb466960200
> $33 = {q = {next = 0x7fb451e20570, prev = 0x7fb3f041f9a0}, qid =
> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 1, cf = 1}
>
> The entries with refcnt 1 are moved to L2 by the background thread
> (lru_run). However it does it only of the open file count is greater than
> low water mark. In my case, the open_fd_count is not high; so lru_run()
> doesn't call lru_run_lane() to demote those entries to L2. What is the best
> approach to handle this scenario?
>
> Thanks,
> Pradeep
>
>
>
> On Mon, Aug 7, 2017 at 6:08 AM, Daniel Gryniewicz  wrote:
>>
>> It never has been.  In cache_inode, a pin-ref kept it from being
>> reaped, now any ref beyond 1 keeps it.
>>
>> On Fri, Aug 4, 2017 at 1:31 PM, Frank Filz 
>> wrote:
>> >> I'm hitting a case where mdcache keeps growing well beyond the high
>> >> water
>> >> mark. Here is a snapshot of the lru_state:
>> >>
>> >> 1 = {entries_hiwat = 10, entries_used = 2306063, chunks_hiwat =
>> > 10,
>> >> chunks_used = 16462,
>> >>
>> >> It has grown to 2.3 million entries and each entry is ~1.6K.
>> >>
>> >> I looked at the first entry in lane 0, L1 queue:
>> >>
>> >> (gdb) p LRU[0].L1
>> >> $9 = {q = {next = 0x7fad64256f00, prev = 0x7faf21a1bc00}, id =
>> >> LRU_ENTRY_L1, size = 254628}
>> >> (gdb) p (mdcache_entry_t *)(0x7fad64256f00-1024)
>> >> $10 = (mdcache_entry_t *) 0x7fad64256b00
>> >> (gdb) p $10->lru
>> >> $11 = {q = {next = 0x7fad65ea0f00, prev = 0x7d67c0 }, qid =
>> >> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 0, cf = 0}
>> >> (gdb) p $10->fh_hk.inavl
>> >> $13 = true
>> >
>> > The refcount 2 prevents reaping.
>> >
>> > There could be a refcount leak.
>> >
>> > Hmm, though, I thought the entries_hwmark was a hard limit, guess not...
>> >
>> > Frank
>> >
>> >> Lane 1:
>> >> (gdb) p LRU[1].L1
>> >> $18 = {q = {next = 0x7fad625c0300, prev = 0x7faec08c5100}, id =
>> >> LRU_ENTRY_L1, size = 253006}
>> >> (gdb) p (mdcache_entry_t *)(0x7fad625c0300 - 1024)
>> >> $21 = (mdcache_entry_t *) 0x7fad625bff00
>> >> (gdb) p $21->lru
>> >> $22 = {q = {next = 0x7fad66fce600, prev = 0x7d68a0 }, qid =
>> >> LRU_ENTRY_L1, refcnt = 2, flags = 0, lane = 1, cf = 1}
>> >>
>> >> (gdb) p $21->fh_hk.inavl
>> >> $24 = true
>> >>
>> >> As per LRU_ENTRY_RECLAIMABLE(), these entry should be reclaimable. Not
>> >> sure why it is not able to claim it. Any ideas?
>> >>
>> >> Thanks,
>> >> Pradeep
>> >>
>> >>
>> >
>> > 
>> > --
>> >> Check out the vibrant tech community on one of the world's most
>> >> engaging
>> >> tech sites, Slashdot.org! http://sdm.link/slashdot
>> >> ___
>> >> Nfs-ganesha-devel mailing list
>> >> Nfs-ganesha-devel@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> >
>> >
>> > ---
>> > This email has been checked for viruses by Avast antivirus software.
>> > https://www.avast.com/antivirus
>> >
>> >
>> >
>> > --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > ___
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 

Re: [Nfs-ganesha-devel] crash in makefd_xprt()

2017-08-10 Thread Matt Benjamin
discussion in #ganesha :)

On Thu, Aug 10, 2017 at 3:55 PM, Malahal Naineni  wrote:
> Hi All,
>
> One of our customers reported the following backtrace. The returned
> "rec" seems to be corrupted. Based on oflags, rpc_dplx_lookup_rec() didn't
> allocate the "rec" in this call path. Its refcount is 2. More importantly
> rec.hdl.xd is 0x51 (a bogus pointer) leading to the crash. GDB data is at
> the end of this email. Note that this crash is observed in latest ganesha2.3
> release.
>
> Looking at rpc_dplx_lookup_rec() and rpc_dplx_unref(), looks like rec's
> refcnt can go to 0 and then back up. Also, rpc_dplx_unref is releasing
> rec-lock and then acquires hash-lock to preserve the lock order. After
> dropping the lock at line 359 below, someone else could grab and change
> refcnt to 1. The second thread could call rpc_dplx_unref() after it is done
> beating the first thread and free the "rec". The first thread accessing
> ">node_k" at line 361 is in danger as it might be accessing freed
> memory. In any case, this is NOT our backtrace here. :-(
>
> Also, looking at the users of this "rec", they seem to close the file
> descriptor and then call rpc_dplx_unref(). This has very nasty side effects
> if my understanding is right. Say, thread one has fd 100, it closed it and
> is calling rpc_dplx_unref to free the "rec", but in the mean time another
> thread gets fd 100, and is calling rpc_dplx_lookup_rec(). At this point the
> second thread is going to use the same "rec" as the first thread, correct?
> Can it happen that a "rec" that belonged to UDP is now being given to a
> thread doing "TCP"? This is one way I can explain the backtrace! The first
> thread has to be UDP that doesn't need "xd" and the second thread should be
> "TCP" where it finds that the "xd" is uninitialized because the "rec" was
> allocated by a UDP thread. If you are still reading this email, kudos and a
> big thank you.
>
> 357 if (rec->refcnt == 0) {
> 358 t = rbtx_partition_of_scalar(_dplx_rec_set.xt,
> rec->fd_k);
> 359 REC_UNLOCK(rec);
> 360 rwlock_wrlock(>lock);
> 361 nv = opr_rbtree_lookup(>t, >node_k);
> 362 rec = NULL;
>
>
> BORING GDB STUFF:
>
> (gdb) bt
> #0  0x3fff7aaaceb0 in makefd_xprt (fd=166878, sendsz=262144,
> recvsz=262144, allocated=0x3ffab97fdb4c)
> at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:436
> #1  0x3fff7aaad224 in rendezvous_request (xprt=0x1000b125310,
> req=0x3ffa2c0008f0)
> at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/libntirpc/src/svc_vc.c:549
> #2  0x10065104 in thr_decode_rpc_request (context=0x0,
> xprt=0x1000b125310)
> at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1729
> #3  0x100657f4 in thr_decode_rpc_requests (thr_ctx=0x3ffedc001280)
> at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1853
> #4  0x10195744 in fridgethr_start_routine (arg=0x3ffedc001280)
> at
> /usr/src/debug/nfs-ganesha-2.3.2-ibm44-0.1.1-Source/support/fridgethr.c:561
>
> (gdb) p oflags
> $1 = 0
> (gdb) p rec->hdl.xd
> $2 = (struct x_vc_data *) 0x51
> (gdb) p *rec
> $3 = {fd_k = 166878, locktrace = {mtx = {__data = {__lock = 2, __count = 0,
> __owner = 92274, __nusers = 1, __kind = 3,
> __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>   __size =
> "\002\000\000\000\000\000\000\000rh\001\000\001\000\000\000\003", '\000'
> ,
>   __align = 2}, func = 0x3fff7aac6ca0 <__func__.8774> "rpc_dplx_ref",
> line = 89}, node_k = {left = 0x0,
> right = 0x0, parent = 0x3ff9c80034f0, red = 1, gen = 639163}, refcnt =
> 2, send = {lock = {we = {mtx = {__data = {
> __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 3,
> __spins = 0, __list = {__prev = 0x0,
>   __next = 0x0}}, __size = '\000' , "\003",
> '\000' , __align = 0},
> cv = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
> __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
> __nwaiters = 0, __broadcast_seq = 0}, __size = '\000'  47 times>, __align = 0}},
>   lock_flag_value = 0, locktrace = {func = 0x0, line = 0}}}, recv =
> {lock = {we = {mtx = {__data = {__lock = 0,
> __count = 0, __owner = 0, __nusers = 0, __kind = 3, __spins = 0,
> __list = {__prev = 0x0, __next = 0x0}},
>   __size = '\000' , "\003", '\000'  times>, __align = 0}, cv = {__data = {
> __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
> __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
> __broadcast_seq = 0}, __size = '\000' ,
> __align = 0}}, lock_flag_value = 0, locktrace = {
> func = 0x3ffc00d8 "\300L\001", line = 0}}}, hdl = {xd = 0x51,
> xprt = 0x0}}
> (gdb)
>
>
> --
> Check out the vibrant tech 

Re: [Nfs-ganesha-devel] Proposed backports for 2.5.2

2017-08-09 Thread Matt Benjamin
does this include the fix for  readdir chunking config parsing issue?
I thought that affected 2.5.x

Matt

On Wed, Aug 9, 2017 at 9:55 AM, Daniel Gryniewicz  wrote:
> Here's my proposed backports for 2.5.2:
>
> commit 7f2d461277521301a417ca368d3c7656edbfc903
> FSAL_GLUSTER: Reset caller_garray to NULL upon free
>
> commit 114c38ce9fcf20878ffce3b454e106089a34ab5d
> Decrement FD count in fsal_close even if obj_ops.close() fails
>
> commit 7f7c92363ba5c96e6fe5add81c81e65e90063cb2
> MDCACHE - Fix rename/getattrs deadlock
>
> commit ef48ab5bc4e3f1a0fd6879433d774975622e8383
> Dbus config: Allow only root users
>
> commit 2b88bcd16aaa653874079dc7401a49bb61421401
> nfs_init_complete() is called too soon!
>
> commit 4aec4811f1c83578df8ecaacf4c8adda58cc3393
> Fix deadlock in lru_reap_impl and simplify reapers
>
> commit d2b701662db9468c967ed6ecfd274625b1902a6c
> Add new Reaper_Work_Per_Lane option
>
>
> Daniel
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] only use of UDP client is NSM

2017-08-08 Thread Matt Benjamin
I do not support dropping UDP.  Thanks.

Matt

On Tue, Aug 8, 2017 at 10:34 PM, William Allen Simpson
 wrote:
> On 8/8/17 1:58 PM, Daniel Gryniewicz wrote:
>>
>> On 08/08/2017 01:17 PM, William Allen Simpson wrote:
>>>
>>> NSM should be accessible by TCP.  Why are we using UDP?
>>>
>>> Is there a downstream need?
>>>
>>
>> Yes, there is a downstream need for NSM.
>>
> Would prefer folks answer the question asked.  I didn't ask about NSM.
> Note the lack of a question mark
>
> It very explicitly asked:
>
> "Why are we using UDP? Is there a downstream need?"
>
> AFAICT from grep'ing the NFS documents, NFSv3 NSM *MUST* support TCP.
> We do not support NFSv2.  We should be using TCP.
>
> Do we have a downstream need for NFSv2 support for NSM only?
>
> If not, I'm going to drop this unsupported and frankly kludgy code.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] only use of UDP client is NSM

2017-08-08 Thread Matt Benjamin
Hi Bill,

While NFSv3 supports TCP, UDP is also supported.

Matt

On Tue, Aug 8, 2017 at 1:17 PM, William Allen Simpson
 wrote:
> Frank, Dominique tracked it down:
>
> #0 0x4e2ea0 in calloc
> (/export/nfs-ganesha/build/MainNFSD/ganesha.nfsd+0x4e2ea0)
> #1 0x5d0447 in gsh_calloc__
> /export/nfs-ganesha/src/include/abstract_mem.h:145:12
> #2 0x758f7eb6 in svc_dg_xprt_zalloc
> /export/nfs-ganesha/src/libntirpc/src/svc_dg.c:101:27
> #3 0x758f7abc in svc_dg_xprt_setup
> /export/nfs-ganesha/src/libntirpc/src/svc_dg.c:119:28
> #4 0x759009eb in svc_xprt_lookup
> /export/nfs-ganesha/src/libntirpc/src/svc_xprt.c:165:4
> #5 0x758f7608 in svc_dg_ncreatef
> /export/nfs-ganesha/src/libntirpc/src/svc_dg.c:139:9
> #6 0x758de453 in clnt_dg_ncreatef
> /export/nfs-ganesha/src/libntirpc/src/clnt_dg.c:123:9
> #7 0x758e11c2 in clnt_dg_ncreate
> /export/nfs-ganesha/src/libntirpc/ntirpc/rpc/clnt.h:505:10
> #8 0x758e0e64 in clnt_tli_ncreate
> /export/nfs-ganesha/src/libntirpc/src/clnt_generic.c:383:8
> #9 0x758f32d7 in getclnthandle
> /export/nfs-ganesha/src/libntirpc/src/rpcb_clnt.c:378:7
> #10 0x758f2622 in __rpcb_findaddr_timed
> /export/nfs-ganesha/src/libntirpc/src/rpcb_clnt.c:651:13
> #11 0x758e0a0f in clnt_tp_ncreate_timed
> /export/nfs-ganesha/src/libntirpc/src/clnt_generic.c:287:3
> #12 0x758e08a2 in clnt_ncreate_timed
> /export/nfs-ganesha/src/libntirpc/src/clnt_generic.c:207:10
> #13 0x758e099d in clnt_ncreate
> /export/nfs-ganesha/src/libntirpc/src/clnt_generic.c:156:10
> #14 0x844a3a in gsh_clnt_create
> /export/nfs-ganesha/src/RPCAL/rpc_tools.c:487:9
> #15 0x71d311 in nsm_connect
> /export/nfs-ganesha/src/Protocols/NLM/nsm.c:55:13
> #16 0x722ae9 in nsm_unmonitor_all
> /export/nfs-ganesha/src/Protocols/NLM/nsm.c:252:7
> #17 0x5de964 in nfs_start
> /export/nfs-ganesha/src/MainNFSD/nfs_init.c:939:3
> #18 0x51c685 in main /export/nfs-ganesha/src/MainNFSD/nfs_main.c:489:2
> #19 0x74df2400 in __libc_start_main (/lib64/libc.so.6+0x20400)
>
> NSM should be accessible by TCP.  Why are we using UDP?
>
> Is there a downstream need?
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] address sanitizer defeats -O0

2017-08-03 Thread Matt Benjamin
Correct.

Matt

On Thu, Aug 3, 2017 at 10:44 AM, Daniel Gryniewicz  wrote:
> I do not believe it's reorganizing code.   I think the code in the first
> section is wrong, and the second version is needed.
>
> Once you've dec'd the refcount, you cannot access the pointer.  At that
> point, another thread could deref and free, and you'd be left with freed
> memory.  You *must* store that value in a local variable.
>
> Daniel
>
>
> On 08/03/2017 02:13 AM, William Allen Simpson wrote:
>>
>> It's improperly reorganizing code.
>>
>> ===
>>
>> Accessing freed memory at ==>
>>
>> Logically, this couldn't happen!
>>
>> int free_nfs_request(request_data_t *reqdata)
>> {
>>  atomic_dec_uint32_t(>r_d_refs);
>>
>>  LogDebug(COMPONENT_DISPATCH,
>>   "%s: %p fd %d xp_refs %" PRIu32 " r_d_refs %" PRIu32,
>>   __func__,
>>   reqdata->r_u.req.svc.rq_xprt,
>>   reqdata->r_u.req.svc.rq_xprt->xp_fd,
>>   reqdata->r_u.req.svc.rq_xprt->xp_refs,
>>   reqdata->r_d_refs);
>>
>>  if (reqdata->r_d_refs)
>> ==> return reqdata->r_d_refs;
>>
>>  switch (reqdata->rtype) {
>>  case NFS_REQUEST:
>>  /* dispose RPC header */
>>  if (reqdata->r_u.req.svc.rq_auth)
>>  SVCAUTH_RELEASE(&(reqdata->r_u.req.svc));
>>  XDR_DESTROY(reqdata->r_u.req.svc.rq_xdrs);
>>  break;
>>  default:
>>  break;
>>  }
>>  SVC_RELEASE(reqdata->r_u.req.svc.rq_xprt, SVC_RELEASE_FLAG_NONE);
>>  pool_free(request_pool, reqdata);
>>  return 0;
>> }
>>
>> ===
>>
>> This works by using local variables.  Kinda pointlessly, when not
>> running log debug.
>>
>> int free_nfs_request(request_data_t *reqdata)
>> {
>>  SVCXPRT *xprt = reqdata->r_u.req.svc.rq_xprt;
>>  uint32_t refs = atomic_dec_uint32_t(>r_d_refs);
>>
>>  LogDebug(COMPONENT_DISPATCH,
>>   "%s: %p fd %d xp_refs %" PRIu32 " r_d_refs %" PRIu32,
>>   __func__,
>>   xprt, xprt->xp_fd, xprt->xp_refs,
>>   refs);
>>
>>  if (refs)
>>  return refs;
>>
>>  switch (reqdata->rtype) {
>>  case NFS_REQUEST:
>>  /* dispose RPC header */
>>  if (reqdata->r_u.req.svc.rq_auth)
>>  SVCAUTH_RELEASE(&(reqdata->r_u.req.svc));
>>  XDR_DESTROY(reqdata->r_u.req.svc.rq_xdrs);
>>  break;
>>  default:
>>  break;
>>  }
>>  SVC_RELEASE(xprt, SVC_RELEASE_FLAG_NONE);
>>  pool_free(request_pool, reqdata);
>>  return 0;
>> }
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Announce Push of V2.6-dev-1

2017-07-28 Thread Matt Benjamin
Critical bugfixes need to go to 2.5-stable, sure, but this is fine for the week.

Matt

On Fri, Jul 28, 2017 at 8:01 PM, Frank Filz  wrote:
> So this really is just bug fixes, might as well fast forward V2.5-stable...
>
> Obviously not up to the V2.6-dev-1 CMakeLists.txt commit...
>
> What do folks think?
>
> Frank
>
>> -Original Message-
>> From: Frank Filz [mailto:ffilz...@mindspring.com]
>> Sent: Friday, July 28, 2017 4:53 PM
>> To: nfs-ganesha-devel@lists.sourceforge.net
>> Subject: [Nfs-ganesha-devel] Announce Push of V2.6-dev-1
>>
>> Branch next
>>
>> Tag:V2.6-dev-1
>>
>> Release Highlights
>>
>> * FSAL_GLUSTER: Reset caller_garray to NULL upon free
>>
>> * Decrement FD count in fsal_close even if obj_ops.close() fails
>>
>> * export: skip export entries that init_export_root fail
>>
>> * MDCACHE - Fix rename/getattrs deadlock
>>
>> * DEBUG_MDCACHE: disable by default
>>
>> * Dbus config: Allow only root users
>>
>> * nfs_init_complete() is called too soon!
>>
>> Signed-off-by: Frank S. Filz 
>>
>> Contents:
>>
>> 9ab9fcd Frank S. Filz V2.6-dev-1
>> 2b88bcd Malahal Naineni nfs_init_complete() is called too soon!
>> ef48ab5 Supriti Singh Dbus config: Allow only root users 0c4dc5b Dominique
>> Martinet DEBUG_MDCACHE: disable by default
>> 7f7c923 Daniel Gryniewicz MDCACHE - Fix rename/getattrs deadlock
>> 257bc75 Kinglong Mee export: skip export entries that init_export_root
> fail
>> 114c38c Madhu Thorat Decrement FD count in fsal_close even if
>> obj_ops.close() fails
>> 7f2d461 Soumya Koduri FSAL_GLUSTER: Reset caller_garray to NULL upon
>> free
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>>
>>
> 
> --
>> Check out the vibrant tech community on one of the world's most engaging
>> tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Balancing input workers

2017-07-24 Thread Matt Benjamin
On Sun, Jul 23, 2017 at 11:26 AM, William Allen Simpson
<william.allen.simp...@gmail.com> wrote:
> On 7/21/17 11:17 AM, Matt Benjamin wrote:
>>
>> As we discussed Wed., I'd like to see something like a msg counter and
>> byte counter that induced switching to the next handle.  This seems
>> consistent w/the front or back queuing idea Dan proposed.
>
>
> I don't understand this comment.  We already have the number of
> bytes in a message.  But I don't know that that has to do with
> switching to another transport handle.

With a msg counter, it could fit into various strategies for allowing
a handle to execute another request if available, allowing for better
pipelining of requests.

>
> Moreover, that information isn't really available at the time of
> deciding what worker to assign.  The worker is assigned, then the
> data is read.

Presumably it takes affect before requeuing the handle, not before assigning it.

>
>> The
>> existing lookahead logic knows when we have reads or writes, but
>> doesn't know how much we read, which would count towards the byte
>> counter.
>>
> Since we are now reading the entire incoming request before
> dispatching and processing the request, and holding onto the input
> data until after it is passed to the FSAL for possible zero-copy,
> I'm not sure what you mean by lookahead.

I'm referring to the "lookahead" construct maintained by our decoders.
It has information after decoders have executed.

>
>
>> On Fri, Jul 21, 2017 at 11:06 AM, William Allen Simpson
>> <william.allen.simp...@gmail.com> wrote:
>>>
>>> My current Napalm code essentially gives the following priorities:
>>>
>>> New UDP, TCP, RDMA, or 9P connections are the "same" priority, as
>>> they each have their own channel, and they each have a dedicated
>>> epoll thread.
>>>
>>> ...
>>>
>>> Right now, they're all treated as first tier, and it handles them
>>> expeditiously.  After all, missing an incoming connection is far
>>> worse (as viewed by the client) than slowing receipt of data.
>>>
> After discussion, I've simplified my 2 year old code that was
> written more like a device driver.  It counted the incoming events
> and one worker task thread per connection handled those events.
>
> Now, there's only 1 tier.  And it depends more on the epoll-only
> rearm to delay incoming requests.  (But not as badly as existing.)
>
> This gets rid of the problem that Matt identified where a piggy
> client could send a lot of requests and they'd all be handled
> sequentially (via the counter) before another client.
>
> Now every request has its own task thread and is always added to the
> tail of the worker queue.  Complete fairness.  More system resources.

I do not currently understand all of the components of this design.
I'm confused by the notion of requests having task threads.  My
impression was that xprt handles are what is queued, and that handles
wait in line for task threads;  also that a handle we are willing to
prioritize can be queued at the head rather than the tail of said
queue.

Matt

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Balancing input workers

2017-07-21 Thread Matt Benjamin
As we discussed Wed., I'd like to see something like a msg counter and
byte counter that induced switching to the next handle.  This seems
consistent w/the front or back queuing idea Dan proposed.  The
existing lookahead logic knows when we have reads or writes, but
doesn't know how much we read, which would count towards the byte
counter.

Matt

On Fri, Jul 21, 2017 at 11:06 AM, William Allen Simpson
 wrote:
> My current Napalm code essentially gives the following priorities:
>
> New UDP, TCP, RDMA, or 9P connections are the "same" priority, as
> they each have their own channel, and they each have a dedicated
> epoll thread.
>
> The only limit is the OS runs out of file descriptors and rejects
> the connection attempt, hopefully before it tells us.
>
> We probably want a new configurable number of connections limit per
> type.  Currently, there's only 1 global configuration.  Temporarily,
> that could be used as the same for every type.
>
> But what should we do?  Accept the TCP connection and then close it?
> Receive the UDP data to get it out of the OS buffers, but then
> discard it?
>
> Right now, they're all treated as first tier, and it handles them
> expeditiously.  After all, missing an incoming connection is far
> worse (as viewed by the client) than slowing receipt of data.
>
> TCP or RDMA service requests are the second tier, vying for worker
> threads with each other.  I'm not entirely sure what 9p is doing.
>
> Service requests stay on the same thread as long as possible.  Each
> new request will be assigned a new worker thread.  That will have
> greatest client equality.
>
> An alternative that DanG and I discussed this morning would be to
> add some feedback from each FSAL that tells whether the request
> was fast or slow.  We'd need yet another configurable parameter for
> how many fast requests are allowed before the next request is
> assigned a worker at the tail of the queue.  (Once we get async
> FSALs going, slow requests always incur a task switch anyway, so
> they'd reset the counter.)
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] About FSAL_GLUSTER performance

2017-07-21 Thread Matt Benjamin
Hi Mark,

As a matter of fact that is one of a bunch of perf and async/non-blocking
changes being worked on for 2.6 and later releases.

Soumya is the person currently working on gfapi async.

Matt





On Jul 21, 2017 3:43 AM, "gui mark"  wrote:

> Hi all (cc maintainers),
>
> We've tried a performance test comparing nfs-ganesha and gNFS from
> gluster, we found that gNFS out performs nfs-ganesha by nearly 2 times on
> OPS.
>
> As I read throught the code FSAL_GLUSTER now adopts sync api from
> libgfapi, is there any plan to switch to the *_async ones, so we could
> possibly gain a performance boost ?
>
> Thanks,
> Mark
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] About dirent chunking

2017-07-12 Thread Matt Benjamin
Dan and Frank are the chunking experts, but iiuc you're right, dir_max
has no effect when chunking is enabled, and iiuc also yes, the plan is
to retire the old dirent cache in favor of chuking.  Frank also has
new commits for pruning the dirent cache via it's LRU list.

regards,

Matt

On Wed, Jul 12, 2017 at 11:13 PM, gui mark  wrote:
> IIUC, the dirent chunking is a complete replacement of  the old-style dirent
> cache. As I tried, when I have 'Dir_Chunk' on, then 'Dir_Max' will not limit
> the number of cached dirents, so are we going to kick out the old direct
> cache completely ?
>
> On Tue, Jul 11, 2017 at 8:40 PM, Daniel Gryniewicz  wrote:
>>
>> On 07/11/2017 06:00 AM, gui mark wrote:
>>>
>>> Hi Frank (cc list)
>>>
>>> I started to grow interested with your dirent chunking these days, and
>>> I've got a few questions below:
>>>
>>> 1. Why we need chunking ? only for lru-style management for dirents ?
>>
>>
>> The big driver for it was the way that dirents were cached before.  When a
>> directory was cached, the entire thing needed to be cached before any
>> dirents were returned to the client.  This had 2 problems for large
>> directories: 1) It took a lot of time, and clients could time out before
>> receiving a reply; 2) It took a lot of memory, since we couldn't reap any
>> dirents, because we needed them all to return any.
>>
>> The chunking breaks the readdir up into chunks, so that only one chunk
>> must be read before results can be returned to clients, and so that chunks
>> can be reaped, freeing memory without breaking readdir.
>>
>>> 2. Is this big feature completed? If not, could you please share some
>>> blueprints on that, so we may be able to contribute some effort ?
>>
>>
>> The feature itself is complete; the last bit of the memory management is
>> still outstanding here: https://review.gerrithub.io/#/c/367446/
>>
>>> 3. Do the underlying FSALs have to add impl for any new interfaces (if
>>> any) to support chunking ?
>>
>>
>> Yes and no.  Everything works correctly without any changes.  However, if
>> the FSAL can compute dirent cookies directly from the name (ie, no
>> round-trip to the cluster) then it can implement compute_readdir_cookie(),
>> and be more efficient when handling dirents created from calls other than
>> readdir() (such as lookup() and rename()).  This is purely an optimization,
>> but one that can theoretically make a difference to come workloads.
>>
>>
>>> 4. Does the new-style dirent cache borrows ideas for the kernel dcache ?
>>
>>
>> Not really, no.  It mostly borrows from the Ganesha's object cache.
>>
>> Daniel
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] About dirent chunking

2017-07-12 Thread Matt Benjamin
Hi All,

The compute_readdir_cookie feature was originally cooked up for RGW,
but Frank showed that you could implement it for ext4, and probably
other FSALs.  I've been remiss in pushing an implementation, but have
now remedied that (for RGW, of course, this function is very simple),
but didn't want to introduce any uncertainty atop the basic chunked
listing behavior as we stabilized our first push of this.  It does
appear to work, though :)

Matt

On Tue, Jul 11, 2017 at 6:00 AM, gui mark  wrote:
> Hi Frank (cc list)
>
> I started to grow interested with your dirent chunking these days, and I've
> got a few questions below:
>
> 1. Why we need chunking ? only for lru-style management for dirents ?
>
> 2. Is this big feature completed? If not, could you please share some
> blueprints on that, so we may be able to contribute some effort ?
>
> 3. Do the underlying FSALs have to add impl for any new interfaces (if any)
> to support chunking ?
>
> 4. Does the new-style dirent cache borrows ideas for the kernel dcache ?
>
> I'm reading your former mails and comments within codes, but I think a
> direct ask shall save me a lot of time :)
>
> Thanks,
> Mark
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] commit test Comparisons

2017-06-29 Thread Matt Benjamin
I find this example of constant on the left less clear, to be honest.

Matt

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Wednesday, June 28, 2017 8:41:26 PM
> Subject: [Nfs-ganesha-devel] commit test Comparisons
> 
> This is a good programming practice of long-standing value.
> 
> Why of why do these evil commit tests keep creeping in?
> 
> bill@simpson91:~/rdma/nfs-ganesha$ git commit --amend -a
> WARNING: Comparisons should place the constant on the right side of the test
> #17: FILE: src/MainNFSD/nfs_rpc_dispatcher_thread.c:1777:
> + if (XPRT_DONE <= stat) {
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] timed waits

2017-06-21 Thread Matt Benjamin
You're welcome to make it an ntirpc init parameter.

Matt

- Original Message -
> From: "Daniel Gryniewicz" <d...@redhat.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Wednesday, June 21, 2017 8:13:34 AM
> Subject: Re: [Nfs-ganesha-devel] timed waits
> 
> On 06/21/2017 05:26 AM, William Allen Simpson wrote:
> > I'd thought my Ganesha wasn't shutting down properly.  Turned out to
> > be sitting in several timed waits: fridge, epoll.
> >
> > The EPOLL timeout is 120 seconds, but has "XXX" next to it.  Apparently,
> > others are that long elsewhere, too.  Seems a long time.
> >
> > Is there a configuration parameter somewhere to use instead?
> 
> There doesn't seem to be, no.
> 
> Daniel
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] cache and hash and partitions should be primes

2017-06-21 Thread Matt Benjamin
hi bill,

inline

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Wednesday, June 21, 2017 4:01:01 AM
> Subject: [Nfs-ganesha-devel] cache and hash and partitions should be primes
> 
> Was looking through ntirpc cache/hash sizes, and discovered that:
> svc_auth_des.c has 64 (not prime);
not used?

> authgss_hash.c has 255 (not prime).

good catch

> 
> Configurable number of partitions aren't checked for primality.
> 
> So began checking Ganesha as well.
> 
> src/support/export_mgr.c has a nice size of 769, appropriate for
> 10,000'ish exports by id with a hit rate of 0.130208.
> 
> src/support/ds.c has a fair size of 163, appropriate for 256'ish
> active servers with a hit rate of 0.520833.
> 
> (I chose those two a couple of years ago.)
> 
> The number 1009 is used several places.  Prime, but too large?
> src/support/netgroup_cache.c
> src/support/uid2grp_cache.c
> src/idmapper/idmapper_cache.c
> 
> DRC_TCP_CACHESZ 127 and DRC_UDP_CACHESZ 599, while prime, seem
> completely oddball (and reversed).  And allow configuration to 255
> and 2047 (not prime) respectively.

careful;  first is small on the theory it will retire fast, and note it is per 
connection

the UDP DRC is global, so larger

Matt

> 
> Are there others that I've missed?
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] async dispatch not good

2017-06-19 Thread Matt Benjamin
Hi,

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>
> Cc: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Monday, June 19, 2017 10:03:53 PM
> Subject: Re: [Nfs-ganesha-devel] async dispatch not good
> 
> On 6/19/17 3:41 PM, Matt Benjamin wrote:
> > it's not about memory, this is the problem we're trying to avoid
> > 
> > but, referring for context to our verbal discussion earlier today, your
> > suggestion to hybridize the existing output side (which depends on
> > blocking sockets) and an async input side using recv() seems work at least
> > exploring;  I assume you are proposing to use recv() with MSG_DONTWAIT?
> > Yes.  Linux does support MSG_DONTWAIT, and it should be possible to try
> the recv() writev() hybrid approach.  At least there's one Oracle
> article that says it works
> 
> The underlying problem is EPOLL reaallly isn't a good design.  What we
> need for speed is callbacks that tell us that the read/write is done,
> not signals that there might be more data pending -- which cause us to
> do more system calls to find out.  System calls are the problem.

They have latency, sure.

> 
> kqueue is a much better design.  We should try to get kqueue support in
> the Linux kernel.  That would aid portability, too.

You're welcome to try, seems political.

> 
> But what I'm doing right now is backing out my previous attempt.  Even
> after dumping the mass code, awful lot of hooks to undo

Sorry.

> 
> My thought now is it's better to get the big changes in, then work on
> TCP I-O re-write separately (as I was doing for UDP and RDMA).  Quick
> and dirty shims, but only temporarily.

One of the key goals I have is read-frags-ahead/non-blocking decode.  Has been 
at the top of the queue since our initial meetings.  Seems like your recv() 
technique should work.  

> 
> While I'm thinking about it, why does Ganesha call svc_reg()?  AFAICT,
> that's just filling in a tree that is never used anymore.
> 
> Can I remove that code in Ganesha?  It's a pain to maintain in ntirpc.

If it's no longer effective, then eventually, sure.  Is it a substantial help 
to your work?

Matt

> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] UDP VSOCK?

2017-06-19 Thread Matt Benjamin
there is no UDP vsock, it's always a stream socket, this could be done 
differently, as desired

Matt

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Friday, June 16, 2017 4:38:09 PM
> Subject: [Nfs-ganesha-devel] UDP VSOCK?
> 
> Tried to talk to DanG today, but he went home earlier than usual.  So
> maybe somebody else knows:
> 
> void Create_SVCXPRTs(void)
> {
>   protos p;
> 
>   LogFullDebug(COMPONENT_DISPATCH, "Allocation of the SVCXPRT");
>   for (p = P_NFS; p < P_COUNT; p++)
>   if (nfs_protocol_enabled(p)) {
>   Create_udp(p);
>   Create_tcp(p);
>   }
> #ifdef RPC_VSOCK
>   if (vsock)
>   create_vsock();
> #endif /* RPC_VSOCK */
> }
> 
> This creates a UDP VSOCK fd, a TCP VSOCK fd, and then another TCP VSOCK
> fd.  I'm fairly sure the the current code won't work properly for the
> UDP VSOCK, and I'm fairly sure that two TCP VSOCKs won't be used.
> 
> Also, VSOCK only needs to support NFS v3 and v4, not the other programs?
> 
> But I could be wrong?
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Mount connection timeout

2017-06-19 Thread Matt Benjamin
yeah

- Original Message -
> From: "Frank Filz" <ffilz...@mindspring.com>
> To: "Supriti Singh" <supriti.si...@suse.com>, 
> nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, June 19, 2017 9:46:32 AM
> Subject: Re: [Nfs-ganesha-devel] Mount connection timeout
> 
> 
> 
> That’s just LRU running and not finding any work to do. I’m not sure those
> messages should be LogDebug, maybe they should be LogFullDebug.
> 
> 
> 
> Frank
> 
> 
> 
> 
> From: Supriti Singh [mailto:supriti.si...@suse.com]
> Sent: Monday, June 19, 2017 4:47 AM
> To: nfs-ganesha-devel@lists.sourceforge.net
> Subject: [Nfs-ganesha-devel] Mount connection timeout
> 
> 
> 
> 
> I am using nfs-ganesha v2.5-final + CephFS FSAL. I have noticed that when I
> mount, sometimes first mount attempt fails with connection time out. And it
> succeed in later attempts.
> 
> In event of timeout, the log contains the following lines many times:
> 
> ganesha.nfsd-4274[cache_lru] lru_run :INODE LRU :DEBUG :After work,
> open_fd_count:0 count:5 fdrate:1 threadwait=90
> ganesha.nfsd-4274[cache_lru] lru_run :INODE LRU :DEBUG :FD count is 0 and low
> water mark is 2048: not reaping.
> 
> 
> Can someone please explain what could be possible reason?
> 
> Thanks,
> Supriti
> 
> 
> --
> 
> 
> Supriti Singh
> 
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> 
> 
> HRB 21284 (AG Nürnberg)
> 
> 
> 
> 
>   Virus-free. www.avast.com
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] async dispatch not good

2017-06-19 Thread Matt Benjamin
Hi Bill,

inline

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Monday, June 19, 2017 3:04:52 AM
> Subject: [Nfs-ganesha-devel] async dispatch not good
> 
> As folks may have noticed, I've been re-working my old 2015 dispatch
> patches that eliminate the network input-side queues in Ganesha.
> 
> Matt had wanted fully async non-blocking I-O.  I've been poking at it
> for a week, and now am sure that's the wrong way to go.

I don't think so, but, below

> 
> It might still be good for FSALs.  Remains to be seen.  DanG and
> Soumya are looking at that now.
> 
> The devil in userland network I-O is system calls.  Each epoll_wait
> is a system call.  Each read or write is a system call.  Each thread
> switch is a system call.
> 
> My code in Ganesha v2.5 (NTIRPC v1.5) gets the network output down to
> one system call per request on a very hot thread.  Cannot do better,
> as trying harder would just push the data into kernel buffers,
> possibly slowing our own output (for various reasons).
> 
> Trying to re-work that for async non-blocking calls instead means
> many more system calls.  Instead of one clean writev with the TCP
> fragment header and all ready buffers in one single call, we'd at
> minimum have a call, an epoll_wait, spawn another work thread, then
> another call and/or release the buffer, rinse and repeat.

the expensive part of this (spawn) is necessary only due to aspects of the old 
design, but, considering effort, ok, below

> 
> For a long buffer chain (the times we want more performance), we'd
> have much less performance -- roughly 2 + (3 * number of buffers)
> additional system calls.  For common short response chains, still
> have the extra overhead of the epoll system call, doubling calls.
> 
> Also, using writev minimizes buffer copies.  Eliminating data
> copying will usually give far better performance.
> 
> The only thing async output is saving is waiting threads.  But I've
> already got the output threads down to the minimum (per interface).
> No gain here!
> 
> On the input side, the truly optimum reduction in system calls would
> be one read to get the TCP fragment header and up to 1500 bytes of
> data, followed (only when needed) by another read to get the entire
> rest of long fragments in one fell swoop.

well, maybe, not considering blocking?  I think we really do want avoid 
blocking in the paths that now can/do, but, below

> 
> With async input I've tried level triggered, and am getting spurious
> epoll read data signals.  Googling shows that's been a problem since
> at least 2014, but possible to program around.

ok

> 
> Still, this could be better, had it not been terrible for output-side.
> 
> Changing to edge triggered means that every good read would be
> followed by another read to make sure that we've gotten all the data.
> That is, common small reads turn into two (2) reads.  Doubling our
> system calls in the common case is not the way to go
> 
> In conclusion, with epoll we know when input data is available, so
> input threads aren't sitting around waiting anyway, and trying to
> minimize threads results in more system calls and poorer performance.
> 
> NTIRPC already defaults to 200 worker threads.  If we need more, we
> should allocate more.  Memory should not be an issue.

it's not about memory, this is the problem we're trying to avoid

but, referring for context to our verbal discussion earlier today, your 
suggestion to hybridize the existing output side (which depends on blocking 
sockets) and an async input side using recv() seems work at least exploring;  I 
assume you are proposing to use recv() with MSG_DONTWAIT?

Matt

> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Need a second opinion on some code

2017-06-14 Thread Matt Benjamin
don't blame RGW;  RGW's ability to provide the cookie/offset for any name isn't 
even exposed or in use, at this point; (but it's awesome, isn't it? :)

Matt

- Original Message -
> From: "Daniel Gryniewicz" <d...@redhat.com>
> To: "Frank Filz" <ffilz...@mindspring.com>
> Cc: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Wednesday, June 14, 2017 9:56:33 AM
> Subject: Re: [Nfs-ganesha-devel] Need a second opinion on some code
> 
> I think it's worse than that.  It will blow away all dirents in a
> directory on any rename, lookup, or link, if the FSAL is not RGW (or
> rather, if the FSAL doesn't support computing cookies).  I'm not sure
> how to handle this, though.  Just putting the dirent into a loose list
> breaks enumeration order, right?
> 
> Daniel
> 
> On Tue, Jun 13, 2017 at 6:08 PM, Frank Filz <ffilz...@mindspring.com> wrote:
> > Hmm, I think the following code blows our dirent cache if we are not able
> > to
> > add the dirent for a created file to a chunk (either because there isn't a
> > chunk to add it to, or the FSAL is not RGW):
> >
> > if (new_dir_entry == allocated_dir_entry &&
> > mdcache_param.dir.avl_chunk > 0) {
> > /* If chunking, try and add this entry to a chunk. */
> > bool chunked = add_dirent_to_chunk(parent, new_dir_entry);
> >
> > if (!chunked && *invalidate) {
> > /* If chunking and invalidating parent, and
> > chunking
> >  * this entry failed, invalidate parent.
> >  */
> > mdcache_dirent_invalidate_all(parent);
> > } else if (chunked && *invalidate) {
> > /* We succeeded in adding to chunk, don't
> > invalidate
> > the
> >  * parent directory.
> >  */
> > *invalidate = false;
> > }
> > }
> >
> > This means the only time we will actually have any loose dirents is due to
> > lookups...
> >
> > I don't think we should blow out the loose dirents in this case, though we
> > need to blow out any chunks since they are no longer valid.
> >
> > Frank
> >
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >
> >
> > --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] intrusive I-O patches

2017-05-30 Thread Matt Benjamin
I approve this message.  For posterity, that doesn't mean I can dictate what is 
on v2.6, but I approve creating advanced branch that advances these goals and 
have confidence that they are shared upstream goals.

Matt

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Tuesday, May 30, 2017 4:20:03 PM
> Subject: [Nfs-ganesha-devel] intrusive I-O patches
> 
> Memorializing today's conversation with Matt.  He's finally agreed
> that it will OK to completely redo the RPC parsing for V2.6, instead
> of the slicing and dicing I've been painfully doing to avoid massive
> interface changes.
> 
> So, all the UDP and TCP input will be redone use the ntirpc IOQ
> zero-copy code that I'd developed for RDMA 2 years ago, or a port of
> the routines as needed into NFS-Ganesha.  Also, the NFS-Ganesha
> output will be redone to rid ourselves of the switch statements for
> parsing every field, as was done years ago in ntirpc.  This will
> allow passing more information about the parsing state, so that RDMA
> will (finally) be able to handle direct writes (NFS Reads).
> 
> Champing at the bit
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] RPC queue enqueued/dequeued counter size

2017-05-15 Thread Matt Benjamin
Hi Sachin,

I'm pretty sure no decisions are made based on these values, they were added 
for diagnostics.  I think you're probably right, however.  I'd defer to the 
folks working on refactoring this part of the code as to whether these should 
or will exist in the medium-term future.

Matt

- Original Message -
> From: "Sachin Punadikar" <punadikar.sac...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, May 15, 2017 3:46:45 AM
> Subject: [Nfs-ganesha-devel] RPC queue enqueued/dequeued counter size
> 
> Hi,
> Recently I came across below 2 counters for RPC queue.
> static uint32_t enqueued_reqs;
> static uint32_t dequeued_reqs;
> 
> Shouldn't the counter size be uint64_t ?
> Having size as uint32_t will allow the counters to grow until 4294967295.
> Increasing the size would help production environments.
> 
> --
> with regards,
> Sachin Punadikar
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-03 Thread Matt Benjamin
Hi Guys,

To get on the record here, the current retire strategy using new requests to 
retire old ones is an intrinsic good, particularly with TCP and related 
cots-ord transports where requests are totally ordered.  I don't think moving 
to a strictly time-based strategy is preferable.  Apparently the actually 
observed or theorized issue has to do with not disposing of requests in 
invalidated DRCs?  That seems to be a special case, no?

Matt

- Original Message -
> From: "Malahal Naineni" <mala...@gmail.com>
> To: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> Cc: "Matt Benjamin" <mbenja...@redhat.com>, 
> nfs-ganesha-devel@lists.sourceforge.net
> Sent: Tuesday, May 2, 2017 2:21:48 AM
> Subject: Re: [Nfs-ganesha-devel] drc refcnt
> 
> Sorry, every cacheable request holds a ref on its DRC as well as its
> DUPREQ. The ref on DUPREQ should be released when the request goes away
> (via nfs_dupreq_rele). The ref on DRC will be released when the
> corresponding DUPREQ request gets released. Since we release DUPREQs while
> processing other requests, you are right that the DRC won't be freed if
> there are no more requests that would use the same DRC.
> 
> I think we should be freeing dupreq periodically using a timed function,
> something like that drc_free_expired.
> 
> Regards, Malahal.
> 
> 
> 
> On Tue, May 2, 2017 at 10:38 AM, Satya Prakash GS <g.satyaprak...@gmail.com>
> wrote:
> 
> > > On Tue, May 2, 2017 at 7:58 AM, Malahal Naineni <mala...@gmail.com>
> > wrote:
> > > A dupreq will place a refcount on its DRC when it calls xxx_get_drc, so
> > we
> > > will release that DRC refcount when we free the dupreq.
> >
> > Ok, so every dupreq holds a ref on the drc. In case of drc cache hit,
> > a dupreq entry can ref the
> > drc more than once. This is still fine because unless the dupreq entry
> > ref goes to zero the drc isn't freed.
> >
> > > nfs_dupreq_finish() shouldn't free its own dupreq. When it does free some
> > > other dupreq, we will release DRC refcount corresponding to that dupreq.
> >
> > > When we free all dupreqs that belong to a DRC
> >
> > In the case of a disconnected client when are all the dupreqs freed ?
> >
> > When all the filesystem operations subside from a client (mount point
> > is no longer in use),
> > nfs_dupreq_finish doesn't get called anymore. This is the only place
> > where dupreq entries are removed from
> > the drc. If the entries aren't removed from drc, drc refcnt doesn't go to
> > 0.
> >
> > >, its refcount should go to
> > > zero (maybe another ref is held by the socket itself, so the socket has
> > to
> > > be closed as well).
> > >
> > >
> > > In fact, if we release DRC refcount without freeing the dupreq, that
> > would
> > > be a bug!
> > >
> > > Regards, Malahal.
> > >
> > Thanks,
> > Satya.
> >
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. drc nested locks

2017-05-03 Thread Matt Benjamin
I don't see a complete design proposal here.  Can you clarify the fuller 
picture of what you're proposing to do?

Matt

- Original Message -
> From: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>, "Malahal Naineni" 
> <mala...@gmail.com>,
> nfs-ganesha-devel@lists.sourceforge.net
> Sent: Wednesday, May 3, 2017 4:14:21 PM
> Subject: Re: [Nfs-ganesha-devel] reg. drc nested locks
> 
> Thank you for the quick reply.
> 
> In dupreq_finish, as part of retiring the drc quite a few locks are
> acquired and dropped (per entry). I want to fix a bug where drc retire
> will happen as part of a different function (this will be called from
> free_expired). The existing logic gets carried over to the new
> function and I was thinking that we may not have to acquire and
> release lock so many times.
> 
> Thanks,
> Satya.
> 
> On Thu, May 4, 2017 at 1:21 AM, Matt Benjamin <mbenja...@redhat.com> wrote:
> > Hi Satya,
> >
> > Sorry, my recommendation would be, we do not change locking to be more
> > coarse grained, and in general, should update it in response to an
> > indication that it is incorrect, not to improve readability in the first
> > instance.
> >
> > Regards,
> >
> > Matt
> >
> > - Original Message -
> >> From: "Matt Benjamin" <mbenja...@redhat.com>
> >> To: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> >> Cc: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni"
> >> <mala...@gmail.com>
> >> Sent: Wednesday, May 3, 2017 3:43:06 PM
> >> Subject: Re: [Nfs-ganesha-devel] reg. drc nested locks
> >>
> >> No?
> >>
> >> Matt
> >>
> >> - Original Message -
> >> > From: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> >> > To: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni"
> >> > <mala...@gmail.com>
> >> > Sent: Wednesday, May 3, 2017 3:34:31 PM
> >> > Subject: [Nfs-ganesha-devel] reg. drc nested locks
> >> >
> >> > Hi,
> >> >
> >> > In nfs_dupreq_start and nfs_dupreq_finish when allocating/freeing a
> >> > dupreq_entry we are trying hard to keep both dupreq_q and the rbtree
> >> > in sync acquiring both the partition lock and the drc (t->mtx,
> >> > drc->mtx). This requires dropping and reacquiring locks at certain
> >> > places. Can these nested locks be changed to take locks one after the
> >> > other.
> >> >
> >> > For example at the time of allocation, we could choose to do this -
> >> >
> >> > PTHREAD_MUTEX_lock(>mtx); /* partition lock */
> >> > nv = rbtree_x_cached_lookup(>xt, t, >rbt_k, dk->hk);
> >> > if (!nv) {
> >> > dk->refcnt = 2;
> >> > (void)rbtree_x_cached_insert(>xt, t,
> >> > >rbt_k, dk->hk);
> >> > PTHREAD_MUTEX_unlock(>mtx); /* partition lock */
> >> >
> >> > PTHREAD_MUTEX_lock(>mtx);
> >> > TAILQ_INSERT_TAIL(>dupreq_q, dk, fifo_q);
> >> > ++(drc->size);
> >> > PTHREAD_MUTEX_unlock(>mtx);
> >> > }
> >> >
> >> > I am assuming this would simplify the lock code a lot.
> >> > If there is a case where this would introduce a race please let me know.
> >> >
> >> > Thanks,
> >> > Satya.
> >> >
> >> > --
> >> > Check out the vibrant tech community on one of the world's most
> >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> > ___
> >> > Nfs-ganesha-devel mailing list
> >> > Nfs-ganesha-devel@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> >> >
> >>
> >> --
> >> Matt Benjamin
> >> Red Hat, Inc.
> >> 315 West Huron Street, Suite 140A
> >> Ann Arbor, Michigan 48103
> >>
> >> http://www.redhat.com/en/technologies/storage
> >>
> >> tel.  734-821-5101
> >> fax.  734-769-8938
> >> cel.  734-216-5309
> >>
> >
> > --
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > http://www.redhat.com/en/technologies/storage
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. drc nested locks

2017-05-03 Thread Matt Benjamin
Hi Satya,

Sorry, my recommendation would be, we do not change locking to be more coarse 
grained, and in general, should update it in response to an indication that it 
is incorrect, not to improve readability in the first instance.

Regards,

Matt

- Original Message -
> From: "Matt Benjamin" <mbenja...@redhat.com>
> To: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> Cc: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni" 
> <mala...@gmail.com>
> Sent: Wednesday, May 3, 2017 3:43:06 PM
> Subject: Re: [Nfs-ganesha-devel] reg. drc nested locks
> 
> No?
> 
> Matt
> 
> - Original Message -
> > From: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> > To: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni"
> > <mala...@gmail.com>
> > Sent: Wednesday, May 3, 2017 3:34:31 PM
> > Subject: [Nfs-ganesha-devel] reg. drc nested locks
> > 
> > Hi,
> > 
> > In nfs_dupreq_start and nfs_dupreq_finish when allocating/freeing a
> > dupreq_entry we are trying hard to keep both dupreq_q and the rbtree
> > in sync acquiring both the partition lock and the drc (t->mtx,
> > drc->mtx). This requires dropping and reacquiring locks at certain
> > places. Can these nested locks be changed to take locks one after the
> > other.
> > 
> > For example at the time of allocation, we could choose to do this -
> > 
> > PTHREAD_MUTEX_lock(>mtx); /* partition lock */
> > nv = rbtree_x_cached_lookup(>xt, t, >rbt_k, dk->hk);
> > if (!nv) {
> > dk->refcnt = 2;
> > (void)rbtree_x_cached_insert(>xt, t,
> > >rbt_k, dk->hk);
> > PTHREAD_MUTEX_unlock(>mtx); /* partition lock */
> > 
> > PTHREAD_MUTEX_lock(>mtx);
> > TAILQ_INSERT_TAIL(>dupreq_q, dk, fifo_q);
> > ++(drc->size);
> > PTHREAD_MUTEX_unlock(>mtx);
> > }
> > 
> > I am assuming this would simplify the lock code a lot.
> > If there is a case where this would introduce a race please let me know.
> > 
> > Thanks,
> > Satya.
> > 
> > ------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> > 
> 
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] reg. drc nested locks

2017-05-03 Thread Matt Benjamin
No?

Matt

- Original Message -
> From: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net, "Malahal Naineni" 
> <mala...@gmail.com>
> Sent: Wednesday, May 3, 2017 3:34:31 PM
> Subject: [Nfs-ganesha-devel] reg. drc nested locks
> 
> Hi,
> 
> In nfs_dupreq_start and nfs_dupreq_finish when allocating/freeing a
> dupreq_entry we are trying hard to keep both dupreq_q and the rbtree
> in sync acquiring both the partition lock and the drc (t->mtx,
> drc->mtx). This requires dropping and reacquiring locks at certain
> places. Can these nested locks be changed to take locks one after the
> other.
> 
> For example at the time of allocation, we could choose to do this -
> 
> PTHREAD_MUTEX_lock(>mtx); /* partition lock */
> nv = rbtree_x_cached_lookup(>xt, t, >rbt_k, dk->hk);
> if (!nv) {
> dk->refcnt = 2;
> (void)rbtree_x_cached_insert(>xt, t,
> >rbt_k, dk->hk);
> PTHREAD_MUTEX_unlock(>mtx); /* partition lock */
> 
> PTHREAD_MUTEX_lock(>mtx);
> TAILQ_INSERT_TAIL(>dupreq_q, dk, fifo_q);
> ++(drc->size);
> PTHREAD_MUTEX_unlock(>mtx);
> }
> 
> I am assuming this would simplify the lock code a lot.
> If there is a case where this would introduce a race please let me know.
> 
> Thanks,
> Satya.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc refcnt

2017-05-01 Thread Matt Benjamin
Hi Satya,

I don't -think- that's the case (that DRCs are leaked).  If so, we would 
certainly wish to correct it.  Malahal has most recently updated these code 
paths.

Regards,

Matt

- Original Message -
> From: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, May 1, 2017 11:08:48 AM
> Subject: [Nfs-ganesha-devel] drc refcnt
> 
> Hi,
> 
> DRC refcnt is incremented on every get_drc. However, every
> nfs_dupreq_finish doesn't call a put_drc. How is it ensured that the
> drc refcnt drops to zero. On doing an umount, is drc eventually
> cleaned up.
> 
> Thanks,
> Satya.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] drc and non-cacheable ops

2017-05-01 Thread Matt Benjamin
Hi Satya,

That is expected, yes.  I'm not aware of all possible implications.  The issue 
of compound ops, specifically, is evidently only present in NFSv4.0 (or 4.1, 
the DRC is not used).

Matt

- Original Message -
> From: "Satya Prakash GS" <g.satyaprak...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, May 1, 2017 10:58:11 AM
> Subject: Re: [Nfs-ganesha-devel] drc and non-cacheable ops
> 
> Can somebody please reply to this.
> 
> Thanks,
> Satya.
> 
> On Wed, Apr 26, 2017 at 3:02 PM, Satya Prakash GS
> <g.satyaprak...@gmail.com> wrote:
> > Hi,
> >
> > I have been looking at the drc code, I see operations like READ,
> > READDIR, etc are not cached in drc. Can a compound operations have mix
> > of both cacheable and non-cacheable operations. For example, can
> > client send both SETATTR and READ as part of one compound operation
> > (if concurrent operations are going on). If there is a mix of
> > operations looks like DRC doesn't cache the operation. Is this ok ?
> >
> > Thanks,
> > Satya.
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 2.5rc2 and FSAL_RGW, no bueno

2017-04-24 Thread Matt Benjamin
Hi Frank,

- Original Message -
> From: "Frank Filz" <ffilz...@mindspring.com>
> To: d...@redhat.com, "Dominique Martinet" <dominique.marti...@cea.fr>
> Cc: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, April 24, 2017 12:53:44 PM
> Subject: Re: [Nfs-ganesha-devel] 2.5rc2 and FSAL_RGW, no bueno
> 
> > >> Our proposal is this:
> > >>
> > >> Branch when Ganesha goes into -rc, rather than on .0 release.  This
> > >> allows us to modify the -rc version of Ganesha to build and work



> 
> Typically while we are in rc, stuff that will not go into the release under
> rc is just held until we open dev again, so no double merges.
> 
> So the remaining issue is what is the best way to converge on stable code
> that will build with release versions of other packages.
> 
> What kinds of fixups are we talking here? Can we just revert the fixups when
> we open dev of the next release?
> 

The main issue are changes that adapt FSAL_RGW to use an updated version of our 
librgw interface.  To date, the changes in any give release have been minor, 
usually adding or adjusting the arguments to a well-known operation or callback 
function signature.

Matt

> Frank
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] READDIR readahead

2017-04-03 Thread Matt Benjamin
inline

- Original Message -
> From: "Frank Filz" <ffilz...@mindspring.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, April 3, 2017 6:56:45 PM
> Subject: [Nfs-ganesha-devel] READDIR readahead
> 

Not a bug, really, I had not actually implemented the stop conditions, pending 
further changes.  Need to plumb them back into librgw, too, I'm pretty 
sure--thanks for updating here, though.

Thanks!

Matt

> Along the way, I also realized there was a bug in FSAL_RGW...
> 

> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] READDIR readahead

2017-04-03 Thread Matt Benjamin
Thanks, Frank, will explore w/RGW.

Matt

- Original Message -
> From: "Frank Filz" <ffilz...@mindspring.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, April 3, 2017 6:56:45 PM
> Subject: [Nfs-ganesha-devel] READDIR readahead
> 
> Matt had asked for an ability for readdir chunking to be able to absorb more
> than one chunks worth of entries if it made sense for the FSAL to read a
> larger number of entries in one go.
> 
> It turns out that it was very simple to support this.
> 
> And for gravy, there's an example of FSAL_VFS which increases the getdents
> buffer size and supports readahead.
> 
> Along the way, I also realized there was a bug in FSAL_RGW...
> 
> It may be worth asking if we should make the FSAL_VFS readdir buffer
> resizable and allow enabling readahead.
> 
> You can find it in my readahead branch:
> 
> https://github.com/ffilz/nfs-ganesha/commits/readahead
> 
> Or review on gerrithub:
> 
> https://review.gerrithub.io/#/c/32/
> https://review.gerrithub.io/#/c/355563/
> https://review.gerrithub.io/#/c/355564/
> https://review.gerrithub.io/#/c/355565/
> 
> You can also pull the whole branch from gerrithub with this command (which
> will create and checkout a branch called readahead in your repo):
> 
> git fetch https://ff...@review.gerrithub.io/a/ffilz/nfs-ganesha
> refs/changes/65/355565/1 && git checkout FETCH_HEAD -b readahead
> 
> This last is taken from the download menu in the final patch, you chose the
> checkout option, cut and paste that. I always add the -b {branchname} to the
> end of it to actually checkout a branch. This can be used to pull down
> anyone's patch set even if they don't have a github repo (or you don't know
> what it is, or they didn't push to it). I use this all the time for doing
> weekly merges if someone has a patchset with more than 2 or 3 patches.
> 
> Thanks
> 
> Frank
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] please all FSAL maintainers/experts respond - READIR and directory chunking and dirent insert

2017-03-23 Thread Matt Benjamin
Hi Frank,

I think you've well represented the RGW situation, with the caveat:  as I 
mentioned in IRc, although my mental habit is to think of the cookie associated 
w/an entry as "it's address" (because it's the case, the cookie is in fact a 
hash computed from the name of the object), I'm no longer clear on the 
distinction between that address and the address of some "next" thing...

The reason is, independent of the perfect cookie stability, what RGW actually 
does when asked for the next entries in the sequence at "foo" is return the 
next names in the alphabetic key sequence AFTER "foo" (that occur in this 
directory path)--so, for RGW, the cookie for "foo" in fact does refer to the 
next entry.

Conceptual help appreciated :)

Matt

- Original Message -
> From: "Frank Filz" <ffilz...@mindspring.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Tuesday, March 21, 2017 8:09:34 PM
> Subject: [Nfs-ganesha-devel] please all FSAL maintainers/experts respond -
> READIR and directory chunking and dirent
> insert
> 

> For FSAL_RGW, we would like the cookie to be the "address" of the entry
> rather than the next entry
> Which allows us to compute the cookie for an inserted dirent (from lookup,
> create, link, or rename)
> 
> For continuing to read a directory having read some number of chunks in, we
> would like to use a whence that will find the next directory entry after the
> last one in the previous chunk.
> 
> Now here is one problem, for FSAL_VFS if we use the d_off as the cookie,
> that is actually the "address" of the next entry AT THAT TIME. That means
> that if we do a lseek to the last cookie in a chunk, we may NOT find the
> actual next entry. There may also be an issue due to . and .. sorting
> somewhere in the middle of the directory (at least on my ext4 filesystem,
> the "address" of . is always 0x4c470ee8300a65ab (which means that will be
> the d_off for whichever entry precedes .) and .. is always
> 0x68ec4bc2e1982399.
> 
> If we aren't trying to insert dirents, that may be ok. If so, we can
> probably live with RGW cookies being the address of the entry while VFS
> cookie are the address of the current next entry, and so long as those FSALs
> which return cookie as the address of the entry, do indeed provide the NEXT
> entry when we provide that cookie as whence on readdir, everything should
> work.
> 
> But I'm also trying to test the dirent insert using FSAL_VFS, and it isn't
> working...
> 
> The problem is an insert that becomes the new first directory entry, or an
> insert that slips in just before the . or .. entries.
> 
> In order to make a workable ability to insert dirents, FSAL_VFS readdir
> COULD return the previous cookie as the cookie for an entry. In that case,
> after doing an lseek, it would just have to skip the first entry. For ext4
> it MIGHT work to actually lseek to whence+1...
> 
> FSAL_VFS compute_readdir_cookie would of course just return the d_off from
> the entry prior to finding the named entry.
> 
> Then one problem remains for FSAL_VFS. We can't get the actual "address" of
> the very first dirent. This could be handled by the following mechanism:
> 
> If we insert a new dirent, and compute_readdir_cookie returns 0 for it, we
> must then call compute_readdir_cookie on the previous first entry (which
> will return it's actual address now that it no longer is the first entry in
> the directory), and move it in the AVL tree so we can now insert the new 0.
> 
> It would really help to understand how Gluster and Ceph readdir with a
> non-zero whence actually works, how do your cookies work?
> 
> How do you feel about chunking possibly missing new entries in a directory
> really is. Note that if we decide our current attributes are invalid,
> refresh them, and detect mtime changes, then we will flush the dirents, so
> this MAY not be that much of an issue. On the other hand, it also means that
> even if we dump the dirent cache, a client that doesn't give up, and sends a
> non-zero whence may miss entries that folks feel it should have found.
> 
> Thanks
> 
> Frank
> 

> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-510

Re: [Nfs-ganesha-devel] UDP duplicate cache in both Ganesha and ntirpc?

2017-03-10 Thread Matt Benjamin
Hi,

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>
> Cc: d...@redhat.com, nfs-ganesha-devel@lists.sourceforge.net
> Sent: Friday, March 10, 2017 2:21:41 PM
> Subject: Re: [Nfs-ganesha-devel] UDP duplicate cache in both Ganesha and 
> ntirpc?
> 
> On 3/9/17 1:44 PM, Matt Benjamin wrote:
> > But, isn't su_cache...NULL?
> >
> Aha, I see that you are correct.  It is only set non-NULL in
> svc_dg_enablecache(), and that's never called.   Anywhere.
> 
> So we have this useless facility that I (and Malahal) have been
> trying to keep up-to-date with changes, and I've recently fixed
> the XXX !MT-SAFE (e89139b), and that's all for nought.
> 
> We don't cache TCP.  We don't cache RDMA.
> 
> This code is an anachronism, and needs to be purged with extreme
> prejudice.  It was badly written, and it's a shame to keep fixing.

Well, it is what it is.  Pretty durn old, and never used in nfs-ganesha, I 
don't believe.

> 
> If we get rid of it, we can use _ioq for output, and get rid of
> the extra locks.  Don't know how important to speed up UDP, but
> we could

This seems potentially a useful improvement, I wold say, so that provides some 
positive motivation for gc'ing the legacy cache stuff, I guess.

Matt

> 
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] dispatch queues

2017-03-09 Thread Matt Benjamin
Hi Bill,

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Thursday, March 9, 2017 2:44:29 AM
> Subject: Re: [Nfs-ganesha-devel] dispatch queues
> 
> On 3/8/17 5:34 AM, William Allen Simpson wrote:
> > Ganesha currently has 2 phases of dispatch queuing: one for input and
> > decoding, then another for executing/encoding output.  (I've fixed the
> > third queue for later sending the output, where the thread should stay
> > hot as long as there's more to process.)
> >
> > On Monday, Matt told me we were having problems with sequential IO.
> > Wishing that somebody had mentioned this to me sooner.
> >
> > [...]
> >
> After a somewhat loud discussion with Matt, we've agreed on a
> different approach.  This will also be useful for fully async IO
> that is planned for V2.6.

Um, I don't think this statement represents either of the two internal meetings 
we had accurately, but ok.

> 
> The sequential IO reports were from specific .

For posterity, the feedback I've seen regarding sequential i/o was provided by 
upstream folks on our regular concall. nobody uses the term "customer" on this 
list, for obvious reasons.

> 
> I'm going to code something more like Weighted Fair Queuing that
> I've mentioned on this list back in June 2015.  The only weight is
> that we want any initial TCP connection to be handled as rapidly as
> possible to get the standard TCP slow start moving.
> 
> Are there other priorities that we should handle?
> 
> Otherwise, we really need a more even handed approach across large
> numbers of clients, keeping each client's requests in strict order,
> even though some of them could be "faster" than others.  The fair
> queuing should also help prevent traffic spikes.
> 
> I think I can have something coded by next week.  I'd already done
> some preliminary work in 2015.  But the time constraint means this
> will be pretty bare bones for V2.5.

Well, as I think we've all agreed, nothing like this is going into 2.5.  
Anything that DOES make it to the nfs-ganesha upstream is going to need to be 
well motivated, well measured, and matured.

> 
> To really do a good job, we need some kind of FSAL feedback API.
> I'm going to ask the Gluster folks for some help on designing it, so
> that we have a good use case and testing infrastructure.  But we'll
> post the design iterations here in the same fashion as an IETF
> Working Group, so that maybe we can get other FSAL feedback, too.
> 
> Is anybody specifically interested in helping design the API?

As the proposer of this idea, I'm interested in seeing experimental prototypes 
that help us establish and refine something that works.  Let's post running 
code, and then write specs.

That said, upstream participation is welcome.

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] UDP duplicate cache in both Ganesha and ntirpc?

2017-03-09 Thread Matt Benjamin
Hi Bill,

As I explained in detail, this code path is used in svc_run, but nfs-ganesha 
doesn't use it.

Matt

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Thursday, March 9, 2017 9:21:38 AM
> Subject: [Nfs-ganesha-devel] UDP duplicate cache in both Ganesha and ntirpc?
> 
> Anybody have any objections to my removing the ntirpc version?
> 
> Clearly, this is done in RPCAL/nfs_duprec.c, so why is it also in
> libntirpc/src/svc_dg.c?
> 
> According to blame, Matt, Malahal, and Frank have all worked on this,
> but not since early 2015.
> 
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] pNFS with CephFS and RGW

2017-03-02 Thread Matt Benjamin
Hi Supriti,

Neither FSAL currently supports pNFS.  The Ceph fsal has vestigial bits of a 
pNFS files layout that aren't currently supported.

Matt

- Original Message -
> From: "Supriti Singh" <supriti.si...@suse.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Thursday, March 2, 2017 9:06:27 AM
> Subject: [Nfs-ganesha-devel] pNFS with CephFS and RGW
> 
> Hi,
> 
> Is it possible to use pNFS protocol for CephFS and RGW FSAL?
> If yes, how to specify it in the config file?
> 
> I could not find any documentation regarding the same.
> 
> Thanks,
> Supriti
> 
> --
> Supriti Singh
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Work in Progress Readdir Chunking Posted

2017-02-23 Thread Matt Benjamin
Hi,

- Original Message -

> 
> Another thought is to look at the nlinks in the directory and decide to
> cache or chunk.

Please do not (invariantly) assume that nlinks is accurate until readdir() :)

Matt

> 
> Thanks
> 
> Frank
> 
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Dirent invalidation on up call

2017-02-15 Thread Matt Benjamin
er, oh noes...

Matt

- Original Message -
> From: "Frank Filz" <ffilz...@mindspring.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Wednesday, February 15, 2017 7:43:40 PM
> Subject: [Nfs-ganesha-devel] Dirent invalidation on up call
> 
> I'm looking at when dirents are invalidated, and it looks like
> MDCACHE_DIR_POPULATED is cleared on up call, but nothing actually cleans out
> the dirents...
> 
> Is this an omission?
> 
> Frank
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] segv in mdc_up_invalidate (synchronous upcall)

2017-01-23 Thread Matt Benjamin
Ok.  Will try that out.

Matt

- Original Message -
> From: "Daniel Gryniewicz" <d...@redhat.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>, "NFS Ganesha Developers" 
> <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Monday, January 23, 2017 9:01:52 AM
> Subject: Re: segv in mdc_up_invalidate (synchronous upcall)
> 
> You're supposed to pass the up_export that was in the fsal_up_vector
> that was passed in during the create_export() call.  So in RGW's case,
> it should be export.up_ops->up_export that is passed.
> 
> Daniel
> 
> On 01/20/2017 11:06 PM, Matt Benjamin wrote:
> > Responding to myself, in part:
> >
> > Looks like fsal_export.super_export "works" but presumes there is one, or
> > at least would if I can safely decide whether to pass super_export if
> > present?  Or something.
> >
> > Matt
> >
> > - Original Message -
> >> From: "Matt Benjamin" <mbenja...@redhat.com>
> >> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> >> Cc: "Daniel Gryniewicz" <d...@redhat.com>
> >> Sent: Friday, January 20, 2017 10:41:50 PM
> >> Subject: segv in mdc_up_invalidate (synchronous upcall)
> >>
> >>
> >> try-expire ev:
> >> 

Re: [Nfs-ganesha-devel] segv in mdc_up_invalidate (synchronous upcall)

2017-01-20 Thread Matt Benjamin
Responding to myself, in part:

Looks like fsal_export.super_export "works" but presumes there is one, or at 
least would if I can safely decide whether to pass super_export if present?  Or 
something.

Matt

- Original Message -
> From: "Matt Benjamin" <mbenja...@redhat.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>
> Cc: "Daniel Gryniewicz" <d...@redhat.com>
> Sent: Friday, January 20, 2017 10:41:50 PM
> Subject: segv in mdc_up_invalidate (synchronous upcall)
> 
> 
> try-expire ev:
> 

[Nfs-ganesha-devel] segv in mdc_up_invalidate (synchronous upcall)

2017-01-20 Thread Matt Benjamin

try-expire ev: 

Re: [Nfs-ganesha-devel] nTI-RPC refcounting and locking

2017-01-16 Thread Matt Benjamin
It sounds like your change is a step in the direction of unifying CLNT and 
SVCXPRT handle structures.  As we've discussed off-list, if you take on the 
project of unifying the actual handle structures, you get the lock 
consolidation for free.  In any event, if rpc_dplx_rec contains a service 
handle expanded, it appears to need a client handle as well.

Matt

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "NFS Ganesha Developers" <nfs-ganesha-devel@lists.sourceforge.net>, "Swen 
> Schillig" <s...@vnet.ibm.com>
> Sent: Monday, January 16, 2017 1:53:24 PM
> Subject: [Nfs-ganesha-devel] nTI-RPC refcounting and locking
> 
> Swen, I've been looking at your patch, and it has some good ideas.
> For some odd reason, I awoke at 1:30 am thinking about it, and
> got up and wrote some code.
> 
> I've taken another patch of mine, and added the SVCXPRT into the
> rpc_dplx_rec, eliminating the refcnt entirely (using the SVCXPRT).
> 
> After all, there's no reason to zalloc them separately.  They
> always are created at the same time.
> 
> So I'm wondering about your thoughts on the locking.  They seem
> redundant.  I'm thinking about changing REC_LOCK to use the
> SVCXPRT xp_lock, instead.
> 
> There's a spot in the existing rpc_dplx_rec creation code where
> there's a timing hole in the code after an existing one is
> found so the extra refcount is decremented.  Another process
> could also decrement and free, and there could be a pointer into
> freed memory.  Unifying the lock would be one solution (better
> and faster than the usual solution with two locks).
> 
> The SVCXPRT lock code has a lot more debugging and testing, too.
> 
> Any other related ideas?
> 
> BTW, I got rid of the , too.  Changed it to a callback
> function ;)
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Readdir results

2017-01-11 Thread Matt Benjamin
Frank's result based on testing of fsal vfs can't be generalized to 
configurations which actually have latency, so, no?

Matt

- Original Message -
> From: "William Allen Simpson" <william.allen.simp...@gmail.com>
> To: "Frank Filz" <ffilz...@mindspring.com>, 
> nfs-ganesha-devel@lists.sourceforge.net
> Sent: Wednesday, January 11, 2017 5:19:46 AM
> Subject: Re: [Nfs-ganesha-devel] Readdir results
> 
> On 1/10/17 4:29 PM, Frank Filz wrote:
> > It looks like dirent caching does buy us something, though interestingly
> > the
> > initial populating tends to be the quickest run...
> >
> If the initial populating is the quickest run, there must be some
> interaction between the cache and the re-fetch that's causing thrashing.
> 
> I make no pretense of understanding this code.  But my gut feeling at this
> point is to toss this caching entirely.  Faster stores and faster networks
> mean fetching the entries should be fairly low latency.  Need to optimize
> moving the data through the system, with less examining it.
> 
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] nfs over rgw - new bucket not been added to mdcache

2017-01-03 Thread Matt Benjamin
Hi Tao,

inline

- Original Message -
> From: "Tao CHEN" <sebastien.che...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, January 2, 2017 11:08:11 PM
> Subject: [Nfs-ganesha-devel] nfs over rgw - new bucket not been added to  
> mdcache
> 
> 
> 
> Hi all,
> I'm using nfs-ganesha v2.4 to build a nfs over rgw with Ceph v11.0.2. I did
> some experiments and I faced something that I can't understand:
> 
> 1. I tried to create a new bucket in ceph cluster with s3cmd and the new
> bucket had been successfully created. However, I couldn't see the new bucket
> in the mount point, but I can still access to this bucket in the mount
> point.

There was a bug with this behavior in early Jewel, but it should be fixed, so
I wouldn't expect this, I'll try to verify on master.

> 
> 2. I mounted 2 points(nfs-client A and nfs-client B). When I created a new
> bucket in nfs-client A, the new bucket showed up in the nfs-client B with a
> short delay. And the new bucket had been successfully created in the
> backend(ceph cluster) too.

good...nb, we're not doing rgw-side invalidates of ganesha yet, which I think
is the root cause of the different issues you're seeing here.

The design goals for rgw + nfs do date are to have a more relaxed namespace
consistency than traditional nfs, but clearly we want to be "eventually
consistent." :)

You can potentially address this issue in the short run by shortening ganesha's
cache expiration time.

> 
> I check the log file and the source code, and I found that, when we created a
> new bucket in ceph cluster, nfs-ganesha sync FSAL but not MDCache. So in the
> mount point we cannot see the new bucket. However, if we create a new bucket
> in nfs-client, nfs-ganesha will sync both FSAL and MDCache.
> 
> It doesn't seem like a bug, can somebody tell me why nfs-ganesha work so?
> Would it cost too much if we sync both FSAL and MDCache if we update ceph
> cluster?
> 
> Thanks for your attention! --
> 
> 
> Tao CHEN
> 

Regards,

Matt

> 
> 
> 
> Élève ingénieur Système Réseaux et Télécommunications
> 
> 
> Université de Technologie de Troyes(UTT)
> 
> 
> Email: sebastien.che...@gmail.com
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Kerberos: Not working because response uses cached krb service from authgss_hash

2016-12-19 Thread Matt Benjamin
Hi Sriram,

Please send your change as a pull request against 
https://github.com/nfs-ganesha/ntirpc.  We need to take some care to ensure 
that we properly enforce service and QOP guarantees.  My understanding would 
have been that any request "still being processed" has been validated and 
unwrapped.  If that's the case, then I do suspect that any further use of the 
request version of the service value is valid.

Matt

- Original Message -
> From: "sriram patil" <spsrirampa...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, December 19, 2016 2:23:36 AM
> Subject: [Nfs-ganesha-devel] Kerberos: Not working because response uses  
> cached krb service from authgss_hash
> 
> 
> 
> Hi,
> 
> When handling kerberos requests ganesha fetches the cached svc_rpc_gss_data
> from authgss_hash. If the kerberos service (authentication, integrity or
> privacy) do not match with the one parsed from the request, ganesha changes
> the service value in the cache. And continues to use the cached object for
> all the further verification and when sending response to the client. Note
> that there is no local copy of the gss data in the request, it uses the
> cached object.
> 
> 
> 
> 
> Code snippet which does the above mentioned lookup:
> 
> 
> file: src/libntirpc/src/svc_auth_gss.c function: _svcauth_gss
> 
> ` /* Context lookup. */
> 
> if ((gc->gc_proc == RPCSEC_GSS_DATA)
> 
> || (gc->gc_proc == RPCSEC_GSS_DESTROY)) {
> 
> /* XXX fix prototype, toss junk args */
> 
> gd = authgss_ctx_hash_get(gc);
> 
> if (!gd)
> 
> svcauth_gss_return(AUTH_REJECTEDCRED);
> 
> gd_hashed = true;
> 
> if (gc->gc_svc != gd->sec.svc)
> 
> gd->sec.svc = gc->gc_svc;
> 
> }`
> 
> 
> 
> 
> Now let’s assume that the cached gss service is set to privacy (3). Before
> the ongoing request can proceed, a new request comes in with OP_RENEW and
> gss service set to integrity (2). As specified in the above snippet, this
> will change the gss service value in the cache to integrity. This will
> affect all the requests which are still being processed and may respond to
> client with an incorrect gss service. Because of this the nfs client is
> unable to interpret the response and fails with EIO. I am using linux nfs
> client so it fails in method gss_unwrap_resp.
> 
> I am continuously hitting this issue in case of server restarts when mounted
> on the client with kerberos privacy. Is there any reason why we use the gss
> service from the cache, though we have a local copy parsed from the actual
> request stored in (rq_clntcred).
> 
> 
> I have tried a fix to always use the gss service from the request
> (rq_clntcred). This is working as expected and no errors on the client side.
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> Sriram
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] [ntirpc] closed PRs

2016-12-12 Thread Matt Benjamin
More to the point, Swen, are you able to re-open any relevant PRs?

Thanks,

Matt

- Original Message -
> From: "Matt Benjamin" <mbenja...@redhat.com>
> To: "Daniel Gryniewicz" <d...@redhat.com>
> Cc: "Swen Schillig" <s...@vnet.ibm.com>, "NFS Ganesha Developers" 
> <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Monday, December 12, 2016 9:47:53 AM
> Subject: Re: [ntirpc] closed PRs
> 
> I did not intentionally close PRs.
> 
> Matt
> 
> - Original Message -
> > From: "Daniel Gryniewicz" <d...@redhat.com>
> > To: "Swen Schillig" <s...@vnet.ibm.com>
> > Cc: "Matt Benjamin" <mbenja...@redhat.com>, "NFS Ganesha Developers"
> > <nfs-ganesha-devel@lists.sourceforge.net>
> > Sent: Monday, December 12, 2016 9:25:59 AM
> > Subject: Re: [ntirpc] closed PRs
> > 
> > Maybe they were auto-closed when Bill re-organized the branches?
> > 
> > Daniel
> > 
> > On Mon, Dec 12, 2016 at 5:00 AM, Swen Schillig <s...@vnet.ibm.com> wrote:
> > > Dan, Matt
> > >
> > > Up until recently we had a bunch of PRs for NTIRPC
> > > being tested/investigated.
> > >
> > > A few weeks ago they were all closed without being merged or at least
> > > commented why they were closed.
> > >
> > > Could you please comment on this and maybe provide some info what your
> > > plans are for those PRs.
> > >
> > > Cheers Swen.
> > >
> > 
> 
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] [ntirpc] closed PRs

2016-12-12 Thread Matt Benjamin
I did not intentionally close PRs.

Matt

- Original Message -
> From: "Daniel Gryniewicz" <d...@redhat.com>
> To: "Swen Schillig" <s...@vnet.ibm.com>
> Cc: "Matt Benjamin" <mbenja...@redhat.com>, "NFS Ganesha Developers" 
> <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Monday, December 12, 2016 9:25:59 AM
> Subject: Re: [ntirpc] closed PRs
> 
> Maybe they were auto-closed when Bill re-organized the branches?
> 
> Daniel
> 
> On Mon, Dec 12, 2016 at 5:00 AM, Swen Schillig <s...@vnet.ibm.com> wrote:
> > Dan, Matt
> >
> > Up until recently we had a bunch of PRs for NTIRPC
> > being tested/investigated.
> >
> > A few weeks ago they were all closed without being merged or at least
> > commented why they were closed.
> >
> > Could you please comment on this and maybe provide some info what your
> > plans are for those PRs.
> >
> > Cheers Swen.
> >
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] gerrithub requesting additional github permissions (all of them)

2016-12-06 Thread Matt Benjamin
Hi Folks,

When I attempted my github signin to gerrithub just now, I was sent to a screen 
requesting full access to my personal github info, and read/write access to all 
my public repositories, not just nfs-ganesha.

I don't seem to be able to sign in without agreeing to these conditions, and 
I'm not certain doing so would be wise.

Am I missing something here?

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Help!!! nfs ganesha can support rgw ACL?

2016-11-16 Thread Matt Benjamin
Hi,

Like Dan said, we don't implmenent NFSv4 ACL interfaces in the RGW FSAL.

Underneath nfs-ganesha, librgw is passing the S3 credentials used at mount time 
down to RGW and all operations use them for authorization, so the underlying 
permissions are those of that user.

Matt

- Original Message -
> From: "yiming xie" <plato...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Wednesday, November 16, 2016 8:41:01 AM
> Subject: Re: [Nfs-ganesha-devel]  Help!!!  nfs ganesha can support rgw 
> ACL?
> 
> nfs ganesha can support rgw ACL?
> If nfs support rgw ACL,how to use it?
> Thanks!
> --
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] gerrithub new required permissions

2016-10-24 Thread Matt Benjamin
Hi All,

When I tried to sign into gerrit today, I got the following auth request from 
github:


Authorize application
GerritHub by @gerritforge-ltd would like to request additional permissions to 
access your account
@mattbenjamin

Specifically:

 Added permissions
Personal user data
Full access

This application will be able to read and write all user data. This includes 
the following:

Private email addresses
Profile information
Followers

I don't think this is acceptable.  Am I doing something wrong?

Matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] [ntirpc] refcount issue ?

2016-10-10 Thread Matt Benjamin
Hi Swen,

Are you going to re-push your recent closed PR as 2 new PRs as planned?

Dan has made some time this week to review and test, so if you have time, it 
will be easier to get to merge.

Cheers,

Matt

- Original Message -
> From: "Matt Benjamin" <mbenja...@redhat.com>
> To: "Swen Schillig" <s...@vnet.ibm.com>
> Cc: "Daniel Gryniewicz" <d...@redhat.com>, "nfs-ganesha-devel" 
> <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Tuesday, September 6, 2016 4:16:39 PM
> Subject: Re: [ntirpc] refcount issue ?
> 
> Hi,
> 
> inline
> 
> - Original Message -
> > From: "Swen Schillig" <s...@vnet.ibm.com>
> > To: "Matt Benjamin" <mbenja...@redhat.com>, "Daniel Gryniewicz"
> > <d...@redhat.com>
> > Cc: "nfs-ganesha-devel" <nfs-ganesha-devel@lists.sourceforge.net>
> > Sent: Tuesday, September 6, 2016 4:10:32 PM
> > Subject: [ntirpc] refcount issue ?
> > 
> > Matt, Dan.
> > 
> > Could you please have a look at the following code areas and verify
> > what I think is a refcount issue.
> > 
> > clnt_vc_ncreate2()
> > {
> > ...
> > if ((oflags & RPC_DPLX_LKP_OFLAG_ALLOC) || (!rec->hdl.xd)) {
> > xd = rec->hdl.xd = alloc_x_vc_data();
> > ...
> > } else {
> > xd = rec->hdl.xd;
> > ++(xd->refcnt);     <=== this is not right. we're not taking an 
> > addtl'
> > ref
> 
> Aren't we?  We now share a ref to previously allocated rec->hdl.xd.
> 
> > here.
> > }
> > ...
> > clnt->cl_p1 = xd;           <=== but here we should increment the
> > refocunt.
> > }
> > 
> > another code section with the same handling.
> > 
> > makefd_xprt()
> > {
> > ...
> > if ((oflags & RPC_DPLX_LKP_OFLAG_ALLOC) || (!rec->hdl.xd)) {
> > newxd = true;
> > xd = rec->hdl.xd = alloc_x_vc_data();
> > ...
> > } else {
> > xd = (struct x_vc_data *)rec->hdl.xd;
> > /* dont return destroyed xprts */
> > if (!(xd->flags & X_VC_DATA_FLAG_SVC_DESTROYED)) {
> > if (rec->hdl.xprt) {
> > xprt = rec->hdl.xprt;
> > /* inc xprt refcnt */
> > SVC_REF(xprt, SVC_REF_FLAG_NONE);
> > } else
> > ++(xd->refcnt);    < not right, no addtl' 
> > ref to xd taken.
> 
> so this looks more likely to be incorrect, need to review
> 
> > }
> > /* return extra ref */
> > rpc_dplx_unref(rec,
> >    RPC_DPLX_FLAG_LOCKED | RPC_DPLX_FLAG_UNLOCK);
> > *allocated = FALSE;
> > 
> > /* return ref'd xprt */
> > goto done_xprt;
> > }
> > ...
> > xprt->xp_p1 = xd;    < but here we should increment the refcount
> > 
> > ...
> > }
> > 
> > Both areas handle the refcount'ing wrong, but it might balance out
> > sometimes.
> > 
> > 
> > What do you think ?
> > 
> > Cheers Swen
> > 
> > 
> 
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-707-0660
> fax.  734-769-8938
> cel.  734-216-5309
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Help!!! nfs client can't see any buckets when client mount on nfs-ganesha rgw

2016-09-29 Thread Matt Benjamin
Hi Yiming,

I -did- reply to a question basically similar to this on nfs-ganesha-devel, i 
believe, as "m...@cohortfs.com"?  I tried, anyway, maybe there's an issue with 
that list subscription.  Sorry, if so.

Anyway, what I wrote was 1) I don't see anything wrong, offhand, but 2) the 
issue is most likely related to permissions.

If "mount" succeeds, then that should mean that the access_key and secret_key 
you provided in the FSAL block are valid.  It doesn't necessarily mean you 
would see anything, though, if the existing buckets and objects were created by 
different user(s).  That is, you should see buckets created by the NFS user 
you're mounting with for sure, and maybe others, depending on ACLs, iirc.

My next question is, can you create new directories (buckets!) in /home/ceph 
using NFS?  If you can, you should be able to see them over S3 using, again, 
the same S3 user credentials.

Regards,

Matt

- Original Message -
> From: "yiming xie" <plato...@gmail.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>
> Sent: Wednesday, September 28, 2016 10:56:21 PM
> Subject: Help!!!  nfs client can't see any buckets when client mount on 
> nfs-ganesha rgw
> 
> I have sent this question to ganesha-devel list,but no any replies.
> So I can only ask for help to you.  I'm sorry to bother you.
> 
> env :centos7, nfs-ganesha 2.3,  jewel
> nfs server :192.168.77.61
> 1.cmake -DUSE_FSAL_RGW=ON  -DRGW_LIBRARY=/usr/lib64 ../src/ && make && make
> install
> 
> 2.vi /etc/ganesha/ganesha.conf
> EXPORT
> {
>   Export_ID=1;
> 
>   Path = "/";
> 
>   Pseudo = "/";
> 
>   Access_Type = RW;
> 
>   NFS_Protocols = 4;
> 
>   Transport_Protocols = TCP;
> 
>   FSAL {
>   Name = RGW;
>   User_Id = "testuid";
>   Access_Key_Id ="N6WENRWBZJWZ9ARS1UDD";
>   Secret_Access_Key = "testsecret";
>   }
> }
> 
> RGW {
> ceph_conf = "/etc/ceph/ceph.conf";
> }
> 
> 3. cp nfs-ganesha/src/scripts/systemd/*  /usr/lib/systemd/system/
> 
> 4. systemctl start nfs-ganesha.service
>systemctl status nfs-ganesha
>   nfs-ganesha.service - NFS-Ganesha file server
>Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled;
>vendor preset: disabled)
>Active: active (running) since Wed 2016-09-28 14:02:02 CST; 4s ago
> 
> 
> 5. client host:
> s3cmd ls
> 2016-09-22 10:29  s3://foo1209_bucket 
> 2016-09-28 02:31  s3://nike_bucket 
> 2016-08-10 14:07  s3://test_bucket 
> 
> sudo mount -t nfs 192.168.77.61:/  /home/cep/xx
>   ls /home/cep/xx
>  xx is empty. Can not see any buckets name.
> 
> Which steps may be wrong? Wait your reply,thanks。

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Help!!! nfs-ganesha FSAL_RGW cmake error.

2016-09-26 Thread Matt Benjamin
Hi,

The solution is to build against a more recent Ceph build.  The required API 
version is scheduled for backport to Ceph Jewel, so eventually, that is the 
highest version you'll need to use the RGW FSAL with nfs-ganesha 2.4.  A quick 
fix, though, is just to clone and build a more recent Ceph baseline (e.g., 
current master).

Regards,

Matt

- Original Message -
> From: "yiming xie" <plato...@gmail.com>
> To: nfs-ganesha-devel@lists.sourceforge.net
> Sent: Monday, September 26, 2016 10:40:55 PM
> Subject: [Nfs-ganesha-devel] Help!!!  nfs-ganesha FSAL_RGW cmake error.
> 
> I want to run nfs-ganesha base on Ceph RGW.
> I have a error when install nfs-ganesha.
> How to solve this problem?
> 
> cmake -DUSE_FSAL_RGW=ON -DRGW_LIBRARY=/usr/lib64 ../src/
> 
> -- Looking for rgw_mount in rgw
> -- Looking for rgw_mount in rgw - found
> -- nike:rgw lib= /usr/lib64 rgwlib=1
> -- Found rgw libraries: /usr/lib64
> -- fie head=/usr/include/rados/rgw_file.h include dir=/usr
> -- Could NOT find RGW: Found unsuitable version ".", but required is at least
> "1.1" (found /usr)
> CMake Warning at CMakeLists.txt:571 (message):
> Cannot find supported RGW runtime. Disabling RGW fsal build
> 
> install env:
> centos7, librgw:ibrgw2.x86_64, librgw2-devel.x86_64
> ceph cluster version: jewel:10.2.2
> 
> nfs-ganesha version: 2.3 (stable) and 2.4 have same problem.
> 
> 
> I change
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] ntirpc-1.4.0 GA sooner, rather than later.

2016-09-20 Thread Matt Benjamin
they are the same now because I moved the tag;  I can try to restore it so 
there is a difference, but it would be unfortunate

- Original Message -
> From: "Kaleb S. KEITHLEY" <kkeit...@redhat.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>, 
> nfs-ganesha-devel@lists.sourceforge.net
> Sent: Tuesday, September 20, 2016 10:21:37 AM
> Subject: Re: ntirpc-1.4.0 GA sooner, rather than later.
> 
> On 09/20/2016 10:06 AM, Matt Benjamin wrote:
> > Sorry.  I had moved the 1.4.0 tag, and then decided I'd better create a new
> > one.
> >
> 
> Well, _release_ tags should never be moved.
> 
> But is there supposed to be a difference between 1.4.0 and 1.4.1,
> because there isn't, as far as I can tell. At least a `diff -r` of the
> two tar files (untarred) doesn't show any diff.
> 
> 
> > Matt
> >
> > - Original Message -
> >> From: "Kaleb S. KEITHLEY" <kkeit...@redhat.com>
> >> To: "Matt Benjamin" <mbenja...@redhat.com>
> >> Sent: Tuesday, September 20, 2016 7:53:12 AM
> >> Subject: Re: ntirpc-1.4.0 GA sooner, rather than later.
> >>
> >> On 09/19/2016 05:43 PM, Matt Benjamin wrote:
> >>> had to push a 1.4.1
> >>>
> >>
> >> Why? There's no diff between the two trees?
> >>
> >> Curiosity is killing the cat. ;-)
> >>
> >> --
> >>
> >> Kaleb
> >>
> >>
> >>
> >
> 
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] [ntirpc] refcount issue ?

2016-09-06 Thread Matt Benjamin
Hi,

inline

- Original Message -
> From: "Swen Schillig" <s...@vnet.ibm.com>
> To: "Matt Benjamin" <mbenja...@redhat.com>, "Daniel Gryniewicz" 
> <d...@redhat.com>
> Cc: "nfs-ganesha-devel" <nfs-ganesha-devel@lists.sourceforge.net>
> Sent: Tuesday, September 6, 2016 4:10:32 PM
> Subject: [ntirpc] refcount issue ?
> 
> Matt, Dan.
> 
> Could you please have a look at the following code areas and verify
> what I think is a refcount issue.
> 
> clnt_vc_ncreate2()
> {
> ...
>   if ((oflags & RPC_DPLX_LKP_OFLAG_ALLOC) || (!rec->hdl.xd)) {
>   xd = rec->hdl.xd = alloc_x_vc_data();
>   ...
>   } else {
>   xd = rec->hdl.xd;
>   ++(xd->refcnt);     <=== this is not right. we're not taking an 
> addtl' ref

Aren't we?  We now share a ref to previously allocated rec->hdl.xd.

>   here.
>   }
> ...
>   clnt->cl_p1 = xd;           <=== but here we should increment the 
> refocunt.
> }
> 
> another code section with the same handling.
> 
> makefd_xprt()
> {
> ...
>   if ((oflags & RPC_DPLX_LKP_OFLAG_ALLOC) || (!rec->hdl.xd)) {
>   newxd = true;
>   xd = rec->hdl.xd = alloc_x_vc_data();
>   ...
>   } else {
>   xd = (struct x_vc_data *)rec->hdl.xd;
>   /* dont return destroyed xprts */
>   if (!(xd->flags & X_VC_DATA_FLAG_SVC_DESTROYED)) {
>   if (rec->hdl.xprt) {
>   xprt = rec->hdl.xprt;
>   /* inc xprt refcnt */
>   SVC_REF(xprt, SVC_REF_FLAG_NONE);
>   } else
>   ++(xd->refcnt);    < not right, no addtl' 
> ref to xd taken.

so this looks more likely to be incorrect, need to review

>   }
>   /* return extra ref */
>   rpc_dplx_unref(rec,
>      RPC_DPLX_FLAG_LOCKED | RPC_DPLX_FLAG_UNLOCK);
>   *allocated = FALSE;
> 
>   /* return ref'd xprt */
>   goto done_xprt;
>   }
>   ...
>   xprt->xp_p1 = xd;    < but here we should increment the refcount
>   
>   ...
> }
> 
> Both areas handle the refcount'ing wrong, but it might balance out
> sometimes.
> 
> 
> What do you think ?
> 
> Cheers Swen
> 
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

--
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


  1   2   >