Re: [Nfs-ganesha-devel] rpcping profile

2018-03-25 Thread Matt Benjamin
With N=10 and num_calls=100, on Lemon, test_rbt averages 2.8M
reqs/s.  That's about half the rate when N=1, which I think is
expected.  If this is really an available rbt in-order
search-remove-insert retire rate when N is 10, my intuition would
be it's sufficiently fast not to be the bottleneck your result claims,
and I think it's necessary to understand why.

Matt

On Sun, Mar 25, 2018 at 6:17 PM, Matt Benjamin  wrote:
> 1 What is the peak outstanding size of outstanding calls
>
> 1.1 if e.g. > 100k is that correct: as last week, why would a sensible
> client issue more than e.g. 1000 calls without seeing replies?
>
> 1.3 if outstanding calls is <= 1, why can test_rbt retire millions of
> duty cycles / s in that scenario?
>
> 2 what does the search workload look like when replies are mixed with calls?
> Ie bidirectional rpc this is intended for?
>
> 2.2 Hint: xid dist is not generally sorted;  client defines only its own
> issue order, not reply order nor peer xids;  why is it safe to base reply
> matching around xids being in sorted order?
>
> Matt
>
> On Sun, Mar 25, 2018, 1:40 PM William Allen Simpson
>  wrote:
>>
>> On 3/24/18 7:50 AM, William Allen Simpson wrote:
>> > Noting that the top problem is exactly my prediction by knowledge of
>> > the code:
>> >clnt_req_callback() opr_rbtree_insert()
>> >
>> > The second is also exactly as expected:
>> >
>> >svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf()
>> >
>> > These are both inserted in ascending order, sorted in ascending order,
>> > and removed in ascending order
>> >
>> > QED: rb_tree is a poor data structure for this purpose.
>>
>> I've replaced those 2 rbtrees with TAILQ, so that we are not
>> spending 49% of the time there anymore, and am now seeing:
>>
>> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
>> program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287
>> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
>> program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817
>>
>> This is probably good enough for now.  Time to move on to
>> more interesting things.
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping profile

2018-03-25 Thread Matt Benjamin
1 What is the peak outstanding size of outstanding calls

1.1 if e.g. > 100k is that correct: as last week, why would a sensible
client issue more than e.g. 1000 calls without seeing replies?

1.3 if outstanding calls is <= 1, why can test_rbt retire millions of
duty cycles / s in that scenario?

2 what does the search workload look like when replies are mixed with
calls?  Ie bidirectional rpc this is intended for?

2.2 Hint: xid dist is not generally sorted;  client defines only its own
issue order, not reply order nor peer xids;  why is it safe to base reply
matching around xids being in sorted order?

Matt

On Sun, Mar 25, 2018, 1:40 PM William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 3/24/18 7:50 AM, William Allen Simpson wrote:
> > Noting that the top problem is exactly my prediction by knowledge of
> > the code:
> >clnt_req_callback() opr_rbtree_insert()
> >
> > The second is also exactly as expected:
> >
> >svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf()
> >
> > These are both inserted in ascending order, sorted in ascending order,
> > and removed in ascending order
> >
> > QED: rb_tree is a poor data structure for this purpose.
>
> I've replaced those 2 rbtrees with TAILQ, so that we are not
> spending 49% of the time there anymore, and am now seeing:
>
> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
> program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287
> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049
> program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817
>
> This is probably good enough for now.  Time to move on to
> more interesting things.
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] rpcping profile

2018-03-25 Thread William Allen Simpson

On 3/24/18 7:50 AM, William Allen Simpson wrote:

Noting that the top problem is exactly my prediction by knowledge of
the code:
   clnt_req_callback() opr_rbtree_insert()

The second is also exactly as expected:

   svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf()

These are both inserted in ascending order, sorted in ascending order,
and removed in ascending order

QED: rb_tree is a poor data structure for this purpose.


I've replaced those 2 rbtrees with TAILQ, so that we are not
spending 49% of the time there anymore, and am now seeing:

rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 program=13 
version=3 procedure=0): mean 151800.6287, total 151800.6287
rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 program=13 
version=3 procedure=0): mean 167828.8817, total 167828.8817

This is probably good enough for now.  Time to move on to
more interesting things.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] rpcping profile

2018-03-24 Thread William Allen Simpson

Using local file tests/rpcping.
Using local file ../profile.
Total: 989 samples
 321  32.5%  32.5%  321  32.5% svc_rqst_expire_cmpf
 149  15.1%  47.5%  475  48.0% opr_rbtree_insert
 139  14.1%  61.6%  140  14.2% __writev
  56   5.7%  67.2%   66   6.7% __GI___pthread_mutex_lock
  32   3.2%  70.5%   32   3.2% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  32   3.2%  73.7%   32   3.2% futex_abstimed_wait_cancelable
  23   2.3%  76.0%   23   2.3% __libc_write
  21   2.1%  78.2%   23   2.3% __libc_recv
  18   1.8%  80.0%   18   1.8% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/misc/../sysdeps/unix/syscall-template.S:84
  17   1.7%  81.7%   17   1.7% __libc_read
  15   1.5%  83.2%   15   1.5% clnt_req_xid_cmpf
  14   1.4%  84.6%   14   1.4% _int_free
  14   1.4%  86.0%   14   1.4% futex_wake
  11   1.1%  87.2%   13   1.3% __GI_epoll_pwait
   8   0.8%  88.0%9   0.9% _int_malloc
   8   0.8%  88.8%  494  49.9% svc_rqst_expire_insert
   7   0.7%  89.5%   20   2.0% __libc_calloc
   7   0.7%  90.2%7   0.7% opr_rbtree_first
   6   0.6%  90.8%6   0.6% __GI___libc_free
   5   0.5%  91.3%5   0.5% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
   5   0.5%  91.8%  150  15.2% svc_vc_recv
   5   0.5%  92.3%  470  47.5% work_pool_thread
   4   0.4%  92.7%4   0.4% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:133
   3   0.3%  93.0%3   0.3% __condvar_dec_grefs
   3   0.3%  93.3%3   0.3% atomic_add_uint64_t
   2   0.2%  93.5%2   0.2% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/inet/../sysdeps/x86_64/htonl.S:29
   2   0.2%  93.7%2   0.2% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:67
   2   0.2%  93.9%   34   3.4% __pthread_mutex_unlock_usercnt
   2   0.2%  94.1%2   0.2% _seterr_reply
   2   0.2%  94.3%2   0.2% atomic_add_uint32_t@2665d
   2   0.2%  94.5%9   0.9% clnt_req_release
   2   0.2%  94.7%2   0.2% malloc_consolidate
   2   0.2%  94.9%  142  14.4% svc_ioq_flushv
   2   0.2%  95.1%3   0.3% svc_ref_it@401421
   2   0.2%  95.3%  255  25.8% svc_rqst_epoll_loop
   2   0.2%  95.6%   12   1.2% xdr_ioq_release
   2   0.2%  95.8%6   0.6% xdr_ioq_setup
   1   0.1%  95.9%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/misc/../sysdeps/unix/syscall-template.S:84
   1   0.1%  96.0%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:59
   1   0.1%  96.1%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:60
   1   0.1%  96.2%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:67
   1   0.1%  96.3%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:296
   1   0.1%  96.4%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:158
   1   0.1%  96.5%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:189
   1   0.1%  96.6%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:193
   1   0.1%  96.7%1   0.1% ?? 
/usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:138
   1   0.1%  96.8%1   0.1% 0x7fffde7caa20
   1   0.1%  96.9%1   0.1% 0x7fffde7caa25
   1   0.1%  97.0%1   0.1% 0x7fffde7caaa1
   1   0.1%  97.1%1   0.1% __GI___libc_malloc
   1   0.1%  97.2%1   0.1% __condvar_get_private
   1   0.1%  97.3%1   0.1% __pthread_cond_destroy
   1   0.1%  97.4%1   0.1% __pthread_cond_signal
   1   0.1%  97.5%1   0.1% __pthread_cond_wait_common
   1   0.1%  97.6%1   0.1% _init
   1   0.1%  97.7%1   0.1% alloc_perturb
   1   0.1%  97.8%1   0.1% atomic_add_uint32_t@401377
   1   0.1%  97.9%1   0.1% atomic_dec_uint32_t@e3d8
   1   0.1%  98.0%2   0.2% atomic_inc_uint32_t@401394
   1   0.1%  98.1%4   0.4% atomic_inc_uint64_t
   1   0.1%  98.2%1   0.1% atomic_postclear_uint32_t_bits
   1   0.1%  98.3%1   0.1% atomic_postset_uint16_t_bits
   1   0.1%  98.4%1   0.1% atomic_sub_uint32_t
   1