Re: [Nfs-ganesha-devel] rpcping profile
With N=10 and num_calls=100, on Lemon, test_rbt averages 2.8M reqs/s. That's about half the rate when N=1, which I think is expected. If this is really an available rbt in-order search-remove-insert retire rate when N is 10, my intuition would be it's sufficiently fast not to be the bottleneck your result claims, and I think it's necessary to understand why. Matt On Sun, Mar 25, 2018 at 6:17 PM, Matt Benjaminwrote: > 1 What is the peak outstanding size of outstanding calls > > 1.1 if e.g. > 100k is that correct: as last week, why would a sensible > client issue more than e.g. 1000 calls without seeing replies? > > 1.3 if outstanding calls is <= 1, why can test_rbt retire millions of > duty cycles / s in that scenario? > > 2 what does the search workload look like when replies are mixed with calls? > Ie bidirectional rpc this is intended for? > > 2.2 Hint: xid dist is not generally sorted; client defines only its own > issue order, not reply order nor peer xids; why is it safe to base reply > matching around xids being in sorted order? > > Matt > > On Sun, Mar 25, 2018, 1:40 PM William Allen Simpson > wrote: >> >> On 3/24/18 7:50 AM, William Allen Simpson wrote: >> > Noting that the top problem is exactly my prediction by knowledge of >> > the code: >> >clnt_req_callback() opr_rbtree_insert() >> > >> > The second is also exactly as expected: >> > >> >svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf() >> > >> > These are both inserted in ascending order, sorted in ascending order, >> > and removed in ascending order >> > >> > QED: rb_tree is a poor data structure for this purpose. >> >> I've replaced those 2 rbtrees with TAILQ, so that we are not >> spending 49% of the time there anymore, and am now seeing: >> >> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 >> program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287 >> rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 >> program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817 >> >> This is probably good enough for now. Time to move on to >> more interesting things. >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Nfs-ganesha-devel mailing list >> Nfs-ganesha-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] rpcping profile
1 What is the peak outstanding size of outstanding calls 1.1 if e.g. > 100k is that correct: as last week, why would a sensible client issue more than e.g. 1000 calls without seeing replies? 1.3 if outstanding calls is <= 1, why can test_rbt retire millions of duty cycles / s in that scenario? 2 what does the search workload look like when replies are mixed with calls? Ie bidirectional rpc this is intended for? 2.2 Hint: xid dist is not generally sorted; client defines only its own issue order, not reply order nor peer xids; why is it safe to base reply matching around xids being in sorted order? Matt On Sun, Mar 25, 2018, 1:40 PM William Allen Simpson < william.allen.simp...@gmail.com> wrote: > On 3/24/18 7:50 AM, William Allen Simpson wrote: > > Noting that the top problem is exactly my prediction by knowledge of > > the code: > >clnt_req_callback() opr_rbtree_insert() > > > > The second is also exactly as expected: > > > >svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf() > > > > These are both inserted in ascending order, sorted in ascending order, > > and removed in ascending order > > > > QED: rb_tree is a poor data structure for this purpose. > > I've replaced those 2 rbtrees with TAILQ, so that we are not > spending 49% of the time there anymore, and am now seeing: > > rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 > program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287 > rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 > program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817 > > This is probably good enough for now. Time to move on to > more interesting things. > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] rpcping profile
On 3/24/18 7:50 AM, William Allen Simpson wrote: Noting that the top problem is exactly my prediction by knowledge of the code: clnt_req_callback() opr_rbtree_insert() The second is also exactly as expected: svc_rqst_expire_insert() opr_rbtree_insert() svc_rqst_expire_cmpf() These are both inserted in ascending order, sorted in ascending order, and removed in ascending order QED: rb_tree is a poor data structure for this purpose. I've replaced those 2 rbtrees with TAILQ, so that we are not spending 49% of the time there anymore, and am now seeing: rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 program=13 version=3 procedure=0): mean 151800.6287, total 151800.6287 rpcping tcp localhost count=1000 threads=1 workers=5 (port=2049 program=13 version=3 procedure=0): mean 167828.8817, total 167828.8817 This is probably good enough for now. Time to move on to more interesting things. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] rpcping profile
Using local file tests/rpcping. Using local file ../profile. Total: 989 samples 321 32.5% 32.5% 321 32.5% svc_rqst_expire_cmpf 149 15.1% 47.5% 475 48.0% opr_rbtree_insert 139 14.1% 61.6% 140 14.2% __writev 56 5.7% 67.2% 66 6.7% __GI___pthread_mutex_lock 32 3.2% 70.5% 32 3.2% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371 32 3.2% 73.7% 32 3.2% futex_abstimed_wait_cancelable 23 2.3% 76.0% 23 2.3% __libc_write 21 2.1% 78.2% 23 2.3% __libc_recv 18 1.8% 80.0% 18 1.8% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/misc/../sysdeps/unix/syscall-template.S:84 17 1.7% 81.7% 17 1.7% __libc_read 15 1.5% 83.2% 15 1.5% clnt_req_xid_cmpf 14 1.4% 84.6% 14 1.4% _int_free 14 1.4% 86.0% 14 1.4% futex_wake 11 1.1% 87.2% 13 1.3% __GI_epoll_pwait 8 0.8% 88.0%9 0.9% _int_malloc 8 0.8% 88.8% 494 49.9% svc_rqst_expire_insert 7 0.7% 89.5% 20 2.0% __libc_calloc 7 0.7% 90.2%7 0.7% opr_rbtree_first 6 0.6% 90.8%6 0.6% __GI___libc_free 5 0.5% 91.3%5 0.5% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 5 0.5% 91.8% 150 15.2% svc_vc_recv 5 0.5% 92.3% 470 47.5% work_pool_thread 4 0.4% 92.7%4 0.4% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:133 3 0.3% 93.0%3 0.3% __condvar_dec_grefs 3 0.3% 93.3%3 0.3% atomic_add_uint64_t 2 0.2% 93.5%2 0.2% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/inet/../sysdeps/x86_64/htonl.S:29 2 0.2% 93.7%2 0.2% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:67 2 0.2% 93.9% 34 3.4% __pthread_mutex_unlock_usercnt 2 0.2% 94.1%2 0.2% _seterr_reply 2 0.2% 94.3%2 0.2% atomic_add_uint32_t@2665d 2 0.2% 94.5%9 0.9% clnt_req_release 2 0.2% 94.7%2 0.2% malloc_consolidate 2 0.2% 94.9% 142 14.4% svc_ioq_flushv 2 0.2% 95.1%3 0.3% svc_ref_it@401421 2 0.2% 95.3% 255 25.8% svc_rqst_epoll_loop 2 0.2% 95.6% 12 1.2% xdr_ioq_release 2 0.2% 95.8%6 0.6% xdr_ioq_setup 1 0.1% 95.9%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/misc/../sysdeps/unix/syscall-template.S:84 1 0.1% 96.0%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:59 1 0.1% 96.1%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:60 1 0.1% 96.2%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/cancellation.S:67 1 0.1% 96.3%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:296 1 0.1% 96.4%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:158 1 0.1% 96.5%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:189 1 0.1% 96.6%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:193 1 0.1% 96.7%1 0.1% ?? /usr/src/debug/glibc-2.26-137-g247c1ddd30/nptl/../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:138 1 0.1% 96.8%1 0.1% 0x7fffde7caa20 1 0.1% 96.9%1 0.1% 0x7fffde7caa25 1 0.1% 97.0%1 0.1% 0x7fffde7caaa1 1 0.1% 97.1%1 0.1% __GI___libc_malloc 1 0.1% 97.2%1 0.1% __condvar_get_private 1 0.1% 97.3%1 0.1% __pthread_cond_destroy 1 0.1% 97.4%1 0.1% __pthread_cond_signal 1 0.1% 97.5%1 0.1% __pthread_cond_wait_common 1 0.1% 97.6%1 0.1% _init 1 0.1% 97.7%1 0.1% alloc_perturb 1 0.1% 97.8%1 0.1% atomic_add_uint32_t@401377 1 0.1% 97.9%1 0.1% atomic_dec_uint32_t@e3d8 1 0.1% 98.0%2 0.2% atomic_inc_uint32_t@401394 1 0.1% 98.1%4 0.4% atomic_inc_uint64_t 1 0.1% 98.2%1 0.1% atomic_postclear_uint32_t_bits 1 0.1% 98.3%1 0.1% atomic_postset_uint16_t_bits 1 0.1% 98.4%1 0.1% atomic_sub_uint32_t 1