[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: CMake: remove USE_TSAN, use -DSANITIZE_THREAD instead
>From Dominique Martinet: Dominique Martinet has uploaded this change for review. ( https://review.gerrithub.io/403728 Change subject: CMake: remove USE_TSAN, use -DSANITIZE_THREAD instead .. CMake: remove USE_TSAN, use -DSANITIZE_THREAD instead We currently have two ways of enabling TSAN and this one does not work Instead of trying to debug why, just remove it Change-Id: I7e925f319821162c1d0446ad1146e7fff693c973 Signed-off-by: Dominique Martinet --- M src/CMakeLists.txt D src/cmake/tsan.cmake 2 files changed, 0 insertions(+), 48 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/28/403728/1 -- To view, visit https://review.gerrithub.io/403728 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: I7e925f319821162c1d0446ad1146e7fff693c973 Gerrit-Change-Number: 403728 Gerrit-PatchSet: 1 Gerrit-Owner: Dominique Martinet -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: CMake sanitizers: s/saitizer/sanitizer/
>From Dominique Martinet: Dominique Martinet has uploaded this change for review. ( https://review.gerrithub.io/403729 Change subject: CMake sanitizers: s/saitizer/sanitizer/ .. CMake sanitizers: s/saitizer/sanitizer/ Annoying typo in function names are evil Change-Id: Icc2ca77720960fec3b13b1473f14f6e8ac72a666 Signed-off-by: Dominique Martinet --- M src/cmake/modules/FindASan.cmake M src/cmake/modules/FindMSan.cmake M src/cmake/modules/FindTSan.cmake M src/cmake/modules/FindUBSan.cmake M src/cmake/modules/sanitize-helpers.cmake 5 files changed, 5 insertions(+), 5 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/29/403729/1 -- To view, visit https://review.gerrithub.io/403729 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: Icc2ca77720960fec3b13b1473f14f6e8ac72a666 Gerrit-Change-Number: 403729 Gerrit-PatchSet: 1 Gerrit-Owner: Dominique Martinet -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] rpcping
rpcping was not thread safe. I have fixes for it incoming. Daniel On 03/13/2018 12:13 PM, William Allen Simpson wrote: On 3/13/18 2:38 AM, William Allen Simpson wrote: In my measurements, using the new CLNT_CALL_BACK(), the client thread starts sending a stream of pings. In every case, it peaks at a relatively stable rate. DanG suggested that timing was dominated by the system time calls. The previous numbers were switched to a finer grained timer than the original code. JeffL says that clock_gettime() should have had negligible overhead. But just to make sure, I've eliminated the per thread timers and substituted one before and one after. Unlike previously, this will include the overhead of setting up the client, in addition to completing all the callback returns. Same result. More calls ::= slower times. rpcping tcp localhost threads=1 count=1000 (port=2049 program=13 version=3 procedure=0): average 36012.0254, total 36012.0254 rpcping tcp localhost threads=1 count=1500 (port=2049 program=13 version=3 procedure=0): average 33720.9125, total 33720.9125 rpcping tcp localhost threads=1 count=2000 (port=2049 program=13 version=3 procedure=0): average 25604.7542, total 25604.7542 rpcping tcp localhost threads=1 count=3000 (port=2049 program=13 version=3 procedure=0): average 21170.0836, total 21170.0836 rpcping tcp localhost threads=1 count=5000 (port=2049 program=13 version=3 procedure=0): average 18163.2451, total 18163.2451 Including the 3-way handshake time for setting up the clients does affect the overall throughput numbers. rpcping tcp localhost threads=2 count=1500 (port=2049 program=13 version=3 procedure=0): average 10379.3976, total 20758.7951 rpcping tcp localhost threads=2 count=1500 (port=2049 program=13 version=3 procedure=0): average 10746.9395, total 21493.8790 rpcping tcp localhost threads=3 count=1500 (port=2049 program=13 version=3 procedure=0): average 5473.3780, total 16420.1339 rpcping tcp localhost threads=3 count=1500 (port=2049 program=13 version=3 procedure=0): average 5886.5549, total 17659.6646 rpcping tcp localhost threads=5 count=1500 (port=2049 program=13 version=3 procedure=0): average 3396.9438, total 16984.7190 rpcping tcp localhost threads=5 count=1500 (port=2049 program=13 version=3 procedure=0): average 3455.3026, total 17276.5131 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] rpcping
On 3/13/18 2:38 AM, William Allen Simpson wrote: In my measurements, using the new CLNT_CALL_BACK(), the client thread starts sending a stream of pings. In every case, it peaks at a relatively stable rate. DanG suggested that timing was dominated by the system time calls. The previous numbers were switched to a finer grained timer than the original code. JeffL says that clock_gettime() should have had negligible overhead. But just to make sure, I've eliminated the per thread timers and substituted one before and one after. Unlike previously, this will include the overhead of setting up the client, in addition to completing all the callback returns. Same result. More calls ::= slower times. rpcping tcp localhost threads=1 count=1000 (port=2049 program=13 version=3 procedure=0): average 36012.0254, total 36012.0254 rpcping tcp localhost threads=1 count=1500 (port=2049 program=13 version=3 procedure=0): average 33720.9125, total 33720.9125 rpcping tcp localhost threads=1 count=2000 (port=2049 program=13 version=3 procedure=0): average 25604.7542, total 25604.7542 rpcping tcp localhost threads=1 count=3000 (port=2049 program=13 version=3 procedure=0): average 21170.0836, total 21170.0836 rpcping tcp localhost threads=1 count=5000 (port=2049 program=13 version=3 procedure=0): average 18163.2451, total 18163.2451 Including the 3-way handshake time for setting up the clients does affect the overall throughput numbers. rpcping tcp localhost threads=2 count=1500 (port=2049 program=13 version=3 procedure=0): average 10379.3976, total 20758.7951 rpcping tcp localhost threads=2 count=1500 (port=2049 program=13 version=3 procedure=0): average 10746.9395, total 21493.8790 rpcping tcp localhost threads=3 count=1500 (port=2049 program=13 version=3 procedure=0): average 5473.3780, total 16420.1339 rpcping tcp localhost threads=3 count=1500 (port=2049 program=13 version=3 procedure=0): average 5886.5549, total 17659.6646 rpcping tcp localhost threads=5 count=1500 (port=2049 program=13 version=3 procedure=0): average 3396.9438, total 16984.7190 rpcping tcp localhost threads=5 count=1500 (port=2049 program=13 version=3 procedure=0): average 3455.3026, total 17276.5131 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: Adding empty file const_strcuts.checkpatch
>From: supriti.si...@suse.com has uploaded this change for review. ( https://review.gerrithub.io/403704 Change subject: Adding empty file const_strcuts.checkpatch .. Adding empty file const_strcuts.checkpatch In absence of this file, checkpatch.pl shows an error: No structs that should be const will be found, file missing Change-Id: Iab141bf7bf5aa40a4c19f4994cbcbb7b896e469b Signed-off-by: Supriti Singh --- A src/scripts/const_structs.checkpatch 1 file changed, 0 insertions(+), 0 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/04/403704/1 -- To view, visit https://review.gerrithub.io/403704 To unsubscribe, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-MessageType: newchange Gerrit-Change-Id: Iab141bf7bf5aa40a4c19f4994cbcbb7b896e469b Gerrit-Change-Number: 403704 Gerrit-PatchSet: 1 Gerrit-Owner: supriti.si...@suse.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Better late than never - US Daylight Savings Time has started and that means weekly conference call is an hour earlier
> Time has started and that means weekly conference call is an hour earlier > > > An hour later... No, an hour earlier. The time for the meeting is based on current Pacific time, not UT/GMT. So when the US enters Daylight Saving Time, the meeting switches to an hour earlier, and when we leave Daylight Saving Time, the meeting switches to an hour later. In most of the US, the clock time of the meeting stays the same. In much of Europe, the clock time is out of sync for a few weeks due to different dates for entering/leaving Daylight Saving Time. In parts of the world that don't observe Daylight Saving Time (most notably for this project is India), the clock time changes as well as the absolute time. For those south of the Equator that observe Daylight Saving Time, the clock time ultimately shifts two hours, though it is probably staged in two steps due to their leaving Daylight Saving Time in their fall on a different date than the US enters Daylight Saving Time in the spring. Frank -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Better late than never - US Daylight Savings Time has started and that means weekly conference call is an hour earlier
Daniel Gryniewicz wrote on Tue, Mar 13, 2018 at 10:07:33AM -0400: > An hour later... Nope, it is an hour earlier for us :) -- Dominique -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] Better late than never - US Daylight Savings Time has started and that means weekly conference call is an hour earlier
An hour later... Daniel On 03/13/2018 10:02 AM, Frank Filz wrote: -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
[Nfs-ganesha-devel] Better late than never - US Daylight Savings Time has started and that means weekly conference call is an hour earlier
-- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] intermittent malloc list corruption on shutdown in -dev.3
Hi Jeff, the CEA bot has hit this twice in the past two or so weeks, so you're definitely not the only one seeing that -- unfortunately it's only ever hit it on the runs without ASAN so the traces are pretty much the same as what you get. This kind of messages mean we're messing about with internal glibc malloc headers and I'm very surprised ASAN/valgrind don't catch it. If it's a race maybe hellgrind? But I think that reports quite a bit, would need more time than I have to check... Anyway, you're not alone, but I don't have much clue either.. Good luck ! :P -- Dominique -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] rpcping
On Tue, Mar 13, 2018 at 2:38 AM, William Allen Simpsonwrote: > On 3/12/18 6:25 PM, Matt Benjamin wrote: >> >> If I understand correctly, we always insert records in xid order, and >> xid is monotonically increasing by 1. I guess pings might come back >> in any order, > > > No, they always come back in order. This is TCP. I've gone to some > lengths to fix the problem that operations were being executed in > arbitrary order. (As was reported in the past.) We're aware of the issues with former req queuing. It was one of my top priorities to fix in napalm, and we did it. > > For UDP, there is always the possibility of loss or re-ordering of > datagrams, one of the reasons for switching to TCP in NFSv3 (and > eliminating UDP in NFSv4). > > Threads can still block in apparently random order, because of > timing variances inside FSAL calls. Should not be an issue here. > > >> but if we assume xids retire in xid order also, > > > They do. Should be no variance. Eliminating the dupreq caching -- > also using the rbtree -- significantly improved the timing. It's certainly correct not to cache, but it's also a special case that arises from...benchmarking with rpcping, not NFS. Same goes for retire order. Who said, let's assume the rpcping requests retire in order? Oh yes, me above. Do you think NFS requests in general are required to retire in arrival order? No, of course not. What workload is the general case for the DRC? NFS. > > Apparently picked the worst tree choice for this data, according to > computer science. If all you have is a hammer What motivates you to write this stuff? Here are two facts you may have overlooked: 1. The DRC has a constant insert-delete workload, and for this application, IIRC, I put the last inserted entries directly into the cache. This both applies standard art on trees (rbtree vs avl perfomance on insert/delete heavy workloads, and ostensibly avoids searching the tree in the common case; I measured hitrate informally, looked to be working). 2. the key in the DRC caches is hk,not xid. > > >> and keep >> a window of 1 records in-tree, that seems maybe like a reasonable >> starting point for measuring this? >> I've not tried 10,000 or 100,000 recently. (The original code > > default sent 100,000.) > > I've not recorded how many remain in-tree during the run. > > In my measurements, using the new CLNT_CALL_BACK(), the client thread > starts sending a stream of pings. In every case, it peaks at a > relatively stable rate. > > For 1,000, <4,000/s. For 100, 40,000/s. Fairly linear relationship. > > By running multiple threads, I showed that each individual thread ran > roughly the same (on average). But there is some variance per run. > > I only posted the 5 thread results, lowest and highest achieved. > > My original message had up to 200 threads and 4 results, but I decided > such a long series was overkill, so removed them before sending. > > That 4,000 and 40,000 per client thread was stable across all runs. > > >> I wrote a gtest program (gerrit) that I think does the above in a >> single thread, no locks, for 1M cycles (search, remove, insert). On >> lemon, compiled at O2, the gtest profiling says the test finishes in >> less than 150ms (I saw as low as 124). That's over 6M cycles/s, I >> think. >> > What have you compared it to? Need a gtest of avl and tailq with the > same data. That's what the papers I looked at do The point is, that is very low latency, a lot less than I expected. It's probably minimized from CPU caching and so forth, but it tries to address the more basic question, is expected or unexpected latency from searching the rb tree a likely contributor to overall latency? If we get 2M retires per sec (let alone 6-7), is that a likely supposition? The rb tree either is, or isn't a major contributor to latency. We'll ditch it if it is. Substituting a tailq (linear search) seems an unlikely choice, but if you can prove your case with the numbers, no one's going to object. Matt -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] rpcping
On 3/12/18 6:25 PM, Matt Benjamin wrote: If I understand correctly, we always insert records in xid order, and xid is monotonically increasing by 1. I guess pings might come back in any order, No, they always come back in order. This is TCP. I've gone to some lengths to fix the problem that operations were being executed in arbitrary order. (As was reported in the past.) For UDP, there is always the possibility of loss or re-ordering of datagrams, one of the reasons for switching to TCP in NFSv3 (and eliminating UDP in NFSv4). Threads can still block in apparently random order, because of timing variances inside FSAL calls. Should not be an issue here. but if we assume xids retire in xid order also, They do. Should be no variance. Eliminating the dupreq caching -- also using the rbtree -- significantly improved the timing. Apparently picked the worst tree choice for this data, according to computer science. If all you have is a hammer and keep a window of 1 records in-tree, that seems maybe like a reasonable starting point for measuring this? I've not tried 10,000 or 100,000 recently. (The original code default sent 100,000.) I've not recorded how many remain in-tree during the run. In my measurements, using the new CLNT_CALL_BACK(), the client thread starts sending a stream of pings. In every case, it peaks at a relatively stable rate. For 1,000, <4,000/s. For 100, 40,000/s. Fairly linear relationship. By running multiple threads, I showed that each individual thread ran roughly the same (on average). But there is some variance per run. I only posted the 5 thread results, lowest and highest achieved. My original message had up to 200 threads and 4 results, but I decided such a long series was overkill, so removed them before sending. That 4,000 and 40,000 per client thread was stable across all runs. I wrote a gtest program (gerrit) that I think does the above in a single thread, no locks, for 1M cycles (search, remove, insert). On lemon, compiled at O2, the gtest profiling says the test finishes in less than 150ms (I saw as low as 124). That's over 6M cycles/s, I think. What have you compared it to? Need a gtest of avl and tailq with the same data. That's what the papers I looked at do -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel