[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 9: > (1 comment) > > > (1 comment) > > > > I realized that, the only proper way to add a 2-way load barrier > is > > with > > std::atomic_thread_fence(std::memory_order_acquire), which is > more > > strict, than > > the 1-way std::atomic::load(std::memory_order_acquire); > > > > I made some performance test. I used TestScanPerformance from > > cbtree-test and > > TestMemRowSetUpdatePerformance from memrowset-test. I also added > 2 > > more > > performance tests, one using basic arena, one using > > ThreadSafeMemoryTrackingArena (we only use the 2nd in > production). > > I will add g++ tests too (as soon as it compiles). > > See the branches sheet for name resolution. > > For some reason, I assumed you were going to add those tests into > this patch or as a separate patch, but it seems by 'adding > > > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing > > > (1 comment) > > > > I realized that, the only proper way to add a 2-way load barrier > is > > with > > std::atomic_thread_fence(std::memory_order_acquire), which is > more > > strict, than > > the 1-way std::atomic::load(std::memory_order_acquire); > > > > I made some performance test. I used TestScanPerformance from > > cbtree-test and > > TestMemRowSetUpdatePerformance from memrowset-test. I also added > 2 > > more > > performance tests, one using basic arena, one using > > ThreadSafeMemoryTrackingArena (we only use the 2nd in > production). > > I will add g++ tests too (as soon as it compiles). > > See the branches sheet for name resolution. > > For some reason, I assumed you were going to add your new tests as > a part of this changelist as well, but it seems you just meant > adding results from running those tests into the spreadsheet :) > > OK, maybe it makes sense to add your new tests into Kudu as well? > Feel free to post that as a separate changelist. > > Also, it would be great if you could address other feedback items > (nits, questions of other reviewers, etc.) and post a follow-up > revision, if necessary. > > Thanks! > > > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing I run the arm tests on a c6g.2xlarge arm machine, and the x86 tests on a physical pc in the bp office (I can include the spec if you want). I thought only the changes matter (not the comparison of x86 and ARM), however I can try to rerun the tests on a similar x86 than the arm machine (c6i.2xlarge?)/ All times are in ms, the value that the google test prints (TestMemRowSetUpdatePerformance included). The (same platform) tests were run on the same machine. One at a time through ssh. No other task was taking place (I even closed VS Code to prevent it from doing any background task). x86_master_clang_2 and x86_master_clang are the same, only 3 hours apart (And I have no idea why the second run is a bit slower). My new tests are nearly the same as "TestConcurrentInsert", just using the ThreadSafeMemoryTrackingArena and the sizes that production actually uses. I could add a test with less inserts using this parameters. (If the tree works while it is small, it will work, when it is big, because threads will "collide" less often). I will post a separate gerrit for this test (Same as TestPerformanceThreadSafe, just runs <1s). -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 07 May 2024 13:14:22 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 10: (1 comment) > (1 comment) > > I realized that, the only proper way to add a 2-way load barrier is > with > std::atomic_thread_fence(std::memory_order_acquire), which is more > strict, than > the 1-way std::atomic::load(std::memory_order_acquire); > > I made some performance test. I used TestScanPerformance from > cbtree-test and > TestMemRowSetUpdatePerformance from memrowset-test. I also added 2 > more > performance tests, one using basic arena, one using > ThreadSafeMemoryTrackingArena (we only use the 2nd in production). > I will add g++ tests too (as soon as it compiles). > See the branches sheet for name resolution. For some reason, I assumed you were going to add those tests into this patch or as a separate patch, but it seems by 'adding > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing > (1 comment) > > I realized that, the only proper way to add a 2-way load barrier is > with > std::atomic_thread_fence(std::memory_order_acquire), which is more > strict, than > the 1-way std::atomic::load(std::memory_order_acquire); > > I made some performance test. I used TestScanPerformance from > cbtree-test and > TestMemRowSetUpdatePerformance from memrowset-test. I also added 2 > more > performance tests, one using basic arena, one using > ThreadSafeMemoryTrackingArena (we only use the 2nd in production). > I will add g++ tests too (as soon as it compiles). > See the branches sheet for name resolution. For some reason, I assumed you were going to add your new tests as a part of this changelist as well, but it seems you just meant adding results from running those tests into the spreadsheet :) OK, maybe it makes sense to add your new tests into Kudu as well? Feel free to post that as a separate changelist. Also, it would be great if you could address other feedback items (nits, questions of other reviewers, etc.) and post a follow-up revision, if necessary. Thanks! > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33 PS6, Line 33: The following 6 barriers need to be more strict: > See the google sheet. Thanks for posting the numbers. A few questions: * What were the machines that x86_64 and ARM tests were run at? How close they are in terms of processing power (frequency, number of CPU cores)? * Do I understand correctly that both the x86_64 and ARM machines were the same for all the tests, just different binaries were run? * I'm not sure I understand what 'x86_master_clang_2' at the Summary tab is for? How is it different from 'x86_master_clang'? * The numbers posted at the Summary tab: are those seconds of test runtime or that's some completely different (e.g., number scanned/inserted keys), or for different tests they are of different units? * The numbers posted at the Summary tab for the 'TestMemRowSetUpdatePerformance' test: what are those? Are those just total runtime or some particular timing report part (inserting, counting, updating) of the test? Also, could you run existing corresponding Kudu tests (you can run you new tests as well, of course) prior and post your gerrit patch under the 'perf' utility (say, 10 runs) and report the results. No need to put them into a spreadsheet, just present the output from the 'perf stat' utility. You can browse 'git log' and search for the examples how running those tests is usually done. You might also be interested in perf-mem and may be in perf-lock, but reports just from generic 'perf' would be great already. Thanks a lot! https://man7.org/linux/man-pages/man1/perf.1.html -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Sat, 04 May 2024 03:06:59 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 10: Verified-1 Build Failed http://jenkins.kudu.apache.org/job/pre_commit/38/ : FAILURE -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 16:06:26 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763), I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21127 to look at the new patch set (#10). Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. [ARM] Concurrent binary tree memory barriers fixed. TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM. The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart from incorrect order, there are no other errors in the tree. All inserted elements are present, but CBTree::GetCopy cannot find some due to the incorrect ordering. This is not a unit test error, but a real error in the CBTree implementation. Modifying the test to only do the inserts first, and only start checking when the tree is finalized does not prevent the bug. So calling only the tree->Insert function from multiple threads is sufficient to reproduce the issue (only on ARM). Root cause: Some memory order restrictions are not strict enough. It does not cause a problem on x86 (There were no real changes for 11 years), but it causes problems on ARM. x86 guarantees a globally consistent order for stores (TSO). ARM, in contrast, allows stores to different memory locations to be observed differently from the program order. More info: https://www.sciencedirect.com/science/article/pii/S1383762124000390 Solution: The following 6 barriers need to be more strict: 1. When we set the Splitting/Inserting flag on a node, then it is not enough to flush all previous changes. It is also very important not to reorder any write before it. So instead of Release_Store (which is practically equivalent to std::memory_order_release), we need a 2 way barrier. 2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we check if the version is unchanged, then a load_acquire (equivalent to std::memory_order_acquire) is not sufficient. We have to also make sure no read is reordered in the other direction. Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places would resolve the issue. However, replacing the current function from atomicops-internals-arm64.h to the c++ standard functions will make the code more consistent. Reason for changing to std::atomics: atomicops-internals-arm64.h was picked from chromium source and the missing functions was reimplemented. The header is gone since, and even chromium uses a solution based on std::atomic (see base/atomicops_internals_portable.h in chromium source). I see no reason to update the header from chromium and implement the missing functions, just to have one more abstraction layer, that is basically just "function aliases" at this point. Reason for removing PACKED: Although the padding of the Nodes was turn of, the following happened (both on ARM and x86): + NodeBase was 16 byte, so there was no padding after the base class. + LeafNode and InternalNode was 249, 244 bytes (used in memrowset and deltamemstore) + When we reserved a new node, we used arena_->AllocateBytesAligned(...), with 8 bytes alignment. So the end result was exactly the same, as if there were no "PACKED" attribute. Also compiler kept warning about " might be unaligned", but it were never unaligned. By removing the packed PACKED we change nothing in the memory layout. (neither on ARM or x86). It just gets rid of the old compiler warning. (std::atomic disable packing anyway). Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 --- M src/kudu/tablet/cbtree-test.cc M src/kudu/tablet/concurrent_btree.h 2 files changed, 101 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/10 -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 10: Build Started http://jenkins.kudu.apache.org/job/pre_commit/38/ -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 15:15:37 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 9: Verified-1 Build Failed http://jenkins.kudu.apache.org/job/pre_commit/35/ : FAILURE -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 13:04:30 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 6: (1 comment) I realized that, the only proper way to add a 2-way load barrier is with std::atomic_thread_fence(std::memory_order_acquire), which is more strict, than the 1-way std::atomic::load(std::memory_order_acquire); I made some performance test. I used TestScanPerformance from cbtree-test and TestMemRowSetUpdatePerformance from memrowset-test. I also added 2 more performance tests, one using basic arena, one using ThreadSafeMemoryTrackingArena (we only use the 2nd in production). I will add g++ tests too (as soon as it compiles). See the branches sheet for name resolution. https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33 PS6, Line 33: The following 6 barriers need to be more strict: > Did you run the TestCBTree.TestScanPerformance scenario from the cbtree-tes See the google sheet. -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 13:00:06 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763), I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21127 to look at the new patch set (#9). Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. [ARM] Concurrent binary tree memory barriers fixed. TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM. The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart from incorrect order, there are no other errors in the tree. All inserted elements are present, but CBTree::GetCopy cannot find some due to the incorrect ordering. This is not a unit test error, but a real error in the CBTree implementation. Modifying the test to only do the inserts first, and only start checking when the tree is finalized does not prevent the bug. So calling only the tree->Insert function from multiple threads is sufficient to reproduce the issue (only on ARM). Root cause: Some memory order restrictions are not strict enough. It does not cause a problem on x86 (There were no real changes for 11 years), but it causes problems on ARM. x86 guarantees a globally consistent order for stores (TSO). ARM, in contrast, allows stores to different memory locations to be observed differently from the program order. More info: https://www.sciencedirect.com/science/article/pii/S1383762124000390 Solution: The following 7 barriers need to be more strict: 1. When we set the Splitting/Inserting flag on a node, then it is not enough to flush all previous changes. It is also very important not to reorder any write before it. So instead of Release_Store (which is practically equivalent to std::memory_order_release), we need a 2 way barrier. 2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf, and we check if the version is unchanged, then a load_acquire (equivalent to std::memory_order_acquire) is not sufficient. We have to also make sure no read is reordered in the other direction. Putting the appropriate std::atomic_thread_fence(...) calls to these 7 places would resolve the issue. However, replacing the current function from atomicops-internals-arm64.h to the C++ standard functions will make the code more consistent. Reason for changing to std::atomics: atomicops-internals-arm64.h was picked from chromium source and the missing functions was reimplemented. The header is gone since, and even chromium uses a solution based on std::atomic (see base/atomicops_internals_portable.h in chromium source). I see no reason to update the header from chromium and implement the missing functions, just to have one more abstraction layer, that is basically just "function aliases" at this point. Reason for __builtin_assume_aligned and reinterpret_cast: Although the nodes are PACKED, We guarantee that they will be aligned to 8 bytes. They can have unaligned values put between them in the same arena, so packing does save memory. If you put std::atomic into a struct, g++ refuses to pack it, so we need a placeholder. Reason for renaming IsLocked to IsLockedUnsafe: In the original version, it was an unsafe memory access. Currently, it is only in places where we hold the lock (which means we have atomic access to the version before and after in the same thread). Also, it is only used in DCHECK() macros. However, I found it disturbing to have an unsafe function similarly named as all the other proper atomic functions around it. I also fixed TestRacyConcurrentInsert, so it won't freez after failing. Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 --- M src/kudu/tablet/cbtree-test.cc M src/kudu/tablet/concurrent_btree.h 2 files changed, 100 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/9 -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763), I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21127 to look at the new patch set (#8). Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. [ARM] Concurrent binary tree memory barriers fixed. TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM. The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart from incorrect order, there are no other errors in the tree. All inserted elements are present, but CBTree::GetCopy cannot find some due to the incorrect ordering. This is not a unit test error, but a real error in the CBTree implementation. Modifying the test to only do the inserts first, and only start checking when the tree is finalized does not prevent the bug. So calling only the tree->Insert function from multiple threads is sufficient to reproduce the issue (only on ARM). Root cause: Some memory order restrictions are not strict enough. It does not cause a problem on x86 (There were no real changes for 11 years), but it causes problems on ARM. x86 guarantees a globally consistent order for stores (TSO). ARM, in contrast, allows stores to different memory locations to be observed differently from the program order. More info: https://www.sciencedirect.com/science/article/pii/S1383762124000390 Solution: The following 7 barriers need to be more strict: 1. When we set the Splitting/Inserting flag on a node, then it is not enough to flush all previous changes. It is also very important not to reorder any write before it. So instead of Release_Store (which is practically equivalent to std::memory_order_release), we need a 2 way barrier. 2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf, and we check if the version is unchanged, then a load_acquire (equivalent to std::memory_order_acquire) is not sufficient. We have to also make sure no read is reordered in the other direction. Putting the appropriate std::atomic_thread_fence(...) calls to this 7 places would resolve the issue. However, replacing the current function from atomicops-internals-arm64.h to the C++ standard functions will make the code more consistent. Reason for changing to std::atomics: atomicops-internals-arm64.h was picked from chromium source and the missing functions was reimplemented. The header is gone since, and even chromium uses a solution based on std::atomic (see base/atomicops_internals_portable.h in chromium source). I see no reason to update the header from chromium and implement the missing functions, just to have one more abstraction layer, that is basically just "function aliases" at this point. Reason for __builtin_assume_aligned and reinterpret_cast: Although the nodes are PACKED, We guarantee that they will be aligned to 8 bytes. They can have unaligned values put between them in the same arena, so packing does save memory. If you put std::atomic into a struct, g++ refuses to pack it, so we need a placeholder. Reason for renaming IsLocked to IsLockedUnsafe: In the original version, it was an unsafe memory access. Currently, it is only in places where we hold the lock (which means we have atomic access to the version before and after in the same thread). Also, it is only used in DCHECK() macros. However, I found it disturbing to have an unsafe function similarly named as all the other proper atomic functions around it. I also fixed TestRacyConcurrentInsert, so it won't freez after failing. Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 --- M src/kudu/tablet/cbtree-test.cc M src/kudu/tablet/concurrent_btree.h 2 files changed, 100 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/8 -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 8: Build Aborted http://jenkins.kudu.apache.org/job/pre_commit/34/ : ABORTED -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 12:59:19 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h File src/kudu/tablet/concurrent_btree.h: http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@390 PS6, Line 390: AtomicVersionValue VersionAcqRel() { : return version_.load(std::memory_order_acq_rel); : } > I'm curios how this make sense if by the specification the behavior of std: No, I missed it. There is actually no right flag for what we wan't to do here. We need to use std::atomic_thread_fence(std::memory_order_acquire); (because it is 2-way). To my best knowledge, g++ compiled the undefined behaviour as seq_cst, so it worked, but also slowed down the execution. -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 12:57:02 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 9: Build Started http://jenkins.kudu.apache.org/job/pre_commit/35/ -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 12:59:26 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 8: Build Started http://jenkins.kudu.apache.org/job/pre_commit/34/ -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 12:58:01 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 7: Build Aborted http://jenkins.kudu.apache.org/job/pre_commit/33/ : ABORTED -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 12:57:53 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763), I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21127 to look at the new patch set (#7). Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. [ARM] Concurrent binary tree memory barriers fixed. TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM. The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart from incorrect order, there are no other errors in the tree. All inserted elements are present, but CBTree::GetCopy cannot find some due to the incorrect ordering. This is not a unit test error, but a real error in the CBTree implementation. Modifying the test to only do the inserts first, and only start checking when the tree is finalized does not prevent the bug. So calling only the tree->Insert function from multiple threads is sufficient to reproduce the issue (only on ARM). Root cause: Some memory order restrictions are not strict enough. It does not cause a problem on x86 (There were no real changes for 11 years), but it causes problems on ARM. x86 guarantees a globally consistent order for stores (TSO). ARM, in contrast, allows stores to different memory locations to be observed differently from the program order. More info: https://www.sciencedirect.com/science/article/pii/S1383762124000390 Solution: The following 7 barriers need to be more strict: 1. When we set the Splitting/Inserting flag on a node, then it is not enough to flush all previous changes. It is also very important not to reorder any write before it. So instead of Release_Store (which is practically equivalent to std::memory_order_release), we need a 2 way barrier. 2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf, and we check if the version is unchanged, then a load_acquire (equivalent to std::memory_order_acquire) is not sufficient. We have to also make sure no read is reordered in the other direction. Putting the appropriate std::atomic_thread_fence(...) calls to this 7 places would resolve the issue. However, replacing the current function from atomicops-internals-arm64.h to the c++ standard functions will make the code more consistent. Reason for changing to std::atomics: atomicops-internals-arm64.h was picked from chromium source and the missing functions was reimplemented. The header is gone since, and even chromium uses a solution based on std::atomic (see base/atomicops_internals_portable.h in chromium source). I see no reason to update the header from chromium and implement the missing functions, just to have one more abstraction layer, that is basically just "function aliases" at this point. Reason for __builtin_assume_aligned and reinterpret_cast: Although the nodes are PACKED, We guarantee that they will be aligned to 8 bytes. They can have unaligned values put between them in the same arena, so packing does save memory. If you put std::atomic into a struct, g++ refuses to pack it, so we need a placeholder. Reason for renaming IsLocked to IsLockedUnsafe: In the original version, it was an unsafe memory access. Currently, it is only in places where we hold the lock (which means we have atomic access to the version before and after in the same thread). Also, it is only used in DCHECK() macros. However, I found it disturbing to have an unsafe function similarly named as all the other proper atomic functions around it. I also fixed TestRacyConcurrentInsert, so it won't freez after failing. Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 --- M src/kudu/tablet/cbtree-test.cc M src/kudu/tablet/concurrent_btree.h 2 files changed, 100 insertions(+), 74 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/7 -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Kudu Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 7: Build Started http://jenkins.kudu.apache.org/job/pre_commit/33/ -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 23 Apr 2024 12:54:52 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@46 PS6, Line 46: would resolve the issue > Have you actually tried that or this is just a hypothesis? I actually tried it out. Wanted to ship it originally. It is version 2 of this review (compare version 2 with base). -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Wed, 10 Apr 2024 12:37:27 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33 PS6, Line 33: The following 6 barriers need to be more strict: Did you run the TestCBTree.TestScanPerformance scenario from the cbtree-test before and after this patch on x86_64? What were the results? http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@45 PS6, Line 45: this nit: these http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@46 PS6, Line 46: would resolve the issue Have you actually tried that or this is just a hypothesis? http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@47 PS6, Line 47: c++ nit: C++ http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@72 PS6, Line 72: By removing the packed PACKED we change nothing in the memory layout How so? What's the size of LeafNode and InternalNode after removing the packed attribute? http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h File src/kudu/tablet/concurrent_btree.h: http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@390 PS6, Line 390: AtomicVersionValue VersionAcqRel() { : return version_.load(std::memory_order_acq_rel); : } I'm curios how this make sense if by the specification the behavior of std::atomic::load(std::memory_order_acq_rel) is undefined? Am I missing something? https://en.cppreference.com/w/cpp/atomic/atomic/load -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Mon, 08 Apr 2024 19:06:55 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 6: Do you think a performance analysis is required to ensure there is no significant regression? -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Fri, 29 Mar 2024 14:35:57 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h File src/kudu/tablet/concurrent_btree.h: http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@139 PS6, Line 139: * nit: add space http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@378 PS6, Line 378: StableVersionAcquire Just for my understanding, what is the difference between StableVersionAcquire and VersionAcquire? -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Fri, 29 Mar 2024 13:47:34 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Hello Tidy Bot, Zoltan Chovan, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763), I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21127 to look at the new patch set (#6). Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. [ARM] Concurrent binary tree memory barriers fixed. TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM. The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart from incorrect order, there are no other errors in the tree. All inserted elements are present, but CBTree::GetCopy cannot find some due to the incorrect ordering. This is not a unit test error, but a real error in the CBTree implementation. Modifying the test to only do the inserts first, and only start checking when the tree is finalized does not prevent the bug. So calling only the tree->Insert function from multiple threads is sufficient to reproduce the issue (only on ARM). Root cause: Some memory order restrictions are not strict enough. It does not cause a problem on x86 (There were no real changes for 11 years), but it causes problems on ARM. x86 guarantees a globally consistent order for stores (TSO). ARM, in contrast, allows stores to different memory locations to be observed differently from the program order. More info: https://www.sciencedirect.com/science/article/pii/S1383762124000390 Solution: The following 6 barriers need to be more strict: 1. When we set the Splitting/Inserting flag on a node, then it is not enough to flush all previous changes. It is also very important not to reorder any write before it. So instead of Release_Store (which is practically equivalent to std::memory_order_release), we need a 2 way barrier. 2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we check if the version is unchanged, then a load_acquire (equivalent to std::memory_order_acquire) is not sufficient. We have to also make sure no read is reordered in the other direction. Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places would resolve the issue. However, replacing the current function from atomicops-internals-arm64.h to the c++ standard functions will make the code more consistent. Reason for changing to std::atomics: atomicops-internals-arm64.h was picked from chromium source and the missing functions was reimplemented. The header is gone since, and even chromium uses a solution based on std::atomic (see base/atomicops_internals_portable.h in chromium source). I see no reason to update the header from chromium and implement the missing functions, just to have one more abstraction layer, that is basically just "function aliases" at this point. Reason for removing PACKED: Although the padding of the Nodes was turn of, the following happened (both on ARM and x86): + NodeBase was 16 byte, so there was no padding after the base class. + LeafNode and InternalNode was 249, 244 bytes (used in memrowset and deltamemstore) + When we reserved a new node, we used arena_->AllocateBytesAligned(...), with 8 bytes alignment. So the end result was exactly the same, as if there were no "PACKED" attribute. Also compiler kept warning about " might be unaligned", but it were never unaligned. By removing the packed PACKED we change nothing in the memory layout. (neither on ARM or x86). It just gets rid of the old compiler warning. (std::atomic disable packing anyway). Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 --- M src/kudu/tablet/concurrent_btree.h 1 file changed, 76 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/6 -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h File src/kudu/tablet/concurrent_btree.h: http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h@382 PS5, Line 382: AtomicVersionValue StableVersionAcqRel() { I can rename it (and the other 3) to AcqRelStableVersion. I find this word order better, but it is just my subjective opinion. -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 26 Mar 2024 12:16:44 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h File src/kudu/tablet/concurrent_btree.h: http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h@125 PS5, Line 125: typedef std::uint64_t AtomicVersionValue; // Local value we loaded/will store This is the way std::atomic works (load returns T, store expects T, atomic is not copyable etc.). I think it is much cleaner than the old solution (not that we have a choice if we use std::atomic). -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 26 Mar 2024 12:10:54 + Gerrit-HasComments: Yes
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/21127 ) Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. Patch Set 5: I have nothing to do with the Tidy bot warning about the TODO-s -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 26 Mar 2024 12:06:13 + Gerrit-HasComments: No
[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.
Hello Zoltan Chovan, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763), I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21127 to look at the new patch set (#5). Change subject: [ARM] Concurrent binary tree memory barriers fixed. .. [ARM] Concurrent binary tree memory barriers fixed. TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM. The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart from incorrect order, there are no other errors in the tree. All inserted elements are present, but CBTree::GetCopy cannot find some due to the incorrect ordering. This is not a unit test error, but a real error in the CBTree implementation. Modifying the test to only do the inserts first, and only start checking when the tree is finalized does not prevent the bug. So calling only the tree->Insert function from multiple threads is sufficient to reproduce the issue (only on ARM). Root cause: Some memory order restrictions are not strict enough. It does not cause a problem on x86 (There were no real changes for 11 years), but it causes problems on ARM. x86 guarantees a globally consistent order for stores (TSO). ARM, in contrast, allows stores to different memory locations to be observed differently from the program order. More info: https://www.sciencedirect.com/science/article/pii/S1383762124000390 Solution: The following 6 barriers need to be more strict: 1. When we set the Splitting/Inserting flag on a node, then it is not enough to flush all previous changes. It is also very important not to reorder any write before it. So instead of Release_Store (which is practically equivalent to std::memory_order_release), we need a 2 way barrier. 2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we check if the version is unchanged, then a load_acquire (equivalent to std::memory_order_acquire) is not sufficient. We have to also make sure no read is reordered in the other direction. Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places would resolve the issue. However, replacing the current function from atomicops-internals-arm64.h to the c++ standard functions will make the code more consistent. Reason for changing to std::atomics: atomicops-internals-arm64.h was picked from chromium source and the missing functions was reimplemented. The header is gone since, and even chromium uses a solution based on std::atomic (see base/atomicops_internals_portable.h in chromium source). I see no reason to update the header from chromium and implement the missing functions, just to have one more abstraction layer, that is basically just "function aliases" at this point. Reason for removing PACKED: Although the padding of the Nodes was turn of, the following happened (both on ARM and x86): + NodeBase was 16 byte, so there was no padding after the base class. + LeafNode and InternalNode was 249, 244 bytes (used in memrowset and deltamemstore) + When we reserved a new node, we used arena_->AllocateBytesAligned(...), with 8 bytes alignment. So the end result was exactly the same, as if there were no "PACKED" attribute. Also compiler kept warning about " might be unaligned", but it were never unaligned. By removing the packed PACKED we change nothing in the memory layout. (neither on ARM or x86). It just gets rid of the old compiler warning. (std::atomic disable packing anyway). Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 --- M src/kudu/tablet/concurrent_btree.h 1 file changed, 76 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/5 -- To view, visit http://gerrit.cloudera.org:8080/21127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382 Gerrit-Change-Number: 21127 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Anonymous Coward (763) Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Chovan Gerrit-Reviewer: Zoltan Martonka