[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-05-07 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 9:

> (1 comment)
 >
 > > (1 comment)
 > >
 > > I realized that, the only proper way to add a 2-way load barrier
 > is
 > > with
 > > std::atomic_thread_fence(std::memory_order_acquire), which is
 > more
 > > strict, than
 > > the 1-way std::atomic::load(std::memory_order_acquire);
 > >
 > > I made some performance test. I used TestScanPerformance from
 > > cbtree-test and
 > > TestMemRowSetUpdatePerformance from memrowset-test. I also added
 > 2
 > > more
 > > performance tests, one using basic arena, one using
 > > ThreadSafeMemoryTrackingArena (we only use the 2nd in
 > production).
 > > I will add g++ tests too (as soon as it compiles).
 > > See the branches sheet for name resolution.
 >
 > For some reason, I assumed you were going to add those tests into
 > this patch or as a separate patch, but it seems by 'adding
 >
 > > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing
 >
 > > (1 comment)
 > >
 > > I realized that, the only proper way to add a 2-way load barrier
 > is
 > > with
 > > std::atomic_thread_fence(std::memory_order_acquire), which is
 > more
 > > strict, than
 > > the 1-way std::atomic::load(std::memory_order_acquire);
 > >
 > > I made some performance test. I used TestScanPerformance from
 > > cbtree-test and
 > > TestMemRowSetUpdatePerformance from memrowset-test. I also added
 > 2
 > > more
 > > performance tests, one using basic arena, one using
 > > ThreadSafeMemoryTrackingArena (we only use the 2nd in
 > production).
 > > I will add g++ tests too (as soon as it compiles).
 > > See the branches sheet for name resolution.
 >
 > For some reason, I assumed you were going to add your new tests as
 > a part of this changelist as well, but it seems you just meant
 > adding results from running those tests into the spreadsheet :)
 >
 > OK, maybe it makes sense to add your new tests into Kudu as well?
 > Feel free to post that as a separate changelist.
 >
 > Also, it would be great if you could address other feedback items
 > (nits, questions of other reviewers, etc.) and post a follow-up
 > revision, if necessary.
 >
 > Thanks!
 >
 > > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing

I run the arm tests on a c6g.2xlarge arm machine, and the x86 tests on a 
physical pc in the bp office (I can include the spec if you want). I thought 
only the changes matter (not the comparison of x86 and ARM), however I can try 
to rerun the tests on a similar x86 than the arm machine (c6i.2xlarge?)/

All times are in ms, the value that the google test prints 
(TestMemRowSetUpdatePerformance included).
The (same platform) tests were run on the same machine. One at a time through 
ssh. No other task was taking place (I even closed VS Code to prevent it from 
doing any background task).
x86_master_clang_2 and x86_master_clang are the same, only 3 hours apart (And I 
have no idea why the second run is a bit slower).

My new tests are nearly the same as "TestConcurrentInsert", just using the 
ThreadSafeMemoryTrackingArena and the sizes that production actually uses.
I could add a test with less inserts using this parameters. (If the tree works 
while it is small, it will work, when it is big, because threads will "collide" 
less often). I will post a separate gerrit for this test (Same as 
TestPerformanceThreadSafe, just runs <1s).


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 07 May 2024 13:14:22 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-05-03 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 10:

(1 comment)

> (1 comment)
 >
 > I realized that, the only proper way to add a 2-way load barrier is
 > with
 > std::atomic_thread_fence(std::memory_order_acquire), which is more
 > strict, than
 > the 1-way std::atomic::load(std::memory_order_acquire);
 >
 > I made some performance test. I used TestScanPerformance from
 > cbtree-test and
 > TestMemRowSetUpdatePerformance from memrowset-test. I also added 2
 > more
 > performance tests, one using basic arena, one using
 > ThreadSafeMemoryTrackingArena (we only use the 2nd in production).
 > I will add g++ tests too (as soon as it compiles).
 > See the branches sheet for name resolution.

For some reason, I assumed you were going to add those tests into this patch or 
as a separate patch, but it seems by 'adding

 > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing

 > (1 comment)
 >
 > I realized that, the only proper way to add a 2-way load barrier is
 > with
 > std::atomic_thread_fence(std::memory_order_acquire), which is more
 > strict, than
 > the 1-way std::atomic::load(std::memory_order_acquire);
 >
 > I made some performance test. I used TestScanPerformance from
 > cbtree-test and
 > TestMemRowSetUpdatePerformance from memrowset-test. I also added 2
 > more
 > performance tests, one using basic arena, one using
 > ThreadSafeMemoryTrackingArena (we only use the 2nd in production).
 > I will add g++ tests too (as soon as it compiles).
 > See the branches sheet for name resolution.

For some reason, I assumed you were going to add your new tests as a part of 
this changelist as well, but it seems you just meant adding results from 
running those tests into the spreadsheet :)

OK, maybe it makes sense to add your new tests into Kudu as well?  Feel free to 
post that as a separate changelist.

Also, it would be great if you could address other feedback items (nits, 
questions of other reviewers, etc.) and post a follow-up revision, if necessary.

Thanks!

 > https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33
PS6, Line 33: The following 6 barriers need to be more strict:
> See the google sheet.
Thanks for posting the numbers.  A few questions:
  * What were the machines that x86_64 and ARM tests were run at?  How close 
they are in terms of processing power (frequency, number of CPU cores)?
  * Do I understand correctly that both the x86_64 and ARM machines were the 
same for all the tests, just different binaries were run?
  * I'm not sure I understand what 'x86_master_clang_2' at the Summary tab is 
for?  How is it different from 'x86_master_clang'?
  * The numbers posted at the Summary tab: are those seconds of test runtime or 
that's some completely different (e.g., number scanned/inserted keys), or for 
different tests they are of different units?
  * The numbers posted at the Summary tab for the 
'TestMemRowSetUpdatePerformance' test: what are those?  Are those just total 
runtime or some particular timing report part (inserting, counting, updating) 
of the test?

Also, could you run existing corresponding Kudu tests (you can run you new 
tests as well, of course) prior and post your gerrit patch under the 'perf' 
utility (say, 10 runs) and report the results.  No need to put them into a 
spreadsheet, just present the output from the 'perf stat' utility.  You can 
browse 'git log' and search for the examples how running those tests is usually 
done.  You might also be interested in perf-mem and may be in perf-lock, but 
reports just from generic 'perf' would be great already.

Thanks a lot!

https://man7.org/linux/man-pages/man1/perf.1.html



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Sat, 04 May 2024 03:06:59 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 10: Verified-1

Build Failed

http://jenkins.kudu.apache.org/job/pre_commit/38/ : FAILURE


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 16:06:26 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Zoltan Martonka (Code Review)
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, 
Anonymous Coward (763),

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#10).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart
from incorrect order, there are no other errors in the tree. All inserted
elements are present, but CBTree::GetCopy cannot find some due to the incorrect
ordering.

This is not a unit test error, but a real error in the CBTree implementation.
Modifying the test to only do the inserts first, and only start checking when
the tree is finalized does not prevent the bug. So calling only the tree->Insert
function from multiple threads is sufficient to reproduce the issue (only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores (TSO).
ARM, in contrast, allows stores to different memory locations to be observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 6 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not enough
  to flush all previous changes. It is also very important not to reorder any
  write before it. So instead of Release_Store (which is practically equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we
  check if the version is unchanged, then a load_acquire (equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the c++ standard functions will make the code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the missing
functions was reimplemented. The header is gone since, and even chromium uses a
solution based on std::atomic (see base/atomicops_internals_portable.h in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction layer, that
is basically just "function aliases" at this point.

Reason for removing PACKED:

Although the padding of the Nodes was turn of, the following happened (both on
ARM and x86):
+ NodeBase was 16 byte, so there was no padding after the base class.
+ LeafNode and InternalNode was 249, 244 bytes (used in memrowset and
  deltamemstore)
+ When we reserved a new node, we used arena_->AllocateBytesAligned(...), with
  8 bytes alignment.

So the end result was exactly the same, as if there were no "PACKED" attribute.
Also compiler kept warning about " might be unaligned", but it were
never unaligned.
By removing the packed PACKED we change nothing in the memory layout. (neither
on ARM or x86). It just gets rid of the old compiler warning. (std::atomic
disable packing anyway).

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/cbtree-test.cc
M src/kudu/tablet/concurrent_btree.h
2 files changed, 101 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/10
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 10:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/38/


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 15:15:37 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 9: Verified-1

Build Failed

http://jenkins.kudu.apache.org/job/pre_commit/35/ : FAILURE


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 13:04:30 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 6:

(1 comment)

I realized that, the only proper way to add a 2-way load barrier is with
std::atomic_thread_fence(std::memory_order_acquire), which is more strict, than
the 1-way std::atomic::load(std::memory_order_acquire);

I made some performance test. I used TestScanPerformance from cbtree-test and
TestMemRowSetUpdatePerformance from memrowset-test. I also added 2 more
performance tests, one using basic arena, one using
ThreadSafeMemoryTrackingArena (we only use the 2nd in production).
I will add g++ tests too (as soon as it compiles).
See the branches sheet for name resolution.

https://docs.google.com/spreadsheets/d/1whl8h7S0DcjVYp0jksR87P9v4U8TQH9nS1C9N0D-lUo/edit?usp=sharing

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33
PS6, Line 33: The following 6 barriers need to be more strict:
> Did you run the TestCBTree.TestScanPerformance scenario from the cbtree-tes
See the google sheet.



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 13:00:06 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Zoltan Martonka (Code Review)
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, 
Anonymous Coward (763),

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#9).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered
nodes. Apart
from incorrect order, there are no other errors in the tree. All
inserted
elements are present, but CBTree::GetCopy cannot find some due to the
incorrect
ordering.

This is not a unit test error, but a real error in the CBTree
implementation.
Modifying the test to only do the inserts first, and only start checking
when
the tree is finalized does not prevent the bug. So calling only the
tree->Insert
function from multiple threads is sufficient to reproduce the issue
(only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause
a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores
(TSO).
ARM, in contrast, allows stores to different memory locations to be
observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 7 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not
enough
  to flush all previous changes. It is also very important not to
reorder any
  write before it. So instead of Release_Store (which is practically
equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in
SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf,
  and we check if the version is unchanged, then a load_acquire
(equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make
sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to these 7
places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the C++ standard functions will make the
code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the
missing
functions was reimplemented. The header is gone since, and even chromium
uses a
solution based on std::atomic (see base/atomicops_internals_portable.h
in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction
layer, that
is basically just "function aliases" at this point.

Reason for __builtin_assume_aligned and reinterpret_cast:

Although the nodes are PACKED, We guarantee that they will be aligned to
8
bytes. They can have unaligned values put between them in the same
arena, so
packing does save memory. If you put std::atomic into a struct, g++
refuses to
pack it, so we need a placeholder.

Reason for renaming IsLocked to IsLockedUnsafe:
In the original version, it was an unsafe memory access. Currently, it
is only
in places where we hold the lock (which means we have atomic access to
the
version before and after in the same thread). Also, it is only used in
DCHECK()
macros. However, I found it disturbing to have an unsafe function
similarly
named as all the other proper atomic functions around it.

I also fixed TestRacyConcurrentInsert, so it won't freez after failing.

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/cbtree-test.cc
M src/kudu/tablet/concurrent_btree.h
2 files changed, 100 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/9
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Zoltan Martonka (Code Review)
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, 
Anonymous Coward (763),

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#8).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered
nodes. Apart
from incorrect order, there are no other errors in the tree. All
inserted
elements are present, but CBTree::GetCopy cannot find some due to the
incorrect
ordering.

This is not a unit test error, but a real error in the CBTree
implementation.
Modifying the test to only do the inserts first, and only start checking
when
the tree is finalized does not prevent the bug. So calling only the
tree->Insert
function from multiple threads is sufficient to reproduce the issue
(only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause
a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores
(TSO).
ARM, in contrast, allows stores to different memory locations to be
observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 7 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not
enough
  to flush all previous changes. It is also very important not to
reorder any
  write before it. So instead of Release_Store (which is practically
equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in
SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf,
  and we check if the version is unchanged, then a load_acquire
(equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make
sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to this 7
places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the C++ standard functions will make the
code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the
missing
functions was reimplemented. The header is gone since, and even chromium
uses a
solution based on std::atomic (see base/atomicops_internals_portable.h
in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction
layer, that
is basically just "function aliases" at this point.

Reason for __builtin_assume_aligned and reinterpret_cast:

Although the nodes are PACKED, We guarantee that they will be aligned to
8
bytes. They can have unaligned values put between them in the same
arena, so
packing does save memory. If you put std::atomic into a struct, g++
refuses to
pack it, so we need a placeholder.

Reason for renaming IsLocked to IsLockedUnsafe:
In the original version, it was an unsafe memory access. Currently, it
is only
in places where we hold the lock (which means we have atomic access to
the
version before and after in the same thread). Also, it is only used in
DCHECK()
macros. However, I found it disturbing to have an unsafe function
similarly
named as all the other proper atomic functions around it.

I also fixed TestRacyConcurrentInsert, so it won't freez after failing.

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/cbtree-test.cc
M src/kudu/tablet/concurrent_btree.h
2 files changed, 100 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/8
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 8:

Build Aborted

http://jenkins.kudu.apache.org/job/pre_commit/34/ : ABORTED


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 12:59:19 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h
File src/kudu/tablet/concurrent_btree.h:

http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@390
PS6, Line 390:   AtomicVersionValue VersionAcqRel() {
 : return version_.load(std::memory_order_acq_rel);
 :   }
> I'm curios how this make sense if by the specification the behavior of std:
No, I missed it. There is actually no right flag for what we wan't to do here.
We need to use std::atomic_thread_fence(std::memory_order_acquire); (because it 
is 2-way).

To my best knowledge, g++ compiled the undefined behaviour as seq_cst, so it 
worked, but also  slowed down the execution.



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 12:57:02 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 9:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/35/


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 12:59:26 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 8:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/34/


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 12:58:01 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 7:

Build Aborted

http://jenkins.kudu.apache.org/job/pre_commit/33/ : ABORTED


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 12:57:53 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Zoltan Martonka (Code Review)
Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins, 
Anonymous Coward (763),

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#7).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered
nodes. Apart
from incorrect order, there are no other errors in the tree. All
inserted
elements are present, but CBTree::GetCopy cannot find some due to the
incorrect
ordering.

This is not a unit test error, but a real error in the CBTree
implementation.
Modifying the test to only do the inserts first, and only start checking
when
the tree is finalized does not prevent the bug. So calling only the
tree->Insert
function from multiple threads is sufficient to reproduce the issue
(only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause
a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores
(TSO).
ARM, in contrast, allows stores to different memory locations to be
observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 7 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not
enough
  to flush all previous changes. It is also very important not to
reorder any
  write before it. So instead of Release_Store (which is practically
equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in
SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf,
  and we check if the version is unchanged, then a load_acquire
(equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make
sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to this 7
places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the c++ standard functions will make the
code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the
missing
functions was reimplemented. The header is gone since, and even chromium
uses a
solution based on std::atomic (see base/atomicops_internals_portable.h
in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction
layer, that
is basically just "function aliases" at this point.

Reason for __builtin_assume_aligned and reinterpret_cast:

Although the nodes are PACKED, We guarantee that they will be aligned to
8
bytes. They can have unaligned values put between them in the same
arena, so
packing does save memory. If you put std::atomic into a struct, g++
refuses to
pack it, so we need a placeholder.

Reason for renaming IsLocked to IsLockedUnsafe:
In the original version, it was an unsafe memory access. Currently, it
is only
in places where we hold the lock (which means we have atomic access to
the
version before and after in the same thread). Also, it is only used in
DCHECK()
macros. However, I found it disturbing to have an unsafe function
similarly
named as all the other proper atomic functions around it.

I also fixed TestRacyConcurrentInsert, so it won't freez after failing.

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/cbtree-test.cc
M src/kudu/tablet/concurrent_btree.h
2 files changed, 100 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/7
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-23 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 7:

Build Started http://jenkins.kudu.apache.org/job/pre_commit/33/


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 23 Apr 2024 12:54:52 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-10 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@46
PS6, Line 46: would resolve the issue
> Have you actually tried that or this is just a hypothesis?
I actually tried it out. Wanted to ship it originally. It is version 2 of this 
review (compare version 2 with base).



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Wed, 10 Apr 2024 12:37:27 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-04-08 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@33
PS6, Line 33: The following 6 barriers need to be more strict:
Did you run the TestCBTree.TestScanPerformance scenario from the cbtree-test 
before and after this patch on x86_64?  What were the results?


http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@45
PS6, Line 45: this
nit: these


http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@46
PS6, Line 46: would resolve the issue
Have you actually tried that or this is just a hypothesis?


http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@47
PS6, Line 47: c++
nit: C++


http://gerrit.cloudera.org:8080/#/c/21127/6//COMMIT_MSG@72
PS6, Line 72: By removing the packed PACKED we change nothing in the memory 
layout
How so?  What's the size of LeafNode and InternalNode after removing the packed 
attribute?


http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h
File src/kudu/tablet/concurrent_btree.h:

http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@390
PS6, Line 390:   AtomicVersionValue VersionAcqRel() {
 : return version_.load(std::memory_order_acq_rel);
 :   }
I'm curios how this make sense if by the specification the behavior of 
std::atomic::load(std::memory_order_acq_rel) is undefined?  Am I missing 
something?

https://en.cppreference.com/w/cpp/atomic/atomic/load



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Mon, 08 Apr 2024 19:06:55 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-29 Thread Ashwani Raina (Code Review)
Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 6:

Do you think a performance analysis is required to ensure there is no 
significant regression?


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Fri, 29 Mar 2024 14:35:57 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-29 Thread Ashwani Raina (Code Review)
Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h
File src/kudu/tablet/concurrent_btree.h:

http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@139
PS6, Line 139: *
nit: add space


http://gerrit.cloudera.org:8080/#/c/21127/6/src/kudu/tablet/concurrent_btree.h@378
PS6, Line 378: StableVersionAcquire
Just for my understanding, what is the difference between StableVersionAcquire 
and VersionAcquire?



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Fri, 29 Mar 2024 13:47:34 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-26 Thread Zoltan Martonka (Code Review)
Hello Tidy Bot, Zoltan Chovan, Ashwani Raina, Kudu Jenkins, Anonymous Coward 
(763),

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#6).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart
from incorrect order, there are no other errors in the tree. All inserted
elements are present, but CBTree::GetCopy cannot find some due to the incorrect
ordering.

This is not a unit test error, but a real error in the CBTree implementation.
Modifying the test to only do the inserts first, and only start checking when
the tree is finalized does not prevent the bug. So calling only the tree->Insert
function from multiple threads is sufficient to reproduce the issue (only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores (TSO).
ARM, in contrast, allows stores to different memory locations to be observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 6 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not enough
  to flush all previous changes. It is also very important not to reorder any
  write before it. So instead of Release_Store (which is practically equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we
  check if the version is unchanged, then a load_acquire (equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the c++ standard functions will make the code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the missing
functions was reimplemented. The header is gone since, and even chromium uses a
solution based on std::atomic (see base/atomicops_internals_portable.h in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction layer, that
is basically just "function aliases" at this point.

Reason for removing PACKED:

Although the padding of the Nodes was turn of, the following happened (both on
ARM and x86):
+ NodeBase was 16 byte, so there was no padding after the base class.
+ LeafNode and InternalNode was 249, 244 bytes (used in memrowset and
  deltamemstore)
+ When we reserved a new node, we used arena_->AllocateBytesAligned(...), with
  8 bytes alignment.

So the end result was exactly the same, as if there were no "PACKED" attribute.
Also compiler kept warning about " might be unaligned", but it were
never unaligned.
By removing the packed PACKED we change nothing in the memory layout. (neither
on ARM or x86). It just gets rid of the old compiler warning. (std::atomic
disable packing anyway).

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/concurrent_btree.h
1 file changed, 76 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/6
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-26 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h
File src/kudu/tablet/concurrent_btree.h:

http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h@382
PS5, Line 382:   AtomicVersionValue StableVersionAcqRel() {
I can rename it (and the other 3) to AcqRelStableVersion. I find this word 
order better, but it is just my subjective opinion.



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 26 Mar 2024 12:16:44 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-26 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h
File src/kudu/tablet/concurrent_btree.h:

http://gerrit.cloudera.org:8080/#/c/21127/5/src/kudu/tablet/concurrent_btree.h@125
PS5, Line 125: typedef std::uint64_t AtomicVersionValue; // Local value we 
loaded/will store
This is the way std::atomic works (load returns T, store expects T, 
atomic is not copyable etc.). I think it is much cleaner than the old 
solution (not that we have a choice if we use std::atomic).



--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 26 Mar 2024 12:10:54 +
Gerrit-HasComments: Yes


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-26 Thread Zoltan Martonka (Code Review)
Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21127 )

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..


Patch Set 5:

I have nothing to do with the Tidy bot warning about the TODO-s


--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka 
Gerrit-Comment-Date: Tue, 26 Mar 2024 12:06:13 +
Gerrit-HasComments: No


[kudu-CR] [ARM] Concurrent binary tree memory barriers fixed.

2024-03-26 Thread Zoltan Martonka (Code Review)
Hello Zoltan Chovan, Ashwani Raina, Kudu Jenkins, Anonymous Coward (763),

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#5).

Change subject: [ARM] Concurrent binary tree memory barriers fixed.
..

[ARM] Concurrent binary tree memory barriers fixed.

TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart
from incorrect order, there are no other errors in the tree. All inserted
elements are present, but CBTree::GetCopy cannot find some due to the incorrect
ordering.

This is not a unit test error, but a real error in the CBTree implementation.
Modifying the test to only do the inserts first, and only start checking when
the tree is finalized does not prevent the bug. So calling only the tree->Insert
function from multiple threads is sufficient to reproduce the issue (only on
ARM).

Root cause:

Some memory order restrictions are not strict enough. It does not cause a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores (TSO).
ARM, in contrast, allows stores to different memory locations to be observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390

Solution:

The following 6 barriers need to be more strict:

1. When we set the Splitting/Inserting flag on a node, then it is not enough
  to flush all previous changes. It is also very important not to reorder any
  write before it. So instead of Release_Store (which is practically equivalent
  to std::memory_order_release), we need a 2 way barrier.

2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we
  check if the version is unchanged, then a load_acquire (equivalent to
  std::memory_order_acquire) is not sufficient. We have to also make sure no
  read is reordered in the other direction.

Putting the appropriate std::atomic_thread_fence(...) calls to this 6 places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the c++ standard functions will make the code
more consistent.

Reason for changing to std::atomics:

atomicops-internals-arm64.h was picked from chromium source and the missing
functions was reimplemented. The header is gone since, and even chromium uses a
solution based on std::atomic (see base/atomicops_internals_portable.h in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction layer, that
is basically just "function aliases" at this point.

Reason for removing PACKED:

Although the padding of the Nodes was turn of, the following happened (both on
ARM and x86):
+ NodeBase was 16 byte, so there was no padding after the base class.
+ LeafNode and InternalNode was 249, 244 bytes (used in memrowset and
  deltamemstore)
+ When we reserved a new node, we used arena_->AllocateBytesAligned(...), with
  8 bytes alignment.

So the end result was exactly the same, as if there were no "PACKED" attribute.
Also compiler kept warning about " might be unaligned", but it were
never unaligned.
By removing the packed PACKED we change nothing in the memory layout. (neither
on ARM or x86). It just gets rid of the old compiler warning. (std::atomic
disable packing anyway).

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/concurrent_btree.h
1 file changed, 76 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/5
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka 
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Zoltan Chovan 
Gerrit-Reviewer: Zoltan Martonka