Hello Andrew Wong,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/16863
to review the following change.
Change subject: [client] make metacache reset safe
......................................................................
[client] make metacache reset safe
I noticed that the newly added TxnManagerTest.BeginManyTransactions
test scenario started failing with ASAN heap-use-after-free warnings.
After looking a that, it turned out that the original code was assuming
the cache wouldn't be ever reset before calling the MetaCache's
destructor. However, changelist 232474a51 introduced a new method
MetaCache::ClearCache() and since then the method is being called upon
altering a table if the partitioning scheme has been updated.
This patch resolves the issue by introducing so-called tablet server
registry that's never reset indeed, where entries in the tablet server
cache are just references to the entries in the registry (they are
raw pointers, actually).
The newly added test scenario was reliably producing AddressSanitizer's
heap-use-after-free warnings every time I ran it using ASAN build.
Below is a snapshot of the relevant traces captured when running the
new test scenario without the changes in the client metacache.
AddressSanitizer: heap-use-after-free on address 0x608000129e20 at pc
0x00000078bd54 bp 0x7fa731d0b240 sp 0x7fa731d0b238
READ of size 4 at 0x608000129e20 thread T149 (rpc reactor-146)
#0 0x78bd53 in base::subtle::NoBarrier_Load(int const volatile*)
src/kudu/gutil/atomicops-internals-x86.h:200:10
#1 0x7fa79520e227 in base::SpinLock::SpinLoop(long, int*)
src/kudu/gutil/spinlock.cc:86:10
#2 0x7fa79520e38b in base::SpinLock::SlowLock()
src/kudu/gutil/spinlock.cc:104:25
#3 0x7fa7a099aab0 in std::unique_lock<kudu::simple_spinlock>::lock()
../../../include/c++/8/bits/std_mutex.h:267:17
#4 0x7fa7a0991e3e in
std::unique_lock<kudu::simple_spinlock>::unique_lock(kudu::simple_spinlock&)
../../../include/c++/8/bits/std_mutex.h:197:2
#5 0x7fa7a0abfda1 in
kudu::client::internal::RemoteTabletServer::InitProxy(kudu::client::KuduClient*,
std::function<void (kudu::Status const&)> const&)
src/kudu/client/meta_cache.cc:145:39
#6 0x7fa7a0ac60f5 in
kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void
(kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&,
kudu::MonoTime const&) src/kudu/client/meta_cache.cc:524:11
#7 0x7fa7a09b2dcf in
kudu::rpc::RetriableRpc<kudu::client::internal::RemoteTabletServer,
kudu::tserver::WriteRequestPB, kudu::tserver::WriteResponsePB>::SendRpc()
src/kudu/rpc/retriable_rpc.h:163:19
#8 0x7fa7a09ac6cc in
kudu::client::internal::Batcher::FlushBuffer(kudu::client::internal::RemoteTablet*,
std::vector<kudu::client::internal::InFlightOp*,
std::allocator<kudu::client::internal::InFlightOp*> > const&)
src/kudu/client/batcher.cc:911:8
#9 0x7fa7a09a9e38 in
kudu::client::internal::Batcher::FlushBuffersIfReady()
src/kudu/client/batcher.cc:884:5
#10 0x7fa7a09abd2d in
kudu::client::internal::Batcher::TabletLookupFinished(kudu::client::internal::InFlightOp*,
kudu::Status const&) src/kudu/client/batcher.cc:851:3
#11 0x7fa7a0acec66 in
kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&)
src/kudu/client/meta_cache.cc:923:3
#12 0x7fa7a0ab8800 in
kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()()
const src/kudu/client/master_proxy_rpc.cc:130:26
0x608000129e20 is located 0 bytes inside of 96-byte region
[0x608000129e20,0x608000129e80)
freed by thread T163 here:
#0 0x65c650 in operator delete(void*)
thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:160:399
#1 0x7fa7a0aec859 in void
STLDeleteContainerPairSecondPointers<std::__detail::_Node_iterator<std::pair<std::string
const, kudu::client::internal::RemoteTabletServer*>, false, true>
>(std::__detail::_Node_iterator<std::pair<std::string const,
kudu::client::internal::RemoteTabletServer*>, false, true>,
std::__detail::_Node_iterator<std::pair<std::string const,
kudu::client::internal::RemoteTabletServer*>, false, true>)
src/kudu/gutil/stl_util.h:199:5
#2 0x7fa7a0ad96d1 in void STLDeleteValues<std::unordered_map<std::string,
kudu::client::internal::RemoteTabletServer*, std::hash<std::string>,
std::equal_to<std::string>, std::allocator<std::pair<std::string const,
kudu::client::internal::RemoteTabletServer*> > >
>(std::unordered_map<std::string, kudu::client::internal::RemoteTabletServer*,
std::hash<std::string>, std::equal_to<std::string>,
std::allocator<std::pair<std::string const,
kudu::client::internal::RemoteTabletServer*> > >*)
src/kudu/gutil/stl_util.h:400:3
#3 0x7fa7a0ad4188 in kudu::client::internal::MetaCache::ClearCache()
src/kudu/client/meta_cache.cc:1257:3
previously allocated by thread T149 (rpc reactor-146) here:
#0 0x65bc98 in operator new(unsigned long)
thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:99:386
#1 0x7fa7a0ac7bcb in
kudu::client::internal::MetaCache::UpdateTabletServerUnlocked(kudu::master::TSInfoPB
const&) src/kudu/client/meta_cache.cc:596:48
#2 0x7fa7a0ad0802 in
kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse(kudu::client::KuduTable
const*, std::string const&, bool, kudu::master::GetTableLocationsResponsePB
const&, kudu::client::internal::MetaCacheEntry*, int)
src/kudu/client/meta_cache.cc:1030:7
#3 0x7fa7a0acf9c0 in
kudu::client::internal::MetaCache::ProcessLookupResponse(kudu::client::internal::LookupRpc
const&, kudu::client::internal::MetaCacheEntry*, int)
src/kudu/client/meta_cache.cc:941:10
#4 0x7fa7a0ace64e in
kudu::client::internal::LookupRpc::SendRpcCb(kudu::Status const&)
src/kudu/client/meta_cache.cc:911:31
#5 0x7fa7a0ab8800 in
kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
kudu::master::GetTableLocationsResponsePB>::SendRpc()::'lambda'()::operator()()
const src/kudu/client/master_proxy_rpc.cc:130:26
#6 0x7fa79b2c1620 in kudu::rpc::OutboundCall::CallCallback()
src/kudu/rpc/outbound_call.cc:274:5
#7 0x7fa79b2c1af0 in
kudu::rpc::OutboundCall::SetResponse(std::unique_ptr<kudu::rpc::CallResponse,
std::default_delete<kudu::rpc::CallResponse> >)
src/kudu/rpc/outbound_call.cc:306:5
#8 0x7fa79b26ef5e in
kudu::rpc::Connection::HandleCallResponse(std::unique_ptr<kudu::rpc::InboundTransfer,
std::default_delete<kudu::rpc::InboundTransfer> >)
src/kudu/rpc/connection.cc:735:14
#9 0x7fa79b26e0d6 in kudu::rpc::Connection::ReadHandler(ev::io&, int)
src/kudu/rpc/connection.cc:673:7
SUMMARY: AddressSanitizer: heap-use-after-free
src/kudu/gutil/atomicops-internals-x86.h:200:10 in
base::subtle::NoBarrier_Load(int const volatile*)
Shadow bytes around the buggy address:
0x0c108001d370: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c108001d380: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c108001d390: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c108001d3a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c108001d3b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
=>0x0c108001d3c0: fa fa fa fa[fd]fd fd fd fd fd fd fd fd fd fd fd
0x0c108001d3d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c108001d3e0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c108001d3f0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c108001d400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c108001d410: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Reviewed-on: http://gerrit.cloudera.org:8080/16839
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Andrew Wong <[email protected]>
(cherry picked from commit ae45cd15c2703e3cfaf62aabb20ba79b961b3135)
Conflicts:
src/kudu/client/client-test.cc
src/kudu/client/client.h
src/kudu/client/meta_cache.cc
src/kudu/client/meta_cache.h
(cherry picked from commit b845d2fcbfd3e2c7ac1ae22d5d7fe75ebefb63dc)
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
4 files changed, 67 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/16863/1
--
To view, visit http://gerrit.cloudera.org:8080/16863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: branch-1.12.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I03ec9318526fbfc2da9b068eb3bbd9cd996efbca
Gerrit-Change-Number: 16863
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>