[jira] [Updated] (KUDU-3275) RaftConsensusITest.TestSlowFollower is Flaky
[ https://issues.apache.org/jira/browse/KUDU-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3275: -- Attachment: raft_consensus-itest.2.txt.gz > RaftConsensusITest.TestSlowFollower is Flaky > > > Key: KUDU-3275 > URL: https://issues.apache.org/jira/browse/KUDU-3275 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Grant Henke >Priority: Major > Attachments: raft_consensus-itest.2.txt.gz > > > I have seen RaftConsensusITest.TestSlowFollower fail quite a few times with > the following: > {code:java} > I0411 07:15:34.772778 3404 raft_consensus.cc:1200] T > a82c506675e74d62a75e53acd05e86e8 P 8110dc17824943a68d3051e0df96ae5d [term 1 > FOLLOWER]: Deduplicated request from leader. Original: > 1.18320->[1.18321-1.18746] Dedup: 1.18746->[]I0411 07:15:34.772778 3404 > raft_consensus.cc:1200] T a82c506675e74d62a75e53acd05e86e8 P > 8110dc17824943a68d3051e0df96ae5d [term 1 FOLLOWER]: Deduplicated request from > leader. Original: 1.18320->[1.18321-1.18746] Dedup: 1.18746->[]W0411 > 07:15:34.798368 3632 log.cc:502] T a82c506675e74d62a75e53acd05e86e8 P > 8110dc17824943a68d3051e0df96ae5d: Injecting 1096ms of latency in > SegmentAllocator::Sync()W0411 07:15:34.834003 3206 consensus_peers.cc:480] T > a82c506675e74d62a75e53acd05e86e8 P 5698d4de566c4adbae64b6f234c0561d -> Peer > 8110dc17824943a68d3051e0df96ae5d (127.28.159.66:42271): Couldn't send request > to peer 8110dc17824943a68d3051e0df96ae5d. Status: Timed out: UpdateConsensus > RPC to 127.28.159.66:42271 timed out after 0.050s (SENT). This is attempt 1: > this message will repeat every 5th retry.W0411 07:15:35.333277 3366 > tablet_service.cc:2874] Error setting up scanner with request: Service > unavailable: Timed out: could not wait for desired snapshot timestamp to be > consistent: Tablet is lagging too much to be able to serve snapshot scan. > Lagging by: 45366 ms, (max is 3 ms):: new_scan_request { tablet_id: > "a82c506675e74d62a75e53acd05e86e8" projected_columns { name: "key" type: > INT32 is_key: true is_nullable: false } projected_columns { name: "int_val" > type: INT32 is_key: false is_nullable: false } projected_columns { name: > "string_val" type: STRING is_key: false is_nullable: true } read_mode: > READ_AT_SNAPSHOT propagated_timestamp: 6627841138429693952 cache_blocks: true > order_mode: ORDERED row_format_flags: 0 authz_token { token_data: > "\010\210\306\312\203\006\"3\n\005slave\022*\n > 4b6d5a12476d4dfcb5a4ab8a6f1190cb\020\001\030\001 \001(\001" signature: > """\036\312x4\323\300\350\177k\357\226a signing_key_seq_num: 0 } } call_seq_id: 0W0411 07:15:35.635779 3366 > tablet_service.cc:2874] Error setting up scanner with request: Service > unavailable: Timed out: could not wait for desired snapshot timestamp to be > consistent: Tablet is lagging too much to be able to serve snapshot scan. > Lagging by: 45668 ms, (max is 3 ms):: new_scan_request { tablet_id: > "a82c506675e74d62a75e53acd05e86e8" projected_columns { name: "key" type: > INT32 is_key: true is_nullable: false } projected_columns { name: "int_val" > type: INT32 is_key: false is_nullable: false } projected_columns { name: > "string_val" type: STRING is_key: false is_nullable: true } read_mode: > READ_AT_SNAPSHOT propagated_timestamp: 6627841138425311232 cache_blocks: true > order_mode: ORDERED row_format_flags: 0 authz_token { token_data: > "\010\210\306\312\203\006\"3\n\005slave\022*\n > 4b6d5a12476d4dfcb5a4ab8a6f1190cb\020\001\030\001 \001(\001" signature: > """\036\312x4\323\300\350\177k\357\226a signing_key_seq_num: 0 } } call_seq_id: 0W0411 07:15:35.636209 3626 > scanner-internal.cc:406] Time spent opening tablet: real 57.698s user 0.002s > sys 0.000sF0411 07:15:35.636267 3626 test_workload.cc:255] Check failed: > _s.ok() Bad status: Timed out: exceeded configured scan timeout of 60.000s: > after 20 scan attempts: unable to retry before timeout: Remote error: Service > unavailable: Timed out: could not wait for desired snapshot timestamp to be > consistent: Timed out waiting for ts: P: 1618125277942117 usec, L: 0 to be > safe (mode: NON-LEADER). Current safe time: P: 1618125277252906 usec, L: 0 > Physical time difference: 0.689s*** Check failure stack trace: ** Aborted > at 1618125335 (unix time) try "date -d @1618125335" if you are using GNU date > ***PC: @ 0x7fe2e72bafb7 gsignal*** SIGABRT (@0x3e8727d) received by > PID 29309 (TID 0x7fe2d979a700) from PID 29309; stack trace: *** @ > 0x7fe2e97881f1 google::(anonymous namespace)::FailureSignalHandler() at ??:0 > @ 0x7fe2eb20e980 (unknown) at ??:0 @ 0x7fe2e72bafb7 gsignal at > ??:0 @ 0x7fe2e72bc921 abort at ??:0 @ 0x7fe2e9778439 > google::logging_fail() at ??:0 @
[jira] [Created] (KUDU-3275) RaftConsensusITest.TestSlowFollower is Flaky
Grant Henke created KUDU-3275: - Summary: RaftConsensusITest.TestSlowFollower is Flaky Key: KUDU-3275 URL: https://issues.apache.org/jira/browse/KUDU-3275 Project: Kudu Issue Type: Bug Affects Versions: 1.14.0 Reporter: Grant Henke I have seen RaftConsensusITest.TestSlowFollower fail quite a few times with the following: {code:java} I0411 07:15:34.772778 3404 raft_consensus.cc:1200] T a82c506675e74d62a75e53acd05e86e8 P 8110dc17824943a68d3051e0df96ae5d [term 1 FOLLOWER]: Deduplicated request from leader. Original: 1.18320->[1.18321-1.18746] Dedup: 1.18746->[]I0411 07:15:34.772778 3404 raft_consensus.cc:1200] T a82c506675e74d62a75e53acd05e86e8 P 8110dc17824943a68d3051e0df96ae5d [term 1 FOLLOWER]: Deduplicated request from leader. Original: 1.18320->[1.18321-1.18746] Dedup: 1.18746->[]W0411 07:15:34.798368 3632 log.cc:502] T a82c506675e74d62a75e53acd05e86e8 P 8110dc17824943a68d3051e0df96ae5d: Injecting 1096ms of latency in SegmentAllocator::Sync()W0411 07:15:34.834003 3206 consensus_peers.cc:480] T a82c506675e74d62a75e53acd05e86e8 P 5698d4de566c4adbae64b6f234c0561d -> Peer 8110dc17824943a68d3051e0df96ae5d (127.28.159.66:42271): Couldn't send request to peer 8110dc17824943a68d3051e0df96ae5d. Status: Timed out: UpdateConsensus RPC to 127.28.159.66:42271 timed out after 0.050s (SENT). This is attempt 1: this message will repeat every 5th retry.W0411 07:15:35.333277 3366 tablet_service.cc:2874] Error setting up scanner with request: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Tablet is lagging too much to be able to serve snapshot scan. Lagging by: 45366 ms, (max is 3 ms):: new_scan_request { tablet_id: "a82c506675e74d62a75e53acd05e86e8" projected_columns { name: "key" type: INT32 is_key: true is_nullable: false } projected_columns { name: "int_val" type: INT32 is_key: false is_nullable: false } projected_columns { name: "string_val" type: STRING is_key: false is_nullable: true } read_mode: READ_AT_SNAPSHOT propagated_timestamp: 6627841138429693952 cache_blocks: true order_mode: ORDERED row_format_flags: 0 authz_token { token_data: "\010\210\306\312\203\006\"3\n\005slave\022*\n 4b6d5a12476d4dfcb5a4ab8a6f1190cb\020\001\030\001 \001(\001" signature: """\036\312x4\323\300\350\177k\357\226a""\036\312x4\323\300\350\177k\357\226a
[jira] [Commented] (KUDU-3274) Buffer overflow in SASL
[ https://issues.apache.org/jira/browse/KUDU-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322189#comment-17322189 ] ASF subversion and git services commented on KUDU-3274: --- Commit 5cd8d574c020925e8257dc6d11af4ee516f329b7 in kudu's branch refs/heads/master from Attila Bukor [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=5cd8d57 ] KUDU-3274 Ignore buffer overflow in libsasl We recently added a few test cases where the client negotiation fails with this error (which is what we expect): GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server kudu/127.6.40@krbtest.com not found in Kerberos database) Apparently SASL doesn't allocate enough memory for this error message in some cases which causes these tests to be flaky with a ~20% error rate with AddressSanitizer enabled: ==9298==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60e3e2d6 at pc 0x00530bf4 bp 0x7f8eb50ad0f0 sp 0x7f8eb50ac8a0 READ of size 151 at 0x60e3e2d6 thread T88 (client-negotiat) #0 0x530bf3 in __interceptor_strlen.part.35 sanitizer_common/sanitizer_common_interceptors.inc:365:5 #1 0x7f8ee6ad9ee8 in std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char const*) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x113ee8) #2 0x7f8eeb7c9c9b in kudu::rpc::SaslLogCallback(void*, int, char const*) ../src/kudu/rpc/sasl_common.cc:102:29 #3 0x7f8eeb30241c in sasl_seterror (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x1441c) #4 0x7f8edd8f143d in _init (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x243d) #5 0x7f8edd8f2452 in _init (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x3452) #6 0x7f8eeb2f7844 in sasl_client_step (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9844) #7 0x7f8eeb2f7bc5 in sasl_client_start (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9bc5) #8 0x7f8eeb678679 in kudu::rpc::ClientNegotiation::SendSaslInitiate()::$_1::operator()() const ../src/kudu/rpc/client_negotiation.cc:594:14 #9 0x7f8eeb67831c in std::_Function_handler::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:282:9 #10 0x7f8ef3b28220 in std::function::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #11 0x7f8eeb7c5840 in kudu::rpc::WrapSaslCall(sasl_conn*, std::function const&, char const*) ../src/kudu/rpc/sasl_common.cc:341:12 #12 0x7f8eeb67363b in kudu::rpc::ClientNegotiation::SendSaslInitiate() ../src/kudu/rpc/client_negotiation.cc:593:20 #13 0x7f8eeb66e0c7 in kudu::rpc::ClientNegotiation::AuthenticateBySasl(kudu::faststring*, std::unique_ptr >*) ../src/kudu/rpc/client_negotiation.cc:523:14 #14 0x7f8eeb667b99 in kudu::rpc::ClientNegotiation::Negotiate(std::unique_ptr >*) ../src/kudu/rpc/client_negotiation.cc:220:7 #15 0x7f8eeb715027 in kudu::rpc::DoClientNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime, std::unique_ptr >*) ../src/kudu/rpc/negotiation.cc:218:3 #16 0x7f8eeb712095 in kudu::rpc::Negotiation::RunNegotiation(scoped_refptr const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) ../src/kudu/rpc/negotiation.cc:295:9 #17 0x7f8eeb74d4ad in kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr const&)::$_1::operator()() const ../src/kudu/rpc/reactor.cc:614:3 #18 0x7f8eeb74d06c in std::_Function_handler const&)::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:297:2 #19 0x71b760 in std::function::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #20 0x7f8ee917d03d in kudu::ThreadPool::DispatchThread() ../src/kudu/util/threadpool.cc:669:7 #21 0x7f8ee91817dc in kudu::ThreadPool::CreateThread()::$_1::operator()() const ../src/kudu/util/threadpool.cc:742:48 #22 0x7f8ee918162c in std::_Function_handler::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:297:2 #23 0x71b760 in std::function::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #24 0x7f8ee915660a in kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:674:3 #25 0x7f8eec6106da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) #26 0x7f8ee64de71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) 0x60e3e2d6 is located 0 bytes to the right of 150-byte region [0x60e3e240,0x60e3e2d6) allocated by thread T88 (client-negotiat) here: #0 0x5a4bb8 in malloc /home/abukor/src/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3 #1 0x7f8eeb2fa1df in _buf_alloc (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0xc1df) This patch suppresses address sanitizer errors in
[jira] [Commented] (KUDU-2612) Implement multi-row transactions
[ https://issues.apache.org/jira/browse/KUDU-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321924#comment-17321924 ] ASF subversion and git services commented on KUDU-2612: --- Commit ee79cdfa9906d14a63d4ec4b487c6eece77cc50f in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ee79cdf ] KUDU-2612: an extra test for txn keepalive failover in Java client This is a follow-up to 096f1ddf09047ea11d78a661010dd549ffa9af51. This patchs adds an extra test scenario similar the one added in the prior changelist, but with additional twist of "rolling" unavailability of leader masters. In addition, it verifies that RPC error responses from TxnManager due to the unavailability of TxnStatusManager are properly handled by the Java client. Change-Id: Ib278d402bee85fb0442cbce98b2b4ab09eb4 Reviewed-on: http://gerrit.cloudera.org:8080/17321 Reviewed-by: Andrew Wong Tested-by: Kudu Jenkins > Implement multi-row transactions > > > Key: KUDU-2612 > URL: https://issues.apache.org/jira/browse/KUDU-2612 > Project: Kudu > Issue Type: Task >Reporter: Mike Percy >Priority: Major > Labels: roadmap-candidate > > Tracking Jira to implement multi-row / multi-table transactions in Kudu. -- This message was sent by Atlassian Jira (v8.3.4#803005)