[jira] [Updated] (KUDU-3275) RaftConsensusITest.TestSlowFollower is Flaky

2021-04-15 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3275:
--
Attachment: raft_consensus-itest.2.txt.gz

> RaftConsensusITest.TestSlowFollower is Flaky
> 
>
> Key: KUDU-3275
> URL: https://issues.apache.org/jira/browse/KUDU-3275
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Grant Henke
>Priority: Major
> Attachments: raft_consensus-itest.2.txt.gz
>
>
> I have seen RaftConsensusITest.TestSlowFollower fail quite a few times with 
> the following: 
> {code:java}
> I0411 07:15:34.772778  3404 raft_consensus.cc:1200] T 
> a82c506675e74d62a75e53acd05e86e8 P 8110dc17824943a68d3051e0df96ae5d [term 1 
> FOLLOWER]: Deduplicated request from leader. Original: 
> 1.18320->[1.18321-1.18746]   Dedup: 1.18746->[]I0411 07:15:34.772778  3404 
> raft_consensus.cc:1200] T a82c506675e74d62a75e53acd05e86e8 P 
> 8110dc17824943a68d3051e0df96ae5d [term 1 FOLLOWER]: Deduplicated request from 
> leader. Original: 1.18320->[1.18321-1.18746]   Dedup: 1.18746->[]W0411 
> 07:15:34.798368  3632 log.cc:502] T a82c506675e74d62a75e53acd05e86e8 P 
> 8110dc17824943a68d3051e0df96ae5d: Injecting 1096ms of latency in 
> SegmentAllocator::Sync()W0411 07:15:34.834003  3206 consensus_peers.cc:480] T 
> a82c506675e74d62a75e53acd05e86e8 P 5698d4de566c4adbae64b6f234c0561d -> Peer 
> 8110dc17824943a68d3051e0df96ae5d (127.28.159.66:42271): Couldn't send request 
> to peer 8110dc17824943a68d3051e0df96ae5d. Status: Timed out: UpdateConsensus 
> RPC to 127.28.159.66:42271 timed out after 0.050s (SENT). This is attempt 1: 
> this message will repeat every 5th retry.W0411 07:15:35.333277  3366 
> tablet_service.cc:2874] Error setting up scanner with request: Service 
> unavailable: Timed out: could not wait for desired snapshot timestamp to be 
> consistent: Tablet is lagging too much to be able to serve snapshot scan. 
> Lagging by: 45366 ms, (max is 3 ms):: new_scan_request { tablet_id: 
> "a82c506675e74d62a75e53acd05e86e8" projected_columns { name: "key" type: 
> INT32 is_key: true is_nullable: false } projected_columns { name: "int_val" 
> type: INT32 is_key: false is_nullable: false } projected_columns { name: 
> "string_val" type: STRING is_key: false is_nullable: true } read_mode: 
> READ_AT_SNAPSHOT propagated_timestamp: 6627841138429693952 cache_blocks: true 
> order_mode: ORDERED row_format_flags: 0 authz_token { token_data: 
> "\010\210\306\312\203\006\"3\n\005slave\022*\n 
> 4b6d5a12476d4dfcb5a4ab8a6f1190cb\020\001\030\001 \001(\001" signature: 
> """\036\312x4\323\300\350\177k\357\226a  signing_key_seq_num: 0 } } call_seq_id: 0W0411 07:15:35.635779  3366 
> tablet_service.cc:2874] Error setting up scanner with request: Service 
> unavailable: Timed out: could not wait for desired snapshot timestamp to be 
> consistent: Tablet is lagging too much to be able to serve snapshot scan. 
> Lagging by: 45668 ms, (max is 3 ms):: new_scan_request { tablet_id: 
> "a82c506675e74d62a75e53acd05e86e8" projected_columns { name: "key" type: 
> INT32 is_key: true is_nullable: false } projected_columns { name: "int_val" 
> type: INT32 is_key: false is_nullable: false } projected_columns { name: 
> "string_val" type: STRING is_key: false is_nullable: true } read_mode: 
> READ_AT_SNAPSHOT propagated_timestamp: 6627841138425311232 cache_blocks: true 
> order_mode: ORDERED row_format_flags: 0 authz_token { token_data: 
> "\010\210\306\312\203\006\"3\n\005slave\022*\n 
> 4b6d5a12476d4dfcb5a4ab8a6f1190cb\020\001\030\001 \001(\001" signature: 
> """\036\312x4\323\300\350\177k\357\226a  signing_key_seq_num: 0 } } call_seq_id: 0W0411 07:15:35.636209  3626 
> scanner-internal.cc:406] Time spent opening tablet: real 57.698s user 0.002s 
> sys 0.000sF0411 07:15:35.636267  3626 test_workload.cc:255] Check failed: 
> _s.ok() Bad status: Timed out: exceeded configured scan timeout of 60.000s: 
> after 20 scan attempts: unable to retry before timeout: Remote error: Service 
> unavailable: Timed out: could not wait for desired snapshot timestamp to be 
> consistent: Timed out waiting for ts: P: 1618125277942117 usec, L: 0 to be 
> safe (mode: NON-LEADER). Current safe time: P: 1618125277252906 usec, L: 0 
> Physical time difference: 0.689s*** Check failure stack trace: ** Aborted 
> at 1618125335 (unix time) try "date -d @1618125335" if you are using GNU date 
> ***PC: @     0x7fe2e72bafb7 gsignal*** SIGABRT (@0x3e8727d) received by 
> PID 29309 (TID 0x7fe2d979a700) from PID 29309; stack trace: ***    @     
> 0x7fe2e97881f1 google::(anonymous namespace)::FailureSignalHandler() at ??:0  
>   @     0x7fe2eb20e980 (unknown) at ??:0    @     0x7fe2e72bafb7 gsignal at 
> ??:0    @     0x7fe2e72bc921 abort at ??:0    @     0x7fe2e9778439 
> google::logging_fail() at ??:0    @     

[jira] [Created] (KUDU-3275) RaftConsensusITest.TestSlowFollower is Flaky

2021-04-15 Thread Grant Henke (Jira)
Grant Henke created KUDU-3275:
-

 Summary: RaftConsensusITest.TestSlowFollower is Flaky
 Key: KUDU-3275
 URL: https://issues.apache.org/jira/browse/KUDU-3275
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Grant Henke


I have seen RaftConsensusITest.TestSlowFollower fail quite a few times with the 
following: 
{code:java}
I0411 07:15:34.772778  3404 raft_consensus.cc:1200] T 
a82c506675e74d62a75e53acd05e86e8 P 8110dc17824943a68d3051e0df96ae5d [term 1 
FOLLOWER]: Deduplicated request from leader. Original: 
1.18320->[1.18321-1.18746]   Dedup: 1.18746->[]I0411 07:15:34.772778  3404 
raft_consensus.cc:1200] T a82c506675e74d62a75e53acd05e86e8 P 
8110dc17824943a68d3051e0df96ae5d [term 1 FOLLOWER]: Deduplicated request from 
leader. Original: 1.18320->[1.18321-1.18746]   Dedup: 1.18746->[]W0411 
07:15:34.798368  3632 log.cc:502] T a82c506675e74d62a75e53acd05e86e8 P 
8110dc17824943a68d3051e0df96ae5d: Injecting 1096ms of latency in 
SegmentAllocator::Sync()W0411 07:15:34.834003  3206 consensus_peers.cc:480] T 
a82c506675e74d62a75e53acd05e86e8 P 5698d4de566c4adbae64b6f234c0561d -> Peer 
8110dc17824943a68d3051e0df96ae5d (127.28.159.66:42271): Couldn't send request 
to peer 8110dc17824943a68d3051e0df96ae5d. Status: Timed out: UpdateConsensus 
RPC to 127.28.159.66:42271 timed out after 0.050s (SENT). This is attempt 1: 
this message will repeat every 5th retry.W0411 07:15:35.333277  3366 
tablet_service.cc:2874] Error setting up scanner with request: Service 
unavailable: Timed out: could not wait for desired snapshot timestamp to be 
consistent: Tablet is lagging too much to be able to serve snapshot scan. 
Lagging by: 45366 ms, (max is 3 ms):: new_scan_request { tablet_id: 
"a82c506675e74d62a75e53acd05e86e8" projected_columns { name: "key" type: INT32 
is_key: true is_nullable: false } projected_columns { name: "int_val" type: 
INT32 is_key: false is_nullable: false } projected_columns { name: "string_val" 
type: STRING is_key: false is_nullable: true } read_mode: READ_AT_SNAPSHOT 
propagated_timestamp: 6627841138429693952 cache_blocks: true order_mode: 
ORDERED row_format_flags: 0 authz_token { token_data: 
"\010\210\306\312\203\006\"3\n\005slave\022*\n 
4b6d5a12476d4dfcb5a4ab8a6f1190cb\020\001\030\001 \001(\001" signature: 
"""\036\312x4\323\300\350\177k\357\226a""\036\312x4\323\300\350\177k\357\226a

[jira] [Commented] (KUDU-3274) Buffer overflow in SASL

2021-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322189#comment-17322189
 ] 

ASF subversion and git services commented on KUDU-3274:
---

Commit 5cd8d574c020925e8257dc6d11af4ee516f329b7 in kudu's branch 
refs/heads/master from Attila Bukor
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=5cd8d57 ]

KUDU-3274 Ignore buffer overflow in libsasl

We recently added a few test cases where the client negotiation fails
with this error (which is what we expect):

GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information 
(Server kudu/127.6.40@krbtest.com not found in Kerberos database)

Apparently SASL doesn't allocate enough memory for this error message in
some cases which causes these tests to be flaky with a ~20% error rate
with AddressSanitizer enabled:

==9298==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60e3e2d6 
at pc 0x00530bf4 bp 0x7f8eb50ad0f0 sp 0x7f8eb50ac8a0
READ of size 151 at 0x60e3e2d6 thread T88 (client-negotiat)
#0 0x530bf3 in __interceptor_strlen.part.35 
sanitizer_common/sanitizer_common_interceptors.inc:365:5
#1 0x7f8ee6ad9ee8 in std::basic_ostream >& 
std::operator<< >(std::basic_ostream >&, char const*) 
(/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x113ee8)
#2 0x7f8eeb7c9c9b in kudu::rpc::SaslLogCallback(void*, int, char const*) 
../src/kudu/rpc/sasl_common.cc:102:29
#3 0x7f8eeb30241c in sasl_seterror 
(/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x1441c)
#4 0x7f8edd8f143d in _init 
(/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x243d)
#5 0x7f8edd8f2452 in _init 
(/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x3452)
#6 0x7f8eeb2f7844 in sasl_client_step 
(/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9844)
#7 0x7f8eeb2f7bc5 in sasl_client_start 
(/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9bc5)
#8 0x7f8eeb678679 in 
kudu::rpc::ClientNegotiation::SendSaslInitiate()::$_1::operator()() const 
../src/kudu/rpc/client_negotiation.cc:594:14
#9 0x7f8eeb67831c in std::_Function_handler::_M_invoke(std::_Any_data
 const&) ../../../include/c++/8/bits/std_function.h:282:9
#10 0x7f8ef3b28220 in std::function::operator()() const 
../../../include/c++/8/bits/std_function.h:687:14
#11 0x7f8eeb7c5840 in kudu::rpc::WrapSaslCall(sasl_conn*, std::function const&, char const*) ../src/kudu/rpc/sasl_common.cc:341:12
#12 0x7f8eeb67363b in kudu::rpc::ClientNegotiation::SendSaslInitiate() 
../src/kudu/rpc/client_negotiation.cc:593:20
#13 0x7f8eeb66e0c7 in 
kudu::rpc::ClientNegotiation::AuthenticateBySasl(kudu::faststring*, 
std::unique_ptr >*) 
../src/kudu/rpc/client_negotiation.cc:523:14
#14 0x7f8eeb667b99 in 
kudu::rpc::ClientNegotiation::Negotiate(std::unique_ptr >*) 
../src/kudu/rpc/client_negotiation.cc:220:7
#15 0x7f8eeb715027 in 
kudu::rpc::DoClientNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, 
kudu::TriStateFlag, kudu::MonoTime, std::unique_ptr >*) 
../src/kudu/rpc/negotiation.cc:218:3
#16 0x7f8eeb712095 in 
kudu::rpc::Negotiation::RunNegotiation(scoped_refptr 
const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) 
../src/kudu/rpc/negotiation.cc:295:9
#17 0x7f8eeb74d4ad in 
kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr
 const&)::$_1::operator()() const ../src/kudu/rpc/reactor.cc:614:3
#18 0x7f8eeb74d06c in std::_Function_handler
 const&)::$_1>::_M_invoke(std::_Any_data const&) 
../../../include/c++/8/bits/std_function.h:297:2
#19 0x71b760 in std::function::operator()() const 
../../../include/c++/8/bits/std_function.h:687:14
#20 0x7f8ee917d03d in kudu::ThreadPool::DispatchThread() 
../src/kudu/util/threadpool.cc:669:7
#21 0x7f8ee91817dc in kudu::ThreadPool::CreateThread()::$_1::operator()() 
const ../src/kudu/util/threadpool.cc:742:48
#22 0x7f8ee918162c in std::_Function_handler::_M_invoke(std::_Any_data const&) 
../../../include/c++/8/bits/std_function.h:297:2
#23 0x71b760 in std::function::operator()() const 
../../../include/c++/8/bits/std_function.h:687:14
#24 0x7f8ee915660a in kudu::Thread::SuperviseThread(void*) 
../src/kudu/util/thread.cc:674:3
#25 0x7f8eec6106da in start_thread 
(/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#26 0x7f8ee64de71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)

0x60e3e2d6 is located 0 bytes to the right of 150-byte region 
[0x60e3e240,0x60e3e2d6)
allocated by thread T88 (client-negotiat) here:
#0 0x5a4bb8 in malloc 
/home/abukor/src/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3
#1 0x7f8eeb2fa1df in _buf_alloc 
(/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0xc1df)

This patch suppresses address sanitizer errors in 

[jira] [Commented] (KUDU-2612) Implement multi-row transactions

2021-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321924#comment-17321924
 ] 

ASF subversion and git services commented on KUDU-2612:
---

Commit ee79cdfa9906d14a63d4ec4b487c6eece77cc50f in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ee79cdf ]

KUDU-2612: an extra test for txn keepalive failover in Java client

This is a follow-up to 096f1ddf09047ea11d78a661010dd549ffa9af51.

This patchs adds an extra test scenario similar the one added
in the prior changelist, but with additional twist of "rolling"
unavailability of leader masters.  In addition, it verifies that
RPC error responses from TxnManager due to the unavailability
of TxnStatusManager are properly handled by the Java client.

Change-Id: Ib278d402bee85fb0442cbce98b2b4ab09eb4
Reviewed-on: http://gerrit.cloudera.org:8080/17321
Reviewed-by: Andrew Wong 
Tested-by: Kudu Jenkins


> Implement multi-row transactions
> 
>
> Key: KUDU-2612
> URL: https://issues.apache.org/jira/browse/KUDU-2612
> Project: Kudu
>  Issue Type: Task
>Reporter: Mike Percy
>Priority: Major
>  Labels: roadmap-candidate
>
> Tracking Jira to implement multi-row / multi-table transactions in Kudu.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)