[jira] [Commented] (KUDU-3093) DebugUtilTest.TestSignalStackTrace of debug-util-test failed on aarch64

2020-03-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071470#comment-17071470
 ] 

ASF subversion and git services commented on KUDU-3093:
---

Commit 263c3aa894c087691ef2c4463d46a52a94f12c2b in kudu's branch 
refs/heads/master from Adar Dembo
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=263c3aa ]

KUDU-3093: another band-aid for this DebugUtilTest.TestSignalStackTrace

A TSAN build yielded a stack trace like:

@   0x444a88  __tsan::ProcessPendingSignals()
@   0x4541c1  __interceptor_pthread_mutex_trylock
@ 0x7fbca26124e1  kudu::Mutex::TryAcquire()
@ 0x7fbca2612893  kudu::Mutex::Acquire()
@   0x4fe036  kudu::MutexLock::MutexLock()
@   0x504abd  kudu::CountDownLatch::WaitUntil()
@   0x504a5f  kudu::CountDownLatch::WaitFor()
@   0x4f04a9  kudu::(anonymous namespace)::SleeperThread()
...

Rather than find the synchronization primitive frame least likely to be
inlined, let's take a more comprehensive approach and search for multiple
candidate frames, including SleeperThread.

I tested this locally in DEBUG, RELEASE, ASAN, and TSAN modes.

Change-Id: Ia4ca0f48ba1d7ad4cea40b70af271d7948f78a57
Reviewed-on: http://gerrit.cloudera.org:8080/15605
Reviewed-by: Alexey Serbin 
Tested-by: Kudu Jenkins


> DebugUtilTest.TestSignalStackTrace of debug-util-test failed on aarch64
> ---
>
> Key: KUDU-3093
> URL: https://issues.apache.org/jira/browse/KUDU-3093
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Assignee: Adar Dembo
>Priority: Major
> Fix For: 1.12.0
>
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed, for DEBUG build type, the test failed 
> sometimes, for RELEASE build type the test is hang on TestSignalStackTrace 
> testcase, I debug using gdb, get the error info as below:
> /home/jenkins/workspace/kudu/src/kudu/util/debug-util-test.cc:123: Failure
> Value of: DumpThreadStack(t->tid())
> Expected: has substring "SleeperThread"
>   Actual: "@ 0x7fa1c688  ([vdso]+0x687)\n@ 0x7f73d548 
>  __pthread_cond_timedwait\n@  0xb5f41e7c  
> kudu::ConditionVariable::WaitUntil()\n@ 0xb5ef05a0  
> _ZNSt17_Function_handlerIFv 
> vEZN4kudu39DebugUtilTest_TestSignalStackTrace_Test8TestBodyEvEUlvE_E9_M_invokeERKSt9_Any_data\n
> @ 0x b5f95120  kudu::Thread::SuperviseThread()\n@ 
> 0x7f737088  start_thread\n@ 0x7f737088  st art_thread\n"
> /home/jenkins/workspace/kudu/src/kudu/util/test_util.cc:348: Failure
> Failed
> Timed out waiting for assertion to pass.
> I0327 07:46:04.605425 17375 test_util.cc:146] 
> ---
> I0327 07:46:04.605443 17375 test_util.cc:147] Had fatal failures, leaving 
> test files at /tmp/kudutest-0/debug-u 
> til-test.DebugUtilTest.TestSignalStackTrace.1585295134509818-17375
> [  FAILED  ] DebugUtilTest.TestSignalStackTrace (30094 ms)
> [--] 1 test from DebugUtilTest (30094 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (30094 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] DebugUtilTest.TestSignalStackTrace
>  1 FAILED TEST
> Don't know why the result is defferent for DEBUG and RELEASE,  seems only 
> onething I can find: 
> https://github.com/apache/kudu/blob/master/src/kudu/gutil/once.cc#L21-L30 But 
> it looks no effect, right? Could someone help us to have a look for this? 
> Thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2059) Data race in DnsResolver

2020-03-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071469#comment-17071469
 ] 

ASF subversion and git services commented on KUDU-2059:
---

Commit ce82af1171099e00606c67a351362a0b68549141 in kudu's branch 
refs/heads/master from Adar Dembo
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ce82af1 ]

KUDU-2059: add a TSAN suppression

No one is actively working on fixing this, so let's at least suppress it so
that precommits aren't flaky.

Change-Id: I86979e4c511bd4cbf027c629c867378cd0b8cd32
Reviewed-on: http://gerrit.cloudera.org:8080/15603
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 


> Data race in DnsResolver
> 
>
> Key: KUDU-2059
> URL: https://issues.apache.org/jira/browse/KUDU-2059
> Project: Kudu
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.4.0
>Reporter: Mike Percy
>Priority: Major
>  Labels: tsan
> Attachments: raft_consensus-itest.txt, raft_consensus-itest.txt.gz, 
> raft_consensus_election-itest-threadpool-race.txt.xz
>
>
> I got a TSAN failure in a Jenkins run of 
> RaftConsensusITest.MultiThreadedInsertWithFailovers:
> http://dist-test.cloudera.org/job?job_id=jenkins-slave.1498799877.11199
> {code}
> WARNING: ThreadSanitizer: data race (pid=14861)
>   Write of size 8 at 0x7b506af0 by main thread:
> #0 pthread_cond_destroy 
> /home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1102
>  (raft_consensus-itest+0x4a198c)
> #1 kudu::ConditionVariable::~ConditionVariable() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/condition_variable.cc:57:12
>  (libkudu_util.so+0xf107e)
> #2 kudu::ThreadPool::~ThreadPool() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/threadpool.cc:339:1 
> (libkudu_util.so+0x1c1b93)
> #3 kudu::DefaultDeleter::operator()(kudu::ThreadPool*) 
> const 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/gscoped_ptr.h:145:5
>  (libmaster.so+0xc0bbe)
> #4 kudu::internal::gscoped_ptr_impl kudu::DefaultDeleter >::~gscoped_ptr_impl() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/gscoped_ptr.h:228:7
>  (libmaster.so+0xc0b89)
> #5 gscoped_ptr 
> >::~gscoped_ptr() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/gscoped_ptr.h:318:7
>  (libmaster.so+0xb1b79)
> #6 kudu::DnsResolver::~DnsResolver() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/net/dns_resolver.cc:45:1
>  (libkudu_util.so+0x1863fa)
> #7 
> kudu::DefaultDeleter::operator()(kudu::DnsResolver*) const 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/gscoped_ptr.h:145:5
>  (libkudu_client.so+0xd0dee)
> #8 kudu::internal::gscoped_ptr_impl kudu::DefaultDeleter >::reset(kudu::DnsResolver*) 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/gscoped_ptr.h:254:7
>  (libkudu_client.so+0xd0da4)
> #9 gscoped_ptr 
> >::reset(kudu::DnsResolver*) 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/gutil/gscoped_ptr.h:375:46
>  (libkudu_client.so+0xc5fd0)
> #10 kudu::client::KuduClient::Data::~Data() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/client/client-internal.cc:342:17
>  (libkudu_client.so+0xd6b1d)
> #11 kudu::client::KuduClient::~KuduClient() 
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/client/client.cc:334:3 
> (libkudu_client.so+0xbbfec)
> #12 
> std::__1::default_delete::operator()(kudu::client::KuduClient*)
>  const 
> /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/tsan/include/c++/v1/memory:2397:13
>  (libkudu_client.so+0xd06bb)
> #13 std::__1::__shared_ptr_pointer std::__1::default_delete, 
> std::__1::allocator >::__on_zero_shared() 
> /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/tsan/include/c++/v1/memory:3795
>  (libkudu_client.so+0xd06bb)
> #14 __release_shared 
> /home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/libcxx/src/memory.cpp:67:9
>  (libc++.so.1+0xc095d)
> #15 std::__1::__shared_weak_count::__release_shared() 
> /home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/libcxx/src/memory.cpp:92
>  (libc++.so.1+0xc095d)
> #16 std::__1::shared_ptr::~shared_ptr() 
> /home/jenkins-slave/workspace/kudu-master/3/thirdparty/installed/tsan/include/c++/v1/memory:4626:19
>  (raft_consensus-itest+0x548818)
> #17 
> kudu::tserver::TabletServerIntegrationTestBase::~TabletServerIntegrationTestBase()
>  
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_itest-base.h:69:7
>  (raft_consensus-itest+0x54890f)
> #18 kudu::tserver::RaftConsensusITest::~RaftConsensusITest() 
> 

[jira] [Assigned] (KUDU-3068) MemTracker CHECK failed on aarch64 when run cache-bench test

2020-03-30 Thread huangtianhua (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangtianhua reassigned KUDU-3068:
--

Assignee: huangtianhua

> MemTracker CHECK failed on aarch64 when run cache-bench test
> 
>
> Key: KUDU-3068
> URL: https://issues.apache.org/jira/browse/KUDU-3068
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test cache-bench failed, error info as below:
> root@ubuntu:/home/workspace/kudu/build/debug/bin# ./cache-bench
> [==] Running 4 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 4 tests from Patterns/CacheBench
> [ RUN  ] Patterns/CacheBench.RunBench/0
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0304 09:04:19.007313 19049 cache-bench.cc:172] Warming up...
> I0304 09:04:20.008733 19049 cache-bench.cc:175] Running benchmark...
> I0304 09:04:22.009366 19049 cache-bench.cc:183] ZIPFIAN ratio=1.00x 
> n_unique=262144: 3.28M lookups/sec
> I0304 09:04:22.009399 19049 cache-bench.cc:184] ZIPFIAN ratio=1.00x 
> n_unique=262144: 99.0% hit rate
> F0304 09:04:22.048135 19049 mem_tracker.cc:89] Check failed: consumption() == 
> 0 Memory tracker test-cache-sharded_lru_cache->root has unreleased 
> consumption 463171584
> *** Check failure stack trace: ***
> *** Aborted at 1583312662 (unix time) try "date -d @1583312662" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> *** SIGABRT (@0x4a69) received by PID 19049 (TID 0xad19e010) from PID 
> 19049; stack trace: ***
>@ 0xae3f8688 ([vdso]+0x687)
>@ 0xad7384d8 raise
>@ 0xad738464 raise
> Aborted (core dumped)
> I run this test for many times and it's successful only one time, I have no 
> idea why the consumption is not zero? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3078) Ranger integration testing

2020-03-30 Thread Attila Bukor (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Bukor reassigned KUDU-3078:
--

Assignee: Attila Bukor

> Ranger integration testing
> --
>
> Key: KUDU-3078
> URL: https://issues.apache.org/jira/browse/KUDU-3078
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Major
>
> The Ranger integration should be properly tested before we can remove the 
> experimental flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3081) Add Kerberos support to MiniRanger

2020-03-30 Thread Attila Bukor (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Bukor reassigned KUDU-3081:
--

Assignee: Attila Bukor

> Add Kerberos support to MiniRanger
> --
>
> Key: KUDU-3081
> URL: https://issues.apache.org/jira/browse/KUDU-3081
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3079) Add MiniRanger for integration tests

2020-03-30 Thread Attila Bukor (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Bukor updated KUDU-3079:
---
Fix Version/s: 1.12.0
   Resolution: Fixed
   Status: Resolved  (was: In Review)

> Add MiniRanger for integration tests
> 
>
> Key: KUDU-3079
> URL: https://issues.apache.org/jira/browse/KUDU-3079
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Major
> Fix For: 1.12.0
>
>
> To write full integration tests we need a bundled Ranger service



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-03-30 Thread Adar Dembo (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071181#comment-17071181
 ] 

Adar Dembo commented on KUDU-3096:
--

[~tlipcon] any ideas here? I remember the stacktrace collection code has a 
bunch of gnarly optimizations built-in; are any x86-specific?

> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---
>
> Key: KUDU-3096
> URL: https://issues.apache.org/jira/browse/KUDU-3096
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ..
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
> [   OK ] DebugUtilTest.TestTimeouts (1002 ms)
> [--] 9 tests from DebugUtilTest (3049 ms total)
> [--] 4 tests from DifferentRaces/RaceTest
> [ RUN  ] DifferentRaces/RaceTest.TestStackTraceRaces/0
> Segmentation fault (core dumped)
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
> core.27980
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/debug-util-test...done.
> [New LWP 28016]
> [New LWP 27980]
> [New LWP 27981]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> Core was generated by `./bin/debug-util-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  tcmalloc::Sampler::RecordAllocation (k=, this= out>)
> at 
> /home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
> 166   if (static_cast(bytes_until_sample_) < k) {
> [Current thread is 1 (Thread 0x86a9b090 (LWP 28016))]
> Sometimes other tests like TestTimeouts are raise segmentation fault either, 
> the gdb info are same, have no idea it related with gperftools? Maybe someone 
> help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-03-30 Thread huangtianhua (Jira)
huangtianhua created KUDU-3096:
--

 Summary: debug-util-test failed sometimes on aarch64: Segmentation 
fault
 Key: KUDU-3096
 URL: https://issues.apache.org/jira/browse/KUDU-3096
 Project: Kudu
  Issue Type: Sub-task
Reporter: huangtianhua


I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ , 
the test debug-util-test failed sometimes, please see the detail info of gdb 
the core dump file: http://paste.openstack.org/show/791306/

root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
..
W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
0xf89ed260 after lost signal to thread 28015
W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
0xf89ed280 after lost signal to thread 28015
W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
0xf89ed2a0 after lost signal to thread 28015
W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
0xf89ed2c0 after lost signal to thread 28015
I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
[   OK ] DebugUtilTest.TestTimeouts (1002 ms)
[--] 9 tests from DebugUtilTest (3049 ms total)

[--] 4 tests from DifferentRaces/RaceTest
[ RUN  ] DifferentRaces/RaceTest.TestStackTraceRaces/0
Segmentation fault (core dumped)


root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
core.27980
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/debug-util-test...done.
[New LWP 28016]
[New LWP 27980]
[New LWP 27981]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `./bin/debug-util-test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  tcmalloc::Sampler::RecordAllocation (k=, this=)
at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
166   if (static_cast(bytes_until_sample_) < k) {
[Current thread is 1 (Thread 0x86a9b090 (LWP 28016))]

Sometimes other tests like TestTimeouts are raise segmentation fault either, 
the gdb info are same, have no idea it related with gperftools? Maybe someone 
help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3095) RaftConsensusNonVoterITest.PromotedReplicaCanVote sometimes fails

2020-03-30 Thread Alexey Serbin (Jira)
Alexey Serbin created KUDU-3095:
---

 Summary: RaftConsensusNonVoterITest.PromotedReplicaCanVote  
sometimes fails
 Key: KUDU-3095
 URL: https://issues.apache.org/jira/browse/KUDU-3095
 Project: Kudu
  Issue Type: Bug
  Components: test
Affects Versions: 1.12.0
Reporter: Alexey Serbin
 Attachments: raft_consensus_nonvoter-itest.txt.xz

The {{RaftConsensusNonVoterITest.PromotedReplicaCanVote}} scenario sometimes 
fails with an error:

{noformat}
I0327 00:44:00.297801  4401 raft_consensus.cc:2810] T 
c2378cfec6604e0e813f43775107f2e6 P 4f6b943b18a649fabbd6cfb8d06ed20f [term 3 
FOLLOWER]: CHANGE_CONFIG_OP replication failed: Aborted: Transaction aborted by 
new leader
/data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc:1079:
 Failure
Failed 
{noformat}

The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)