[jira] [Created] (KUDU-2447) TS crashed with "NONE predicate can not be pushed into key"
Xu Yao created KUDU-2447: Summary: TS crashed with "NONE predicate can not be pushed into key" Key: KUDU-2447 URL: https://issues.apache.org/jira/browse/KUDU-2447 Project: Kudu Issue Type: Bug Reporter: Xu Yao tserver carashes on Scan when [lowerPrimaryKey, upperPrimary) and predicates of primary column do not overlap. Example of table in TestScannerMultiTablet unittest, The table like this: {code:java} key1 (string) | key2 (string) | value (string){code} The data layout ends up like this: {code:java} tablet '', '1': no rows tablet '1', '2': '111', '122', '133' tablet '2', '3': '211', '222', '233' tablet '3', '': '311', '322', '333'{code} add Scan PrimaryKeyBounds: ['12', '13') add Scan Predicate: key2 <= '1' Run the example, ts will crash, and print "NONE predicate can not be pushed into key" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2447) TS crashed with "NONE predicate can not be pushed into key"
[ https://issues.apache.org/jira/browse/KUDU-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482506#comment-16482506 ] Xu Yao commented on KUDU-2447: -- https://gerrit.cloudera.org/#/c/10463/ > TS crashed with "NONE predicate can not be pushed into key" > --- > > Key: KUDU-2447 > URL: https://issues.apache.org/jira/browse/KUDU-2447 > Project: Kudu > Issue Type: Bug >Reporter: Xu Yao >Priority: Major > Labels: scan > > tserver carashes on Scan when [lowerPrimaryKey, upperPrimary) and predicates > of primary column do not overlap. > Example of table in TestScannerMultiTablet unittest, The table like this: > {code:java} > key1 (string) | key2 (string) | value (string){code} > The data layout ends up like this: > {code:java} > tablet '', '1': no rows > tablet '1', '2': '111', '122', '133' > tablet '2', '3': '211', '222', '233' > tablet '3', '': '311', '322', '333'{code} > add Scan PrimaryKeyBounds: ['12', '13') > add Scan Predicate: key2 <= '1' > Run the example, ts will crash, and print "NONE predicate can not be pushed > into key" > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2443) Moving single-replica tablets does not work in kudu CLI
[ https://issues.apache.org/jira/browse/KUDU-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2443: Resolution: Fixed Fix Version/s: 1.8.0 Status: Resolved (was: In Review) Fixed with cb20372 and b73dec1. > Moving single-replica tablets does not work in kudu CLI > --- > > Key: KUDU-2443 > URL: https://issues.apache.org/jira/browse/KUDU-2443 > Project: Kudu > Issue Type: Bug > Components: CLI, consensus >Affects Versions: 1.7.0, 1.8.0, 1.7.1 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > Fix For: 1.8.0 > > > When trying to move a single replica of non-replicated tablet using {{kudu > tablet change_config move_replica}}, the system adds a non-voter, then > promotes it, and then evict the newly added replica, doing it over and over > again. > {noformat} > Tablet 01701f0725e644d39b627045266f60bb of table 'impala::default.test1' is > recovering: 1 on-going t > ablet copies > 70f7ee61ead54b1885d819f354eb3405 (vc1316.halxg.cloudera.com:7050): RUNNING > [LEADER] > 72fcec63e96f4248ae39d114eb3cd7c9 (vc1318.halxg.cloudera.com:7050): not > running [NONVOTER] > State: INITIALIZED > Data state: TABLET_DATA_COPYING > Last status: Tablet Copy: Downloading block 4611686003499827794 (45/2464) > All reported replicas are: > A = 70f7ee61ead54b1885d819f354eb3405 > B = 72fcec63e96f4248ae39d114eb3cd7c9 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A* B~ | | | Yes > A | A* B~ | 15 | 143335 | Yes > B | A B~ | 15 | 143335 | Yes > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2448) Document how to temporarily decommission/shutdown a tablet server
Grant Henke created KUDU-2448: - Summary: Document how to temporarily decommission/shutdown a tablet server Key: KUDU-2448 URL: https://issues.apache.org/jira/browse/KUDU-2448 Project: Kudu Issue Type: Task Components: documentation Reporter: Grant Henke I have been asked a few times the "best" way to shutdown a tablet server or master server for common maintenance like an OS upgrade, disk swap, etc. Even if it's straightforward, perhaps adding a section to the documentation [here|https://kudu.apache.org/docs/administration.html#_common_kudu_workflows] we make users confident about their process. I have heard of increasing the [kudu-master_follower_unavailable_considered_failed_sec|https://kudu.apache.org/docs/configuration_reference.html#kudu-master_follower_unavailable_considered_failed_sec] configuration to avoid re-replication during these short planned outages. Perhaps that tip could be added if appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2449) Document best practices for adding a new tablet server
Grant Henke created KUDU-2449: - Summary: Document best practices for adding a new tablet server Key: KUDU-2449 URL: https://issues.apache.org/jira/browse/KUDU-2449 Project: Kudu Issue Type: Task Components: documentation Reporter: Grant Henke Though adding a new tablet server is straightforward I have been asked questions on best practices a few times. The main follow up question to "how do I add new tablet servers?" is "Do I need to rebalance the data? How do I do that?" Perhaps a brief best practices guide [here|https://kudu.apache.org/docs/administration.html#_common_kudu_workflows] in the docs would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2449) Document best practices for adding a new tablet server
[ https://issues.apache.org/jira/browse/KUDU-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2449: -- Labels: beginner (was: ) > Document best practices for adding a new tablet server > -- > > Key: KUDU-2449 > URL: https://issues.apache.org/jira/browse/KUDU-2449 > Project: Kudu > Issue Type: Task > Components: documentation >Reporter: Grant Henke >Priority: Major > Labels: beginner > > Though adding a new tablet server is straightforward I have been asked > questions on best practices a few times. The main follow up question to "how > do I add new tablet servers?" is "Do I need to rebalance the data? How do I > do that?" > Perhaps a brief best practices guide > [here|https://kudu.apache.org/docs/administration.html#_common_kudu_workflows] > in the docs would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-1889) Support OpenSSL 1.1.0
[ https://issues.apache.org/jira/browse/KUDU-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-1889. -- Resolution: Fixed Fix Version/s: 1.8.0 Fixed in 14080bb. > Support OpenSSL 1.1.0 > - > > Key: KUDU-1889 > URL: https://issues.apache.org/jira/browse/KUDU-1889 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Dan Burkert >Assignee: Adar Dembo >Priority: Minor > Fix For: 1.8.0 > > > We currently can't compile against OpenSSL 1.1.0. Probably low priority > right now, but eventually distros are going to start shipping with 1.1.0 by > default. > {code} > [121/323] Building CXX object > src/kudu/security/CMakeFiles/security.dir/cert.cc.o > FAILED: src/kudu/security/CMakeFiles/security.dir/cert.cc.o > /usr/local/opt/ccache/libexec/c++ -DKUDU_HEADERS_NO_STUBS=1 > -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 > -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS > -Dsecurity_EXPORTS -Isrc -I../../src -isystem > ../../thirdparty/installed/common/include -isystem > ../../thirdparty/installed/uninstrumented/include > -I/usr/local/opt/openssl@1.1/include > -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall > -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing > -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments > -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g > -fPIC -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/cert.cc.o -MF > src/kudu/security/CMakeFiles/security.dir/cert.cc.o.d -o > src/kudu/security/CMakeFiles/security.dir/cert.cc.o -c > /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc > /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:29: error: member > access into incomplete type 'X509_req_st' > CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: > note: expanded from macro 'CHECK_GT' > #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2) >^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: > note: expanded from macro 'CHECK_OP' > CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: > note: expanded from macro 'CHECK_OP_LOG' > google::GetReferenceableValue(val1),\ >^ > /usr/local/opt/openssl@1.1/include/openssl/x509.h:91:16: note: forward > declaration of 'X509_req_st' > typedef struct X509_req_st X509_REQ; >^ > /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:46: error: use of > undeclared identifier 'CRYPTO_LOCK_X509_REQ' > CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: > note: expanded from macro 'CHECK_GT' > #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2) >^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: > note: expanded from macro 'CHECK_OP' > CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: > note: expanded from macro 'CHECK_OP_LOG' > google::GetReferenceableValue(val1),\ >^ > 2 errors generated. > [122/323] Building CXX object > src/kudu/security/CMakeFiles/security.dir/crypto.cc.o > FAILED: src/kudu/security/CMakeFiles/security.dir/crypto.cc.o > /usr/local/opt/ccache/libexec/c++ -DKUDU_HEADERS_NO_STUBS=1 > -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 > -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS > -Dsecurity_EXPORTS -Isrc -I../../src -isystem > ../../thirdparty/installed/common/include -isystem > ../../thirdparty/installed/uninstrumented/include > -I/usr/local/opt/openssl@1.1/include > -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall > -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing > -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments > -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g > -fPIC -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/crypto.cc.o > -MF src/kudu/security/CMakeFiles/security.dir/crypto.cc.o.d-o > src/kudu/security/CMakeFiles/security.dir/crypto.cc.o -c > /Users/dan/src/cloudera/kudu/src/kudu/security/crypto
[jira] [Assigned] (KUDU-1889) Support OpenSSL 1.1.0
[ https://issues.apache.org/jira/browse/KUDU-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1889: Assignee: Adar Dembo > Support OpenSSL 1.1.0 > - > > Key: KUDU-1889 > URL: https://issues.apache.org/jira/browse/KUDU-1889 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Dan Burkert >Assignee: Adar Dembo >Priority: Minor > > We currently can't compile against OpenSSL 1.1.0. Probably low priority > right now, but eventually distros are going to start shipping with 1.1.0 by > default. > {code} > [121/323] Building CXX object > src/kudu/security/CMakeFiles/security.dir/cert.cc.o > FAILED: src/kudu/security/CMakeFiles/security.dir/cert.cc.o > /usr/local/opt/ccache/libexec/c++ -DKUDU_HEADERS_NO_STUBS=1 > -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 > -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS > -Dsecurity_EXPORTS -Isrc -I../../src -isystem > ../../thirdparty/installed/common/include -isystem > ../../thirdparty/installed/uninstrumented/include > -I/usr/local/opt/openssl@1.1/include > -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall > -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing > -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments > -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g > -fPIC -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/cert.cc.o -MF > src/kudu/security/CMakeFiles/security.dir/cert.cc.o.d -o > src/kudu/security/CMakeFiles/security.dir/cert.cc.o -c > /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc > /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:29: error: member > access into incomplete type 'X509_req_st' > CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: > note: expanded from macro 'CHECK_GT' > #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2) >^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: > note: expanded from macro 'CHECK_OP' > CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: > note: expanded from macro 'CHECK_OP_LOG' > google::GetReferenceableValue(val1),\ >^ > /usr/local/opt/openssl@1.1/include/openssl/x509.h:91:16: note: forward > declaration of 'X509_req_st' > typedef struct X509_req_st X509_REQ; >^ > /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:46: error: use of > undeclared identifier 'CRYPTO_LOCK_X509_REQ' > CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: > note: expanded from macro 'CHECK_GT' > #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2) >^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: > note: expanded from macro 'CHECK_OP' > CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal) > ^ > ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: > note: expanded from macro 'CHECK_OP_LOG' > google::GetReferenceableValue(val1),\ >^ > 2 errors generated. > [122/323] Building CXX object > src/kudu/security/CMakeFiles/security.dir/crypto.cc.o > FAILED: src/kudu/security/CMakeFiles/security.dir/crypto.cc.o > /usr/local/opt/ccache/libexec/c++ -DKUDU_HEADERS_NO_STUBS=1 > -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 > -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS > -Dsecurity_EXPORTS -Isrc -I../../src -isystem > ../../thirdparty/installed/common/include -isystem > ../../thirdparty/installed/uninstrumented/include > -I/usr/local/opt/openssl@1.1/include > -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall > -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing > -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments > -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g > -fPIC -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/crypto.cc.o > -MF src/kudu/security/CMakeFiles/security.dir/crypto.cc.o.d-o > src/kudu/security/CMakeFiles/security.dir/crypto.cc.o -c > /Users/dan/src/cloudera/kudu/src/kudu/security/crypto.cc > /Users/dan/src/cloudera/kudu/src/kudu/security/crypto.cc:82:33:
[jira] [Assigned] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message
[ https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengling Wang reassigned KUDU-1867: --- Assignee: Fengling Wang > Improve the "Could not lock .../block_manager_instance" error message > - > > Key: KUDU-1867 > URL: https://issues.apache.org/jira/browse/KUDU-1867 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Fengling Wang >Priority: Major > Labels: newbie > > It's possible for users to encounter a rather cryptic error when trying to > run Kudu while it's already running or with a different user than what was > previously used: > {code} > Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could > not lock /path/to/data/block_manager_instance: Could not lock > /path/to/data/block_manager_instance: lock > /path/to/data/block_manager_instance: Resource temporarily unavailable (error > 11) > {code} > This is the log line that we FATAL with, so unless you already know what it > means you're left to your own guessing and log digging. Instead, the error > message could be more prescriptive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message
[ https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengling Wang resolved KUDU-1867. - Resolution: Fixed Fix Version/s: 1.2.0 > Improve the "Could not lock .../block_manager_instance" error message > - > > Key: KUDU-1867 > URL: https://issues.apache.org/jira/browse/KUDU-1867 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Fengling Wang >Priority: Major > Labels: newbie > Fix For: 1.2.0 > > > It's possible for users to encounter a rather cryptic error when trying to > run Kudu while it's already running or with a different user than what was > previously used: > {code} > Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could > not lock /path/to/data/block_manager_instance: Could not lock > /path/to/data/block_manager_instance: lock > /path/to/data/block_manager_instance: Resource temporarily unavailable (error > 11) > {code} > This is the log line that we FATAL with, so unless you already know what it > means you're left to your own guessing and log digging. Instead, the error > message could be more prescriptive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2450) Handle mutex init / destroy returning EAGAIN
Mike Percy created KUDU-2450: Summary: Handle mutex init / destroy returning EAGAIN Key: KUDU-2450 URL: https://issues.apache.org/jira/browse/KUDU-2450 Project: Kudu Issue Type: Improvement Components: util Affects Versions: 1.7.0 Reporter: Mike Percy I saw the following in an rpc_server-test unit test failure: {code:java} F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . Device or resource busy *** Check failure stack trace: *** *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are using GNU date *** PC: @ 0x373b032625 __GI_raise *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from PID 16049; stack trace: *** @ 0x373b40f710 (unknown) at ??:0 @ 0x373b032625 __GI_raise at ??:0 @ 0x373b033e05 __GI_abort at ??:0 @ 0x7f6bf1458a29 google::logging_fail() at ??:0 @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0 @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0 @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0 @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0 @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0 @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0 @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0 @ 0x412a14 std::_Sp_counted_base<>::_M_release() at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163 @ 0x4121f3 std::__shared_count<>::~__shared_count() at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667 @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0 @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0 @ 0x373b035ebd __cxa_finalize at ??:0 @ 0x7f6bf38b60a3 (unknown) at ??:0 @ 0x373ac0ec3c _dl_fini at ??:0 @ 0x373b035b22 __GI_exit at ??:0 @ 0x373b01ed64 __libc_start_main at ??:0 @ 0x40e739 (unknown) at ??:0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2450) Handle mutex init / destroy returning EAGAIN
[ https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483260#comment-16483260 ] Mike Percy commented on KUDU-2450: -- Apparently pthread_mutex_init() and pthread_mutex_destroy() can return EAGAIN, requiring a retry. I haven't dug into the kernel source to determine what can cause this return code. > Handle mutex init / destroy returning EAGAIN > > > Key: KUDU-2450 > URL: https://issues.apache.org/jira/browse/KUDU-2450 > Project: Kudu > Issue Type: Improvement > Components: util >Affects Versions: 1.7.0 >Reporter: Mike Percy >Priority: Major > > I saw the following in an rpc_server-test unit test failure: > {code:java} > F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . > Device or resource busy > *** Check failure stack trace: *** > *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are > using GNU date *** > PC: @ 0x373b032625 __GI_raise > *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from > PID 16049; stack trace: *** > @ 0x373b40f710 (unknown) at ??:0 > @ 0x373b032625 __GI_raise at ??:0 > @ 0x373b033e05 __GI_abort at ??:0 > @ 0x7f6bf1458a29 google::logging_fail() at ??:0 > @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0 > @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0 > @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0 > @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0 > @ 0x412a14 std::_Sp_counted_base<>::_M_release() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163 > @ 0x4121f3 std::__shared_count<>::~__shared_count() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667 > @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0 > @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0 > @ 0x373b035ebd __cxa_finalize at ??:0 > @ 0x7f6bf38b60a3 (unknown) at ??:0 > @ 0x373ac0ec3c _dl_fini at ??:0 > @ 0x373b035b22 __GI_exit at ??:0 > @ 0x373b01ed64 __libc_start_main at ??:0 > @ 0x40e739 (unknown) at ??:0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2450) Handle mutex init / destroy returning EAGAIN
[ https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483397#comment-16483397 ] Todd Lipcon commented on KUDU-2450: --- Eagain on destroy indicates an attempt to destroy a locked mutex, so retrying is not appropriate > Handle mutex init / destroy returning EAGAIN > > > Key: KUDU-2450 > URL: https://issues.apache.org/jira/browse/KUDU-2450 > Project: Kudu > Issue Type: Improvement > Components: util >Affects Versions: 1.7.0 >Reporter: Mike Percy >Priority: Major > > I saw the following in an rpc_server-test unit test failure: > {code:java} > F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . > Device or resource busy > *** Check failure stack trace: *** > *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are > using GNU date *** > PC: @ 0x373b032625 __GI_raise > *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from > PID 16049; stack trace: *** > @ 0x373b40f710 (unknown) at ??:0 > @ 0x373b032625 __GI_raise at ??:0 > @ 0x373b033e05 __GI_abort at ??:0 > @ 0x7f6bf1458a29 google::logging_fail() at ??:0 > @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0 > @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0 > @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0 > @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0 > @ 0x412a14 std::_Sp_counted_base<>::_M_release() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163 > @ 0x4121f3 std::__shared_count<>::~__shared_count() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667 > @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0 > @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0 > @ 0x373b035ebd __cxa_finalize at ??:0 > @ 0x7f6bf38b60a3 (unknown) at ??:0 > @ 0x373ac0ec3c _dl_fini at ??:0 > @ 0x373b035b22 __GI_exit at ??:0 > @ 0x373b01ed64 __libc_start_main at ??:0 > @ 0x40e739 (unknown) at ??:0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2450) pthread_mutex_destroy returns EBUSY in rpc_server-test failure
[ https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2450: - Summary: pthread_mutex_destroy returns EBUSY in rpc_server-test failure (was: Handle mutex init / destroy returning EAGAIN) > pthread_mutex_destroy returns EBUSY in rpc_server-test failure > -- > > Key: KUDU-2450 > URL: https://issues.apache.org/jira/browse/KUDU-2450 > Project: Kudu > Issue Type: Improvement > Components: util >Affects Versions: 1.7.0 >Reporter: Mike Percy >Priority: Major > > I saw the following in an rpc_server-test unit test failure: > {code:java} > F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . > Device or resource busy > *** Check failure stack trace: *** > *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are > using GNU date *** > PC: @ 0x373b032625 __GI_raise > *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from > PID 16049; stack trace: *** > @ 0x373b40f710 (unknown) at ??:0 > @ 0x373b032625 __GI_raise at ??:0 > @ 0x373b033e05 __GI_abort at ??:0 > @ 0x7f6bf1458a29 google::logging_fail() at ??:0 > @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0 > @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0 > @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0 > @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0 > @ 0x412a14 std::_Sp_counted_base<>::_M_release() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163 > @ 0x4121f3 std::__shared_count<>::~__shared_count() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667 > @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0 > @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0 > @ 0x373b035ebd __cxa_finalize at ??:0 > @ 0x7f6bf38b60a3 (unknown) at ??:0 > @ 0x373ac0ec3c _dl_fini at ??:0 > @ 0x373b035b22 __GI_exit at ??:0 > @ 0x373b01ed64 __libc_start_main at ??:0 > @ 0x40e739 (unknown) at ??:0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2450) pthread_mutex_destroy returns EBUSY in rpc_server-test failure
[ https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483412#comment-16483412 ] Mike Percy commented on KUDU-2450: -- Whoops, that is EBUSY not EAGAIN as Adar pointed out in Gerrit. I'm changing the subject line. > pthread_mutex_destroy returns EBUSY in rpc_server-test failure > -- > > Key: KUDU-2450 > URL: https://issues.apache.org/jira/browse/KUDU-2450 > Project: Kudu > Issue Type: Improvement > Components: util >Affects Versions: 1.7.0 >Reporter: Mike Percy >Priority: Major > > I saw the following in an rpc_server-test unit test failure: > {code:java} > F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . > Device or resource busy > *** Check failure stack trace: *** > *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are > using GNU date *** > PC: @ 0x373b032625 __GI_raise > *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from > PID 16049; stack trace: *** > @ 0x373b40f710 (unknown) at ??:0 > @ 0x373b032625 __GI_raise at ??:0 > @ 0x373b033e05 __GI_abort at ??:0 > @ 0x7f6bf1458a29 google::logging_fail() at ??:0 > @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0 > @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0 > @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0 > @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0 > @ 0x412a14 std::_Sp_counted_base<>::_M_release() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163 > @ 0x4121f3 std::__shared_count<>::~__shared_count() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667 > @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0 > @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0 > @ 0x373b035ebd __cxa_finalize at ??:0 > @ 0x7f6bf38b60a3 (unknown) at ??:0 > @ 0x373ac0ec3c _dl_fini at ??:0 > @ 0x373b035b22 __GI_exit at ??:0 > @ 0x373b01ed64 __libc_start_main at ??:0 > @ 0x40e739 (unknown) at ??:0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KUDU-2450) pthread_mutex_destroy returns EBUSY in rpc_server-test failure
[ https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483260#comment-16483260 ] Mike Percy edited comment on KUDU-2450 at 5/22/18 2:19 AM: --- Edit: leaving the below for historicity but 16 is EBUSY which indicates a locked mutex. The man page indicates EAGAIN is allowed but does not say when. Original comment: Apparently pthread_mutex_init() and pthread_mutex_destroy() can return EAGAIN, requiring a retry. I haven't dug into the kernel source to determine what can cause this return code. was (Author: mpercy): Apparently pthread_mutex_init() and pthread_mutex_destroy() can return EAGAIN, requiring a retry. I haven't dug into the kernel source to determine what can cause this return code. > pthread_mutex_destroy returns EBUSY in rpc_server-test failure > -- > > Key: KUDU-2450 > URL: https://issues.apache.org/jira/browse/KUDU-2450 > Project: Kudu > Issue Type: Improvement > Components: util >Affects Versions: 1.7.0 >Reporter: Mike Percy >Priority: Major > > I saw the following in an rpc_server-test unit test failure: > {code:java} > F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . > Device or resource busy > *** Check failure stack trace: *** > *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are > using GNU date *** > PC: @ 0x373b032625 __GI_raise > *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from > PID 16049; stack trace: *** > @ 0x373b40f710 (unknown) at ??:0 > @ 0x373b032625 __GI_raise at ??:0 > @ 0x373b033e05 __GI_abort at ??:0 > @ 0x7f6bf1458a29 google::logging_fail() at ??:0 > @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0 > @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0 > @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0 > @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0 > @ 0x412a14 std::_Sp_counted_base<>::_M_release() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163 > @ 0x4121f3 std::__shared_count<>::~__shared_count() at > /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667 > @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0 > @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0 > @ 0x373b035ebd __cxa_finalize at ??:0 > @ 0x7f6bf38b60a3 (unknown) at ??:0 > @ 0x373ac0ec3c _dl_fini at ??:0 > @ 0x373b035b22 __GI_exit at ??:0 > @ 0x373b01ed64 __libc_start_main at ??:0 > @ 0x40e739 (unknown) at ??:0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message
[ https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483456#comment-16483456 ] Fengling Wang commented on KUDU-1867: - [~jdcryans] Yes? https://gerrit.cloudera.org/#/c/10419/ > Improve the "Could not lock .../block_manager_instance" error message > - > > Key: KUDU-1867 > URL: https://issues.apache.org/jira/browse/KUDU-1867 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Fengling Wang >Priority: Major > Labels: newbie > Fix For: 1.2.0 > > > It's possible for users to encounter a rather cryptic error when trying to > run Kudu while it's already running or with a different user than what was > previously used: > {code} > Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could > not lock /path/to/data/block_manager_instance: Could not lock > /path/to/data/block_manager_instance: lock > /path/to/data/block_manager_instance: Resource temporarily unavailable (error > 11) > {code} > This is the log line that we FATAL with, so unless you already know what it > means you're left to your own guessing and log digging. Instead, the error > message could be more prescriptive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message
[ https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengling Wang updated KUDU-1867: Fix Version/s: (was: 1.2.0) 1.8.0 > Improve the "Could not lock .../block_manager_instance" error message > - > > Key: KUDU-1867 > URL: https://issues.apache.org/jira/browse/KUDU-1867 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Fengling Wang >Priority: Major > Labels: newbie > Fix For: 1.8.0 > > > It's possible for users to encounter a rather cryptic error when trying to > run Kudu while it's already running or with a different user than what was > previously used: > {code} > Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could > not lock /path/to/data/block_manager_instance: Could not lock > /path/to/data/block_manager_instance: lock > /path/to/data/block_manager_instance: Resource temporarily unavailable (error > 11) > {code} > This is the log line that we FATAL with, so unless you already know what it > means you're left to your own guessing and log digging. Instead, the error > message could be more prescriptive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message
[ https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483424#comment-16483424 ] Jean-Daniel Cryans commented on KUDU-1867: -- [~fwang29] did you mean to resolve this as Fix Version 1.8.0? > Improve the "Could not lock .../block_manager_instance" error message > - > > Key: KUDU-1867 > URL: https://issues.apache.org/jira/browse/KUDU-1867 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Fengling Wang >Priority: Major > Labels: newbie > Fix For: 1.2.0 > > > It's possible for users to encounter a rather cryptic error when trying to > run Kudu while it's already running or with a different user than what was > previously used: > {code} > Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could > not lock /path/to/data/block_manager_instance: Could not lock > /path/to/data/block_manager_instance: lock > /path/to/data/block_manager_instance: Resource temporarily unavailable (error > 11) > {code} > This is the log line that we FATAL with, so unless you already know what it > means you're left to your own guessing and log digging. Instead, the error > message could be more prescriptive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)