[jira] [Created] (KUDU-2447) TS crashed with "NONE predicate can not be pushed into key"

2018-05-21 Thread Xu Yao (JIRA)
Xu Yao created KUDU-2447:


 Summary: TS crashed with "NONE predicate can not be pushed into 
key"
 Key: KUDU-2447
 URL: https://issues.apache.org/jira/browse/KUDU-2447
 Project: Kudu
  Issue Type: Bug
Reporter: Xu Yao


tserver carashes on Scan when [lowerPrimaryKey, upperPrimary) and predicates of 
primary column do not overlap.

Example of table in TestScannerMultiTablet unittest, The table like this:
{code:java}
key1 (string) | key2 (string) | value (string){code}
The data layout ends up like this:
{code:java}
tablet '', '1': no rows
tablet '1', '2': '111', '122', '133'
tablet '2', '3': '211', '222', '233'
tablet '3', '': '311', '322', '333'{code}
add Scan PrimaryKeyBounds: ['12', '13')

add Scan Predicate: key2 <= '1'

Run the example, ts will crash, and print "NONE predicate can not be pushed 
into key"

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2447) TS crashed with "NONE predicate can not be pushed into key"

2018-05-21 Thread Xu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482506#comment-16482506
 ] 

Xu Yao commented on KUDU-2447:
--

https://gerrit.cloudera.org/#/c/10463/

> TS crashed with "NONE predicate can not be pushed into key"
> ---
>
> Key: KUDU-2447
> URL: https://issues.apache.org/jira/browse/KUDU-2447
> Project: Kudu
>  Issue Type: Bug
>Reporter: Xu Yao
>Priority: Major
>  Labels: scan
>
> tserver carashes on Scan when [lowerPrimaryKey, upperPrimary) and predicates 
> of primary column do not overlap.
> Example of table in TestScannerMultiTablet unittest, The table like this:
> {code:java}
> key1 (string) | key2 (string) | value (string){code}
> The data layout ends up like this:
> {code:java}
> tablet '', '1': no rows
> tablet '1', '2': '111', '122', '133'
> tablet '2', '3': '211', '222', '233'
> tablet '3', '': '311', '322', '333'{code}
> add Scan PrimaryKeyBounds: ['12', '13')
> add Scan Predicate: key2 <= '1'
> Run the example, ts will crash, and print "NONE predicate can not be pushed 
> into key"
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2443) Moving single-replica tablets does not work in kudu CLI

2018-05-21 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2443:

   Resolution: Fixed
Fix Version/s: 1.8.0
   Status: Resolved  (was: In Review)

Fixed with cb20372 and b73dec1.

> Moving single-replica tablets does not work in kudu CLI
> ---
>
> Key: KUDU-2443
> URL: https://issues.apache.org/jira/browse/KUDU-2443
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, consensus
>Affects Versions: 1.7.0, 1.8.0, 1.7.1
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
> Fix For: 1.8.0
>
>
> When trying to move a single replica of non-replicated tablet using {{kudu 
> tablet change_config move_replica}}, the system adds a non-voter, then 
> promotes it, and then evict the newly added replica, doing it over and over 
> again.
> {noformat}
> Tablet 01701f0725e644d39b627045266f60bb of table 'impala::default.test1' is 
> recovering: 1 on-going t
> ablet copies
>   70f7ee61ead54b1885d819f354eb3405 (vc1316.halxg.cloudera.com:7050): RUNNING 
> [LEADER]
>   72fcec63e96f4248ae39d114eb3cd7c9 (vc1318.halxg.cloudera.com:7050): not 
> running [NONVOTER]
> State:   INITIALIZED
> Data state:  TABLET_DATA_COPYING
> Last status: Tablet Copy: Downloading block 4611686003499827794 (45/2464)
> All reported replicas are:
>   A = 70f7ee61ead54b1885d819f354eb3405
>   B = 72fcec63e96f4248ae39d114eb3cd7c9
> The consensus matrix is:
>  Config source | Replicas | Current term | Config index | Committed?
> ---+--+--+--+
>  master| A*  B~   |  |  | Yes
>  A | A*  B~   | 15   | 143335   | Yes
>  B | A   B~   | 15   | 143335   | Yes
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2448) Document how to temporarily decommission/shutdown a tablet server

2018-05-21 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2448:
-

 Summary: Document how to temporarily decommission/shutdown a 
tablet server
 Key: KUDU-2448
 URL: https://issues.apache.org/jira/browse/KUDU-2448
 Project: Kudu
  Issue Type: Task
  Components: documentation
Reporter: Grant Henke


I have been asked a few times the "best" way to shutdown a tablet server or 
master server for common maintenance like an OS upgrade, disk swap, etc. Even 
if it's straightforward, perhaps adding a section to the documentation 
[here|https://kudu.apache.org/docs/administration.html#_common_kudu_workflows] 
we make users confident about their process.

I have heard of increasing the 
[kudu-master_follower_unavailable_considered_failed_sec|https://kudu.apache.org/docs/configuration_reference.html#kudu-master_follower_unavailable_considered_failed_sec]
 configuration to avoid re-replication during these short planned outages. 
Perhaps that tip could be added if appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2449) Document best practices for adding a new tablet server

2018-05-21 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2449:
-

 Summary: Document best practices for adding a new tablet server
 Key: KUDU-2449
 URL: https://issues.apache.org/jira/browse/KUDU-2449
 Project: Kudu
  Issue Type: Task
  Components: documentation
Reporter: Grant Henke


Though adding a new tablet server is straightforward I have been asked 
questions on best practices a few times. The main follow up question to "how do 
I add new tablet servers?" is "Do I need to rebalance the data? How do I do 
that?" 

Perhaps a brief best practices guide 
[here|https://kudu.apache.org/docs/administration.html#_common_kudu_workflows] 
in the docs would make sense. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2449) Document best practices for adding a new tablet server

2018-05-21 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2449:
--
Labels: beginner  (was: )

> Document best practices for adding a new tablet server
> --
>
> Key: KUDU-2449
> URL: https://issues.apache.org/jira/browse/KUDU-2449
> Project: Kudu
>  Issue Type: Task
>  Components: documentation
>Reporter: Grant Henke
>Priority: Major
>  Labels: beginner
>
> Though adding a new tablet server is straightforward I have been asked 
> questions on best practices a few times. The main follow up question to "how 
> do I add new tablet servers?" is "Do I need to rebalance the data? How do I 
> do that?" 
> Perhaps a brief best practices guide 
> [here|https://kudu.apache.org/docs/administration.html#_common_kudu_workflows]
>  in the docs would make sense. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1889) Support OpenSSL 1.1.0

2018-05-21 Thread Adar Dembo (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo resolved KUDU-1889.
--
   Resolution: Fixed
Fix Version/s: 1.8.0

Fixed in 14080bb.

> Support OpenSSL 1.1.0
> -
>
> Key: KUDU-1889
> URL: https://issues.apache.org/jira/browse/KUDU-1889
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Dan Burkert
>Assignee: Adar Dembo
>Priority: Minor
> Fix For: 1.8.0
>
>
> We currently can't compile against OpenSSL 1.1.0.  Probably low priority 
> right now, but eventually distros are going to start shipping with 1.1.0 by 
> default.
> {code}
> [121/323] Building CXX object 
> src/kudu/security/CMakeFiles/security.dir/cert.cc.o
> FAILED: src/kudu/security/CMakeFiles/security.dir/cert.cc.o
> /usr/local/opt/ccache/libexec/c++   -DKUDU_HEADERS_NO_STUBS=1 
> -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 
> -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS 
> -Dsecurity_EXPORTS -Isrc -I../../src -isystem 
> ../../thirdparty/installed/common/include -isystem 
> ../../thirdparty/installed/uninstrumented/include 
> -I/usr/local/opt/openssl@1.1/include 
> -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall 
> -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing 
> -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments 
> -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g 
> -fPIC   -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/cert.cc.o -MF 
> src/kudu/security/CMakeFiles/security.dir/cert.cc.o.d -o 
> src/kudu/security/CMakeFiles/security.dir/cert.cc.o -c 
> /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc
> /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:29: error: member 
> access into incomplete type 'X509_req_st'
>   CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1)
> ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: 
> note: expanded from macro 'CHECK_GT'
> #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2)
>^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: 
> note: expanded from macro 'CHECK_OP'
>   CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal)
>  ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: 
> note: expanded from macro 'CHECK_OP_LOG'
>  google::GetReferenceableValue(val1),\
>^
> /usr/local/opt/openssl@1.1/include/openssl/x509.h:91:16: note: forward 
> declaration of 'X509_req_st'
> typedef struct X509_req_st X509_REQ;
>^
> /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:46: error: use of 
> undeclared identifier 'CRYPTO_LOCK_X509_REQ'
>   CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1)
>  ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: 
> note: expanded from macro 'CHECK_GT'
> #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2)
>^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: 
> note: expanded from macro 'CHECK_OP'
>   CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal)
>  ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: 
> note: expanded from macro 'CHECK_OP_LOG'
>  google::GetReferenceableValue(val1),\
>^
> 2 errors generated.
> [122/323] Building CXX object 
> src/kudu/security/CMakeFiles/security.dir/crypto.cc.o
> FAILED: src/kudu/security/CMakeFiles/security.dir/crypto.cc.o
> /usr/local/opt/ccache/libexec/c++   -DKUDU_HEADERS_NO_STUBS=1 
> -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 
> -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS 
> -Dsecurity_EXPORTS -Isrc -I../../src -isystem 
> ../../thirdparty/installed/common/include -isystem 
> ../../thirdparty/installed/uninstrumented/include 
> -I/usr/local/opt/openssl@1.1/include 
> -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall 
> -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing 
> -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments 
> -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g 
> -fPIC   -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/crypto.cc.o 
> -MF src/kudu/security/CMakeFiles/security.dir/crypto.cc.o.d-o 
> src/kudu/security/CMakeFiles/security.dir/crypto.cc.o -c 
> /Users/dan/src/cloudera/kudu/src/kudu/security/crypto

[jira] [Assigned] (KUDU-1889) Support OpenSSL 1.1.0

2018-05-21 Thread Adar Dembo (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo reassigned KUDU-1889:


Assignee: Adar Dembo

> Support OpenSSL 1.1.0
> -
>
> Key: KUDU-1889
> URL: https://issues.apache.org/jira/browse/KUDU-1889
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: Dan Burkert
>Assignee: Adar Dembo
>Priority: Minor
>
> We currently can't compile against OpenSSL 1.1.0.  Probably low priority 
> right now, but eventually distros are going to start shipping with 1.1.0 by 
> default.
> {code}
> [121/323] Building CXX object 
> src/kudu/security/CMakeFiles/security.dir/cert.cc.o
> FAILED: src/kudu/security/CMakeFiles/security.dir/cert.cc.o
> /usr/local/opt/ccache/libexec/c++   -DKUDU_HEADERS_NO_STUBS=1 
> -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 
> -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS 
> -Dsecurity_EXPORTS -Isrc -I../../src -isystem 
> ../../thirdparty/installed/common/include -isystem 
> ../../thirdparty/installed/uninstrumented/include 
> -I/usr/local/opt/openssl@1.1/include 
> -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall 
> -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing 
> -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments 
> -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g 
> -fPIC   -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/cert.cc.o -MF 
> src/kudu/security/CMakeFiles/security.dir/cert.cc.o.d -o 
> src/kudu/security/CMakeFiles/security.dir/cert.cc.o -c 
> /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc
> /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:29: error: member 
> access into incomplete type 'X509_req_st'
>   CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1)
> ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: 
> note: expanded from macro 'CHECK_GT'
> #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2)
>^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: 
> note: expanded from macro 'CHECK_OP'
>   CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal)
>  ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: 
> note: expanded from macro 'CHECK_OP_LOG'
>  google::GetReferenceableValue(val1),\
>^
> /usr/local/opt/openssl@1.1/include/openssl/x509.h:91:16: note: forward 
> declaration of 'X509_req_st'
> typedef struct X509_req_st X509_REQ;
>^
> /Users/dan/src/cloudera/kudu/src/kudu/security/cert.cc:158:46: error: use of 
> undeclared identifier 'CRYPTO_LOCK_X509_REQ'
>   CHECK_GT(CRYPTO_add(&data_->references, 1, CRYPTO_LOCK_X509_REQ), 1)
>  ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:770:48: 
> note: expanded from macro 'CHECK_GT'
> #define CHECK_GT(val1, val2) CHECK_OP(_GT, > , val1, val2)
>^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:741:26: 
> note: expanded from macro 'CHECK_OP'
>   CHECK_OP_LOG(name, op, val1, val2, google::LogMessageFatal)
>  ^
> ../../thirdparty/installed/uninstrumented/include/glog/logging.h:722:44: 
> note: expanded from macro 'CHECK_OP_LOG'
>  google::GetReferenceableValue(val1),\
>^
> 2 errors generated.
> [122/323] Building CXX object 
> src/kudu/security/CMakeFiles/security.dir/crypto.cc.o
> FAILED: src/kudu/security/CMakeFiles/security.dir/crypto.cc.o
> /usr/local/opt/ccache/libexec/c++   -DKUDU_HEADERS_NO_STUBS=1 
> -DKUDU_HEADERS_USE_RICH_SLICE=1 -DKUDU_HEADERS_USE_SHORT_STATUS_MACROS=1 
> -DKUDU_STATIC_DEFINE -DTCMALLOC_ENABLED -D__STDC_FORMAT_MACROS 
> -Dsecurity_EXPORTS -Isrc -I../../src -isystem 
> ../../thirdparty/installed/common/include -isystem 
> ../../thirdparty/installed/uninstrumented/include 
> -I/usr/local/opt/openssl@1.1/include 
> -I/System/Library/Frameworks/Kerberos.framework/Headers -msse4.2 -Wall 
> -Wno-sign-compare -Wno-deprecated -pthread -fno-strict-aliasing 
> -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -ggdb -Qunused-arguments 
> -Wno-ambiguous-member-template -Wdocumentation-deprecated-sync -std=c++11 -g 
> -fPIC   -fPIC -MD -MT src/kudu/security/CMakeFiles/security.dir/crypto.cc.o 
> -MF src/kudu/security/CMakeFiles/security.dir/crypto.cc.o.d-o 
> src/kudu/security/CMakeFiles/security.dir/crypto.cc.o -c 
> /Users/dan/src/cloudera/kudu/src/kudu/security/crypto.cc
> /Users/dan/src/cloudera/kudu/src/kudu/security/crypto.cc:82:33: 

[jira] [Assigned] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message

2018-05-21 Thread Fengling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengling Wang reassigned KUDU-1867:
---

Assignee: Fengling Wang

> Improve the "Could not lock .../block_manager_instance" error message
> -
>
> Key: KUDU-1867
> URL: https://issues.apache.org/jira/browse/KUDU-1867
> Project: Kudu
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Fengling Wang
>Priority: Major
>  Labels: newbie
>
> It's possible for users to encounter a rather cryptic error when trying to 
> run Kudu while it's already running or with a different user than what was 
> previously used:
> {code}
> Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could 
> not lock /path/to/data/block_manager_instance: Could not lock 
> /path/to/data/block_manager_instance: lock 
> /path/to/data/block_manager_instance: Resource temporarily unavailable (error 
> 11)
> {code}
> This is the log line that we FATAL with, so unless you already know what it 
> means you're left to your own guessing and log digging. Instead, the error 
> message could be more prescriptive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message

2018-05-21 Thread Fengling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengling Wang resolved KUDU-1867.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

> Improve the "Could not lock .../block_manager_instance" error message
> -
>
> Key: KUDU-1867
> URL: https://issues.apache.org/jira/browse/KUDU-1867
> Project: Kudu
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Fengling Wang
>Priority: Major
>  Labels: newbie
> Fix For: 1.2.0
>
>
> It's possible for users to encounter a rather cryptic error when trying to 
> run Kudu while it's already running or with a different user than what was 
> previously used:
> {code}
> Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could 
> not lock /path/to/data/block_manager_instance: Could not lock 
> /path/to/data/block_manager_instance: lock 
> /path/to/data/block_manager_instance: Resource temporarily unavailable (error 
> 11)
> {code}
> This is the log line that we FATAL with, so unless you already know what it 
> means you're left to your own guessing and log digging. Instead, the error 
> message could be more prescriptive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2450) Handle mutex init / destroy returning EAGAIN

2018-05-21 Thread Mike Percy (JIRA)
Mike Percy created KUDU-2450:


 Summary: Handle mutex init / destroy returning EAGAIN
 Key: KUDU-2450
 URL: https://issues.apache.org/jira/browse/KUDU-2450
 Project: Kudu
  Issue Type: Improvement
  Components: util
Affects Versions: 1.7.0
Reporter: Mike Percy


I saw the following in an rpc_server-test unit test failure:
{code:java}
F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . 
Device or resource busy
*** Check failure stack trace: ***
*** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are 
using GNU date ***
PC: @ 0x373b032625 __GI_raise
*** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from 
PID 16049; stack trace: ***
@ 0x373b40f710 (unknown) at ??:0
@ 0x373b032625 __GI_raise at ??:0
@ 0x373b033e05 __GI_abort at ??:0
@ 0x7f6bf1458a29 google::logging_fail() at ??:0
@ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0
@ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0
@ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0
@ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0
@ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0
@ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0
@ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0
@ 0x412a14 std::_Sp_counted_base<>::_M_release() at 
/opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163
@ 0x4121f3 std::__shared_count<>::~__shared_count() at 
/opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667
@ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0
@ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0
@ 0x373b035ebd __cxa_finalize at ??:0
@ 0x7f6bf38b60a3 (unknown) at ??:0
@ 0x373ac0ec3c _dl_fini at ??:0
@ 0x373b035b22 __GI_exit at ??:0
@ 0x373b01ed64 __libc_start_main at ??:0
@ 0x40e739 (unknown) at ??:0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2450) Handle mutex init / destroy returning EAGAIN

2018-05-21 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483260#comment-16483260
 ] 

Mike Percy commented on KUDU-2450:
--

Apparently pthread_mutex_init() and pthread_mutex_destroy() can return EAGAIN, 
requiring a retry.

I haven't dug into the kernel source to determine what can cause this return 
code.

> Handle mutex init / destroy returning EAGAIN
> 
>
> Key: KUDU-2450
> URL: https://issues.apache.org/jira/browse/KUDU-2450
> Project: Kudu
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 1.7.0
>Reporter: Mike Percy
>Priority: Major
>
> I saw the following in an rpc_server-test unit test failure:
> {code:java}
> F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . 
> Device or resource busy
> *** Check failure stack trace: ***
> *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are 
> using GNU date ***
> PC: @ 0x373b032625 __GI_raise
> *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from 
> PID 16049; stack trace: ***
> @ 0x373b40f710 (unknown) at ??:0
> @ 0x373b032625 __GI_raise at ??:0
> @ 0x373b033e05 __GI_abort at ??:0
> @ 0x7f6bf1458a29 google::logging_fail() at ??:0
> @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0
> @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0
> @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0
> @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x412a14 std::_Sp_counted_base<>::_M_release() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163
> @ 0x4121f3 std::__shared_count<>::~__shared_count() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667
> @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0
> @ 0x373b035ebd __cxa_finalize at ??:0
> @ 0x7f6bf38b60a3 (unknown) at ??:0
> @ 0x373ac0ec3c _dl_fini at ??:0
> @ 0x373b035b22 __GI_exit at ??:0
> @ 0x373b01ed64 __libc_start_main at ??:0
> @ 0x40e739 (unknown) at ??:0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2450) Handle mutex init / destroy returning EAGAIN

2018-05-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483397#comment-16483397
 ] 

Todd Lipcon commented on KUDU-2450:
---

Eagain on destroy indicates an attempt to destroy a locked mutex, so retrying 
is not appropriate

> Handle mutex init / destroy returning EAGAIN
> 
>
> Key: KUDU-2450
> URL: https://issues.apache.org/jira/browse/KUDU-2450
> Project: Kudu
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 1.7.0
>Reporter: Mike Percy
>Priority: Major
>
> I saw the following in an rpc_server-test unit test failure:
> {code:java}
> F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . 
> Device or resource busy
> *** Check failure stack trace: ***
> *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are 
> using GNU date ***
> PC: @ 0x373b032625 __GI_raise
> *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from 
> PID 16049; stack trace: ***
> @ 0x373b40f710 (unknown) at ??:0
> @ 0x373b032625 __GI_raise at ??:0
> @ 0x373b033e05 __GI_abort at ??:0
> @ 0x7f6bf1458a29 google::logging_fail() at ??:0
> @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0
> @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0
> @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0
> @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x412a14 std::_Sp_counted_base<>::_M_release() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163
> @ 0x4121f3 std::__shared_count<>::~__shared_count() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667
> @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0
> @ 0x373b035ebd __cxa_finalize at ??:0
> @ 0x7f6bf38b60a3 (unknown) at ??:0
> @ 0x373ac0ec3c _dl_fini at ??:0
> @ 0x373b035b22 __GI_exit at ??:0
> @ 0x373b01ed64 __libc_start_main at ??:0
> @ 0x40e739 (unknown) at ??:0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2450) pthread_mutex_destroy returns EBUSY in rpc_server-test failure

2018-05-21 Thread Mike Percy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Percy updated KUDU-2450:
-
Summary: pthread_mutex_destroy returns EBUSY in rpc_server-test failure  
(was: Handle mutex init / destroy returning EAGAIN)

> pthread_mutex_destroy returns EBUSY in rpc_server-test failure
> --
>
> Key: KUDU-2450
> URL: https://issues.apache.org/jira/browse/KUDU-2450
> Project: Kudu
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 1.7.0
>Reporter: Mike Percy
>Priority: Major
>
> I saw the following in an rpc_server-test unit test failure:
> {code:java}
> F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . 
> Device or resource busy
> *** Check failure stack trace: ***
> *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are 
> using GNU date ***
> PC: @ 0x373b032625 __GI_raise
> *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from 
> PID 16049; stack trace: ***
> @ 0x373b40f710 (unknown) at ??:0
> @ 0x373b032625 __GI_raise at ??:0
> @ 0x373b033e05 __GI_abort at ??:0
> @ 0x7f6bf1458a29 google::logging_fail() at ??:0
> @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0
> @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0
> @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0
> @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x412a14 std::_Sp_counted_base<>::_M_release() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163
> @ 0x4121f3 std::__shared_count<>::~__shared_count() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667
> @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0
> @ 0x373b035ebd __cxa_finalize at ??:0
> @ 0x7f6bf38b60a3 (unknown) at ??:0
> @ 0x373ac0ec3c _dl_fini at ??:0
> @ 0x373b035b22 __GI_exit at ??:0
> @ 0x373b01ed64 __libc_start_main at ??:0
> @ 0x40e739 (unknown) at ??:0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2450) pthread_mutex_destroy returns EBUSY in rpc_server-test failure

2018-05-21 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483412#comment-16483412
 ] 

Mike Percy commented on KUDU-2450:
--

Whoops, that is EBUSY not EAGAIN as Adar pointed out in Gerrit. I'm changing 
the subject line.

> pthread_mutex_destroy returns EBUSY in rpc_server-test failure
> --
>
> Key: KUDU-2450
> URL: https://issues.apache.org/jira/browse/KUDU-2450
> Project: Kudu
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 1.7.0
>Reporter: Mike Percy
>Priority: Major
>
> I saw the following in an rpc_server-test unit test failure:
> {code:java}
> F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . 
> Device or resource busy
> *** Check failure stack trace: ***
> *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are 
> using GNU date ***
> PC: @ 0x373b032625 __GI_raise
> *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from 
> PID 16049; stack trace: ***
> @ 0x373b40f710 (unknown) at ??:0
> @ 0x373b032625 __GI_raise at ??:0
> @ 0x373b033e05 __GI_abort at ??:0
> @ 0x7f6bf1458a29 google::logging_fail() at ??:0
> @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0
> @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0
> @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0
> @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x412a14 std::_Sp_counted_base<>::_M_release() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163
> @ 0x4121f3 std::__shared_count<>::~__shared_count() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667
> @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0
> @ 0x373b035ebd __cxa_finalize at ??:0
> @ 0x7f6bf38b60a3 (unknown) at ??:0
> @ 0x373ac0ec3c _dl_fini at ??:0
> @ 0x373b035b22 __GI_exit at ??:0
> @ 0x373b01ed64 __libc_start_main at ??:0
> @ 0x40e739 (unknown) at ??:0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KUDU-2450) pthread_mutex_destroy returns EBUSY in rpc_server-test failure

2018-05-21 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483260#comment-16483260
 ] 

Mike Percy edited comment on KUDU-2450 at 5/22/18 2:19 AM:
---

Edit: leaving the below for historicity but 16 is EBUSY which indicates a 
locked mutex.

The man page indicates EAGAIN is allowed but does not say when.

Original comment: Apparently pthread_mutex_init() and pthread_mutex_destroy() 
can return EAGAIN, requiring a retry. I haven't dug into the kernel source to 
determine what can cause this return code.


was (Author: mpercy):
Apparently pthread_mutex_init() and pthread_mutex_destroy() can return EAGAIN, 
requiring a retry.

I haven't dug into the kernel source to determine what can cause this return 
code.

> pthread_mutex_destroy returns EBUSY in rpc_server-test failure
> --
>
> Key: KUDU-2450
> URL: https://issues.apache.org/jira/browse/KUDU-2450
> Project: Kudu
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 1.7.0
>Reporter: Mike Percy
>Priority: Major
>
> I saw the following in an rpc_server-test unit test failure:
> {code:java}
> F0519 15:03:51.615512 16049 mutex.cc:77] Check failed: 0 == rv (0 vs. 16) . 
> Device or resource busy
> *** Check failure stack trace: ***
> *** Aborted at 1526767431 (unix time) try "date -d @1526767431" if you are 
> using GNU date ***
> PC: @ 0x373b032625 __GI_raise
> *** SIGABRT (@0x4523eb1) received by PID 16049 (TID 0x7f6bf0f84980) from 
> PID 16049; stack trace: ***
> @ 0x373b40f710 (unknown) at ??:0
> @ 0x373b032625 __GI_raise at ??:0
> @ 0x373b033e05 __GI_abort at ??:0
> @ 0x7f6bf1458a29 google::logging_fail() at ??:0
> @ 0x7f6bf145a31d google::LogMessage::Fail() at ??:0
> @ 0x7f6bf145c1dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6bf1459e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6bf145cc7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f6bf397773a kudu::Mutex::~Mutex() at ??:0
> @ 0x7f6bf39b87bb kudu::ThreadMgr::~ThreadMgr() at ??:0
> @ 0x7f6bf39be504 std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x412a14 std::_Sp_counted_base<>::_M_release() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:163
> @ 0x4121f3 std::__shared_count<>::~__shared_count() at 
> /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/bits/shared_ptr_base.h:667
> @ 0x7f6bf39b86ca std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7f6bf39b86e4 std::shared_ptr<>::~shared_ptr() at ??:0
> @ 0x373b035ebd __cxa_finalize at ??:0
> @ 0x7f6bf38b60a3 (unknown) at ??:0
> @ 0x373ac0ec3c _dl_fini at ??:0
> @ 0x373b035b22 __GI_exit at ??:0
> @ 0x373b01ed64 __libc_start_main at ??:0
> @ 0x40e739 (unknown) at ??:0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message

2018-05-21 Thread Fengling Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483456#comment-16483456
 ] 

Fengling Wang commented on KUDU-1867:
-

[~jdcryans] Yes? https://gerrit.cloudera.org/#/c/10419/

> Improve the "Could not lock .../block_manager_instance" error message
> -
>
> Key: KUDU-1867
> URL: https://issues.apache.org/jira/browse/KUDU-1867
> Project: Kudu
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Fengling Wang
>Priority: Major
>  Labels: newbie
> Fix For: 1.2.0
>
>
> It's possible for users to encounter a rather cryptic error when trying to 
> run Kudu while it's already running or with a different user than what was 
> previously used:
> {code}
> Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could 
> not lock /path/to/data/block_manager_instance: Could not lock 
> /path/to/data/block_manager_instance: lock 
> /path/to/data/block_manager_instance: Resource temporarily unavailable (error 
> 11)
> {code}
> This is the log line that we FATAL with, so unless you already know what it 
> means you're left to your own guessing and log digging. Instead, the error 
> message could be more prescriptive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message

2018-05-21 Thread Fengling Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengling Wang updated KUDU-1867:

Fix Version/s: (was: 1.2.0)
   1.8.0

> Improve the "Could not lock .../block_manager_instance" error message
> -
>
> Key: KUDU-1867
> URL: https://issues.apache.org/jira/browse/KUDU-1867
> Project: Kudu
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Fengling Wang
>Priority: Major
>  Labels: newbie
> Fix For: 1.8.0
>
>
> It's possible for users to encounter a rather cryptic error when trying to 
> run Kudu while it's already running or with a different user than what was 
> previously used:
> {code}
> Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could 
> not lock /path/to/data/block_manager_instance: Could not lock 
> /path/to/data/block_manager_instance: lock 
> /path/to/data/block_manager_instance: Resource temporarily unavailable (error 
> 11)
> {code}
> This is the log line that we FATAL with, so unless you already know what it 
> means you're left to your own guessing and log digging. Instead, the error 
> message could be more prescriptive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1867) Improve the "Could not lock .../block_manager_instance" error message

2018-05-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483424#comment-16483424
 ] 

Jean-Daniel Cryans commented on KUDU-1867:
--

[~fwang29] did you mean to resolve this as Fix Version 1.8.0?

> Improve the "Could not lock .../block_manager_instance" error message
> -
>
> Key: KUDU-1867
> URL: https://issues.apache.org/jira/browse/KUDU-1867
> Project: Kudu
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Fengling Wang
>Priority: Major
>  Labels: newbie
> Fix For: 1.2.0
>
>
> It's possible for users to encounter a rather cryptic error when trying to 
> run Kudu while it's already running or with a different user than what was 
> previously used:
> {code}
> Check failed: _s.ok() Bad status: IO error: Failed to load FS layout: Could 
> not lock /path/to/data/block_manager_instance: Could not lock 
> /path/to/data/block_manager_instance: lock 
> /path/to/data/block_manager_instance: Resource temporarily unavailable (error 
> 11)
> {code}
> This is the log line that we FATAL with, so unless you already know what it 
> means you're left to your own guessing and log digging. Instead, the error 
> message could be more prescriptive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)