[jira] [Created] (KUDU-1862) Add a ToolTest to test all the action help strings

2017-02-02 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1862:
-

 Summary: Add a ToolTest to test all the action help strings
 Key: KUDU-1862
 URL: https://issues.apache.org/jira/browse/KUDU-1862
 Project: Kudu
  Issue Type: Improvement
Reporter: Dinesh Bhat
Priority: Minor


We want a test to cover the help string displayed by actions of cli tools. For 
example:
{noformat}
TEST_F(ToolTest, TestActionHelp) {
  const vector kFormatActionRegexes = {
  "-fs_wal_dir \\(Directory",
  "-fs_data_dirs \\(Comma-separated list",
  "-uuid \\(The uuid"
  };
  NO_FATALS(RunTestHelp("fs format --help", kFormatActionRegexes));
  NO_FATALS(RunTestHelp("fs format extra_arg", kFormatActionRegexes,
  Status::InvalidArgument("too many arguments: 'extra_arg'")));
}
{noformat}
which is testing "kudu fs format" command's action help string. We need a 
test/method to test all the action help strings. We could make this more 
generic for future additions too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KUDU-1504) Add a tool to force a Raft config change

2017-01-10 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816818#comment-15816818
 ] 

Dinesh Bhat commented on KUDU-1504:
---

Great !! thanks for clarifying.

> Add a tool to force a Raft config change
> 
>
> Key: KUDU-1504
> URL: https://issues.apache.org/jira/browse/KUDU-1504
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus, ops-tooling
>Affects Versions: 0.9.0
>Reporter: Mike Percy
> Fix For: n/a
>
>
> It would be useful to implement a tool that allowed for an 
> administrator-controlled "forced" configuration change. This type of thing is 
> useful as a "parachute" if, for example, 2 nodes of a 3-node configuration 
> were permanently offline. An administrator may wish to accept whatever 
> potential data loss occurred and force the configuration to be composed of a 
> single node so that it could come back online and then grown back to a larger 
> configuration size again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1504) Add a tool to force a Raft config change

2017-01-09 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812824#comment-15812824
 ] 

Dinesh Bhat commented on KUDU-1504:
---

Isn't KUDU-1721 meant to address this as a 'feature' ? Do you see KUDU-1721, 
KUDU-1330 and this JIRA as 3 independent work items ?

> Add a tool to force a Raft config change
> 
>
> Key: KUDU-1504
> URL: https://issues.apache.org/jira/browse/KUDU-1504
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus, ops-tooling
>Affects Versions: 0.9.0
>Reporter: Mike Percy
>
> It would be useful to implement a tool that allowed for an 
> administrator-controlled "forced" configuration change. This type of thing is 
> useful as a "parachute" if, for example, 2 nodes of a 3-node configuration 
> were permanently offline. An administrator may wish to accept whatever 
> potential data loss occurred and force the configuration to be composed of a 
> single node so that it could come back online and then grown back to a larger 
> configuration size again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1618) Add local_replica tool to delete a replica

2017-01-05 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat resolved KUDU-1618.
---
  Resolution: Fixed
   Fix Version/s: 1.2.0
Target Version/s: 1.2.0  (was: 1.3.0)

> Add local_replica tool to delete a replica
> --
>
> Key: KUDU-1618
> URL: https://issues.apache.org/jira/browse/KUDU-1618
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling
>Affects Versions: 1.0.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
> Fix For: 1.2.0
>
>
> Occasionally we've hit cases where a tablet is corrupt in such a way that the 
> tserver fails to start or crashes soon after starting. Typically we'd prefer 
> the tablet just get marked FAILED but in the worst case it causes the whole 
> tserver to fail.
> For these cases we should add a 'local_replica' subtool to fully remove a 
> local tablet. Related, it might be useful to have a 'local_replica archive' 
> which would create a tarball from the data in this tablet for later 
> examination by developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1504) Add a tool to force a Raft config change

2017-01-05 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802434#comment-15802434
 ] 

Dinesh Bhat commented on KUDU-1504:
---

[~mpercy] I am going to put this as a dup of KUDU-1330, since they address the 
same issue, and some discussions have happened in that JIRA.

> Add a tool to force a Raft config change
> 
>
> Key: KUDU-1504
> URL: https://issues.apache.org/jira/browse/KUDU-1504
> Project: Kudu
>  Issue Type: New Feature
>  Components: consensus, ops-tooling
>Affects Versions: 0.9.0
>Reporter: Mike Percy
>
> It would be useful to implement a tool that allowed for an 
> administrator-controlled "forced" configuration change. This type of thing is 
> useful as a "parachute" if, for example, 2 nodes of a 3-node configuration 
> were permanently offline. An administrator may wish to accept whatever 
> potential data loss occurred and force the configuration to be composed of a 
> single node so that it could come back online and then grown back to a larger 
> configuration size again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1741) Make MiniCluster and ExternalMiniCluster follow one semantics for Restart

2017-01-04 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1741:
--
Description: Triaging this so that we can attend to this later. MiniCluster 
and ExternalMiniCluster utility classes follow opposite semantics for restart 
today and that's confusing. For eg, ExternalMiniCluster::Restart() expects that 
all the nodes must be shutdown before we do restart, whereas 
MiniCluster::Restart() expects that all servers are up and running so that it 
can internally execute shutdown and start in that order. We can keep one 
semantics here to make it less confusing. We may have to change bunch of 
existing tests whichever approach we end up choosing, because tests are also 
written following these semantics.  (was: Triaging this so that we can attend 
to this later. MiniCluster and ExternalMiniCluster utility classes follow 
opposite semantics today and that's confusing. For eg, 
ExternalMiniCluster::Restart() expects that all the nodes must be shutdown 
before we do restart, whereas MiniCluster::Restart() expects that all servers 
are up and running so that it can internally execute shutdown and start in that 
order. We can keep one semantics here to make it less confusing. We may have to 
change bunch of existing tests whichever approach we end up choosing, because 
tests are also written following these semantics.)

> Make MiniCluster and ExternalMiniCluster follow one semantics for Restart
> -
>
> Key: KUDU-1741
> URL: https://issues.apache.org/jira/browse/KUDU-1741
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Trivial
>
> Triaging this so that we can attend to this later. MiniCluster and 
> ExternalMiniCluster utility classes follow opposite semantics for restart 
> today and that's confusing. For eg, ExternalMiniCluster::Restart() expects 
> that all the nodes must be shutdown before we do restart, whereas 
> MiniCluster::Restart() expects that all servers are up and running so that it 
> can internally execute shutdown and start in that order. We can keep one 
> semantics here to make it less confusing. We may have to change bunch of 
> existing tests whichever approach we end up choosing, because tests are also 
> written following these semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KUDU-1608) Catalog Manager DeleteTablet retry logic is broken

2016-12-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat reassigned KUDU-1608:
-

Assignee: Dinesh Bhat

> Catalog Manager DeleteTablet retry logic is broken
> --
>
> Key: KUDU-1608
> URL: https://issues.apache.org/jira/browse/KUDU-1608
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Reporter: Dan Burkert
>Assignee: Dinesh Bhat
>
> There are a couple of issues with the Catalog Manager's retry logic for 
> DeleteTablet requests:
> 1. The retries loop indefinitely
> 2. The RPC response is checked against a whitelist of fatal errors, instead 
> of a list of retriable errors.  Additionally, we are missing many fatal 
> errors on this list such as WRONG_SERVER_UUID and UNKNOWN_ERROR.  I think we 
> should instead only retry on errors which we know we can recover from.
> 3. The catalog manager aggressively sends out DeleteTablet requests to tablet 
> servers when tablets are ejected from the group.  Arguably this should only 
> be done lazily when the dead tablets report in, since most of the time the 
> tablet will be ejected due to failure (and will never be seen again).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1798) File manager is broken on OS X

2016-12-09 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1798:
--
Priority: Major  (was: Blocker)

> File manager is broken on OS X
> --
>
> Key: KUDU-1798
> URL: https://issues.apache.org/jira/browse/KUDU-1798
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>
> {noformat}
> I1209 14:43:32.670500 1985855488 env_posix.cc:1263] Raising process file 
> limit from 256 to 9223372036854775807
> F1209 14:43:32.670521 1985855488 env_posix.cc:1266] Check failed: 
> setrlimit(RLIMIT_NOFILE, ) == 0 : Invalid argument [22]
> *** Check failure stack trace: ***
> @0x1068648bd  google::LogMessage::Flush()
> @0x106864743  google::LogMessage::~LogMessage()
> @0x106865685  google::ErrnoLogMessage::~ErrnoLogMessage()
> @0x106333a57  kudu::(anonymous 
> namespace)::PosixEnv::IncreaseOpenFileLimit()
> @0x1058b642f  
> kudu::fs::GetFileCacheCapacityForBlockManager()::$_0::operator()()
> @0x1058b63fb  
> _ZNSt3__117__call_once_proxyINS_5tupleIJOZN4kudu2fs35GetFileCacheCapacityForBlockManagerEPNS2_3EnvEE3$_0EvPv
> @ 0x7fff921cb6e9  std::__1::__call_once()
> @0x1058b5acb  kudu::fs::GetFileCacheCapacityForBlockManager()
> @0x1058f12ee  kudu::fs::FileBlockManager::FileBlockManager()
> @0x1058f1b85  kudu::fs::FileBlockManager::FileBlockManager()
> @0x1058ff046  kudu::FsManager::InitBlockManager()
> @0x1058fe6fc  kudu::FsManager::Init()
> @0x1058ff9fc  kudu::FsManager::Open()
> @0x10408d737  kudu::server::ServerBase::Init()
> @0x103759d2c  kudu::master::Master::Init()
> @0x1036a6837  kudu::master::MasterMain()
> @0x1036a6132  main
> @ 0x7fff86a7e5ad  start
> @   0x13  (unknown)
> ../../src/kudu/integration-tests/master-stress-test.cc:124: Failure
> Failed
> Bad status: Runtime error: Failed to add distributed masters: Unable to start 
> Master at index 0: 
> /Users/dinesh/Documents/kudu_exp2/kudu/build/debug/./bin/kudu-master: process 
> exited on signal 6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1798) File manager is broken on OS X

2016-12-09 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1798:
-

 Summary: File manager is broken on OS X
 Key: KUDU-1798
 URL: https://issues.apache.org/jira/browse/KUDU-1798
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Dinesh Bhat
Assignee: Dinesh Bhat
Priority: Blocker


{noformat}
I1209 14:43:32.670500 1985855488 env_posix.cc:1263] Raising process file limit 
from 256 to 9223372036854775807
F1209 14:43:32.670521 1985855488 env_posix.cc:1266] Check failed: 
setrlimit(RLIMIT_NOFILE, ) == 0 : Invalid argument [22]
*** Check failure stack trace: ***
@0x1068648bd  google::LogMessage::Flush()
@0x106864743  google::LogMessage::~LogMessage()
@0x106865685  google::ErrnoLogMessage::~ErrnoLogMessage()
@0x106333a57  kudu::(anonymous 
namespace)::PosixEnv::IncreaseOpenFileLimit()
@0x1058b642f  
kudu::fs::GetFileCacheCapacityForBlockManager()::$_0::operator()()
@0x1058b63fb  
_ZNSt3__117__call_once_proxyINS_5tupleIJOZN4kudu2fs35GetFileCacheCapacityForBlockManagerEPNS2_3EnvEE3$_0EvPv
@ 0x7fff921cb6e9  std::__1::__call_once()
@0x1058b5acb  kudu::fs::GetFileCacheCapacityForBlockManager()
@0x1058f12ee  kudu::fs::FileBlockManager::FileBlockManager()
@0x1058f1b85  kudu::fs::FileBlockManager::FileBlockManager()
@0x1058ff046  kudu::FsManager::InitBlockManager()
@0x1058fe6fc  kudu::FsManager::Init()
@0x1058ff9fc  kudu::FsManager::Open()
@0x10408d737  kudu::server::ServerBase::Init()
@0x103759d2c  kudu::master::Master::Init()
@0x1036a6837  kudu::master::MasterMain()
@0x1036a6132  main
@ 0x7fff86a7e5ad  start
@   0x13  (unknown)
../../src/kudu/integration-tests/master-stress-test.cc:124: Failure
Failed
Bad status: Runtime error: Failed to add distributed masters: Unable to start 
Master at index 0: 
/Users/dinesh/Documents/kudu_exp2/kudu/build/debug/./bin/kudu-master: process 
exited on signal 6
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1796) ToolTest.TestLocalReplicaTombstoneDelete fails sporadically

2016-12-07 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1796:
--
Description: 
Flaky test dashboard shows following assert quite frequently:
{noformat}
/data1/jenkins-workspace/kudu-workspace/src/kudu/tools/kudu-tool-test.cc:1203: 
Failure
Value of: tombstoned_opid.index()
  Actual: 205
Expected: last_logged_opid.index()
Which is: 206
{noformat}

It is not clear whether this is a test bug or if this is catching a genuine 
bug, for now labeling this with test component.


  was:
Flaky test dashboard shows following assert quite frequently:
{noformat}
/data1/jenkins-workspace/kudu-workspace/src/kudu/tools/kudu-tool-test.cc:1203: 
Failure
Value of: tombstoned_opid.index()
  Actual: 205
Expected: last_logged_opid.index()
Which is: 206
{noformat}




> ToolTest.TestLocalReplicaTombstoneDelete fails sporadically
> ---
>
> Key: KUDU-1796
> URL: https://issues.apache.org/jira/browse/KUDU-1796
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.1
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>  Labels: test
>
> Flaky test dashboard shows following assert quite frequently:
> {noformat}
> /data1/jenkins-workspace/kudu-workspace/src/kudu/tools/kudu-tool-test.cc:1203:
>  Failure
> Value of: tombstoned_opid.index()
>   Actual: 205
> Expected: last_logged_opid.index()
> Which is: 206
> {noformat}
> It is not clear whether this is a test bug or if this is catching a genuine 
> bug, for now labeling this with test component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1796) ToolTest.TestLocalReplicaTombstoneDelete fails sporadically

2016-12-07 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1796:
-

 Summary: ToolTest.TestLocalReplicaTombstoneDelete fails 
sporadically
 Key: KUDU-1796
 URL: https://issues.apache.org/jira/browse/KUDU-1796
 Project: Kudu
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.1
Reporter: Dinesh Bhat
Assignee: Dinesh Bhat


Flaky test dashboard shows following assert quite frequently:
{noformat}
/data1/jenkins-workspace/kudu-workspace/src/kudu/tools/kudu-tool-test.cc:1203: 
Failure
Value of: tombstoned_opid.index()
  Actual: 205
Expected: last_logged_opid.index()
Which is: 206
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-658) Investigate a checksum error that happened on the ITBLL cluster

2016-12-06 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726240#comment-15726240
 ] 

Dinesh Bhat commented on KUDU-658:
--

Saw a failure on jenkins with my patch: 
http://104.196.14.100/job/kudu-gerrit/5247/ seems like this issue.

> Investigate a checksum error that happened on the ITBLL cluster
> ---
>
> Key: KUDU-658
> URL: https://issues.apache.org/jira/browse/KUDU-658
> Project: Kudu
>  Issue Type: Bug
>  Components: fs
>Affects Versions: M5
>Reporter: Jean-Daniel Cryans
>
> I'm logging this to dump the information I have about the problem, which we 
> might just not be handling.
> {noformat}
> F0313 20:21:04.065847  4036 tablet_server_main.cc:39] Check failed: _s.ok() 
> Bad status: Corruption: Failed to load FS layout: /data/10/kudu-jenkins/data: 
> Incorrect checksum of file 
> /data/10/kudu-jenkins/data/cf28b919d37d4cb3acd6a8fb1b3d8870.metadata: 
> actually 1214729159, expected 0
> {noformat}
> The files are located in a2414.halxg.cloudera.com:/home/jdcryans/KUDU-658



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1769) Add a tool to garbage collect orphan data blocks

2016-11-29 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1769:
-

 Summary: Add a tool to garbage collect orphan data blocks
 Key: KUDU-1769
 URL: https://issues.apache.org/jira/browse/KUDU-1769
 Project: Kudu
  Issue Type: Task
  Components: tablet
Reporter: Dinesh Bhat


There could be cases where we find orphan data blocks not referenced by any 
tablet metadata:
- Crash during a flush/compaction
- Crash during a tablet copy could orphan the data blocks copied so far.
- Bugs 

In such situations, it is useful to have a manual tool to delete all the orphan 
data blocks for which there are no metadata. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1741) Make MiniCluster and ExternalMiniCluster follow one semantics for Start/Shutdown/Restart

2016-11-04 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1741:
-

 Summary: Make MiniCluster and ExternalMiniCluster follow one 
semantics for Start/Shutdown/Restart
 Key: KUDU-1741
 URL: https://issues.apache.org/jira/browse/KUDU-1741
 Project: Kudu
  Issue Type: Task
Reporter: Dinesh Bhat
Priority: Trivial


Triaging this so that we can attend to this later. MiniCluster and 
ExternalMiniCluster utility classes follow opposite semantics today and that's 
confusing. For eg, ExternalMiniCluster::Restart() expects that all the nodes 
must be shutdown before we do restart, whereas MiniCluster::Restart() expects 
that all servers are up and running so that it can internally execute shutdown 
and start in that order. We can keep one semantics here to make it less 
confusing. We may have to change bunch of existing tests whichever approach we 
end up choosing, because tests are also written following these semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KUDU-1613) Under certain circumstances, tablet leader does not evict failed replica

2016-11-04 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat reassigned KUDU-1613:
-

Assignee: Dinesh Bhat

> Under certain circumstances, tablet leader does not evict failed replica
> 
>
> Key: KUDU-1613
> URL: https://issues.apache.org/jira/browse/KUDU-1613
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus, tablet
>Affects Versions: 1.0.0
>Reporter: Adar Dembo
>Assignee: Dinesh Bhat
>Priority: Critical
>
> Dan found this while working on Kudu training material.
> Suppose you have a three node cluster and a table with a singleton tablet 
> (replicated three times). Now suppose you stopped one tserver, deleted all of 
> its on-disk data, then restarted it.
> You would expect the following:
> # The tablet's leader replica can no longer reach the replica on the 
> reformatted tserver.
> # The leader will evict that replica.
> # The master will notice the tablet's under-replication and ask the leader to 
> add a new replica, probably on the reformatted node.
> Instead, there's no eviction at all. The leader replica keeps spewing 
> messages like this in its log:
> {noformat}
> W0913 14:13:18.411238 22597 consensus_peers.cc:332] T 
> 89dfba0c0a714259acf69d9f611e1e92 P 1540ac6e6cb44c2c9f9c6c6c98fd61f7 -> Peer 
> cc2ef23f1c2c42b7a6a02d7183d92884 (dan-test-g-2.gce.cloudera.com:7050): 
> Couldn't send request to peer cc2ef23f1c2c42b7a6a02d7183d92884 for tablet 
> 89dfba0c0a714259acf69d9f611e1e92. Error code: WRONG_SERVER_UUID (16). Status: 
> Invalid argument: UpdateConsensus: Wrong destination UUID requested. Local 
> UUID: ef3ea81d59fc4a91b754cfe63b21e6ee. Requested UUID: 
> cc2ef23f1c2c42b7a6a02d7183d92884. Retrying in the next heartbeat period. 
> Already tried 5821 times.
> {noformat}
> Having looked at the code responsible for starting replica eviction 
> (PeerMessageQueue::RequestForPeer) and the code spewing that error 
> (Peer::ProcessResponseError), I think I see what's going on. The eviction 
> code in RequestforPeer() checks the peer's "last successful communication 
> time" to decide whether to evict or not. Intuitively you'd expect that time 
> to be updated only when the peer responds successfully, but there are a 
> couple cases in Peer::ProcessResponseError where we update the last 
> communication time anyway. Notably:
> # If the RPC controller yielded a RemoteError, or
> # If the RPC controller had no error but the response itself contained an 
> error, and the error's code was not TABLET_NOT_FOUND, or
> # If the RPC controller and the response had no error, but the response's 
> status had an error, and that error's code was CANNOT_PREPARE.
> I think we're hitting case #2, because there should be no RPC controller 
> error (the reformatted tserver did respond to the leader replica), but the 
> response does contain a WRONG_SERVER_UUID error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1740) Split the kudu-tool-test into multiple binaries

2016-11-03 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1740:
-

 Summary: Split the kudu-tool-test into multiple binaries
 Key: KUDU-1740
 URL: https://issues.apache.org/jira/browse/KUDU-1740
 Project: Kudu
  Issue Type: Test
Reporter: Dinesh Bhat
Assignee: Dinesh Bhat
Priority: Trivial


Per Mike's comment here, it would be good to split existing kudu-tool-test into 
multiple files so that we can take advantage of running some of these tests in 
parallel. Also, as the test suite is getting bigger, it reduces the overall 
test run time as well. We can leverage this cleanup to migrate all the tool 
related tests we have under kudu-admin-test.
https://gerrit.cloudera.org/#/c/4834/6/src/kudu/tools/kudu-tool-test.cc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Comment: was deleted

(was: commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec
)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Comment: was deleted

(was: commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec
)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Comment: was deleted

(was: commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec
)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat reopened KUDU-1566:
---

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621472#comment-15621472
 ] 

Dinesh Bhat commented on KUDU-1566:
---

commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec


> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat resolved KUDU-1566.
---
Resolution: Duplicate

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Fix Version/s: 1.1.0

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621453#comment-15621453
 ] 

Dinesh Bhat commented on KUDU-1566:
---

commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec


> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621451#comment-15621451
 ] 

Dinesh Bhat commented on KUDU-1566:
---

commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec


> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621447#comment-15621447
 ] 

Dinesh Bhat commented on KUDU-1566:
---

commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec


> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621446#comment-15621446
 ] 

Dinesh Bhat commented on KUDU-1566:
---

commit c105fefde20798e9f3671419ca9638a0affea1e5
Author: Dinesh Bhat 
Date:   Tue Oct 25 23:49:29 2016 -0700

[scripts] KUDU-1566: Update jira fields automatically

This script is intended to be merged with our existing tools - like
pre-push or post-commit, for all commits which start with a JIRA id
in their COMMIT_MSG. This helps to track the review patches to JIRA
domain and vice versa thereby making it easy to navigate between the
two worlds. Currently it comments on the JIRA issue by parsing the
commit log for the presence of 'KUDU-[1-9]+' string, and it also
updates a review link against 'Code Review' field in jira.

Also, it can be leveraged to automatically resolve a JIRA if optional
'--resolve=True' is provided.

Change-Id: I4519ee0b83f9af03ba55f0eacc0553e86a3f13ec


> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-31 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Code Review: https://gerrit.cloudera.org/#/c/4852/  (was: 
https://example_gerrit_link)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Fix Version/s: (was: 1.1.0)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat reopened KUDU-1566:
---

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Resolution: Fixed
Status: Resolved  (was: In Review)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Status: In Review  (was: In Progress)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
   Resolution: Fixed
Fix Version/s: 1.1.0
   Status: Resolved  (was: In Review)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat reopened KUDU-1566:
---

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
> Fix For: 1.1.0
>
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-30 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Status: In Review  (was: In Progress)

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-10-25 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1566:
--
Code Review: https://example_gerrit_link

> JIRA updater script linking a gerrit patch to JIRA automatically
> 
>
> Key: KUDU-1566
> URL: https://issues.apache.org/jira/browse/KUDU-1566
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>Priority: Minor
>
> At times, I have found it hard to track a particular JIRA to a gerrit patch 
> and vice versa to gain more context on a submitted change, code review 
> discussions, etc. I am hoping this will bridge the gap between the review 
> system and JIRA tracking.
> Currently, all of our commits do not carry JIRA numbers, but this could be 
> applicable to whichever gerrit patch carries one in its commit message. I 
> have come across such scripts before, so spinning one shouldn't be that hard. 
> Though not as fancy as the below link, we could just add a gerritt link to 
> JIRA comment section whenever a change is submitted(or perhaps posted for 
> review).
> https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1618) Add local_replica tool to delete a replica

2016-10-25 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1618:
--
Code Review: https://gerrit.cloudera.org/#/c/4834/

> Add local_replica tool to delete a replica
> --
>
> Key: KUDU-1618
> URL: https://issues.apache.org/jira/browse/KUDU-1618
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling
>Affects Versions: 1.0.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>
> Occasionally we've hit cases where a tablet is corrupt in such a way that the 
> tserver fails to start or crashes soon after starting. Typically we'd prefer 
> the tablet just get marked FAILED but in the worst case it causes the whole 
> tserver to fail.
> For these cases we should add a 'local_replica' subtool to fully remove a 
> local tablet. Related, it might be useful to have a 'local_replica archive' 
> which would create a tarball from the data in this tablet for later 
> examination by developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1483) in some cases, followers cannot promote to leader.

2016-10-25 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606404#comment-15606404
 ] 

Dinesh Bhat commented on KUDU-1483:
---

'/bin/kudu tablet change_config|leader_step_down' has some variant of that 
functionality, but I must also say that, I have used these tools in slightly 
different scenarios than the one indicated above and I am not sure if they help 
at all. If possible, we could simulate some of these errors from tests and see 
if the recovery is possible via these tools.

> in some cases, followers cannot promote to leader.
> --
>
> Key: KUDU-1483
> URL: https://issues.apache.org/jira/browse/KUDU-1483
> Project: Kudu
>  Issue Type: Bug
>Reporter: zhangsong
>
> in my env, a tablet only has two follower on master's webui, that situation 
> last forever.
> Some logs about the tablet on two followers log:
> follower1:
>  I0613 11:16:33.244365 26846 leader_election.cc:223] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e 
> [CANDIDATE]: Term 31717 election: Requesting vote from
>  peer 8cf59ddd6d154ae99d3b23da840169e0W0613 11:16:33.247150 26016 
> leader_election.cc:281] T 87588b06c65d4898a5b8c29d08b3528d P 
> eded59517b14432ab9022cd50d160b8e [CANDIDATE]: Term 31717 election: Tablet 
> error from VoteRequest() call to peer 8cf59ddd6d154ae99d3b23da840169e0: 
> Illegal state: Tablet not RUN
> NING: FAILED: Not found: Can't find block: 1363326557009763249I0613 
> 11:16:33.247463 26016 leader_election.cc:248] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e 
> [CANDIDATE]: Term 31717 election: Election decided. Re
> sult: candidate lost.I0613 11:16:33.248205 17534 raft_consensus.cc:1942] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Snoozing failure detection for election timeout plus an 
> additional 15.536s
> I0613 11:16:33.248245 17534 raft_consensus.cc:1795] T 
> 87588b06c65d4898a5b8c29d08b3528d P
>  eded59517b14432ab9022cd50d160b8e [term 31717 FOLLOWER]: Leader election lost 
> for term 3
> 1717. Reason: None given
> sult: candidate lost.I0613 11:16:33.248205 17534 raft_consensus.cc:1942] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Snoozing failure detection for election timeout plus an 
> additional 15.536sI0613 11:16:33.248245 17534 raft_consensus.cc:1795] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election lost for term 31717. Reason: None given
> I0613 11:16:34.288436 26137 raft_consensus.cc:1298] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Handling vote request from an unknown peer 
> 95bc8f3637ed4a52b53a984052ba6114
> I0613 11:16:34.288633 26137 raft_consensus.cc:1558] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31666. Current term is 
> 31717.
> I0613 11:16:41.506261 26127 raft_consensus.cc:1298] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Handling vote request from an unknown peer 
> 95bc8f3637ed4a52b53a984052ba6114
> I0613 11:16:41.506325 26127 raft_consensus.cc:1558] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31667. Current term is 
> 31717.
> I0613 11:16:45.440551 26135 raft_consensus.cc:1298] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Handling vote request from an unknown peer 
> 95bc8f3637ed4a52b53a984052ba6114
> I0613 11:16:45.440625 26135 raft_consensus.cc:1558] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31668. Current term is 
> 31717.
> it seems that there are three follower/voters  and one of it has tablet in 
> "not running" state.
> on the other follower:
> W0613 11:16:45.437863 18782 leader_election.cc:281] T 
> 87588b06c65d4898a5b8c29d08b3528d P 95bc8f3637ed4a52b53a984052ba6114 
> [CANDIDATE]: Term 31668 election: Tablet error from VoteRequest() call to 
> peer 8cf59ddd6d154ae99d3b23da840169e0: Illegal state: Tablet not RUNNING: 
> FAILED: Not found: Can't find block: 1363326557009763249
> W0613 11:16:45.438611 18782 leader_election.cc:333] T 
> 87588b06c65d4898a5b8c29d08b3528d P 95bc8f3637ed4a52b53a984052ba6114 
> [CANDIDATE]: Term 31668 election: Vote denied by peer 
> eded59517b14432ab9022cd50d160b8e 

[jira] [Commented] (KUDU-1618) Add local_replica tool to delete a replica

2016-10-25 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606212#comment-15606212
 ] 

Dinesh Bhat commented on KUDU-1618:
---

Thanks [~tlipcon], agreed to your points above that this is not a bug. I 
confirmed that ksck as of now doesn't know about this spurious replica 
resulting from the tool's action. I wonder if it's even possible to show this 
info via ksck because I guess these reports are not sent to master ? 

> Add local_replica tool to delete a replica
> --
>
> Key: KUDU-1618
> URL: https://issues.apache.org/jira/browse/KUDU-1618
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling
>Affects Versions: 1.0.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>
> Occasionally we've hit cases where a tablet is corrupt in such a way that the 
> tserver fails to start or crashes soon after starting. Typically we'd prefer 
> the tablet just get marked FAILED but in the worst case it causes the whole 
> tserver to fail.
> For these cases we should add a 'local_replica' subtool to fully remove a 
> local tablet. Related, it might be useful to have a 'local_replica archive' 
> which would create a tarball from the data in this tablet for later 
> examination by developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1705) [Python] - PIP Install is not installing Cython dependency automatically

2016-10-24 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603033#comment-15603033
 ] 

Dinesh Bhat commented on KUDU-1705:
---

Not sure if this is a useful info for this bug, I had to upgrade the setuptools 
via pip before installing kudu-python in one of the centos6.6 machine because 
of following error:

 - yum install -y gcc-c++ python-devel python-pip
 - easy_install pip
 - pip install --upgrade setuptools
 - pip install Cython
 - pip install six
 - pip install kudu-python

{noformat}
virtualbox-iso: Using /usr/lib/python2.6/site-packages
virtualbox-iso: Processing dependencies for pip
virtualbox-iso: Finished processing dependencies for pip
virtualbox-iso: + pip install Cython
virtualbox-iso: 
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. For more information, see 
https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
virtualbox-iso: InsecurePlatformWarning
virtualbox-iso: You are using pip version 7.1.0, however version 8.1.2 is 
available.
virtualbox-iso: You should consider upgrading via the 'pip install 
--upgrade pip' command.
virtualbox-iso: Collecting Cython
virtualbox-iso: 
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. For more information, see 
https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
virtualbox-iso: InsecurePlatformWarning
virtualbox-iso: Downloading Cython-0.24.1.tar.gz (1.7MB)
virtualbox-iso: Installing collected packages: Cython
virtualbox-iso: Running setup.py install for Cython
virtualbox-iso: Successfully installed Cython-0.24.1
virtualbox-iso: + pip install six
virtualbox-iso: You are using pip version 7.1.0, however version 8.1.2 is 
available.
virtualbox-iso: You should consider upgrading via the 'pip install 
--upgrade pip' command.
virtualbox-iso: Collecting six
virtualbox-iso: 
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. For more information, see 
https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
virtualbox-iso: InsecurePlatformWarning
virtualbox-iso: Downloading six-1.10.0-py2.py3-none-any.whl
virtualbox-iso: Installing collected packages: six
virtualbox-iso: Successfully installed six-1.10.0
virtualbox-iso: + pip install kudu-python
virtualbox-iso: You are using pip version 7.1.0, however version 8.1.2 is 
available.
virtualbox-iso: You should consider upgrading via the 'pip install 
--upgrade pip' command.
virtualbox-iso: Collecting kudu-python
virtualbox-iso: 
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. For more information, see 
https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
virtualbox-iso: InsecurePlatformWarning
virtualbox-iso: Downloading kudu-python-0.3.0.tar.gz (212kB)
virtualbox-iso: Complete output from command python setup.py egg_info:
virtualbox-iso: Building from system prefix /usr
virtualbox-iso: Command "python setup.py egg_info" failed with error code 1 
in /tmp/pip-build-JlF4dw/kudu-python
virtualbox-iso:
virtualbox-iso: Installed 
/tmp/easy_install-gkEbZX/pytest-runner-2.9/setuptools_scm-1.15.0-py2.6.egg
virtualbox-iso: your setuptools is too old (<12)
virtualbox-iso: setuptools_scm functionality is degraded
virtualbox-iso: zip_safe flag not set; analyzing archive contents...
virtualbox-iso:
virtualbox-iso: Installed 
/tmp/pip-build-JlF4dw/kudu-python/pytest_runner-2.9-py2.6.egg
virtualbox-iso: running egg_info
virtualbox-iso: creating pip-egg-info/kudu_python.egg-info
virtualbox-iso: writing requirements to 
pip-egg-info/kudu_python.egg-info/requires.txt
virtualbox-iso: writing pip-egg-info/kudu_python.egg-info/PKG-INFO
virtualbox-iso: writing top-level names to 
pip-egg-info/kudu_python.egg-info/top_level.txt
virtualbox-iso: writing dependency_links to 
pip-egg-info/kudu_python.egg-info/dependency_links.txt
virtualbox-iso: writing manifest file 

[jira] [Commented] (KUDU-1718) Leader is unable to evict a FAILED replica

2016-10-21 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595908#comment-15595908
 ] 

Dinesh Bhat commented on KUDU-1718:
---

[~mpercy] could this a manifestation of KUDU-1613 ? We could keep this as 
'related' if we think this issue is not WAL specific.

> Leader is unable to evict a FAILED replica
> --
>
> Key: KUDU-1718
> URL: https://issues.apache.org/jira/browse/KUDU-1718
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.0.0
>Reporter: Mike Percy
>
> Investigate reported issue where a tablet comes up as FAILED for some reason 
> (maybe WAL corruption) and the leader cannot evict it / delete it from the 
> Raft config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1573) In corner cases , tablet could not recovery successfully from node crash.

2016-10-19 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590278#comment-15590278
 ] 

Dinesh Bhat commented on KUDU-1573:
---

[~bruceSz] we pre-allocate the log segments for performance reasons, which 
means fallocate() would have zero'ed them out. 

> In corner cases , tablet  could not recovery successfully from  node crash.
> ---
>
> Key: KUDU-1573
> URL: https://issues.apache.org/jira/browse/KUDU-1573
> Project: Kudu
>  Issue Type: Bug
>Reporter: zhangsong
> Attachments: wal_recovery_dir.zip
>
>
> Last friday ,one of nodes of my kudu cluster crashed and , tablet can not 
> recovery successfully after restart kudu-tserver, i observed error messages 
> from log:
> (TABLET_DATA_READY): Corruption: Could not open LogReader. Reason: Unable to 
> initialize log reader: Segment sequence numbers are not consecutive. Previous 
> segment: seqno 0, path 
> /export/servers/kudu/tserver_wal_data_7052/wals/ed0d8b3a835e4c27afe695252ad0b8f5.recovery/wal-00018;
>  Current segment: seqno 17, path 
> /export/servers/kudu/tserver_wal_data_7052/wals/ed0d8b3a835e4c27afe695252ad0b8f5.recovery/wal-00017



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1618) Add local_replica tool to delete a replica

2016-10-14 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576389#comment-15576389
 ] 

Dinesh Bhat commented on KUDU-1618:
---

[~tlipcon] thanks for a quick reply, by 'shouldn't have a replica' in above 
comment, you meant current tablet server where we are trying to bring up the 
replica, is not part of raft config for that tablet anymore right ? It has 
other tservers as replicas at this point. That makes sense. I believe tserver 
keeps trying until there may be another change_config in future which brings in 
this tserver as replica for that tablet.
One follow up Qn is: What state should the replica be in after step 6 ? I see 
it in RUNNING state, which was slightly confusing, because this replica isn't 
an active replica at this point.

> Add local_replica tool to delete a replica
> --
>
> Key: KUDU-1618
> URL: https://issues.apache.org/jira/browse/KUDU-1618
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling
>Affects Versions: 1.0.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>
> Occasionally we've hit cases where a tablet is corrupt in such a way that the 
> tserver fails to start or crashes soon after starting. Typically we'd prefer 
> the tablet just get marked FAILED but in the worst case it causes the whole 
> tserver to fail.
> For these cases we should add a 'local_replica' subtool to fully remove a 
> local tablet. Related, it might be useful to have a 'local_replica archive' 
> which would create a tarball from the data in this tablet for later 
> examination by developers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KUDU-1618) Add local_replica tool to delete a replica

2016-10-14 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576176#comment-15576176
 ] 

Dinesh Bhat edited comment on KUDU-1618 at 10/14/16 7:29 PM:
-

I was trying to repro an issue where I was not able to do a remote tablet copy 
onto a local_replica if the tablet was DELETE_TOMBSTONED(but has metadata file 
present). However along with the issue reproduction, I saw one state of the 
replica which was confusing. Here are the steps I executed:
1. Bring up a cluster with 1 master, 3 tablet servers hosting 3 tablets, each 
tablet had 3 replicas.
2. There was a standby tserver which was added later.
3. KILL one tserver, after 5 mins, all replicas on that tserver failover to new 
standby with a change_config.
{noformat}
I1013 16:31:48.183486 26604 raft_consensus_state.cc:533] T 
048c7d202da3469eb1b1973df9510007 P b11d2af1457b4542808407b4d4d1bd29 [term 5 
FOLLOWER]: Committing config change with OpId 5.5: config changed from index 4 
to 5, VOTER 19acc272821d425582d3dfb9ed2ab7cd (127.61.33.8) added. New config: { 
opid_index: 5 OBSOLETE_local: false peers { permanent_uuid: 
"9acfc108d9b446c1be783b6d6e7b49ef" member_type: VOTER last_known_addr { host: 
"127.95.58.0" port: 33932 } } peers { permanent_uuid: 
"b11d2af1457b4542808407b4d4d1bd29" member_type: VOTER last_known_addr { host: 
"127.95.58.2" port: 40670 } } peers { permanent_uuid: 
"19acc272821d425582d3dfb9ed2ab7cd" member_type: VOTER last_known_addr { host: 
"127.61.33.8" port: 63532 } } }
I1013 16:31:48.184077 26143 catalog_manager.cc:2800] AddServer ChangeConfig RPC 
for tablet 048c7d202da3469eb1b1973df9510007 on TS 
9acfc108d9b446c1be783b6d6e7b49ef (127.95.58.0:33932) with cas_config_opid_index 
4: Change config succeeded
{noformat}
4. Use 'local_replica copy_from_remote' to copy one tablet replica before 
bringing up, the command fails:
{noformat}
I1013 16:43:41.523896 30948 tablet_copy_service.cc:124] Beginning new tablet 
copy session on tablet 048c7d202da3469eb1b1973df9510007 from peer 
bb2517bc5f2b4980bb07c06019b5a8e9 at {real_user=dinesh, eff_user=} at 
127.61.33.8:40240: session id = 
bb2517bc5f2b4980bb07c06019b5a8e9-048c7d202da3469eb1b1973df9510007
I1013 16:43:41.524291 30948 tablet_copy_session.cc:142] T 
048c7d202da3469eb1b1973df9510007 P 19acc272821d425582d3dfb9ed2ab7cd: Tablet 
Copy: opened 0 blocks and 1 log segments
Already present: Tablet already exists: 048c7d202da3469eb1b1973df9510007
{noformat}
5. Remove the metadata file and WAL log for that tablet, and the 
copy_from_fremote succeeds at this point(expected).
6. Bring up the killed tserver, now all replicas on this are tombstoned except 
one tablet for which we did a copy_from_remote in step 5. Master who was 
incessantly trying to TOMBSTONED the evicted replicas on the tserver which was 
down earlier, throws some interesting log:
{noformat}
[dinesh@ve0518 debug]$ I1013 16:55:54.551717 26141 catalog_manager.cc:2591] 
Sending DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
048c7d202da3469eb1b1973df9510007 on bb2517bc5f2b4980bb07c06019b5a8e9 
(127.95.58.1:40867) (TS bb2517bc5f2b4980bb07c06019b5a8e9 not found in new 
config with opid_index 4)
W1013 16:55:54.552803 26141 catalog_manager.cc:2552] TS 
bb2517bc5f2b4980bb07c06019b5a8e9 (127.95.58.1:40867): delete failed for tablet 
048c7d202da3469eb1b1973df9510007 due to a CAS failure. No further retry: 
Illegal state: Request specified cas_config_opid_index_less_or_equal of -1 but 
the committed config has opid_index of 5
I1013 16:55:54.884133 26141 catalog_manager.cc:2591] Sending 
DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
e9481b695d34483488af07dfb94a8557 on bb2517bc5f2b4980bb07c06019b5a8e9 
(127.95.58.1:40867) (TS bb2517bc5f2b4980bb07c06019b5a8e9 not found in new 
config with opid_index 3)
I1013 16:55:54.885964 26141 catalog_manager.cc:2567] TS 
bb2517bc5f2b4980bb07c06019b5a8e9 (127.95.58.1:40867): tablet 
e9481b695d34483488af07dfb94a8557 (table test-table 
[id=ca8f507e47684ddfa147e2cd232ed773]) successfully deleted
I1013 16:55:54.915202 26141 catalog_manager.cc:2591] Sending 
DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
e3ff6a1529cf46c5b9787fe322a749e6 on bb2517bc5f2b4980bb07c06019b5a8e9 
(127.95.58.1:40867) (TS bb2517bc5f2b4980bb07c06019b5a8e9 not found in new 
config with opid_index 3)
I1013 16:55:54.916774 26141 catalog_manager.cc:2567] TS 
bb2517bc5f2b4980bb07c06019b5a8e9 (127.95.58.1:40867): tablet 
e3ff6a1529cf46c5b9787fe322a749e6 (table test-table 
[id=ca8f507e47684ddfa147e2cd232ed773]) successfully deleted
{noformat}
7. It continuously spews log messages like this now:
{noformat}
[dinesh@ve0518 debug]$ W1013 16:55:36.608486  6519 raft_consensus.cc:461] T 
048c7d202da3469eb1b1973df9510007 P bb2517bc5f2b4980bb07c06019b5a8e9 [term 5 
NON_PARTICIPANT]: Failed to trigger leader election: Illegal state: Not 
starting election: Node is currently a non-participant in the raft config: 
opid_index: 5 

[jira] [Updated] (KUDU-1699) Make the tests less failure-prone for 'Timed out waiting for Table Creation'

2016-10-13 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1699:
--
Attachment: delete_table-test.txt
create-table-itest.txt

643ab5a4a0500d3bd819ebb7d07e431da2bea7ff_161.0.delete_table_test_2.zip

a2d6375b7fe1fa1e483dce49157a53dde714d372_40.0.create_table_itest_0.zip

> Make the tests less failure-prone for 'Timed out waiting for Table Creation'
> 
>
> Key: KUDU-1699
> URL: https://issues.apache.org/jira/browse/KUDU-1699
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
> Attachments: 
> 643ab5a4a0500d3bd819ebb7d07e431da2bea7ff_161.0.delete_table_test_2.zip, 
> a2d6375b7fe1fa1e483dce49157a53dde714d372_40.0.create_table_itest_0.zip, 
> create-table-itest.txt, delete_table-test.txt
>
>
> Triaging one issue (which I believe is not related to network flake) I found 
> while testing the 1.0.1 RC release bits for tracking purposes. I noticed that 
> the failure pattern for some of these tests are quite common. Here are the 
> sequences of operations from logs just before failure:
>  - ExternalMiniClusterITestBase::StartCluster() succeeds creating the tablet 
> servers to hold tablet replicas
>  - TestTable is created either directly from test or via 
> TestWorkload.Setup(), so tablet replicas are spilled to the tablet servers.
>  - Meanwhile the tablet replicas haven't elected a leader successfully (only 
> terms are advanced for about 30 secs), and eventually table create fails.
> It is not clear to me if this is a bug, I need to dig into this little more 
> than just logs. If this is not a bug, I wonder if we have some room to make 
> this less failure-prone here. Attached are 2 logs I have from the test run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1298) Make pip install of Kudu Python client work with C++11 on OS X

2016-10-13 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572733#comment-15572733
 ] 

Dinesh Bhat commented on KUDU-1298:
---

Tested on OS X 10.11.5.

> Make pip install of Kudu Python client work with C++11 on OS X
> --
>
> Key: KUDU-1298
> URL: https://issues.apache.org/jira/browse/KUDU-1298
> Project: Kudu
>  Issue Type: Bug
>  Components: python
>Reporter: Jeff Hammerbacher
>Assignee: David Alves
> Fix For: NA
>
>
> Pip install of kudu-python on Homebrew's Python 3.5.1 and the most recent 
> Xcode is failing for me: 
> https://getkudu.slack.com/archives/kudu-general/p1453163444003189.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1674) SEGV in SubProcess::Call if it tries to capture stderr alone

2016-10-02 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1674:
-

 Summary: SEGV in SubProcess::Call if it tries to capture stderr 
alone
 Key: KUDU-1674
 URL: https://issues.apache.org/jira/browse/KUDU-1674
 Project: Kudu
  Issue Type: Bug
  Components: util
Affects Versions: 1.0.0
Reporter: Dinesh Bhat
Assignee: Dinesh Bhat
 Fix For: 1.0.1


One of the jenkins run captured this when we called
SubProcess::Call(argv[0], nullptr, );

{noformat}

WARNING: ThreadSanitizer: heap-use-after-free (pid=28482)
  Read of size 8 at 0x7d085e38 by main thread:
#0 memcpy 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:598
 (kudu-admin-test+0x0046850b)

#1 std::__1::basic_string::__move_assign(std::__1::basic_string&, 
std::__1::integral_constant) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/installed/tsan/include/c++/v1/string:2525:18
 (libkudu_util.so+0x0016ca51)

#2 std::__1::basic_string::operator=(std::__1::basic_string&&) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/installed/tsan/include/c++/v1/string:2536
 (libkudu_util.so+0x0016ca51)

#3 kudu::Subprocess::Call(std::__1::vector, 
std::__1::allocator > > const&, std::__1::basic_string*, 
std::__1::basic_string*) 
/home/jenkins-slave/workspace/kudu-2/src/kudu/util/subprocess.cc:503 
(libkudu_util.so+0x0016ca51)

#4 kudu::tools::AdminCliTest_TestLeaderStepDown_Test::TestBody() 
/home/jenkins-slave/workspace/kudu-2/src/kudu/tools/kudu-admin-test.cc:199:14 
(kudu-admin-test+0x004cc43b)

#5 void 
testing::internal::HandleSehExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2078:10
 (libgmock.so+0x00048243)
#6 void 
testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2114
 (libgmock.so+0x00048243)
#7 testing::Test::Run() 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2150:5
 (libgmock.so+0x0002ce6f)
#8 testing::TestInfo::Run() 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2326:11
 (libgmock.so+0x0002dea7)
#9 testing::TestCase::Run() 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2444:28
 (libgmock.so+0x0002eaf8)
#10 testing::internal::UnitTestImpl::RunAllTests() 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:4315:43
 (libgmock.so+0x00038f51)
#11 bool 
testing::internal::HandleSehExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool 
(testing::internal::UnitTestImpl::*)(), char const*) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2078:10
 (libgmock.so+0x00048df3)
#12 bool 
testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool 
(testing::internal::UnitTestImpl::*)(), char const*) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:2114
 (libgmock.so+0x00048df3)
#13 testing::UnitTest::Run() 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/gmock-1.7.0/gtest/src/gtest.cc:3926:10
 (libgmock.so+0x00038988)
#14 RUN_ALL_TESTS() 
/home/jenkins-slave/workspace/kudu-2/thirdparty/installed/tsan/include/gtest/gtest.h:2288:46
 (libkudu_test_main.so+0x2acb)
#15 main 
/home/jenkins-slave/workspace/kudu-2/src/kudu/util/test_main.cc:75:13 
(libkudu_test_main.so+0x255a)

  Previous write of size 8 at 0x7d085e38 by main thread:
#0 operator delete[](void*) 
/home/jenkins-slave/workspace/kudu-2/thirdparty/src/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_new_delete.cc:79
 (kudu-admin-test+0x004c7a41)
#1 void 
google::protobuf::internal::RepeatedPtrFieldBase::Destroy()
 
/home/jenkins-slave/workspace/kudu-2/thirdparty/installed/tsan/include/google/protobuf/repeated_field.h:871:3
 (libksck.so+0x0003fa53)
#2 

[jira] [Updated] (KUDU-1596) Automate upgrade/downgrade RC tests

2016-09-07 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1596:
--
Description: 
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- On way of doing it in Black Box mode:
#version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=
# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
# Run test ./bin/run-version-itest

Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.

  was:
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- Black Box way of doing it:

#version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.


> Automate upgrade/downgrade RC tests
> ---
>
> Key: KUDU-1596
> URL: https://issues.apache.org/jira/browse/KUDU-1596
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>
> In discussion with [~mpercy] last week, we want to do the following to 
> alleviate the  tedious/manual release upgrade/downgrade tests.
> - White box test cases, basic test to begin with.
> 1.  Create a table via external minicluster, say with 1.0 binaries
> 2. Load data and shutdown the cluster
> 3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
> 4. Restart the cluster with same parameters, key thing is to stash the 
> RPC/webutil addresses so that we can restart the respective servers on same 
> ports again.
> 5. Add both negative and positive tests. Also need to see if we can induce an 
> incompatible change by some means and test for negative cases.
> - On way of doing it in Black Box mode:
> #version_test.py --release 0.9.1 --dir= --release 0.10.0 
> --dir=
> # Download 0.9.1
> # Unpack/Build 0.9.1
> # Run test ./bin/run-version-itest 
> # Download 0.10.0
> # Unpack/Build 0.10.0
> # Run test ./bin/run-version-itest
> Need to think how to pass around RPC/web endpoints, and how to introduce 
> negative tests for incompatible test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1596) Automate upgrade/downgrade RC tests

2016-09-07 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1596:
--
Description: 
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- One way of doing it in Black Box mode:
#version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=
# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
# Run test ./bin/run-version-itest

Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.

  was:
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- On way of doing it in Black Box mode:
#version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=
# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
# Run test ./bin/run-version-itest

Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.


> Automate upgrade/downgrade RC tests
> ---
>
> Key: KUDU-1596
> URL: https://issues.apache.org/jira/browse/KUDU-1596
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>
> In discussion with [~mpercy] last week, we want to do the following to 
> alleviate the  tedious/manual release upgrade/downgrade tests.
> - White box test cases, basic test to begin with.
> 1.  Create a table via external minicluster, say with 1.0 binaries
> 2. Load data and shutdown the cluster
> 3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
> 4. Restart the cluster with same parameters, key thing is to stash the 
> RPC/webutil addresses so that we can restart the respective servers on same 
> ports again.
> 5. Add both negative and positive tests. Also need to see if we can induce an 
> incompatible change by some means and test for negative cases.
> - One way of doing it in Black Box mode:
> #version_test.py --release 0.9.1 --dir= --release 0.10.0 
> --dir=
> # Download 0.9.1
> # Unpack/Build 0.9.1
> # Run test ./bin/run-version-itest 
> # Download 0.10.0
> # Unpack/Build 0.10.0
> # Run test ./bin/run-version-itest
> Need to think how to pass around RPC/web endpoints, and how to introduce 
> negative tests for incompatible test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1596) Automate upgrade/downgrade RC tests

2016-09-07 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1596:
--
Description: 
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- Black Box way of doing it:

#version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.

  was:
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- Black Box way of doing it:

version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.


> Automate upgrade/downgrade RC tests
> ---
>
> Key: KUDU-1596
> URL: https://issues.apache.org/jira/browse/KUDU-1596
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>
> In discussion with [~mpercy] last week, we want to do the following to 
> alleviate the  tedious/manual release upgrade/downgrade tests.
> - White box test cases, basic test to begin with.
> 1.  Create a table via external minicluster, say with 1.0 binaries
> 2. Load data and shutdown the cluster
> 3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
> 4. Restart the cluster with same parameters, key thing is to stash the 
> RPC/webutil addresses so that we can restart the respective servers on same 
> ports again.
> 5. Add both negative and positive tests. Also need to see if we can induce an 
> incompatible change by some means and test for negative cases.
> - Black Box way of doing it:
> #version_test.py --release 0.9.1 --dir= --release 0.10.0 
> --dir=
> # Download 0.9.1
> # Unpack/Build 0.9.1
> # Run test ./bin/run-version-itest 
> # Download 0.10.0
> # Unpack/Build 0.10.0
> #Run test ./bin/run-version-itest
> Need to think how to pass around RPC/web endpoints, and how to introduce 
> negative tests for incompatible test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1596) Automate upgrade/downgrade RC tests

2016-09-07 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1596:
-

 Summary: Automate upgrade/downgrade RC tests
 Key: KUDU-1596
 URL: https://issues.apache.org/jira/browse/KUDU-1596
 Project: Kudu
  Issue Type: Task
Reporter: Dinesh Bhat
Assignee: Dinesh Bhat


In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

Black Box way of doing it:

./version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1596) Automate upgrade/downgrade RC tests

2016-09-07 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1596:
--
Description: 
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- Black Box way of doing it:

version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.

  was:
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- Black Box way of doing it:

./version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.


> Automate upgrade/downgrade RC tests
> ---
>
> Key: KUDU-1596
> URL: https://issues.apache.org/jira/browse/KUDU-1596
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>
> In discussion with [~mpercy] last week, we want to do the following to 
> alleviate the  tedious/manual release upgrade/downgrade tests.
> - White box test cases, basic test to begin with.
> 1.  Create a table via external minicluster, say with 1.0 binaries
> 2. Load data and shutdown the cluster
> 3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
> 4. Restart the cluster with same parameters, key thing is to stash the 
> RPC/webutil addresses so that we can restart the respective servers on same 
> ports again.
> 5. Add both negative and positive tests. Also need to see if we can induce an 
> incompatible change by some means and test for negative cases.
> - Black Box way of doing it:
> version_test.py --release 0.9.1 --dir= --release 0.10.0 
> --dir=
> # Download 0.9.1
> # Unpack/Build 0.9.1
> # Run test ./bin/run-version-itest 
> # Download 0.10.0
> # Unpack/Build 0.10.0
> #Run test ./bin/run-version-itest
> Need to think how to pass around RPC/web endpoints, and how to introduce 
> negative tests for incompatible test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1596) Automate upgrade/downgrade RC tests

2016-09-07 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1596:
--
Description: 
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

- Black Box way of doing it:

./version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.

  was:
In discussion with [~mpercy] last week, we want to do the following to 
alleviate the  tedious/manual release upgrade/downgrade tests.

- White box test cases, basic test to begin with.
1.  Create a table via external minicluster, say with 1.0 binaries
2. Load data and shutdown the cluster
3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
4. Restart the cluster with same parameters, key thing is to stash the 
RPC/webutil addresses so that we can restart the respective servers on same 
ports again.
5. Add both negative and positive tests. Also need to see if we can induce an 
incompatible change by some means and test for negative cases.

Black Box way of doing it:

./version_test.py --release 0.9.1 --dir= --release 0.10.0 
--dir=

# Download 0.9.1
# Unpack/Build 0.9.1
# Run test ./bin/run-version-itest 
# Download 0.10.0
# Unpack/Build 0.10.0
#Run test ./bin/run-version-itest
Need to think how to pass around RPC/web endpoints, and how to introduce 
negative tests for incompatible test.


> Automate upgrade/downgrade RC tests
> ---
>
> Key: KUDU-1596
> URL: https://issues.apache.org/jira/browse/KUDU-1596
> Project: Kudu
>  Issue Type: Task
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
>
> In discussion with [~mpercy] last week, we want to do the following to 
> alleviate the  tedious/manual release upgrade/downgrade tests.
> - White box test cases, basic test to begin with.
> 1.  Create a table via external minicluster, say with 1.0 binaries
> 2. Load data and shutdown the cluster
> 3. SetBinarypath to new binaries(pre-existing location) carrying 1.1 version
> 4. Restart the cluster with same parameters, key thing is to stash the 
> RPC/webutil addresses so that we can restart the respective servers on same 
> ports again.
> 5. Add both negative and positive tests. Also need to see if we can induce an 
> incompatible change by some means and test for negative cases.
> - Black Box way of doing it:
> ./version_test.py --release 0.9.1 --dir= --release 0.10.0 
> --dir=
> # Download 0.9.1
> # Unpack/Build 0.9.1
> # Run test ./bin/run-version-itest 
> # Download 0.10.0
> # Unpack/Build 0.10.0
> #Run test ./bin/run-version-itest
> Need to think how to pass around RPC/web endpoints, and how to introduce 
> negative tests for incompatible test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KUDU-1534) expose software version in ListMaster RPC response

2016-08-25 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438290#comment-15438290
 ] 

Dinesh Bhat edited comment on KUDU-1534 at 8/26/16 2:24 AM:


Cool, thanks. also clarifying: when you say 'trunk', you mean top-of-the-tree 
right ? We currently have '0.10.0-SNAPSHOT' in trunk so when you said 0.10.0, 
you probably meant our RC release bits ? I guess there shouldn't be any 
difference between 0.9.1 or 0.10.0 as far as this test is 
concerned(addiotionally 0.9.1 has tserver software version missing too).


was (Author: dineshabbi):
Cool, thanks. also clarifying: when you say 'trunk', you mean top-of-the-tree 
right ? We currently have '0.10.0-SNAPSHOT' in version_defines.h so when you 
said 0.10.0, you probably meant our RC release bits ? 

> expose software version in ListMaster RPC response
> --
>
> Key: KUDU-1534
> URL: https://issues.apache.org/jira/browse/KUDU-1534
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Dan Burkert
>Assignee: Dinesh Bhat
>Priority: Minor
>  Labels: newbie
> Attachments: cluster-downgrade.log, cluster-upgrade.log
>
>
> KUDU-1490 exposed the software version of tablet servers in the 
> GetTabletServers RPC response, but an equivalent doesn't exist for 
> ListMasters response.  This will become more important as multi-master setups 
> get more common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KUDU-1534) expose software version in ListMaster RPC response

2016-08-25 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438290#comment-15438290
 ] 

Dinesh Bhat edited comment on KUDU-1534 at 8/26/16 1:51 AM:


Cool, thanks. also clarifying: when you say 'trunk', you mean top-of-the-tree 
right ? We currently have '0.10.0-SNAPSHOT' in version_defines.h so when you 
said 0.10.0, you probably meant our RC release bits ? 


was (Author: dineshabbi):
Also clarifying: when you say 'trunk', you mean top-of-the-tree right ? We 
currently have '0.10.0-SNAPSHOT' in version_defines.h so when you said 0.10.0, 
you probably meant our RC release bits ? 

> expose software version in ListMaster RPC response
> --
>
> Key: KUDU-1534
> URL: https://issues.apache.org/jira/browse/KUDU-1534
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Dan Burkert
>Assignee: Dinesh Bhat
>Priority: Minor
>  Labels: newbie
> Attachments: cluster-downgrade.log, cluster-upgrade.log
>
>
> KUDU-1490 exposed the software version of tablet servers in the 
> GetTabletServers RPC response, but an equivalent doesn't exist for 
> ListMasters response.  This will become more important as multi-master setups 
> get more common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1534) expose software version in ListMaster RPC response

2016-08-25 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438194#comment-15438194
 ] 

Dinesh Bhat commented on KUDU-1534:
---

Code Reviews are posted at : https://gerrit.cloudera.org/#/c/4099/
I also performed downgrade/upgrade of clusters as per [~adar]'s suggestion in 
the review, here are the steps followed:

Start the cluster with 0.10.0 release:
  - Picked a random test which creates a table and distributes the replica 
evenly, test elects the master as leader
  - Leave the FS layout files intact even after test is done(using flag 
leave_test_files=always)
  - Finish the test

Downgrade cluster 0.9.1 release:
  - Start the cluster using same FS layout used earlier
  - Started the master first, and then tservers, and also reversed the orders 
in bringing up.
  - Logs are attached as cluster-downgrade.log
  - Kill the master/tservers instances

Upgrade back to 0.10.0 release:
  - Start the cluster using same FS layout used earlier
  - Started master first, and then tservers, and tried permutation/combo of 
bringups in differenrt order.
  - Logs are in cluster-upgrade.log
  - Kill master/tservers



> expose software version in ListMaster RPC response
> --
>
> Key: KUDU-1534
> URL: https://issues.apache.org/jira/browse/KUDU-1534
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Dan Burkert
>Assignee: Dinesh Bhat
>Priority: Minor
>  Labels: newbie
>
> KUDU-1490 exposed the software version of tablet servers in the 
> GetTabletServers RPC response, but an equivalent doesn't exist for 
> ListMasters response.  This will become more important as multi-master setups 
> get more common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KUDU-1574) Bring all the toolsets under new CLI framework

2016-08-23 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat resolved KUDU-1574.
---
   Resolution: Duplicate
Fix Version/s: 1.0.0

> Bring all the toolsets under new CLI framework
> --
>
> Key: KUDU-1574
> URL: https://issues.apache.org/jira/browse/KUDU-1574
> Project: Kudu
>  Issue Type: Bug
>Reporter: Dinesh Bhat
>Assignee: Dinesh Bhat
> Fix For: 1.0.0
>
>
> Adar's commit to handle new tool is here:
> https://gerrit.cloudera.org/#/c/4013/
> Todd has a port of pb-dump in progress here:
> http://gerrit.cloudera.org:8080/4037
> Also see KUDU-619 for rest of the tool list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically

2016-08-18 Thread Dinesh Bhat (JIRA)
Dinesh Bhat created KUDU-1566:
-

 Summary: JIRA updater script linking a gerrit patch to JIRA 
automatically
 Key: KUDU-1566
 URL: https://issues.apache.org/jira/browse/KUDU-1566
 Project: Kudu
  Issue Type: Task
Reporter: Dinesh Bhat
Assignee: Dinesh Bhat
Priority: Minor


At times, I have found it hard to track a particular to JIRA to a gerrit patch 
and vice versa to gain more context on a submitted change, discussions, etc. I 
am hoping this will bridge the gap between the review system and JIRA tracking.

Currently, all of our commits do not carry JIRA numbers, but this could be 
applicable to whichever gerrit patch carries one in its commit message. I have 
come across such scripts before, so spinning one shouldn't be that hard. Though 
not as fancy as the below link, we could just add a gerritt link to JIRA 
comment section whenever a change is submitted(or perhaps posted for review).

https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KUDU-1500) TSAN race in ListTablets vs tablet metadata loading

2016-08-12 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1500:
--
Attachment: tablet_copy-itest.txt

Attached is a logfile generated from a new test added to exercise this data 
race(and fix).

> TSAN race in ListTablets vs tablet metadata loading
> ---
>
> Key: KUDU-1500
> URL: https://issues.apache.org/jira/browse/KUDU-1500
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet, tserver
>Affects Versions: 0.9.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>  Labels: newbie
> Attachments: tablet_copy-itest.txt
>
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
>   Write of size 8 at 0x7d4ccd90 by thread T71 (mutexes: write M4355):
> #0 std::vector std::allocator 
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
>  (libkudu_common.so+0x00158983)
> #1 std::vector std::allocator >::clear() 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
>  (libkudu_common.so+0x0013d721)
> #2 kudu::PartitionSchema::Clear() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3 
> (libkudu_common.so+0x0012f8fc)
> #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&, 
> kudu::Schema const&, kudu::PartitionSchema*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3 
> (libkudu_common.so+0x0012f4e6)
> #4 
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7 
> (libtablet.so+0x003a4561)
> #5 
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3 
> (libtablet.so+0x003a7805)
> #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&, 
> kudu::HostPort const&, scoped_refptr*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
>  (libtserver.so+0x001148ca)
> #7 
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const&, boost::optional*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
>  (libtserver.so+0x001aa273)
> #8 
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const*, kudu::consensus::StartRemoteBootstrapResponsePB*, 
> kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:982:14 
> (libtserver.so+0x00175d03)
> #9 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/build/tsan/src/kudu/consensus/consensus.service.cc:188:7 
> (libconsensus_proto.so+0x0009e457)
> #10 std::_Function_handler google::protobuf::Message*, kudu::rpc::RpcContext*), 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
>  (libconsensus_proto.so+0x0009e17a)
> #11 std::function google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
>  (libkrpc.so+0x00177997)
> #12 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3 
> (libkrpc.so+0x0017744c)
> #13 kudu::rpc::ServicePool::RunThread() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5 
> (libkrpc.so+0x0017a650)
> #14 boost::_mfi::mf0 kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const 
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x0017d55b)
> #15 void boost::_bi::list1 
> >::operator(), 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0 kudu::rpc::ServicePool>&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x0017d438)
> #16 

[jira] [Commented] (KUDU-1500) TSAN race in ListTablets vs tablet metadata loading

2016-08-10 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416166#comment-15416166
 ] 

Dinesh Bhat commented on KUDU-1500:
---

So, wherever I mentioned 'schema' earlier, I meant PartitionSchema and not the 
Schema :)

> TSAN race in ListTablets vs tablet metadata loading
> ---
>
> Key: KUDU-1500
> URL: https://issues.apache.org/jira/browse/KUDU-1500
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet, tserver
>Affects Versions: 0.9.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>  Labels: newbie
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
>   Write of size 8 at 0x7d4ccd90 by thread T71 (mutexes: write M4355):
> #0 std::vector std::allocator 
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
>  (libkudu_common.so+0x00158983)
> #1 std::vector std::allocator >::clear() 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
>  (libkudu_common.so+0x0013d721)
> #2 kudu::PartitionSchema::Clear() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3 
> (libkudu_common.so+0x0012f8fc)
> #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&, 
> kudu::Schema const&, kudu::PartitionSchema*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3 
> (libkudu_common.so+0x0012f4e6)
> #4 
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7 
> (libtablet.so+0x003a4561)
> #5 
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3 
> (libtablet.so+0x003a7805)
> #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&, 
> kudu::HostPort const&, scoped_refptr*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
>  (libtserver.so+0x001148ca)
> #7 
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const&, boost::optional*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
>  (libtserver.so+0x001aa273)
> #8 
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const*, kudu::consensus::StartRemoteBootstrapResponsePB*, 
> kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:982:14 
> (libtserver.so+0x00175d03)
> #9 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/build/tsan/src/kudu/consensus/consensus.service.cc:188:7 
> (libconsensus_proto.so+0x0009e457)
> #10 std::_Function_handler google::protobuf::Message*, kudu::rpc::RpcContext*), 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
>  (libconsensus_proto.so+0x0009e17a)
> #11 std::function google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
>  (libkrpc.so+0x00177997)
> #12 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3 
> (libkrpc.so+0x0017744c)
> #13 kudu::rpc::ServicePool::RunThread() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5 
> (libkrpc.so+0x0017a650)
> #14 boost::_mfi::mf0 kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const 
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x0017d55b)
> #15 void boost::_bi::list1 
> >::operator(), 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0 kudu::rpc::ServicePool>&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x0017d438)
> #16 boost::_bi::bind_t 

[jira] [Commented] (KUDU-1500) TSAN race in ListTablets vs tablet metadata loading

2016-08-10 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415934#comment-15415934
 ] 

Dinesh Bhat commented on KUDU-1500:
---

[~danburkert], I had an offline discussion with [~mpercy] yesterday, and 
summarizing the details here:
With the current solution, we intend to keep partition/schema as immutable 
objects after they are created. That means, remote bootstrapping(mode: 
REPLACING_PEER) does not overwrite partition/schema if the tablet metadata was 
already initialized(state_ == initialized) as part of initial 
bootstrap(mode:NEW_PEER), and then we do not ever replace contents of 
partition/schema after that. 

This looks cleaner/simpler than locking fixes earlier and avoids 
synchronizations too for these immutable objects. However, it comes with an 
assumption that these fields are never mucked up at run time. Essentially, we 
are ignoring the contents of the superblock given by a healthy remote peer, and 
instead keeping what we had previously which may or may not have been corrupted 
in theory. I think we have room for improvisation here, but until I get a 
better grasp on the overall workflow of ConsensusMeta/WAL/TabletPeer/OnDisk 
layout, I am skeptical of introducing an invasive-change/regression.

Looking further, this line under TSTabletManager::StartTabletCopy() is the one 
which is the root-cause of this race. The TabletPeer()->meta_ takes the pointer 
to old metadata object which is tombstoned and scratches the new PB contents in 
place. 

{noformat}
if (LookupTabletUnlocked(tablet_id, _tablet_peer)) {
  meta = old_tablet_peer->tablet_metadata();
  replacing_tablet = true;
}
{noformat}

A more cleaner approach could be to keep an atomically swappable pointer inside 
TabletPeer() towards TabletMetadata object and when remote bootstrap occurs, 
copy the existing metadata onto a new object, overlay everything arrived via PB 
from remote peer onto new object, and then make the TabletPeer::meta_ pointer 
point to new metadata object and delete old one(or keep a copy on-disk for 
corruption debugging). We have enough validations along the way to check what 
has come in from wire is a healthy one so contents of PB are trustable. 
Rollback story: If we don’t succeed in resurrecting a new metadata, we go back 
to tombstoned because we would be swapping the pointer only at the final stage. 

> TSAN race in ListTablets vs tablet metadata loading
> ---
>
> Key: KUDU-1500
> URL: https://issues.apache.org/jira/browse/KUDU-1500
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet, tserver
>Affects Versions: 0.9.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>  Labels: newbie
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
>   Write of size 8 at 0x7d4ccd90 by thread T71 (mutexes: write M4355):
> #0 std::vector std::allocator 
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
>  (libkudu_common.so+0x00158983)
> #1 std::vector std::allocator >::clear() 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
>  (libkudu_common.so+0x0013d721)
> #2 kudu::PartitionSchema::Clear() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3 
> (libkudu_common.so+0x0012f8fc)
> #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&, 
> kudu::Schema const&, kudu::PartitionSchema*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3 
> (libkudu_common.so+0x0012f4e6)
> #4 
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7 
> (libtablet.so+0x003a4561)
> #5 
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3 
> (libtablet.so+0x003a7805)
> #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&, 
> kudu::HostPort const&, scoped_refptr*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
>  (libtserver.so+0x001148ca)
> #7 
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const&, boost::optional*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
>  (libtserver.so+0x001aa273)
> #8 
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const*, 

[jira] [Commented] (KUDU-1500) TSAN race in ListTablets vs tablet metadata loading

2016-08-05 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410343#comment-15410343
 ] 

Dinesh Bhat commented on KUDU-1500:
---

[~tlipcon] [~mpercy] [~d...@danburkert.com], please see the history of 
discussions on KUDU-1264 as well. Our initial approach was to take the least 
invasive path - i.e., copying the required attributes(under lock) in the read 
path which races with metadata resurrection. Review is posted under 
http://gerrit.cloudera.org:8080/3823

Approach 1(current): Guard the read path with same lock taken by resurrection 
path. However, this kinda assumes that accessors in hot path like 
Tablet::CheckRowInTablet are not active because we had quiesced the tablet 
before during the 'delete after corruption' path. Although it seems like safe 
assumption, it may open up room for future worms. 

Approach 2: Now I am thinking why can't we take the same approach as in adding 
a new tablet replica approach ? i.e since the tablet is in NOT_RUNNING state, 
we need not serve ListTablets RPC for this tablet alone. I am not sure about 
the consequence of this filtering on the cluster.


> TSAN race in ListTablets vs tablet metadata loading
> ---
>
> Key: KUDU-1500
> URL: https://issues.apache.org/jira/browse/KUDU-1500
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet, tserver
>Affects Versions: 0.9.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>  Labels: newbie
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
>   Write of size 8 at 0x7d4ccd90 by thread T71 (mutexes: write M4355):
> #0 std::vector std::allocator 
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
>  (libkudu_common.so+0x00158983)
> #1 std::vector std::allocator >::clear() 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
>  (libkudu_common.so+0x0013d721)
> #2 kudu::PartitionSchema::Clear() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3 
> (libkudu_common.so+0x0012f8fc)
> #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&, 
> kudu::Schema const&, kudu::PartitionSchema*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3 
> (libkudu_common.so+0x0012f4e6)
> #4 
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7 
> (libtablet.so+0x003a4561)
> #5 
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3 
> (libtablet.so+0x003a7805)
> #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&, 
> kudu::HostPort const&, scoped_refptr*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
>  (libtserver.so+0x001148ca)
> #7 
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const&, boost::optional*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
>  (libtserver.so+0x001aa273)
> #8 
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const*, kudu::consensus::StartRemoteBootstrapResponsePB*, 
> kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:982:14 
> (libtserver.so+0x00175d03)
> #9 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/build/tsan/src/kudu/consensus/consensus.service.cc:188:7 
> (libconsensus_proto.so+0x0009e457)
> #10 std::_Function_handler google::protobuf::Message*, kudu::rpc::RpcContext*), 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
>  (libconsensus_proto.so+0x0009e17a)
> #11 std::function google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
>  (libkrpc.so+0x00177997)
> #12 

[jira] [Commented] (KUDU-1500) TSAN race in ListTablets vs tablet metadata loading

2016-08-05 Thread Dinesh Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410308#comment-15410308
 ] 

Dinesh Bhat commented on KUDU-1500:
---

This is duplicate of KUDU-1264, which I closed/linked to this bug.

> TSAN race in ListTablets vs tablet metadata loading
> ---
>
> Key: KUDU-1500
> URL: https://issues.apache.org/jira/browse/KUDU-1500
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet, tserver
>Affects Versions: 0.9.0
>Reporter: Todd Lipcon
>Assignee: Dinesh Bhat
>  Labels: newbie
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
>   Write of size 8 at 0x7d4ccd90 by thread T71 (mutexes: write M4355):
> #0 std::vector std::allocator 
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
>  (libkudu_common.so+0x00158983)
> #1 std::vector std::allocator >::clear() 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
>  (libkudu_common.so+0x0013d721)
> #2 kudu::PartitionSchema::Clear() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3 
> (libkudu_common.so+0x0012f8fc)
> #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&, 
> kudu::Schema const&, kudu::PartitionSchema*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3 
> (libkudu_common.so+0x0012f4e6)
> #4 
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7 
> (libtablet.so+0x003a4561)
> #5 
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3 
> (libtablet.so+0x003a7805)
> #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&, 
> kudu::HostPort const&, scoped_refptr*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
>  (libtserver.so+0x001148ca)
> #7 
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const&, boost::optional*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
>  (libtserver.so+0x001aa273)
> #8 
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const*, kudu::consensus::StartRemoteBootstrapResponsePB*, 
> kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:982:14 
> (libtserver.so+0x00175d03)
> #9 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/build/tsan/src/kudu/consensus/consensus.service.cc:188:7 
> (libconsensus_proto.so+0x0009e457)
> #10 std::_Function_handler google::protobuf::Message*, kudu::rpc::RpcContext*), 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr
>  const&)::$_8>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
>  (libconsensus_proto.so+0x0009e17a)
> #11 std::function google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
>  (libkrpc.so+0x00177997)
> #12 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3 
> (libkrpc.so+0x0017744c)
> #13 kudu::rpc::ServicePool::RunThread() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5 
> (libkrpc.so+0x0017a650)
> #14 boost::_mfi::mf0 kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const 
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x0017d55b)
> #15 void boost::_bi::list1 
> >::operator(), 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0 kudu::rpc::ServicePool>&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x0017d438)
> #16 boost::_bi::bind_t 

[jira] [Resolved] (KUDU-1264) TSAN data race between Tablet Bootstrap and ListTablets

2016-08-05 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat resolved KUDU-1264.
---
   Resolution: Duplicate
Fix Version/s: 1.0.0

> TSAN data race between Tablet Bootstrap and ListTablets
> ---
>
> Key: KUDU-1264
> URL: https://issues.apache.org/jira/browse/KUDU-1264
> Project: Kudu
>  Issue Type: Bug
>Reporter: Mike Percy
>Assignee: Dinesh Bhat
> Fix For: 1.0.0
>
>
> This came up in a dist-test run of a TSAN build on 
> TabletReplacementITest.TestRemoteBoostrapWithPendingConfigChangeCommits as 
> part of KUDU-1233. I am somewhat skeptical that it's a real bug, but it could 
> just be somehow misreported by TSAN and the real error is elsewhere.
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=20565)
>   Write of size 8 at 0x7d1400027e88 by thread T124:
> #0 operator delete(void*)  (kudu-tserver+0x000494e6)
> #1 std::string::assign(std::string const&)  
> (libstdc++.so.6+0x000bb4e8)
> #2 
> kudu::tablet::TabletBootstrap::Bootstrap(std::tr1::shared_ptr*,
>  scoped_refptr*, kudu::consensus::ConsensusBootstrapInfo*) 
> /home/todd/git/kudu/src/kudu/tablet/tablet_bootstrap.cc:439 
> (libtablet.so+0x00240bd8)
> #3 
> kudu::tablet::BootstrapTablet(scoped_refptr 
> const&, scoped_refptr const&, 
> std::tr1::shared_ptr const&, kudu::MetricRegistry*, 
> kudu::tablet::TabletStatusListener*, 
> std::tr1::shared_ptr*, scoped_refptr*, 
> scoped_refptr const&, 
> kudu::consensus::ConsensusBootstrapInfo*) 
> /home/todd/git/kudu/src/kudu/tablet/tablet_bootstrap.cc:376 
> (libtablet.so+0x00240952)
> #4 
> kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr
>  const&, scoped_refptr const&) 
> /home/todd/git/kudu/src/kudu/tserver/ts_tablet_manager.cc:607 
> (libtserver.so+0x00161d4e)
> #5 boost::_mfi::mf2 scoped_refptr const&, 
> scoped_refptr 
> const&>::operator()(kudu::tserver::TSTabletManager*, 
> scoped_refptr const&, 
> scoped_refptr const&) const 
> /usr/include/boost/bind/mem_fn_template.hpp:280 (libtserver.so+0x0016e05e)
> #6 void 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value 
> >::operator() scoped_refptr const&, 
> scoped_refptr const&>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 kudu::tserver::TSTabletManager, scoped_refptr 
> const&, scoped_refptr const&>&, 
> boost::_bi::list0&, int) /usr/include/boost/bind/bind.hpp:392 
> (libtserver.so+0x0016dfb3)
> #7 boost::_bi::bind_t kudu::tserver::TSTabletManager, scoped_refptr 
> const&, scoped_refptr const&>, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value 
> > >::operator()() /usr/include/boost/bind/bind_template.hpp:20 
> (libtserver.so+0x0016df33)
> #8 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 scoped_refptr const&, 
> scoped_refptr const&>, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value 
> > >, void>::invoke(boost::detail::function::function_buffer&) 
> /usr/include/boost/function/function_template.hpp:153 
> (libtserver.so+0x0016dcf1)
> #9 boost::function0::operator()() const 
> /usr/include/boost/function/function_template.hpp:766 
> (libkrpc.so+0x00096a61)
> #10 kudu::FunctionRunnable::Run() 
> /home/todd/git/kudu/src/kudu/util/threadpool.cc:46 
> (libkudu_util.so+0x0021099d)
> #11 kudu::ThreadPool::DispatchThread(bool) 
> /home/todd/git/kudu/src/kudu/util/threadpool.cc:317 
> (libkudu_util.so+0x0020f4b6)
> #12 boost::_mfi::mf1 bool>::operator()(kudu::ThreadPool*, bool) const 
> /usr/include/boost/bind/mem_fn_template.hpp:165 
> (libkudu_util.so+0x00212845)
> #13 void boost::_bi::list2, 
> boost::_bi::value >::operator() kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type, 
> boost::_mfi::mf1&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:313 (libkudu_util.so+0x002127ab)
> #14 boost::_bi::bind_t bool>, boost::_bi::list2, 
> boost::_bi::value > >::operator()() 
> /usr/include/boost/bind/bind_template.hpp:20 (libkudu_util.so+0x00212733)
> #15 
> 

[jira] [Assigned] (KUDU-1264) TSAN data race between Tablet Bootstrap and ListTablets

2016-08-05 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat reassigned KUDU-1264:
-

Assignee: Dinesh Bhat

> TSAN data race between Tablet Bootstrap and ListTablets
> ---
>
> Key: KUDU-1264
> URL: https://issues.apache.org/jira/browse/KUDU-1264
> Project: Kudu
>  Issue Type: Bug
>Reporter: Mike Percy
>Assignee: Dinesh Bhat
>
> This came up in a dist-test run of a TSAN build on 
> TabletReplacementITest.TestRemoteBoostrapWithPendingConfigChangeCommits as 
> part of KUDU-1233. I am somewhat skeptical that it's a real bug, but it could 
> just be somehow misreported by TSAN and the real error is elsewhere.
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=20565)
>   Write of size 8 at 0x7d1400027e88 by thread T124:
> #0 operator delete(void*)  (kudu-tserver+0x000494e6)
> #1 std::string::assign(std::string const&)  
> (libstdc++.so.6+0x000bb4e8)
> #2 
> kudu::tablet::TabletBootstrap::Bootstrap(std::tr1::shared_ptr*,
>  scoped_refptr*, kudu::consensus::ConsensusBootstrapInfo*) 
> /home/todd/git/kudu/src/kudu/tablet/tablet_bootstrap.cc:439 
> (libtablet.so+0x00240bd8)
> #3 
> kudu::tablet::BootstrapTablet(scoped_refptr 
> const&, scoped_refptr const&, 
> std::tr1::shared_ptr const&, kudu::MetricRegistry*, 
> kudu::tablet::TabletStatusListener*, 
> std::tr1::shared_ptr*, scoped_refptr*, 
> scoped_refptr const&, 
> kudu::consensus::ConsensusBootstrapInfo*) 
> /home/todd/git/kudu/src/kudu/tablet/tablet_bootstrap.cc:376 
> (libtablet.so+0x00240952)
> #4 
> kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr
>  const&, scoped_refptr const&) 
> /home/todd/git/kudu/src/kudu/tserver/ts_tablet_manager.cc:607 
> (libtserver.so+0x00161d4e)
> #5 boost::_mfi::mf2 scoped_refptr const&, 
> scoped_refptr 
> const&>::operator()(kudu::tserver::TSTabletManager*, 
> scoped_refptr const&, 
> scoped_refptr const&) const 
> /usr/include/boost/bind/mem_fn_template.hpp:280 (libtserver.so+0x0016e05e)
> #6 void 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value 
> >::operator() scoped_refptr const&, 
> scoped_refptr const&>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2 kudu::tserver::TSTabletManager, scoped_refptr 
> const&, scoped_refptr const&>&, 
> boost::_bi::list0&, int) /usr/include/boost/bind/bind.hpp:392 
> (libtserver.so+0x0016dfb3)
> #7 boost::_bi::bind_t kudu::tserver::TSTabletManager, scoped_refptr 
> const&, scoped_refptr const&>, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value 
> > >::operator()() /usr/include/boost/bind/bind_template.hpp:20 
> (libtserver.so+0x0016df33)
> #8 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf2 scoped_refptr const&, 
> scoped_refptr const&>, 
> boost::_bi::list3, 
> boost::_bi::value, 
> boost::_bi::value 
> > >, void>::invoke(boost::detail::function::function_buffer&) 
> /usr/include/boost/function/function_template.hpp:153 
> (libtserver.so+0x0016dcf1)
> #9 boost::function0::operator()() const 
> /usr/include/boost/function/function_template.hpp:766 
> (libkrpc.so+0x00096a61)
> #10 kudu::FunctionRunnable::Run() 
> /home/todd/git/kudu/src/kudu/util/threadpool.cc:46 
> (libkudu_util.so+0x0021099d)
> #11 kudu::ThreadPool::DispatchThread(bool) 
> /home/todd/git/kudu/src/kudu/util/threadpool.cc:317 
> (libkudu_util.so+0x0020f4b6)
> #12 boost::_mfi::mf1 bool>::operator()(kudu::ThreadPool*, bool) const 
> /usr/include/boost/bind/mem_fn_template.hpp:165 
> (libkudu_util.so+0x00212845)
> #13 void boost::_bi::list2, 
> boost::_bi::value >::operator() kudu::ThreadPool, bool>, boost::_bi::list0>(boost::_bi::type, 
> boost::_mfi::mf1&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:313 (libkudu_util.so+0x002127ab)
> #14 boost::_bi::bind_t bool>, boost::_bi::list2, 
> boost::_bi::value > >::operator()() 
> /usr/include/boost/bind/bind_template.hpp:20 (libkudu_util.so+0x00212733)
> #15 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf1

[jira] [Updated] (KUDU-1548) Looping raft_consensus-itest hitting some failures

2016-08-05 Thread Dinesh Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Bhat updated KUDU-1548:
--
Assignee: (was: Dinesh Bhat)

> Looping raft_consensus-itest hitting some failures
> --
>
> Key: KUDU-1548
> URL: https://issues.apache.org/jira/browse/KUDU-1548
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Dinesh Bhat
> Attachments: dist-test-results-1548.tar.gz
>
>
> Find the detailed test results/logs under the attached tarfile 
> dist-test-results-1548.tar.gz
> [~mpercy] aready is addressing one test flakiness here: 
> https://gerrit.cloudera.org/#/c/3819
> {noformat}
> The test log looked something like this:
> I0729 18:59:47.834403 11544 raft_consensus.cc:370] T 
> e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 
> FOLLOWER]: No leader contacted us within the election timeout. Triggering 
> leader election
> I0729 18:59:47.834686 11544 raft_consensus.cc:2019] T 
> e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 [term 1 
> FOLLOWER]: Advancing to term 2
> I0729 18:59:47.840427 11544 leader_election.cc:223] T 
> e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
> [CANDIDATE]: Term 2 election: Requesting vote from peer 
> 54197053abab4b6cb1b1632c9d1062dc
> I0729 18:59:47.840860 11544 leader_election.cc:223] T 
> e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
> [CANDIDATE]: Term 2 election: Requesting vote from peer 
> 3522a8de8170476dba0beb58cb2150d4
> I0729 18:59:47.872720 11669 raft_consensus.cc:869] T 
> e3503c47a21649ca931234999cd0bb45 P 3522a8de8170476dba0beb58cb2150d4 [term 1 
> FOLLOWER]: Refusing update from remote peer 54197053abab4b6cb1b1632c9d1062dc: 
> Log matching property violated. Preceding OpId in replica: term: 1 index: 1. 
> Preceding OpId from leader: term: 1 index: 2. (index mismatch)
> I0729 18:59:47.874522 11454 consensus_queue.cc:578] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [LEADER]: 
> Connected to new peer: Peer: 3522a8de8170476dba0beb58cb2150d4, Is new: false, 
> Last received: 1.1, Next index: 2, Last known committed idx: 1, Last exchange 
> result: ERROR, Needs remote bootstrap: false
> I0729 18:59:47.878105 11150 raft_consensus.cc:1324] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
> LEADER]: Handling vote request from an unknown peer 
> d4f64819170a4cf78fe4c9e9a72ec4b9
> I0729 18:59:47.878290 11150 raft_consensus.cc:2014] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
> LEADER]: Stepping down as leader of term 1
> I0729 18:59:47.878451 11150 raft_consensus.cc:499] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
> LEADER]: Becoming Follower/Learner. State: Replica: 
> 54197053abab4b6cb1b1632c9d1062dc, State: 1, Role: LEADER
> Watermarks: {Received: term: 1 index: 2 Committed: term: 1 index: 1}
> I0729 18:59:47.878968 11150 consensus_queue.cc:162] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc 
> [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, 
> Majority replicated op: 1.1, Committed index: 1.1, Last appended: 1.2, 
> Current term: 1, Majority size: -1, State: 1, Mode: NON_LEADER
> I0729 18:59:47.879871 11150 consensus_peers.cc:358] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc -> Peer 
> 3522a8de8170476dba0beb58cb2150d4 (127.37.56.2:53243): Closing peer: 
> 3522a8de8170476dba0beb58cb2150d4
> I0729 18:59:47.882057 11150 raft_consensus.cc:2019] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 1 
> FOLLOWER]: Advancing to term 2
> I0729 18:59:47.885711 11150 raft_consensus.cc:1626] T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 
> FOLLOWER]: Leader election vote request: Denying vote to candidate 
> d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged 
> OpId of term: 1 index: 2, which is greater than that of the candidate, which 
> has last-logged OpId of term: 1 index: 1.
> I0729 18:59:47.892060 11477 leader_election.cc:361] T 
> e3503c47a21649ca931234999cd0bb45 P d4f64819170a4cf78fe4c9e9a72ec4b9 
> [CANDIDATE]: Term 2 election: Vote denied by peer 
> 54197053abab4b6cb1b1632c9d1062dc. Message: Invalid argument: T 
> e3503c47a21649ca931234999cd0bb45 P 54197053abab4b6cb1b1632c9d1062dc [term 2 
> FOLLOWER]: Leader election vote request: Denying vote to candidate 
> d4f64819170a4cf78fe4c9e9a72ec4b9 for term 2 because replica has last-logged 
> OpId of term: 1 index: 2, which is greater than that of the candidate, which 
> has last-logged OpId of term: 1 index: 1.
> I0729 18:59:47.894548 11669 raft_consensus.cc:1324] T 
>