[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-09-09 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
after it is loaded for the very first time. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Reviewed-on: http://gerrit.cloudera.org:8080/3823
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 129 insertions(+), 16 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-09-09 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 17: Code-Review+2

OK let's commit this :)

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-09-09 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 17:

> Do you guys think we should pull this in for 1.0 even though we
 > don't have an upgrade test yet? Maybe it's worth filing a JIRA and
 > committing this, since I think Adar saw it in a test environment.

I think so, given that this change doesn't break forward compat(i.e, 0.10 
should be able to accept remote-copy from 1.0). If we intro partition changes 
in 1.0, we anyways need to handle backward-compat in the future code. I had 
filed JIRA for upgrade/downgrade test.

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-09-08 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 17:

Do you guys think we should pull this in for 1.0 even though we don't have an 
upgrade test yet? Maybe it's worth filing a JIRA and committing this, since I 
think Adar saw it in a test environment.

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-30 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 17:

(2 comments)

TFTRs Mike/Dan/Adar, updated the patch to reflect the discussions we have had 
on this. Please take a look.

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313:   CHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchema partition_schema;
  :   
RETURN_NOT_OK(PartitionSchema::FromPB(superblock.partition_schema(),
  : *schema_, 
&partition_schema));
  :   CHECK(partition_schema_.Equals(partition_schema));
  : 
  :   Partition partition;
  :   Partition::FromPB(superblock.partition(), &partition);
  :   CHECK(partition_.Equals(partition));
> Providing equals on partition should be easy, it just needs to check that t
also checked for hash_buckets equality Dan, assuming that's immutable as well.


PS15, Line 313:   CHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchema partition_schema;
  :   
RETURN_NOT_OK(PartitionSchema::FromPB(superblock.partition_schema(),
  : *schema_, 
&partition_schema));
  :   CHECK(partition_schema_.Equals(partition_schema));
  : 
  :   Partition partition;
  :   Partition::FromPB(superblock.partition(), &partition);
  :   CHECK(partition_.Equals(partition));
> I suppose we could change this to CHECK and just rely on upgrade / downgrad
Cool, thanks Mike, updated the code with Equals() for both partition and 
partition_schema, and moving to CHECK to keep the code same for all build 
types. I agree with catching the compat issues with white/black box automated 
tests for upgrade/downgrade.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-30 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 17:

Build Started http://104.196.14.100/job/kudu-gerrit/3152/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-30 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#17).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
after it is loaded for the very first time. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 129 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/17
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-30 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
> I suppose we could change this to CHECK and just rely on upgrade / downgrad
Providing equals on partition should be easy, it just needs to check that the 
lower and upper bound partition key strings are equal.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-29 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
> I liked this idea, but it still doesn't prevent compat break if the wire fo
I suppose we could change this to CHECK and just rely on upgrade / downgrade 
tests for coverage on this stuff, if we don't want release and debug builds to 
act differently. After all, this isn't a hot path.

Dinesh, to your point: providing an Equals() method for the non-PB objects 
doesn't prevent forward-compatibility problems, you are right. These are just 
inherent issues we have to deal with and as far as I can tell, having upgrade 
tests are really the only way to avoid unforeseen compatibility problems.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-29 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313: } else {
  :   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAs
> Well, it looks like we already have PartitionSchema::Equals() so maybe we j
I liked this idea, but it still doesn't prevent compat break if the wire format 
differs from the local copy if the destination version is ahead of source 
during tablet-copy. In general, it seems this is a question of : if the node is 
accepting a remote copy, is it a good idea to accept a copy from a server with 
release (N-1) ?


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-29 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
> I understand that this code isn't executed in release mode, but I feel fair
Well, it looks like we already have PartitionSchema::Equals() so maybe we just 
need to implement Partition::Equals() to do this without comparing protobuf 
encodings. We could have something like:

 PartitionNSchema s;
 RETURN_NOT_OK(PartitionSchema::FromPB(superblock.partition_schema(), &s));
 DCHECK(partition_schema_.Equals(s));

And something similar for Partition.

In general I'm mostly concerned with forward and backward compatibility in 
release mode, although also maintaining it for DEBUG builds clearly would be 
nice. I think writing automated upgrade / downgrade tests would also help 
detect problems before release.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-28 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
> #1 is a good idea
I understand that this code isn't executed in release mode, but I feel fairly 
strongly that forward (or backward, for that matter) compatibility behavior 
shouldn't vary between Kudu build types. That's really unexpected, and it's the 
kind of thing that may show up when a user encounters a real bug and tries to 
reproduce it with a debug build to get more information.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-28 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
> 1. I actually thought about NDEBUG, but forgot to materialize in between th
#1 is a good idea

Regarding #2, since it's a DCHECK we ignore it in release mode.

Below is what the PBs look like. I suppose we could create equality functions 
for them to try to future proof them from PB changes but I'm not sure it's 
worth it for a DCHECK.

message PartitionPB {
  // The hash buckets of the partition. The number of hash buckets must match
  // the number of hash bucket components in the partition's schema.
  repeated int32 hash_buckets = 1 [packed = true];
  // The encoded start partition key (inclusive).
  optional bytes partition_key_start = 2;
  // The encoded end partition key (exclusive).
  optional bytes partition_key_end = 3;
}

// The serialized format of a Kudu table partition schema.
message PartitionSchemaPB {

  // A column identifier for partition schemas. In general, the name will be
  // used when a client creates the table since column IDs are assigned by the
  // master. All other uses of partition schemas will use the numeric column ID.
  message ColumnIdentifierPB {
oneof identifier {
  int32 id = 1;
  string name = 2;
}
  }

  message RangeSchemaPB {
// Column identifiers of columns included in the range. All columns must be
// a component of the primary key.
repeated ColumnIdentifierPB columns = 1;
  }

  message HashBucketSchemaPB {
// Column identifiers of columns included in the hash. Every column must be
// a component of the primary key.
repeated ColumnIdentifierPB columns = 1;

// Number of buckets into which columns will be hashed. Must be at least 2.
required int32 num_buckets = 2;

// Seed value for hash calculation. Administrators may set a seed value
// on a per-table basis in order to randomize the mapping of rows to
// buckets. Setting a seed provides some amount of protection against denial
// of service attacks when the hash bucket columns contain user provided
// input.
optional uint32 seed = 3;

enum HashAlgorithm {
  UNKNOWN = 0;
  MURMUR_HASH_2 = 1;
}

// The hash algorithm to use for calculating the hash bucket.
optional HashAlgorithm hash_algorithm = 4;
  }

  repeated HashBucketSchemaPB hash_bucket_schemas = 1;
  optional RangeSchemaPB range_schema = 2;
}


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-28 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 16:

Build Started http://104.196.14.100/job/kudu-gerrit/3118/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-28 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#16).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
after it is loaded for the very first time. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
2 files changed, 120 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/16
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-27 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(4 comments)

TFTR Adar, I will update the new patch in a while, please see responses inline.

http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

PS15, Line 271: fix(KUDU-1500)
> Nit: "fix (KUDU-1500)" (separate 'fix' from the parenthetical block with a 
Done


PS15, Line 325: scoped_refptr thread;
  : ASSERT_OK(kudu::Thread::Create(CURRENT_TEST_NAME(),
  :
Substitute("list-tablet-thread-$0", i),
  :&ListTabletsDuringTabletCopy,
  :follower_ts, tablet_id,
  :&barrier, &finish, &thread));
  : threads.push_back(thread);
> Now that we can use C++11 we prefer std::thread for test threads. The advan
TBH I need to read up about std::thread to understand what you are saying about 
these subtle differences. I will post an update once I convert this into 
std::thread.


http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 304: // We treat few fields in the metadata as immutable objects,
  : // so they are loaded from protobuf only when tablet replica
  : // is being instantiated on this peer(KUDU-1500).
> Nit: this comment is a little confusing. How about rewording like so: "Some
Done.


PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
> I have some questions about this logic:
1. I actually thought about NDEBUG, but forgot to materialize in between the 
test experiments, good that you reminded :), consider it done.

2. Hmmm, good Qn. I believe we may land into similar issue if we change the 
superblock version(or format). i.e, destination version differs from source in 
the tablet-copy workflow. My hunch is that(not very sure) we probably have to 
worry about that compatibility at several other places more than here. However 
Mike may have more clearer picture here to answer this Qn.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

(5 comments)

Came to +2, ended up leaving some comments. :/

http://gerrit.cloudera.org:8080/#/c/3823/15//COMMIT_MSG
Commit Message:

PS15, Line 15: once after
Nit: seems like you only need one of these words, but not both.


http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

PS15, Line 271: fix(KUDU-1500)
Nit: "fix (KUDU-1500)" (separate 'fix' from the parenthetical block with a 
space).


PS15, Line 325: scoped_refptr thread;
  : ASSERT_OK(kudu::Thread::Create(CURRENT_TEST_NAME(),
  :
Substitute("list-tablet-thread-$0", i),
  :&ListTabletsDuringTabletCopy,
  :follower_ts, tablet_id,
  :&barrier, &finish, &thread));
  : threads.push_back(thread);
Now that we can use C++11 we prefer std::thread for test threads. The advantage 
is less boilerplate code, both in starting up threads and in the use of lambdas 
for the thread functions themselves, with lambda capture reducing parameter 
passing.

See master-test.cc:TestConcurrentCreateOfSameTable for an example.


http://gerrit.cloudera.org:8080/#/c/3823/15/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

PS15, Line 304: // We treat few fields in the metadata as immutable objects,
  : // so they are loaded from protobuf only when tablet replica
  : // is being instantiated on this peer(KUDU-1500).
Nit: this comment is a little confusing. How about rewording like so: "Some 
metadata fields are assumed to be immutable and thus are only read from the 
protobuf when the tablet metadata is loaded for the very first time. See 
KUDU-1500 for more details."


PS15, Line 313:   DCHECK_EQ(table_id_, superblock.table_id());
  :   PartitionSchemaPB partition_schema_pb;
  :   partition_schema_.ToPB(&partition_schema_pb);
  :   
DCHECK_EQ(superblock.partition_schema().SerializeAsString(),
  : partition_schema_pb.SerializeAsString());
  :   PartitionPB partition_pb;
  :   partition_.ToPB(&partition_pb);
  :   DCHECK_EQ(superblock.partition().SerializeAsString(),
  : partition_pb.SerializeAsString());
I have some questions about this logic:
1. Since this is just sanity checks that are enabled in debug builds, the 
entire block should be conditioned on #ifndef NDEBUG (and then the DCHECK_EQs 
should be converted into CHECK_EQ). This is to avoid the ToPB() calls in 
non-debug builds, where their results aren't used.
2. I'm worried about impacting forward compatibility by comparing serialized 
protobufs like this. In commit 43d2749, Mike removed an equivalent check in 
tserver registration which effectively prevented us from adding new fields to 
the registration protobuf. Would this lead to the same problem? Or is it 
different?


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 15:

Build Started http://104.196.14.100/job/kudu-gerrit/3107/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#15).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
2 files changed, 132 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/15
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#14).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
2 files changed, 131 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/14
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 14:

Build Started http://104.196.14.100/job/kudu-gerrit/3101/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#13).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
3 files changed, 133 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/13
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 13:

Build Started http://104.196.14.100/job/kudu-gerrit/3100/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 12:

(1 comment)

Thanks again, updated along with std:: prefixes to keep jenkins happy.

http://gerrit.cloudera.org:8080/#/c/3823/11/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 268: 
> Can get rid of this
Nnice ! yeah that was a useless return, updated all of above.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 12:

Build Started http://104.196.14.100/job/kudu-gerrit/3099/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#12).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which
caught the race in raft_consensus-itest.TestCorruptReplicaMetadata.

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
3 files changed, 133 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/12
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3823/11/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 258: Status ListTabletsDuringTabletCopy(const TServerDetails* ts,
Shouldn't this return void because it's a main thread function?


Line 265: ListTablets(ts, MonoDelta::FromSeconds(30), &tablets);
CHECK_OK() ?


Line 268:   return Status::OK();
Can get rid of this


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 10:

(10 comments)

TFTR Mike, updated new patch, and also responses inline below.

http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 327:   // Threads to issue ListTablets RPCs to follower.
> 4 spaces is the minimum.
ok cool, i may have missed 'minimum' part.


http://gerrit.cloudera.org:8080/#/c/3823/10/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 23: #include 
> nit: alphabetical order for includes would put this at the top
Fixed.


Line 254: Status ListTabletsDuringTabletCopy(const TServerDetails* ts,
> Mind adding a quick doc comment to this function?
Done.


Line 276: // This is only optimistic in that, we hope to catch the
> Worth mentioning that we rely on TSAN to "catch" it.
added some more verbiage, the line above had a tsan mention though.


Line 322:   std::atomic transition(false);
> I don't think we use the transition variable anymore.
good catch, removed.


Line 324:   int num_threads = FLAGS_test_num_threads;
> why bother having this extra variable num_threads?
good point, removed. that also makes the actual ListTablet thread lot thinner :)


Line 355:   finish = true;
> It's only a test, but we can get away with finish.store(true, memory_order_
my thinking was since this is a test we could get away with the memory barrier 
:), but I see your point and agree with you, done.


http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 283: last_durable_mrs_id_ = superblock.last_durable_mrs_id();
> Are you sure table_name_ is not accessed by ListTablets()?
I was speaking from immutability point of view. To answer your Qn, table_name_ 
is accessed all over including ListTablets, but in safe way. The same lock 
which is held during SuperblockLoading is held across reads for table_name_ too.
string TabletMetadata::table_name() const {
  std::lock_guard l(data_lock_);
  DCHECK_NE(state_, kNotLoadedYet);
  return table_name_;
}


http://gerrit.cloudera.org:8080/#/c/3823/10/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 317:   DCHECK_EQ(
> Although, another thing to consider is whether there is always a guaranteed
H, good point. Tests on localhost passing, I will run various tests on 
different platforms exercising this routine to make sure assumption holds true.


Line 323:   DCHECK_EQ(0, table_id_.compare(superblock.table_id()));
> DCHECK_EQ(table_id_, superblock.table_id())
Ha ! didn't realize DCHECK_CHECK actually acts on string. thank you. Took care 
of all the above comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 11:

Build Started http://104.196.14.100/job/kudu-gerrit/3096/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#11).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during tablet-copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.
DCHECKs are added to make sure the oncoming protobuf fields
match the immutable fields not updated during tablet-copy.

Testing: Spun a new test TabletCopyITest.TestDeleteTabletDuringTabletCopy
which exercises this data race window, also ran original test which caught
the race in raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
3 files changed, 135 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/11
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/10/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 317:   DCHECK_EQ(
> I don't think this is valid. The easiest thing to do is DCHECK_EQ(superbloc
Although, another thing to consider is whether there is always a guaranteed 
order of the components. I'm not exactly sure


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-26 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 10:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 327:   // Threads to issue ListTablets RPCs to follower.
> thought of following 'function call' formatting guideline. Shouldn't next l
4 spaces is the minimum.


http://gerrit.cloudera.org:8080/#/c/3823/10/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 23: #include 
nit: alphabetical order for includes would put this at the top


Line 254: Status ListTabletsDuringTabletCopy(const TServerDetails* ts,
Mind adding a quick doc comment to this function?


Line 276: // This is only optimistic in that, we hope to catch the
Worth mentioning that we rely on TSAN to "catch" it.


Line 322:   std::atomic transition(false);
I don't think we use the transition variable anymore.


Line 324:   int num_threads = FLAGS_test_num_threads;
why bother having this extra variable num_threads?


Line 355:   finish = true;
It's only a test, but we can get away with finish.store(true, 
memory_order_relaxed) here since we don't need a memory barrier.


http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 283: last_durable_mrs_id_ = superblock.last_durable_mrs_id();
> table_name_ is being set by AlterSchema, so even though it's not supported,
Are you sure table_name_ is not accessed by ListTablets()?


http://gerrit.cloudera.org:8080/#/c/3823/10/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 312: } else {
Can you do the validation in the else-block in the same order that you do 
assignment in the if-block?


Line 313:   PartitionPB partitionPB;
nit: camelCaps not allowed for variable names in C++ by the style guide. Also 
below. Name this partition_pb and the below one partition_schema_pb (also note 
there is a typo in the variable name "partion_schemaPB").


Line 317:   DCHECK_EQ(
I don't think this is valid. The easiest thing to do is 
DCHECK_EQ(superblock.partition().SerializeAsString(), 
partition_pb.SerializeAsString());


Line 320:   DCHECK_EQ(
Also here


Line 323:   DCHECK_EQ(0, table_id_.compare(superblock.table_id()));
DCHECK_EQ(table_id_, superblock.table_id())


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-24 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#10).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
3 files changed, 140 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/10
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-24 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 10:

Build Started http://104.196.14.100/job/kudu-gerrit/3061/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-24 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 9:

Build Started http://104.196.14.100/job/kudu-gerrit/3053/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-24 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 8:

(12 comments)

TFTR MIke, addressed all the comments, and also re-ran the test suite. Please 
take a look at responses inline.

http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 42: DEFINE_int32(num_test_threads, 16,
> Let's follow the convention of the other params and name this test_num_thre
Done.


Line 262: // Step 4: Verify that at least one thread caught the state 
transition.
> Let's just remove step 4.
Done.


Line 287:   for (int i = 0; i < cluster_->num_tablet_servers(); i++) {
> This check is not needed. WaitForServersToAgree() already effectively achie
Yeah, bad copy paste skills :), done.


Line 298:   // Find out who is leader so that we can tombstone a follower
> It doesn't matter whether you kill a leader or a follower, does it? We only
Sounds good, that's true - that was the original intention of choosing a 
follower to tombstone to keep this test specifically geared towards kudu-1500.


Line 306:   AtomicBool transition(false);
> These days, prefer std::atomic_bool or std::atomic (the former is a t
Done.


Line 327:   << " on TS " << follower_ts->uuid();
> nit: it's prettier / more readable to align the << with the one on the prev
thought of following 'function call' formatting guideline. Shouldn't next line 
be offset by 4 spaces ? Took your suggestion anyways.


Line 347:   if (!transition.Load())
> Hmm. I think we should remove this since it's not really doing anything, an
Done. Agreed.


Line 351:   ASSERT_TRUE(cluster_->tablet_server(kTsIndex)->IsProcessAlive());
> This can be replaced with NO_FATALS(cluster_->AssertNoCrashes());
Done, thanks.


http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 283: table_id_ = superblock.table_id();
> What about table_id_? Shouldn't this also be inside the "if" statement belo
table_name_ is being set by AlterSchema, so even though it's not supported, we 
could keep it mutable(as-is) as long as it is not racy. I liked idea of doing 
DCHECK on all other 3 objects inside an else{} that they didn't differ from our 
assumption, added code and also tested them. Please take a look again. I am 
also getting this warning for the memcmp usage: 
"/Users/dinesh/Documents/kudu_exp/kudu/src/kudu/tablet/tablet_metadata.cc:317:140:
 warning: first operand of this 'memcmp' call is a pointer to dynamic class 
'PartitionPB'; vtable pointer will be compared [-Wdynamic-class-memaccess]"


Line 298:   LOG_WITH_PREFIX(WARNING) << "Upgrading from Kudu 0.5.0 directly 
to this"
> This seems like it should be a FATAL error to me. Dan?
Hmmm, I kept it as warning for now, but I guess I was supposed to return error 
from here, just fixed that now.


Line 300:   << " moving to a higher version." << std::endl;
> std::endl is not required here and should be removed
Done.


Line 303: if (state_ == kNotLoadedYet) {
> Please add a comment here explaining why we only do this when the TabletMet
Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-24 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#9).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
3 files changed, 139 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-22 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 8:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

Line 242: Status ListTabletsDuringTabletCopy(const TServerDetails* ts,
I don't think this utility is generic enough to put here. It is very specific. 
Let's move it to the test.


http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 42: DEFINE_int32(num_test_threads, 16,
Let's follow the convention of the other params and name this test_num_threads


Line 262: // Step 4: Verify that at least one thread caught the state 
transition.
Let's just remove step 4.


Line 287:   for (int i = 0; i < cluster_->num_tablet_servers(); i++) {
This check is not needed. WaitForServersToAgree() already effectively achieves 
this.


Line 298:   // Find out who is leader so that we can tombstone a follower
It doesn't matter whether you kill a leader or a follower, does it? We only 
care that one of the servers eventually becomes leader and starts a tablet copy 
on our tombstoned replica.

That said, even though this isn't necessary for correctness, it makes average 
test time lower since we don't have to wait for an election to get the tablet 
copy kicked off. Let's just document that as the reason for doing this.


Line 306:   AtomicBool transition(false);
These days, prefer std::atomic_bool or std::atomic (the former is a 
typedef of the latter) from C++11 to this older C++03 compatible atomics 
library.


Line 327:   << " on TS " << follower_ts->uuid();
nit: it's prettier / more readable to align the << with the one on the previous 
line


Line 347:   if (!transition.Load())
Hmm. I think we should remove this since it's not really doing anything, and 
trying to rely on it would make this test racy. No one will ever check this 
test log output and if they did it wouldn't tell them very much.


Line 351:   ASSERT_TRUE(cluster_->tablet_server(kTsIndex)->IsProcessAlive());
This can be replaced with NO_FATALS(cluster_->AssertNoCrashes());


http://gerrit.cloudera.org:8080/#/c/3823/8/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 283: table_id_ = superblock.table_id();
What about table_id_? Shouldn't this also be inside the "if" statement below? 
Also the table_name_. Right now, we don't support renaming tables. I think we 
should add a DHECK that these things didn't change and then put them inside the 
"if".


Line 298:   LOG_WITH_PREFIX(WARNING) << "Upgrading from Kudu 0.5.0 directly 
to this"
This seems like it should be a FATAL error to me. Dan?


Line 300:   << " moving to a higher version." << std::endl;
std::endl is not required here and should be removed


Line 303: if (state_ == kNotLoadedYet) {
Please add a comment here explaining why we only do this when the 
TabletMetadata is uninitialized.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-13 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 8:

(2 comments)

TFTR Dan, addressed both comments with responses inline...

http://gerrit.cloudera.org:8080/#/c/3823/7/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 323:   barrier.Wait();
> Could this be done with a countdown latch or similar?  5 seconds is a very 
Just stumbled across a cute Barrier implementation in util/, I am beginning to 
appreciate the power of C++ util libraries at the moment :)


http://gerrit.cloudera.org:8080/#/c/3823/7/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 298:   LOG_WITH_PREFIX(WARNING) << "Upgrading from Kudu 0.5.0 directly 
to this"
> I think this is a little hard to follow, perhaps a better way to state it i
Done, although I wonder if there are any other situations which meet 
!superblock.has_partition() in which case this message may become  misleading.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-13 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 8:

Build Started http://104.196.14.100/job/kudu-gerrit/2866/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-13 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#8).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 138 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3823/7/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 323:   SleepFor(MonoDelta::FromSeconds(5));
Could this be done with a countdown latch or similar?  5 seconds is a very long 
time to wait.


http://gerrit.cloudera.org:8080/#/c/3823/7/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

Line 298:   LOG_WITH_PREFIX(WARNING) << "Upgrade from release <= 0.5 to > 
0.6 "
I think this is a little hard to follow, perhaps a better way to state it is:

Upgrading from Kudu 0.5.0 directly to this version is not supported. Please 
upgrade to 0.6.0 before moving to a higher version.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has abandoned this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Abandoned

abandoning due to duplicate links. original review is under 
http://gerrit.cloudera.org:8080/3823

-- 
To view, visit http://gerrit.cloudera.org:8080/3955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I0a7a62b7c5aa62e9888b4b454790861df9fe2ad5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has abandoned this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Abandoned

abandoning due to duplicate links. original review is under 
http://gerrit.cloudera.org:8080/3823

-- 
To view, visit http://gerrit.cloudera.org:8080/3957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I204e9ef326b95b4d6216778a35b0438760ff
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has abandoned this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Abandoned

abandoning due to duplicate links. original review is under 
http://gerrit.cloudera.org:8080/3823

-- 
To view, visit http://gerrit.cloudera.org:8080/3956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I0cfea3ea1b8e308fe05a0b3e94c93b7b1369ac24
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#7).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 136 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 7:

Build Started http://104.196.14.100/job/kudu-gerrit/2840/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/3957

to review the following change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: I0a7a62b7c5aa62e9888b4b454790861df9fe2ad5

Change-Id: I204e9ef326b95b4d6216778a35b0438760ff
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 136 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/3957/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I204e9ef326b95b4d6216778a35b0438760ff
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2836/

-- 
To view, visit http://gerrit.cloudera.org:8080/3957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I204e9ef326b95b4d6216778a35b0438760ff
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 6:

(2 comments)

TFTR mpercy/dan, I have posted the new diffs under 
https://gerrit.cloudera.org/#/c/3955/, and decided to abandon this patch. 

PS: I tried to use same Change-Id as this one in my commit message to keep 
everything in one thread, but failed to do so because I got the new link when I 
published the new set of diffs.

http://gerrit.cloudera.org:8080/#/c/3823/6/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

Line 109:   const Partition partition_locked() const {
> Is this the right return type?  Seems weird to return const by value.
Yeah, please ignore this change altogether, I have posted a new review with the 
new fix we had discussed earlier this week, and I am gonna abandon this change.


http://gerrit.cloudera.org:8080/#/c/3823/6/src/kudu/tserver/tserver-path-handlers.cc
File src/kudu/tserver/tserver-path-handlers.cc:

Line 211:   peer->tablet_metadata()->schema());
> Note: schema_ is also modified in TabletMetadata::LoadFromSuperBlock()
Yes, but schema_ is not race-prone in that it has atomic pointer swap to new 
schema_ and also keeps history of stale schema_.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/3955

to review the following change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019

Change-Id: I0a7a62b7c5aa62e9888b4b454790861df9fe2ad5
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 136 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/3955/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a7a62b7c5aa62e9888b4b454790861df9fe2ad5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2835/

-- 
To view, visit http://gerrit.cloudera.org:8080/3956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0cfea3ea1b8e308fe05a0b3e94c93b7b1369ac24
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Dinesh Bhat (Code Review)
Hello Dan Burkert, Mike Percy,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/3956

to review the following change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the Partition and PartitionSchema on that tserver is accessed using
TabletPeer in an unguarded manner via ListTablets RPCs.

The proposed fix here keeps the Partition and PartitionSchema immutable
once after it is loaded during replica bootstrap. These fields are never
overwritten again during Tablet Copy or any other workflow.
Also a unit test is added to exercise this data race window. i.e,
the unit test aims to issue ListTablets RPCs during a follower's
tablet copy stage and provides an optimistic coverage for the fix.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019

Change-Id: I0cfea3ea1b8e308fe05a0b3e94c93b7b1369ac24
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.cc
4 files changed, 136 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/3956/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0cfea3ea1b8e308fe05a0b3e94c93b7b1369ac24
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-12 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2834/

-- 
To view, visit http://gerrit.cloudera.org:8080/3955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a7a62b7c5aa62e9888b4b454790861df9fe2ad5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-09 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3823/6/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

Line 109:   const Partition partition_locked() const {
Is this the right return type?  Seems weird to return const by value.


Line 142:   const PartitionSchema partition_schema_locked() const {
same here.


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-09 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3823/6/src/kudu/tserver/tserver-path-handlers.cc
File src/kudu/tserver/tserver-path-handlers.cc:

Line 211:   peer->tablet_metadata()->schema());
Note: schema_ is also modified in TabletMetadata::LoadFromSuperBlock()


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-09 Thread Dinesh Bhat (Code Review)
Dinesh Bhat has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 6:

(1 comment)

TFTR MIke, this is still WIP, I uploaded one missing piece from my previous 
patchset to confirm jenkins test suite pass. I will upload another patchset 
today.

http://gerrit.cloudera.org:8080/#/c/3823/4/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

Line 142:   const PartitionSchema partition_schema_locked() const {
> I don't think this is needed anymore if we make PartitionSchema::PartitionD
I think I had missed out one important callsite in ListTablets(), while 
cleaning up the changes. I am inclined to keep this routine and 
PartitionDebugString as it is since this routine is called by 2 other callers 
now. This is still WIP as I uploaded the patch only to see if kudu jenkins test 
suite is happy,


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-09 Thread Dinesh Bhat (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#6).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the partition schema on that tserver is accessed in an unguarded manner
from a thread servicing ListTablets API.

The proposed fix here accesses the metadata partition resources via
copy by value mechanism now in slow paths, where the copy itself
happens under the same lock which makes the metadata resurrection
operation idempotent. Fast paths are untouched since the parition
schema is bound to be accessed in fast paths only after tablet metadata
is fully resurrected and tablet is in RUNNING state.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_bootstrap.h
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tools/fs_tool.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver-path-handlers.cc
8 files changed, 20 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-09 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/2760/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-09 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/2758/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-08 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3823/4/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

Line 104:   const Partition& partition() const {
Typically when we create functions like this we name them like:

GetPartitionUnlocked()
GetPartitionCopy()

And add thread-safety comments to them.

I'd even suggest going one step further. Have Tablet keep a copy of the 
Partition field locally when it's constructed and never access it from the 
TabletMetadata after that. That way all the calls that looks like 
tablet->metadata()->partition_locked() etc will just look like 
tablet->partition() and they don't have to worry about taking locks or copying.


Line 142:   const PartitionSchema partition_schema_locked() const {
I don't think this is needed anymore if we make 
PartitionSchema::PartitionDebugString() static


http://gerrit.cloudera.org:8080/#/c/3823/4/src/kudu/tserver/tserver-path-handlers.cc
File src/kudu/tserver/tserver-path-handlers.cc:

Line 209: string partition = 
peer->tablet_metadata()->partition_schema_locked()
Can we just make PartitionSchema::PartitionDebugString() a static method?


-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-08 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/2746/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-08 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/2740/

-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

2016-08-08 Thread Dinesh Bhat (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3823

to look at the new patch set (#3).

Change subject: KUDU-1500: Data race in 
RaftConsensusITest.TestCorruptReplicaMetadata
..

KUDU-1500: Data race in RaftConsensusITest.TestCorruptReplicaMetadata

The test intends to corrupt the metadata of one of the tserver tablets.
While the cluster is in the process of resurrecting the failed tserver,
the partition schema on that tserver is accessed in an unguarded manner
from a thread servicing ListTablets API.

The proposed fix here accesses the metadata partition resources via
copy by value mechanism now in slow paths, where the copy itself
happens under the same lock which makes the metadata resurrection
operation idempotent. Fast paths are untouched since the parition
schema is bound to be accessed in fast paths only after tablet metadata
is fully resurrected and tablet is in RUNNING state.

Testing: Passed about 2000 iteration of the failing test
raft_consensus-itest.TestCorruptReplicaMetadata

Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_bootstrap.h
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tools/fs_tool.cc
M src/kudu/tserver/tserver-path-handlers.cc
7 files changed, 18 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/3823/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3823
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If57617e22b41296b8d4e8ad131220f1ebb235019
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dinesh Bhat 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon