[kudu-CR] [thirdparty] fix gperftools build on Mac OS X

2018-02-24 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9444 )

Change subject: [thirdparty] fix gperftools build on Mac OS X
..


Patch Set 1: Code-Review+2

Mind dropping a comment on the upstream pull request for this as well? I'll 
update it there (on my phone at the moment)


-- 
To view, visit http://gerrit.cloudera.org:8080/9444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff3e3a5dd50541c8eba48e7e6aa917d649e54931
Gerrit-Change-Number: 9444
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Sat, 24 Feb 2018 18:15:26 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2312: Scan predicate application ordering is non-deterministic

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9440 )

Change subject: KUDU-2312: Scan predicate application ordering is 
non-deterministic
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9440/1/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/9440/1/src/kudu/common/generic_iterators-test.cc@402
PS1, Line 402: CHECK
> That requires the types to be printable, unfortunately.
ASSERT_EQ does but not ASSERT_TRUE



--
To view, visit http://gerrit.cloudera.org:8080/9440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I99b2cabecd8626cad7e11fbdd492af7276e08348
Gerrit-Change-Number: 9440
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 24 Feb 2018 01:48:21 +
Gerrit-HasComments: Yes


[kudu-CR] maintenance manager: log the reason for scheduling each operation

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9172 )

Change subject: maintenance_manager: log the reason for scheduling each 
operation
..


Patch Set 1:

OK, I made things a bit more concise, let me know how this looks


--
To view, visit http://gerrit.cloudera.org:8080/9172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dcdb863a7a0b0fc2a72757801d5c057fa725c34
Gerrit-Change-Number: 9172
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Sat, 24 Feb 2018 01:47:06 +
Gerrit-HasComments: No


[kudu-CR] maintenance manager: log the reason for scheduling each operation

2018-02-23 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Jean-Daniel Cryans, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9172

to look at the new patch set (#2).

Change subject: maintenance_manager: log the reason for scheduling each 
operation
..

maintenance_manager: log the reason for scheduling each operation

This makes it easier to troubleshoot when the maintenance manager
appears to be scheduling the "wrong" operation.

Example output from a long run of full_stack_insert_scan-test:

...: Scheduling FlushMRSOp(93ded54b4dfb4e1586ff7fe700184f53): under memory 
pressure (60.30% used, can flush 641875735 bytes)
...: Scheduling LogGCOp(93ded54b4dfb4e1586ff7fe700184f53): free 386079935 bytes 
of WAL
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 29857788 
bytes on disk
...: Scheduling FlushMRSOp(93ded54b4dfb4e1586ff7fe700184f53): under memory 
pressure (60.01% used, can flush 637714199 bytes)
...: Scheduling LogGCOp(93ded54b4dfb4e1586ff7fe700184f53): free 394543122 bytes 
of WAL
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.281697
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.280992
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.280256
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.060532
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.060298
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 56855045 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.054961
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 7202893 
bytes on disk
...: Scheduling FlushMRSOp(93ded54b4dfb4e1586ff7fe700184f53): under memory 
pressure (60.25% used, can flush 633552663 bytes)
...: Scheduling LogGCOp(93ded54b4dfb4e1586ff7fe700184f53): free 394836440 bytes 
of WAL
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.192819
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.184820
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.184674
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 70881575 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.127476
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 18119656 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.127334
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.111677
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 31714242 
bytes on disk
...: Scheduling FlushMRSOp(93ded54b4dfb4e1586ff7fe700184f53): under memory 
pressure (60.06% used, can flush 624189207 bytes)
...: Scheduling LogGCOp(93ded54b4dfb4e1586ff7fe700184f53): free 377818508 bytes 
of WAL
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 29069301 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.144540
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.138843
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 36867458 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.138827
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.122173
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 31717417 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.121637
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.121104
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.088980
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.087296
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 54371475 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.055906
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 9063001 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.031908
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 6840843 
bytes on disk
...: Scheduling FlushMRSOp(93ded54b4dfb4e1586ff7fe700184f53): under memory 
pressure (60.08% used, can flush 614825751 bytes)
...: Scheduling LogGCOp(93ded54b4dfb4e1586ff7fe700184f53): free 377913596 bytes 
of WAL
...: Scheduling UndoDeltaBlockGCOp(93ded54b4dfb4e1586ff7fe700184f53): 28612520 
bytes on disk
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 
score=0.113092
...: Scheduling CompactRowSetsOp(93ded54b4dfb4e1586ff7fe700184f53): perf 

[kudu-CR] KUDU-2312: Scan predicate application ordering is non-deterministic

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9440 )

Change subject: KUDU-2312: Scan predicate application ordering is 
non-deterministic
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9440/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9440/1//COMMIT_MSG@10
PS1, Line 10: orders by selectivity (as before), but breaks ties by column 
index.
do you think determinism should be independent of the order in which the 
predicates are specified? ie should we leave it open for Impala to do things 
like order the predicates based on their selectivity estimates? I _think_ we 
maintain the order that the client specified all the way through, so using a 
std::stable_sort might be preferable to breaking ties by column index? It would 
still be deterministic, just client-specified.


http://gerrit.cloudera.org:8080/#/c/9440/1/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/9440/1/src/kudu/common/generic_iterators-test.cc@402
PS1, Line 402: CHECK
ASSERT_TRUE?


http://gerrit.cloudera.org:8080/#/c/9440/1/src/kudu/common/generic_iterators-test.cc@416
PS1, Line 416: CHECK
same



--
To view, visit http://gerrit.cloudera.org:8080/9440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I99b2cabecd8626cad7e11fbdd492af7276e08348
Gerrit-Change-Number: 9440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 24 Feb 2018 01:29:05 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2290: Tool to re-create a tablet

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9393 )

Change subject: KUDU-2290: Tool to re-create a tablet
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h@577
PS2, Line 577: , causing all of its data
 :   // to be permanently lost.
Would it be feasible to turn the replaced tablet into a new single-tablet 
table, so that if you eventually were able to repair it, you could at least get 
the data back out of it?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc@4249
PS2, Line 4249:   
replaced_tablet->mutable_metadata()->mutable_dirty()->set_state(SysTabletsEntryPB::DELETED,
if we make it DELETED it will also get removed off of the tablet servers. we 
might want to keep it there for some investigation or attempts to recover the 
data. Is there a way we can accomplish that?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto
File src/kudu/master/master.proto:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@792
PS2, Line 792: option (kudu.rpc.authz_method) = "AuthorizeService";
we don't want AuthorizeSuperUser here?


http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc@517
PS2, Line 517:   LOG(INFO) << "ReplaceTablet: received request to replace 
tablet " << req->tablet_id();
worth logging the requestor info too



--
To view, visit http://gerrit.cloudera.org:8080/9393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c
Gerrit-Change-Number: 9393
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 23:19:10 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9329 )

Change subject: KUDU-2297 (part 3): refactor process-wide stack collection out 
of /stacks
..

KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

Previously a bunch of logic to collect all the stacks from the process
was in the /stacks path handler. This logic is relatively generic and
shouldn't be intermingled with the formatting code. In particular I'd
like to use it in the diagnostics log, where a different output format
is desirable.

Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Reviewed-on: http://gerrit.cloudera.org:8080/9329
Tested-by: Todd Lipcon 
Reviewed-by: Mike Percy 
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/server/diagnostics_log.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
5 files changed, 190 insertions(+), 112 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Gerrit-Change-Number: 9329
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9330 )

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..

KUDU-2297 (part 4): periodically dump stacks to diagnostics log

This modifies the diagnostics log to periodically dump stack traces.
This is slightly complicated by the fact that symbolized stack traces
can be relatively large. So, we separate the logging of symbols and
stack traces. When an address first appears in a log file, it is logged
as part of a symbol line. Later logs of the same address do not need
to re-log the symbol.

With this, a typical stack trace dump of an idle tserver is about 4KB
pre-compression, and a 'symbols' dump is about 6KB. So logging stacks
reasonably often should not use much disk space or IO.

Currently this is enabled on the same interval as the metrics log, but
only if a new experimental flag --diagnostics-log-stack-traces is
enabled. I'm planning to move it to a different flag in a later commit,
but wanted to keep this one simple and incremental.

Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Reviewed-on: http://gerrit.cloudera.org:8080/9330
Tested-by: Todd Lipcon 
Reviewed-by: Mike Percy 
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
M src/kudu/util/rolling_log-test.cc
M src/kudu/util/rolling_log.cc
M src/kudu/util/rolling_log.h
7 files changed, 235 insertions(+), 41 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 10
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Don't perform compactions when clean time has not been advanced

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9436 )

Change subject: Don't perform compactions when clean time has not been advanced
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia74abdf7d806efc4239dc9cff4a5da28621d331a
Gerrit-Change-Number: 9436
Gerrit-PatchSet: 1
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 22:19:03 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2191 (5/n): Add Kerberos SASL support to the HMS client

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8692 )

Change subject: KUDU-2191 (5/n): Add Kerberos SASL support to the HMS client
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8692
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f217ae05fd36c8ee88fe20eeccd73d49233a345
Gerrit-Change-Number: 8692
Gerrit-PatchSet: 11
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 21:04:24 +
Gerrit-HasComments: No


[kudu-CR] [doc] Document the new decimal column type

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9432 )

Change subject: [doc] Document the new decimal column type
..


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9432/2/docs/developing.adoc
File docs/developing.adoc:

http://gerrit.cloudera.org:8080/#/c/9432/2/docs/developing.adoc@183
PS2, Line 183: ,
nit: remove this comma (even if you are a fan of the oxford comma, you only get 
to use it in lists of three or more)


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/known_issues.adoc
File docs/known_issues.adoc:

http://gerrit.cloudera.org:8080/#/c/9432/2/docs/known_issues.adoc@56
PS2, Line 56: * Type and nullability of existing columns cannot be changed by 
altering the table.
maybe worth adding a note here that the precision and scale of DECIMAL columns 
can also not be changed


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/release_notes.adoc
File docs/release_notes.adoc:

http://gerrit.cloudera.org:8080/#/c/9432/2/docs/release_notes.adoc@53
PS2, Line 53: * Kudu now supports a decimal column type. The decimal type is a 
numeric data type
it's probably worth some note in compat section on what happens if you try to 
access a table with a DECIMAL column from an old client which doesn't 
understand it. Does it fail to access the table entirely, or only fail to 
access if you try to use that column, etc.


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc
File docs/schema_design.adoc:

http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@99
PS2, Line 99: float
should we format 'float' and 'double' with `...` so it shows up more like a 
keyword? Or if you want to use the english name I would say "floating point 
types" rather than the specific keywords.

Same with references to "The decimal type", perhaps should be "The `decimal` 
type" or somesuch?


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@100
PS2, Line 100: int64
same


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@103
PS2, Line 103: a precision and scale
"a" seems misplaced


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@108
PS2, Line 108: For example, representing
 : integer values up to , and fractional values up to 99.99, 
both require a
 : precision of 4
I think this would read a little more clearly written as:

For example, a precision of 4 is required to represent integer values up to 
, or to represent values up to 99.99 with two fractional digits.


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@116
PS2, Line 116: If precision and scale are equal, all the digits come after the 
decimal point,
 : making all the values between -0.999... and 0.999...
This sentence is a bit confusing. Perhaps:

If precision and scale are equal, all of the digits come after the decimal 
point. For example, a decimal with precision and scale equal to 3 can represent 
values between -0.999 and 0.999.


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@129
PS2, Line 129: * Decimal values with precision greater than 18 are stored in 16 
bytes.
Is it worth adding a note that we don't currently support BITSHUFFLE encoding 
for this last case and thus there is likely to be more than double the storage 
cost when moving from precision=18 to precision=19?

It might also be worth adding some guidance reminding the user that precision 
and scale cannot be altered on existing data here, but warning them that "just 
use the highest precision possible" has some cost in performance, memory, and 
storage.


http://gerrit.cloudera.org:8080/#/c/9432/2/docs/schema_design.adoc@143
PS2, Line 143: bitshuffle
is bitshuffle the default for int128? I thought it wasnt supported but maybe 
you got that in while I wasn't paying attention :)



--
To view, visit http://gerrit.cloudera.org:8080/9432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9489613d35daad708648ea04d49e472d3149b33d
Gerrit-Change-Number: 9432
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 21:01:17 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2259: add real user to AuthenticationCredentialsPB

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9374 )

Change subject: KUDU-2259: add real user to AuthenticationCredentialsPB
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9374/2/java/kudu-client/src/main/java/org/apache/kudu/client/SecurityContext.java
File java/kudu-client/src/main/java/org/apache/kudu/client/SecurityContext.java:

http://gerrit.cloudera.org:8080/#/c/9374/2/java/kudu-client/src/main/java/org/apache/kudu/client/SecurityContext.java@61
PS2, Line 61:   private String realUser;
probably worth a comment.

Why is this nullable, considering it's initialized with the system property 
user.name? Is it possible that is null? If it were null would other stuff just 
crash later anyway with an NPE?


http://gerrit.cloudera.org:8080/#/c/9374/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestSecurityContextRealUser.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestSecurityContextRealUser.java:

http://gerrit.cloudera.org:8080/#/c/9374/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestSecurityContextRealUser.java@73
PS2, Line 73: try (KuduClient client = new 
KuduClient.KuduClientBuilder(miniCluster.getMasterAddresses()).build()) {
wrap


http://gerrit.cloudera.org:8080/#/c/9374/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestSecurityContextRealUser.java@77
PS2, Line 77: NotAuthorized
yea, probably guess it's worth a JIRA


http://gerrit.cloudera.org:8080/#/c/9374/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestSecurityContextRealUser.java@82
PS2, Line 82: try (KuduClient client = new 
KuduClient.KuduClientBuilder(miniCluster.getMasterAddresses()).build()) {
mind wrapping this?


http://gerrit.cloudera.org:8080/#/c/9374/2/src/kudu/client/client-internal.h
File src/kudu/client/client-internal.h:

http://gerrit.cloudera.org:8080/#/c/9374/2/src/kudu/client/client-internal.h@233
PS2, Line 233:   rpc::UserCredentials user_credentials_;
can you comment that this never changes after construction? ie it's effectively 
const? was a bit nervous about thread-safety but that appears to be the case, 
right?


http://gerrit.cloudera.org:8080/#/c/9374/2/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/9374/2/src/kudu/client/client-test.cc@5362
PS2, Line 5362:   // TODO(dan): should this be NotAuthorized?
yea seems like probably should.  maybe put this in the same JIRA as the other 
one in the Java side?



--
To view, visit http://gerrit.cloudera.org:8080/9374
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2d901d42501ecfc0f6372f68cf7335eb188b45
Gerrit-Change-Number: 9374
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 20:49:17 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9409 )

Change subject: KUDU-2291 (part 8): fix a TSAN issue with libunwind 
initialization
..

KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

libunwind uses double-checked locking for initialization, which isn't
technically safe. We previously tried to work around this by calling into the
stack trace library before starting any kudu::Threads, but that still left us
open to races in unit tests like rw_mutex-test which uses std::thread.

This patch changes the forced single-threaded initialization to use GoogleOnce
instead.

Prior to this patch, looping rw_mutex-test on TSAN failed 12/1000 times.
With the patch it passed 1000/1000.

Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Reviewed-on: http://gerrit.cloudera.org:8080/9409
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M src/kudu/util/debug-util.cc
M src/kudu/util/thread.cc
2 files changed, 26 insertions(+), 5 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Gerrit-Change-Number: 9409
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2309: /masters can show the wrong list of masters

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9378 )

Change subject: KUDU-2309: /masters can show the wrong list of masters
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9378/3/www/masters.mustache
File www/masters.mustache:

http://gerrit.cloudera.org:8080/#/c/9378/3/www/masters.mustache@40
PS3, Line 40:   Errors
Does it make sense to show this if there are no errors?



-- 
To view, visit http://gerrit.cloudera.org:8080/9378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I603ebc22e998bac9bd00edc939577ae339587f26
Gerrit-Change-Number: 9378
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 20:34:11 +
Gerrit-HasComments: Yes


[kudu-CR] [util] fix build on OS X

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9433 )

Change subject: [util] fix build on OS X
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8905e412a036ef4ca9fbedc589d01a8d79eb0c7
Gerrit-Change-Number: 9433
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 20:28:18 +
Gerrit-HasComments: No


[kudu-CR] webserver-stress-itest: fix flakiness

2018-02-23 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9414

to look at the new patch set (#3).

Change subject: webserver-stress-itest: fix flakiness
..

webserver-stress-itest: fix flakiness

This fixes a source of flakiness I found on the flaky dashboard. In some runs
of this test, we'd hit the following interleaving:

- we start the master with webserver_port=0 and it picks some port (eg 35000)
- we stop the master
- the curl threads are still running, and one of them picks port 35000 as the
  local side of its TCP connection. It then tries to connect to 35000 and hits
  the dreaded "tcp loop connect" phenomenon[1] in which it actually connects
  to _itself_. Thus it just hangs there occupying the port
- we try to start the master again, and it fails to bind
- we now time out trying to Join() on the curl thread, which is waiting forever
  for itself to respond to an HTTP request.

The fix is to use non-ephemeral ports for the webserver as we already do
for the RPC server. I additionally added timeouts to the curl calls.

[1] http://www.rampa.sk/static/tcpLoopConnect.html

Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
---
M src/kudu/integration-tests/linked_list-test-util.h
M src/kudu/integration-tests/webserver-stress-itest.cc
M src/kudu/util/curl_util.cc
M src/kudu/util/curl_util.h
4 files changed, 32 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/9414/3
--
To view, visit http://gerrit.cloudera.org:8080/9414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
Gerrit-Change-Number: 9414
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-2325. Fix CHECK failure in RevokeSigData()

2018-02-23 Thread Todd Lipcon (Code Review)
Hello Mike Percy,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9430

to review the following change.


Change subject: KUDU-2325. Fix CHECK failure in RevokeSigData()
..

KUDU-2325. Fix CHECK failure in RevokeSigData()

This fixes a race in the stack trace signaling mechanism:

Target thread in HandleStackTraceSignal():
-
- stack->Collect()
- queued_to_tid = kNotInUse
  

Tracer thread in RevokeSigData:
---
- see that the flag is not complete
- exchange out queued_to_tid
- assertions checked that queued_to_tid was either
  equal to the target tid or kDumpInProgress (because
  it was "not complete yet" it didn't expect
  kNotInUse)

- mark complete

This was fairly rare since RevokeSigData had to run in between
the setting of 'queued_to_tid = kNotInUse' and the signalling
of the completion flag. A new test added in this commit managed
to trigger it pretty reliably by measuring the usual time taken for
stack trace collection and setting timeouts to very close to that value.

The fix is to simplify the state machine a bit so that the target thread
doesn't ever transition back to 'kNotInUse'. I added a more detailed
diagram of the state machine to clarify this.

Change-Id: I419ffa462821c1e6d8f9f158f028b86d60f4fd54
---
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
2 files changed, 111 insertions(+), 38 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/9430/1
--
To view, visit http://gerrit.cloudera.org:8080/9430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I419ffa462821c1e6d8f9f158f028b86d60f4fd54
Gerrit-Change-Number: 9430
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] java: improve error messages when tokens are not used

2018-02-23 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9070 )

Change subject: java: improve error messages when tokens are not used
..


Patch Set 1:

yep, this is on my list today to revise


--
To view, visit http://gerrit.cloudera.org:8080/9070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f5160a79e973c411c16a784dc105444f5dd2a6f
Gerrit-Change-Number: 9070
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 19:22:49 +
Gerrit-HasComments: No


[kudu-CR] webserver-stress-itest: fix flakiness

2018-02-23 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9414

to look at the new patch set (#2).

Change subject: webserver-stress-itest: fix flakiness
..

webserver-stress-itest: fix flakiness

This fixes a source of flakiness I found on the flaky dashboard. In some runs
of this test, we'd hit the following interleaving:

- we start the master with webserver_port=0 and it picks some port (eg 35000)
- we stop the master
- the curl threads are still running, and one of them picks port 35000 as the
  local side of its TCP connection. It then tries to connect to 35000 and hits
  the dreaded "tcp loop connect" phenomenon[1] in which it actually connects
  to _itself_. Thus it just hangs there occupying the port
- we try to start the master again, and it fails to bind
- we now time out trying to Join() on the curl thread, which is waiting forever
  for itself to respond to an HTTP request.

The fix is to use non-ephemeral ports for the webserver as we already do
for the RPC server. I additionally added timeouts to the curl calls.

[1] http://www.rampa.sk/static/tcpLoopConnect.html

Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
---
M src/kudu/integration-tests/linked_list-test-util.h
M src/kudu/integration-tests/webserver-stress-itest.cc
M src/kudu/util/curl_util.cc
M src/kudu/util/curl_util.h
4 files changed, 31 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/9414/2
--
To view, visit http://gerrit.cloudera.org:8080/9414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
Gerrit-Change-Number: 9414
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9409

to look at the new patch set (#3).

Change subject: KUDU-2291 (part 8): fix a TSAN issue with libunwind 
initialization
..

KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

libunwind uses double-checked locking for initialization, which isn't
technically safe. We previously tried to work around this by calling into the
stack trace library before starting any kudu::Threads, but that still left us
open to races in unit tests like rw_mutex-test which uses std::thread.

This patch changes the forced single-threaded initialization to use GoogleOnce
instead.

Prior to this patch, looping rw_mutex-test on TSAN failed 12/1000 times.
With the patch it passed 1000/1000.

Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
---
M src/kudu/util/debug-util.cc
M src/kudu/util/thread.cc
2 files changed, 26 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/9409/3
--
To view, visit http://gerrit.cloudera.org:8080/9409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Gerrit-Change-Number: 9409
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9409

to look at the new patch set (#2).

Change subject: KUDU-2291 (part 8): fix a TSAN issue with libunwind 
initialization
..

KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

libunwind uses double-checked locking for initialization, which isn't
technically safe. We previously tried to work around this by calling into the
stack trace library before starting any kudu::Threads, but that still left us
open to races in unit tests like rw_mutex-test which uses std::thread.

This patch changes the forced single-threaded initialization to use GoogleOnce
instead.

Prior to this patch, looping rw_mutex-test on TSAN failed 12/1000 times.
With the patch it passed 1000/1000.

Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
---
M src/kudu/util/debug-util.cc
M src/kudu/util/thread.cc
2 files changed, 26 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/9409/2
--
To view, visit http://gerrit.cloudera.org:8080/9409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Gerrit-Change-Number: 9409
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] webserver-stress-itest: fix flakiness

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Mike Percy,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9414

to review the following change.


Change subject: webserver-stress-itest: fix flakiness
..

webserver-stress-itest: fix flakiness

This fixes a source of flakiness I found on the flaky dashboard. In some runs
of this test, we'd hit the following interleaving:

- we start the master with webserver_port=0 and it picks some port (eg 35000)
- we stop the master
- the curl threads are still running, and one of them picks port 35000 as the
  local side of its TCP connection. It then tries to connect to 35000 and hits
  the dreaded "tcp loop connect" phenomenon[1] in which it actually connects
  to _itself_. Thus it just hangs there occupying the port
- we try to start the master again, and it fails to bind
- we now time out trying to Join() on the curl thread, which is waiting forever
  for itself to respond to an HTTP request.

The fix is to use non-ephemeral ports for the webserver as we already do
for the RPC server. I additionally added timeouts to the curl calls.

[1] http://www.rampa.sk/static/tcpLoopConnect.html

Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
---
M src/kudu/integration-tests/linked_list-test-util.h
M src/kudu/integration-tests/webserver-stress-itest.cc
M src/kudu/util/curl_util.cc
M src/kudu/util/curl_util.h
4 files changed, 31 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/9414/1
--
To view, visit http://gerrit.cloudera.org:8080/9414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If754d7f47a4c9c04bae3e9ef31acad801dd4db9b
Gerrit-Change-Number: 9414
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9409 )

Change subject: KUDU-2291 (part 8): fix a TSAN issue with libunwind 
initialization
..


Patch Set 1: Verified-1

This can cause a deadlock in some rare scenario, working on it


--
To view, visit http://gerrit.cloudera.org:8080/9409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Gerrit-Change-Number: 9409
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 02:59:28 +
Gerrit-HasComments: No


[kudu-CR] rw semaphore: dont include debug-util.h when not necessary

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9328 )

Change subject: rw_semaphore: dont include debug-util.h when not necessary
..

rw_semaphore: dont include debug-util.h when not necessary

Just avoids an unnecessary include which was making my compilation slow
down every time I modified debug-util.h.

Change-Id: I8a194bbffbe413eaffe0449639e3b4aa35a89900
Reviewed-on: http://gerrit.cloudera.org:8080/9328
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/util/rw_semaphore.h
1 file changed, 8 insertions(+), 5 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8a194bbffbe413eaffe0449639e3b4aa35a89900
Gerrit-Change-Number: 9328
Gerrit-PatchSet: 10
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9329 )

Change subject: KUDU-2297 (part 3): refactor process-wide stack collection out 
of /stacks
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9329/7/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9329/7/src/kudu/util/debug-util.cc@715
PS7, Line 715:   CHECK(!infos_[i].stack.HasCollected()) << 
infos_[i].status.ToString();
> Did you figure this out?
yea, I put the fix in the separate patch "KUDU-2291 (part 7): fix race in stack 
trace collection"



--
To view, visit http://gerrit.cloudera.org:8080/9329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Gerrit-Change-Number: 9329
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 02:38:41 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 7): fix race in stack trace collection

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9406 )

Change subject: KUDU-2291 (part 7): fix race in stack trace collection
..


Patch Set 1: Verified+1

Going to address the IWYU issue separately to avoid having to rebase the world


--
To view, visit http://gerrit.cloudera.org:8080/9406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id1af0ffa02efb8151f1f3f84dd142d91066cc1f8
Gerrit-Change-Number: 9406
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 02:37:09 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 1): refactor metrics log out of ServerBase

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9326 )

Change subject: KUDU-2297 (part 1): refactor metrics log out of ServerBase
..

KUDU-2297 (part 1): refactor metrics log out of ServerBase

This is preparing for putting more information into the metrics log
instead of just metrics snapshots. The logging code will get more
complex as it gains features, so this makes a new class for it.

This is slightly incompatible since I also changed the name on disk to
'diagnostics' instead of 'metrics' and updated the documentation, but I
am not aware of people using this in the wild. So long as we
release-note it, I think it's reasonable to expect any sophisticated
operators to adjust their scripting accordingly.

Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Reviewed-on: http://gerrit.cloudera.org:8080/9326
Tested-by: Todd Lipcon 
Reviewed-by: Mike Percy 
---
M docs/administration.adoc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/diagnostics_log.cc
A src/kudu/server/diagnostics_log.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
6 files changed, 241 insertions(+), 73 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Gerrit-Change-Number: 9326
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 2). Convert diagnostics log to a format closer to glog

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9327 )

Change subject: KUDU-2297 (part 2). Convert diagnostics log to a format closer 
to glog
..

KUDU-2297 (part 2). Convert diagnostics log to a format closer to glog

Previously the metrics/diagnostics log used a custom log format designed
to be readable by scripts. The only timestamps were unix microtimes,
which are convenient for computers but not for people. In order to make
the log more easily greppable, this patch converts it over to use a
glog-compatible prefix on each log line.

The aim is that, if an admin sees some issues at 10/23 15:03 they can simply
grep the diagnostics log for '1021 15:0[0-6]' rather than having to go
figure out the appropriate range of numeric timestamps.

This patch also updates the docs accordingly.

Change-Id: I10d93c84f8c738c26850ae3d1ae6afbb5519c7ad
Reviewed-on: http://gerrit.cloudera.org:8080/9327
Tested-by: Todd Lipcon 
Reviewed-by: Mike Percy 
---
M docs/administration.adoc
M src/kudu/server/diagnostics_log.cc
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/trace.cc
5 files changed, 46 insertions(+), 29 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I10d93c84f8c738c26850ae3d1ae6afbb5519c7ad
Gerrit-Change-Number: 9327
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 7): fix race in stack trace collection

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2291 (part 7): fix race in stack trace collection
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Id1af0ffa02efb8151f1f3f84dd142d91066cc1f8
Gerrit-Change-Number: 9406
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2291 (part 7): fix race in stack trace collection

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9406 )

Change subject: KUDU-2291 (part 7): fix race in stack trace collection
..

KUDU-2291 (part 7): fix race in stack trace collection

This fixes a narrow race in which StackTraceCollector could return
'TimedOut' but in fact the stack trace would have been collected.

See the comments for details.

Change-Id: Id1af0ffa02efb8151f1f3f84dd142d91066cc1f8
Reviewed-on: http://gerrit.cloudera.org:8080/9406
Reviewed-by: Mike Percy 
Tested-by: Todd Lipcon 
---
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
2 files changed, 19 insertions(+), 8 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Todd Lipcon: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id1af0ffa02efb8151f1f3f84dd142d91066cc1f8
Gerrit-Change-Number: 9406
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] rw semaphore: dont include debug-util.h when not necessary

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9328

to look at the new patch set (#9).

Change subject: rw_semaphore: dont include debug-util.h when not necessary
..

rw_semaphore: dont include debug-util.h when not necessary

Just avoids an unnecessary include which was making my compilation slow
down every time I modified debug-util.h.

Change-Id: I8a194bbffbe413eaffe0449639e3b4aa35a89900
---
M src/kudu/util/rw_semaphore.h
1 file changed, 8 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/9328/9
--
To view, visit http://gerrit.cloudera.org:8080/9328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8a194bbffbe413eaffe0449639e3b4aa35a89900
Gerrit-Change-Number: 9328
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] rw semaphore: dont include debug-util.h when not necessary

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9328 )

Change subject: rw_semaphore: dont include debug-util.h when not necessary
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9328/6/src/kudu/util/rw_semaphore.h
File src/kudu/util/rw_semaphore.h:

http://gerrit.cloudera.org:8080/#/c/9328/6/src/kudu/util/rw_semaphore.h@54
PS6, Line 54: //
: // In order to support easier debugging of leaked locks, this 
class can track
: // the stack trace of the last thread to lock it in write mode. 
To do so,
: // uncomment the following define:
: //   #define RW_SEMAPHORE_TRACK_HOLDER 1
> this is out of order from the define above and wouldn't actually work this
Done



--
To view, visit http://gerrit.cloudera.org:8080/9328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a194bbffbe413eaffe0449639e3b4aa35a89900
Gerrit-Change-Number: 9328
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 01:24:41 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Mike Percy,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9409

to review the following change.


Change subject: KUDU-2291 (part 8): fix a TSAN issue with libunwind 
initialization
..

KUDU-2291 (part 8): fix a TSAN issue with libunwind initialization

libunwind uses double-checked locking for initialization, which isn't
technically safe. We previously tried to work around this by calling into the
stack trace library before starting any kudu::Threads, but that still left us
open to races in unit tests like rw_mutex-test which uses std::thread.

This patch changes the forced single-threaded initialization to use GoogleOnce
instead.

Prior to this patch, looping rw_mutex-test on TSAN failed 12/1000 times.
With the patch it passed 1000/1000.

Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
---
M src/kudu/util/debug-util.cc
M src/kudu/util/thread.cc
2 files changed, 17 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/9409/1
--
To view, visit http://gerrit.cloudera.org:8080/9409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I522b6553e9cb9a30d7106ff55ad119f7df1f949c
Gerrit-Change-Number: 9409
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Mike Percy 


[kudu-CR] tablet: don't store row count in delta tracker

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9216 )

Change subject: tablet: don't store row count in delta tracker
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I084e0944f388c22e838b017663a812b0ba77245d
Gerrit-Change-Number: 9216
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:21:57 +
Gerrit-HasComments: No


[kudu-CR] tablet: don't store row count in delta tracker

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9216 )

Change subject: tablet: don't store row count in delta tracker
..

tablet: don't store row count in delta tracker

DeltaTracker's constructor previously required a row-count that can
only be obtained by first reading from disk. While useful as a sanity
check while tracking updates, it's not important for this count to be
known at instantiation-time.

This patch is helpful in reducing the amount of data required from disk
at startup, as one of the reasons we need to fully open some CFiles is
to get this row count. This patch itself doesn't defer any reads; it
just defers the requirement of the row count.

This patch has no functional changes.

Change-Id: I084e0944f388c22e838b017663a812b0ba77245d
Reviewed-on: http://gerrit.cloudera.org:8080/9216
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
5 files changed, 49 insertions(+), 37 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I084e0944f388c22e838b017663a812b0ba77245d
Gerrit-Change-Number: 9216
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] row: optimize copying of MRS rows into the Arena

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9404 )

Change subject: row: optimize copying of MRS rows into the Arena
..

row: optimize copying of MRS rows into the Arena

I tried a stress workload using YCSB with 100 columns, each a 10-byte
string. I expected this to be roughly the same performance as 10 columns
containing 100-byte strings, but in fact it was about 3x as slow. A
profile showed most of the CPU consumed in MemRowSet::Insert,
specifically in the inlined Arena::AllocateBytes call. Apparently with
many threads trying to allocate each cell of each row separately from
the arena, this became a point of contention.

This patch batches the allocation to do a single allocation for all of
the strings to be copied.

I didn't do a full run to measure throughput, but roughly it seems about
20% faster and cluster-wide CPU usage is down about 50%. The
MemRowSet::Insert call went from about 50% of the cycles down to <2%.

Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
Reviewed-on: http://gerrit.cloudera.org:8080/9404
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves 
---
M src/kudu/common/row.h
1 file changed, 23 insertions(+), 5 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  David Ribeiro Alves: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
Gerrit-Change-Number: 9404
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9330 )

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..


Patch Set 9: Verified+1

IWYU failure from ancestor patch


--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:12:13 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 2). Convert diagnostics log to a format closer to glog

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2297 (part 2). Convert diagnostics log to a format closer 
to glog
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I10d93c84f8c738c26850ae3d1ae6afbb5519c7ad
Gerrit-Change-Number: 9327
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2297 (part 3): refactor process-wide stack collection out 
of /stacks
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Gerrit-Change-Number: 9329
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9329 )

Change subject: KUDU-2297 (part 3): refactor process-wide stack collection out 
of /stacks
..


Patch Set 8: Verified+1

IWYU failure from ancestor patch


--
To view, visit http://gerrit.cloudera.org:8080/9329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Gerrit-Change-Number: 9329
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:11:45 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 2). Convert diagnostics log to a format closer to glog

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9327 )

Change subject: KUDU-2297 (part 2). Convert diagnostics log to a format closer 
to glog
..


Patch Set 8: Verified+1

IWYU failure from parent patch


--
To view, visit http://gerrit.cloudera.org:8080/9327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10d93c84f8c738c26850ae3d1ae6afbb5519c7ad
Gerrit-Change-Number: 9327
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:11:11 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 1): refactor metrics log out of ServerBase

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9326 )

Change subject: KUDU-2297 (part 1): refactor metrics log out of ServerBase
..


Patch Set 8: Verified+1

IWYU failure from parent patch


--
To view, visit http://gerrit.cloudera.org:8080/9326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Gerrit-Change-Number: 9326
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:10:43 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 1): refactor metrics log out of ServerBase

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2297 (part 1): refactor metrics log out of ServerBase
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Gerrit-Change-Number: 9326
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR](branch-1.5.x) Don't perform compactions when clean time has not been advanced

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9397 )

Change subject: Don't perform compactions when clean time has not been advanced
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia74abdf7d806efc4239dc9cff4a5da28621d331a
Gerrit-Change-Number: 9397
Gerrit-PatchSet: 3
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 23 Feb 2018 00:08:43 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9330 )

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9330/8/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9330/8/src/kudu/server/diagnostics_log.cc@202
PS8, Line 202: static_cast(addr) - 1
> Mind adding a comment to explain the pointer address change by -1?
Done



--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 22 Feb 2018 22:07:08 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9330

to look at the new patch set (#9).

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..

KUDU-2297 (part 4): periodically dump stacks to diagnostics log

This modifies the diagnostics log to periodically dump stack traces.
This is slightly complicated by the fact that symbolized stack traces
can be relatively large. So, we separate the logging of symbols and
stack traces. When an address first appears in a log file, it is logged
as part of a symbol line. Later logs of the same address do not need
to re-log the symbol.

With this, a typical stack trace dump of an idle tserver is about 4KB
pre-compression, and a 'symbols' dump is about 6KB. So logging stacks
reasonably often should not use much disk space or IO.

Currently this is enabled on the same interval as the metrics log, but
only if a new experimental flag --diagnostics-log-stack-traces is
enabled. I'm planning to move it to a different flag in a later commit,
but wanted to keep this one simple and incremental.

Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
M src/kudu/util/rolling_log-test.cc
M src/kudu/util/rolling_log.cc
M src/kudu/util/rolling_log.h
7 files changed, 235 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/9330/9
--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 7): fix race in stack trace collection

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9406


Change subject: KUDU-2291 (part 7): fix race in stack trace collection
..

KUDU-2291 (part 7): fix race in stack trace collection

This fixes a narrow race in which StackTraceCollector could return
'TimedOut' but in fact the stack trace would have been collected.

See the comments for details.

Change-Id: Id1af0ffa02efb8151f1f3f84dd142d91066cc1f8
---
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
2 files changed, 19 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/9406/1
--
To view, visit http://gerrit.cloudera.org:8080/9406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id1af0ffa02efb8151f1f3f84dd142d91066cc1f8
Gerrit-Change-Number: 9406
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 1): refactor metrics log out of ServerBase

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9326

to look at the new patch set (#8).

Change subject: KUDU-2297 (part 1): refactor metrics log out of ServerBase
..

KUDU-2297 (part 1): refactor metrics log out of ServerBase

This is preparing for putting more information into the metrics log
instead of just metrics snapshots. The logging code will get more
complex as it gains features, so this makes a new class for it.

This is slightly incompatible since I also changed the name on disk to
'diagnostics' instead of 'metrics' and updated the documentation, but I
am not aware of people using this in the wild. So long as we
release-note it, I think it's reasonable to expect any sophisticated
operators to adjust their scripting accordingly.

Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
---
M docs/administration.adoc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/diagnostics_log.cc
A src/kudu/server/diagnostics_log.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
6 files changed, 241 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/9326/8
--
To view, visit http://gerrit.cloudera.org:8080/9326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Gerrit-Change-Number: 9326
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR](branch-1.5.x) Don't perform compactions when clean time has not been advanced

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9397 )

Change subject: Don't perform compactions when clean time has not been advanced
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9397/2/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/9397/2/src/kudu/tablet/tablet.cc@1563
PS2, Line 1563: LOG_WITH_PREFIX(WARNING) << "Can't schedule compaction. 
Clean time has not been advanced "
Do we risk flooding the logs with this error when we hit this case? Seems it 
would log the warning every time the MM scheduler runs



--
To view, visit http://gerrit.cloudera.org:8080/9397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia74abdf7d806efc4239dc9cff4a5da28621d331a
Gerrit-Change-Number: 9397
Gerrit-PatchSet: 2
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 21:52:39 +
Gerrit-HasComments: Yes


[kudu-CR] row: optimize copying of MRS rows into the Arena

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9404 )

Change subject: row: optimize copying of MRS rows into the Arena
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9404/1/src/kudu/common/row.h
File src/kudu/common/row.h:

http://gerrit.cloudera.org:8080/#/c/9404/1/src/kudu/common/row.h@348
PS1, Line 348:   for (int i = 0; i < schema->num_columns(); i++) {
 : typename RowType::Cell cell = row->cell(i);
 : if (cell.typeinfo()->physical_type() == BINARY) {
 :   if (cell.is_nullable() && cell.is_null()) {
 : continue;
 :   }
 :
 :   const Slice *slice = reinterpret_cast(cell.ptr());
 :   size += slice->size();
 : }
 :   }
> would it make sense to have an "GetIndirectDataSize()" in row? we might cac
the problem is Row is more of a "concept" than a specific class (ie it can be 
ConstContiguousRow or RowBlockRow or MRSRow, etc). Given that I don't think 
there's an easy way to cache it, and sharing this code across them would 
require duplicating it.

It seems outside of tests that we only call this function from MemRowSet so I 
considered moving it, but then I'd need to update the tests a bit to do 
something else, and didn't want to yak-shave too far here



--
To view, visit http://gerrit.cloudera.org:8080/9404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
Gerrit-Change-Number: 9404
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 21:50:01 +
Gerrit-HasComments: Yes


[kudu-CR] java: provide our own preferred cipher suite list

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9398 )

Change subject: java: provide our own preferred cipher suite list
..

java: provide our own preferred cipher suite list

This updates the Java client to use a cipher suite list which matches
the server side. This has a very big performance benefit since it
prioritizes GCM-based ciphers which are significantly faster than those
that use SHA256 or SHA384 HMAC for integrity.

In particular, a YCSB in-memory random-read workload I was running got
~60% faster. The CPU profile on the server previously showed most of its
time in SHA hash calculation, and now shows only relatively little CPU
in GCM-related calls.

Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Reviewed-on: http://gerrit.cloudera.org:8080/9398
Reviewed-by: Dan Burkert 
Tested-by: Kudu Jenkins
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java
1 file changed, 39 insertions(+), 0 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Gerrit-Change-Number: 9398
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Improve tablet status messages during startup

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9400 )

Change subject: Improve tablet status messages during startup
..

Improve tablet status messages during startup

Currently while the tablet server is starting, all of the tablets that
are waiting in the 'tablet-open' threadpool have the status 'Tablet
initializing...'. This can be confusing to admins since it sounds like
it is stuck rather than waiting in a queue.

This amends the status messages accordingly.

Change-Id: I72337860e3c4c0a0992c3fb8578e94857529805e
Reviewed-on: http://gerrit.cloudera.org:8080/9400
Reviewed-by: Will Berkeley 
Tested-by: Kudu Jenkins
---
M src/kudu/tablet/tablet_replica.cc
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Will Berkeley: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I72337860e3c4c0a0992c3fb8578e94857529805e
Gerrit-Change-Number: 9400
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Make tcmalloc heap sampling more useful

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9263 )

Change subject: Make tcmalloc heap sampling more useful
..

Make tcmalloc heap sampling more useful

Previously our /pprof/heap path required that the process be started
with lifetime heap profiling enabled. This mode of operation is very
slow so we would never be able to use it in production. Even in tests
it's of limited utility.

The alternative is tcmalloc heap _sampling_ which only records periodic
allocations. According to recent mailing list threads, Google runs most
of their binaries with this enabled in production, so it's designed to
be very low overhead and potentially useful for us in the field.

This doesn't go so far as to enable it by default, but it does adjust
the /pprof/heap endpoint to dump the sampled allocations instead of
starting a profile. Sampling can be enabled using either the tcmalloc
env variable (TCMALLOC_SAMPLE_PARAMETER) or a new command line flag
--heap-sample-every-n-bytes.

I tested this manually by running a kudu tserver with loadgen and
sampling set to 500kb. I then used pprof to grab the heap sample
information which looked like I expected it to (most of the memory from
MRS)

Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255
Reviewed-on: http://gerrit.cloudera.org:8080/9263
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley 
---
M docs/troubleshooting.adoc
M src/kudu/server/pprof_path_handlers.cc
M src/kudu/util/flags.cc
3 files changed, 95 insertions(+), 26 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255
Gerrit-Change-Number: 9263
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] row: optimize copying of MRS rows into the Arena

2018-02-22 Thread Todd Lipcon (Code Review)
Hello David Ribeiro Alves,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9404

to review the following change.


Change subject: row: optimize copying of MRS rows into the Arena
..

row: optimize copying of MRS rows into the Arena

I tried a stress workload using YCSB with 100 columns, each a 10-byte
string. I expected this to be roughly the same performance as 10 columns
containing 100-byte strings, but in fact it was about 3x as slow. A
profile showed most of the CPU consumed in MemRowSet::Insert,
specifically in the inlined Arena::AllocateBytes call. Apparently with
many threads trying to allocate each cell of each row separately from
the arena, this became a point of contention.

This patch batches the allocation to do a single allocation for all of
the strings to be copied.

I didn't do a full run to measure throughput, but roughly it seems about
20% faster and cluster-wide CPU usage is down about 50%. The
MemRowSet::Insert call went from about 50% of the cycles down to <2%.

Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
---
M src/kudu/common/row.h
1 file changed, 23 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/9404/1
--
To view, visit http://gerrit.cloudera.org:8080/9404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6eea882d1d9a7355fb0bbad12c388908ec399a39
Gerrit-Change-Number: 9404
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: David Ribeiro Alves 


[kudu-CR] thirdparty: patch tcmalloc to improve AllocLarge performance

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9392 )

Change subject: thirdparty: patch tcmalloc to improve AllocLarge performance
..


Patch Set 2:

I stress tested this with a read heavy workload and write-heavy workload for 
several hours each, so I'm pretty sure it's stable. It also passes tcmalloc 
unit tests. Hopefully it'll get reviewed upstream soon, but let's get this in 
for 1.7.


--
To view, visit http://gerrit.cloudera.org:8080/9392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Gerrit-Change-Number: 9392
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 20:15:50 +
Gerrit-HasComments: No


[kudu-CR] thirdparty: patch tcmalloc to improve AllocLarge performance

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9392 )

Change subject: thirdparty: patch tcmalloc to improve AllocLarge performance
..

thirdparty: patch tcmalloc to improve AllocLarge performance

This pulls in an upstream tcmalloc pull request[1] which addresses O(n)
behavior in the large allocation path. Without this patch, large
allocations (>=1MB) do a linear scan of all large spans of free heap
to find the best fit, which is very expensive especially as the heap
grows more fragmented over time.

I tested this using YCSB workload C (100% random read) on an in-memory dataset
(10M rows on 6 nodes). As noted in KUDU-1465, scanners currently always
allocate 1MB buffers as a starting point even if the scan will only return a
small amount of data. These 1M allocations aggravate the AllocLarge path
and without this patch, AllocLarge was the top CPU consumer.

Prior to this fix, the workload managed about 20k reads/second. With the
fix, the workload averaged around 85k reads/second.

We should still fix KUDU-1465 to allocate smaller buffers, since small
allocations will always be faster (and less wasteful), but with this tcmalloc
fix, the performance issue is much less pronounced. The fix will also likely
help any other places that we might be making large allocations.

[1] https://github.com/gperftools/gperftools/pull/960

Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Reviewed-on: http://gerrit.cloudera.org:8080/9392
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert 
---
M thirdparty/download-thirdparty.sh
A 
thirdparty/patches/gperftools-Implemented-O-log-n-searching-among-large-spans.patch
M 
thirdparty/patches/gperftools-Replace-namespace-base-with-namespace-tcmalloc.patch
3 files changed, 571 insertions(+), 51 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Dan Burkert: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Gerrit-Change-Number: 9392
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Improve tablet status messages during startup

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Will Berkeley,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9400

to review the following change.


Change subject: Improve tablet status messages during startup
..

Improve tablet status messages during startup

Currently while the tablet server is starting, all of the tablets that
are waiting in the 'tablet-open' threadpool have the status 'Tablet
initializing...'. This can be confusing to admins since it sounds like
it is stuck rather than waiting in a queue.

This amends the status messages accordingly.

Change-Id: I72337860e3c4c0a0992c3fb8578e94857529805e
---
M src/kudu/tablet/tablet_replica.cc
1 file changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/9400/1
--
To view, visit http://gerrit.cloudera.org:8080/9400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I72337860e3c4c0a0992c3fb8578e94857529805e
Gerrit-Change-Number: 9400
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] java: provide our own preferred cipher suite list

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Dan Burkert, Kudu Jenkins, Grant Henke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9398

to look at the new patch set (#3).

Change subject: java: provide our own preferred cipher suite list
..

java: provide our own preferred cipher suite list

This updates the Java client to use a cipher suite list which matches
the server side. This has a very big performance benefit since it
prioritizes GCM-based ciphers which are significantly faster than those
that use SHA256 or SHA384 HMAC for integrity.

In particular, a YCSB in-memory random-read workload I was running got
~60% faster. The CPU profile on the server previously showed most of its
time in SHA hash calculation, and now shows only relatively little CPU
in GCM-related calls.

Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java
1 file changed, 39 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/9398/3
--
To view, visit http://gerrit.cloudera.org:8080/9398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Gerrit-Change-Number: 9398
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] java: provide our own preferred cipher suite list

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9398 )

Change subject: java: provide our own preferred cipher suite list
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9398/2/java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java:

http://gerrit.cloudera.org:8080/#/c/9398/2/java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java@520
PS2, Line 520: engine.setEnabledCipherSuites(toEnable.toArray(new 
String[0]));
> I would fail with an exception. IMHO falling back to defaults which may not
Done



--
To view, visit http://gerrit.cloudera.org:8080/9398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Gerrit-Change-Number: 9398
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 19:27:13 +
Gerrit-HasComments: Yes


[kudu-CR](branch-1.5.x) Don't perform compactions when clean time has not been advanced

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9397 )

Change subject: Don't perform compactions when clean time has not been advanced
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9397/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/9397/1/src/kudu/tablet/tablet.cc@1333
PS1, Line 1333: LOG(WARNING) << "Not performing compaction. Clean time has 
not been advanced past its "
LOG_WITH_PREFIX here?

Also, could we do this check where we gather the compaction stats, so it gets 
effectively disabled? I'm afraid otherwise we'll get in a tight loop with the 
MM issuing a compaction against this tablet, and the compaction being skipped.



--
To view, visit http://gerrit.cloudera.org:8080/9397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia74abdf7d806efc4239dc9cff4a5da28621d331a
Gerrit-Change-Number: 9397
Gerrit-PatchSet: 1
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 19:23:29 +
Gerrit-HasComments: Yes


[kudu-CR](branch-1.5.x) Optionally advance safe time with non-write operations

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9396 )

Change subject: Optionally advance safe time with non-write operations
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I99de2284ba1626581c6e0770c823fd1ba714c1d7
Gerrit-Change-Number: 9396
Gerrit-PatchSet: 2
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 19:22:10 +
Gerrit-HasComments: No


[kudu-CR] java: provide our own preferred cipher suite list

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9398 )

Change subject: java: provide our own preferred cipher suite list
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9398/2/java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java:

http://gerrit.cloudera.org:8080/#/c/9398/2/java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java@520
PS2, Line 520: engine.setEnabledCipherSuites(toEnable.toArray(new 
String[0]));
> Maybe add an assert that the toEnable list isn't empty, or is that sufficie
sure You think I should just assert or should I throw a RuntimeException or 
somesuch? Or should I continue on with a warning that we are falling back to 
system default?



--
To view, visit http://gerrit.cloudera.org:8080/9398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Gerrit-Change-Number: 9398
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 19:21:25 +
Gerrit-HasComments: Yes


[kudu-CR] java: provide our own preferred cipher suite list

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9398

to look at the new patch set (#2).

Change subject: java: provide our own preferred cipher suite list
..

java: provide our own preferred cipher suite list

This updates the Java client to use a cipher suite list which matches
the server side. This has a very big performance benefit since it
prioritizes GCM-based ciphers which are significantly faster than those
that use SHA256 or SHA384 HMAC for integrity.

In particular, a YCSB in-memory random-read workload I was running got
~60% faster. The CPU profile on the server previously showed most of its
time in SHA hash calculation, and now shows only relatively little CPU
in GCM-related calls.

Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java
1 file changed, 32 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/9398/2
--
To view, visit http://gerrit.cloudera.org:8080/9398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Gerrit-Change-Number: 9398
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] java: provide our own preferred cipher suite list

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Dan Burkert,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9398

to review the following change.


Change subject: java: provide our own preferred cipher suite list
..

java: provide our own preferred cipher suite list

This updates the Java client to use a cipher suite list which matches
the server side. This has a very big performance benefit since it
prioritizes GCM-based ciphers which are significantly faster than those
that use SHA256 or SHA384 HMAC for integrity.

In particular, a YCSB in-memory random-read workload I was running got
~60% faster. The CPU profile on the server previously showed most of its
time in SHA hash calculation, and now shows only relatively little CPU
in GCM-related calls.

Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Negotiator.java
1 file changed, 29 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/98/9398/1
--
To view, visit http://gerrit.cloudera.org:8080/9398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4b2853dfd8d330bdf003924a5574fde865f54249
Gerrit-Change-Number: 9398
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 


[kudu-CR] KUDU-2301: (Part-1) Add instrumentation on a per connection level

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9343 )

Change subject: KUDU-2301: (Part-1) Add instrumentation on a per connection 
level
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae1a5fe0066adf644a9cac41ad6696e1bbf00465
Gerrit-Change-Number: 9343
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 18:06:20 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2301: (Part-1) Add instrumentation on a per connection level

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9343 )

Change subject: KUDU-2301: (Part-1) Add instrumentation on a per connection 
level
..

KUDU-2301: (Part-1) Add instrumentation on a per connection level

This patch returns the OutboundTransfer queue size on a per
connection level and makes them accessible via the
DumpRunningRpcs() call.

A test is added in rpc-test to ensure that this metric works as
expected.

A future patch will add more metrics.

Change-Id: Iae1a5fe0066adf644a9cac41ad6696e1bbf00465
Reviewed-on: http://gerrit.cloudera.org:8080/9343
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/rpc/connection.cc
M src/kudu/rpc/connection.h
M src/kudu/rpc/messenger.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_introspection.proto
5 files changed, 68 insertions(+), 3 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iae1a5fe0066adf644a9cac41ad6696e1bbf00465
Gerrit-Change-Number: 9343
Gerrit-PatchSet: 10
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR](branch-1.5.x) Optionally advance safe time with non-write operations

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9396 )

Change subject: Optionally advance safe time with non-write operations
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9396/1/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

http://gerrit.cloudera.org:8080/#/c/9396/1/src/kudu/tablet/tablet_bootstrap.cc@91
PS1, Line 91: DEFINE_bool(advance_safe_time_with_non_write_ops, false,
probably should be marked unsafe or experimental or something if we plan to 
remove it soon



--
To view, visit http://gerrit.cloudera.org:8080/9396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.5.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I99de2284ba1626581c6e0770c823fd1ba714c1d7
Gerrit-Change-Number: 9396
Gerrit-PatchSet: 1
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 17:24:30 +
Gerrit-HasComments: Yes


[kudu-CR] thirdparty: patch tcmalloc to improve AllocLarge performance

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9392

to look at the new patch set (#2).

Change subject: thirdparty: patch tcmalloc to improve AllocLarge performance
..

thirdparty: patch tcmalloc to improve AllocLarge performance

This pulls in an upstream tcmalloc pull request[1] which addresses O(n)
behavior in the large allocation path. Without this patch, large
allocations (>=1MB) do a linear scan of all large spans of free heap
to find the best fit, which is very expensive especially as the heap
grows more fragmented over time.

I tested this using YCSB workload C (100% random read) on an in-memory dataset
(10M rows on 6 nodes). As noted in KUDU-1465, scanners currently always
allocate 1MB buffers as a starting point even if the scan will only return a
small amount of data. These 1M allocations aggravate the AllocLarge path
and without this patch, AllocLarge was the top CPU consumer.

Prior to this fix, the workload managed about 20k reads/second. With the
fix, the workload averaged around 85k reads/second.

We should still fix KUDU-1465 to allocate smaller buffers, since small
allocations will always be faster (and less wasteful), but with this tcmalloc
fix, the performance issue is much less pronounced. The fix will also likely
help any other places that we might be making large allocations.

[1] https://github.com/gperftools/gperftools/pull/960

Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
---
M thirdparty/download-thirdparty.sh
A 
thirdparty/patches/gperftools-Implemented-O-log-n-searching-among-large-spans.patch
M 
thirdparty/patches/gperftools-Replace-namespace-base-with-namespace-tcmalloc.patch
3 files changed, 571 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/9392/2
--
To view, visit http://gerrit.cloudera.org:8080/9392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Gerrit-Change-Number: 9392
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] thirdparty: patch tcmalloc to improve AllocLarge performance

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Dan Burkert,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9392

to review the following change.


Change subject: thirdparty: patch tcmalloc to improve AllocLarge performance
..

thirdparty: patch tcmalloc to improve AllocLarge performance

This pulls in an upstream tcmalloc pull request[1] which addresses O(n)
behavior in the large allocation path. Without this patch, large
allocations (>=1MB) do a linear scan of all large spans of free heap
to find the best fit, which is very expensive especially as the heap
grows more fragmented over time.

I tested this using YCSB workload C (100% random read) on an in-memory dataset
(10M rows on 6 nodes). As noted in KUDU-1465, scanners currently always
allocate 1MB buffers as a starting point even if the scan will only return a
small amount of data. These 1M allocations aggravate the AllocLarge path
and without this patch, AllocLarge was the top CPU consumer.

Prior to this fix, the workload managed about 20k reads/second. With the
fix, the workload averaged around 85k reads/second.

We should still fix KUDU-1465 to allocate smaller buffers, since small
allocations will always be faster (and less wasteful), but with this tcmalloc
fix, the performance issue is much less pronounced. The fix will also likely
help any other places that we might be making large allocations.

[1] https://github.com/gperftools/gperftools/pull/960

Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
---
M thirdparty/download-thirdparty.sh
A 
thirdparty/patches/gperftools-Implemented-O-log-n-searching-among-large-spans.patch
2 files changed, 496 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/9392/1
--
To view, visit http://gerrit.cloudera.org:8080/9392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Gerrit-Change-Number: 9392
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 


[kudu-CR] Fix move constructors to be noexcept

2018-02-22 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9382

to look at the new patch set (#2).

Change subject: Fix move constructors to be noexcept
..

Fix move constructors to be noexcept

I noticed while looking at a profile of a tablet server starting up that
I saw a lot of CPU used in stacks like:

  kudu::subtle::RefCountedThreadSafeBase::AddRef() const
  void std::vector::
   _M_emplace_back_aux

Digging into the source of std::vector, I saw that the emplace_back_aux
function is used when emplacing requires resizing the underlying array.
Since scoped_refptr has a move constructor I was surprised to see it
using AddRef() here during the array expansion. In fact it turns out
that std::vector<> will only use the move constructor when it is marked
'noexcept', which ours was not.

I verified this with a simple test program as well[1]. If the move
constructor is marked 'noexcept', it is called. Otherwise, it is not.

This patch corrects scoped_refptr as well as a few other spots where we
had move constructors without noexcept.

[1] https://gist.github.com/toddlipcon/c9a807c6a67037ccd1b63caa93f74c67

Change-Id: Idd7487766be3261e3506ff9613a5d733013018b0
---
M src/kudu/cfile/block_cache.h
M src/kudu/cfile/block_handle.h
M src/kudu/gutil/ref_counted.h
M src/kudu/tablet/lock_manager.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/util/cow_object.h
M src/kudu/util/status.h
7 files changed, 17 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/9382/2
--
To view, visit http://gerrit.cloudera.org:8080/9382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd7487766be3261e3506ff9613a5d733013018b0
Gerrit-Change-Number: 9382
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Fix move constructors to be noexcept

2018-02-22 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9382 )

Change subject: Fix move constructors to be noexcept
..


Patch Set 1:

nope, I just didn't grep well enough to find those. Will fix


--
To view, visit http://gerrit.cloudera.org:8080/9382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd7487766be3261e3506ff9613a5d733013018b0
Gerrit-Change-Number: 9382
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 08:15:06 +
Gerrit-HasComments: No


[kudu-CR] [consensus peers] micro-optimization on controller status

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9387 )

Change subject: [consensus_peers] micro-optimization on controller status
..


Patch Set 2:

sure, seems fine, but for 0.03% of the CPU in an error-injected workload I'm 
not sure it's that much of a win :)


--
To view, visit http://gerrit.cloudera.org:8080/9387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25ac21db191a21f3bdfb3378e32fbe366d98297c
Gerrit-Change-Number: 9387
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 05:05:48 +
Gerrit-HasComments: No


[kudu-CR] [consensus peers] micro-optimization on controller status

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9387 )

Change subject: [consensus_peers] micro-optimization on controller status
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25ac21db191a21f3bdfb3378e32fbe366d98297c
Gerrit-Change-Number: 9387
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 22 Feb 2018 05:05:06 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 1): refactor metrics log out of ServerBase

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9326 )

Change subject: KUDU-2297 (part 1): refactor metrics log out of ServerBase
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9326/7/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9326/7/src/kudu/server/diagnostics_log.cc@76
PS7, Line 76:   return Status::OK();
> return s;
nice catch


http://gerrit.cloudera.org:8080/#/c/9326/7/src/kudu/server/diagnostics_log.cc@96
PS7, Line 96: MonoDelta::FromSeconds(60)
> should this be configurable?
eh I think it's rare enough that it's not really worth the flag (even a hidden 
one)



--
To view, visit http://gerrit.cloudera.org:8080/9326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Gerrit-Change-Number: 9326
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 22 Feb 2018 00:06:17 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9319 )

Change subject: KUDU-2291 (part 6) stacks: use libunwind directly for stack 
tracing
..

KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

Previously we used glog's wrapper around libunwind for stack tracing.
However that has a deficiency that it assumes that, process wide, only
one thread can be inside libunwind at a time[1]

It appears that this is left over from some very old versions of
libunwind, or was already unnecessarily conservative. libunwind is meant
to be thread safe, and we have tests that will trigger if it is not.

This just extracts the function body of the glog function we were using
and does the same work manually.

Without this fix, the "collect from all the threads at the same time"
code path resulted in most of the threads collecting an empty trace
since they tried to call libunwind at the same time.

[1] https://github.com/google/glog/issues/298
Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
Reviewed-on: http://gerrit.cloudera.org:8080/9319
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M build-support/iwyu/iwyu.sh
A build-support/iwyu/mappings/libunwind.imp
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
5 files changed, 81 insertions(+), 17 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
Gerrit-Change-Number: 9319
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
..

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Reviewed-on: http://gerrit.cloudera.org:8080/9318
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 317 insertions(+), 109 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

2018-02-21 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9319

to look at the new patch set (#8).

Change subject: KUDU-2291 (part 6) stacks: use libunwind directly for stack 
tracing
..

KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

Previously we used glog's wrapper around libunwind for stack tracing.
However that has a deficiency that it assumes that, process wide, only
one thread can be inside libunwind at a time[1]

It appears that this is left over from some very old versions of
libunwind, or was already unnecessarily conservative. libunwind is meant
to be thread safe, and we have tests that will trigger if it is not.

This just extracts the function body of the glog function we were using
and does the same work manually.

Without this fix, the "collect from all the threads at the same time"
code path resulted in most of the threads collecting an empty trace
since they tried to call libunwind at the same time.

[1] https://github.com/google/glog/issues/298
Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
---
M build-support/iwyu/iwyu.sh
A build-support/iwyu/mappings/libunwind.imp
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
5 files changed, 81 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/9319/8
--
To view, visit http://gerrit.cloudera.org:8080/9319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
Gerrit-Change-Number: 9319
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9329 )

Change subject: KUDU-2297 (part 3): refactor process-wide stack collection out 
of /stacks
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9329/7/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9329/7/src/kudu/util/debug-util.cc@715
PS7, Line 715:   CHECK(!infos_[i].stack.HasCollected()) << 
infos_[i].status.ToString();
this assertion failed for me under load, it appears that somehow we are 
sometimes racing against the other thread collecting stacks here



--
To view, visit http://gerrit.cloudera.org:8080/9329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Gerrit-Change-Number: 9329
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 20:39:34 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@213
PS5, Line 213: sig
> nit: name this sig_data for consistency when tracing both sides of the comm
Done


http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@350
PS5, Line 350: DLOG(WARNING) << "Leaking SignalData structure " << 
sig_data_ << " after lost signal "
> I think we could at least make the signalled thread try to delete the sig_d
Deleting inside the signal handler is not allowed since free() is not an 
async-safe function.

I was actually hoping to eventually change this to leak into a freelist so the 
"leaked" data blocks can be reused by queueing them to another tid. Will add a 
TODO for that.



--
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 18:49:30 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

2018-02-21 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#7).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
..

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 317 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/7
--
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Fix move constructors to be noexcept

2018-02-21 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9382

to review the following change.


Change subject: Fix move constructors to be noexcept
..

Fix move constructors to be noexcept

I noticed while looking at a profile of a tablet server starting up that
I saw a lot of CPU used in stacks like:

  kudu::subtle::RefCountedThreadSafeBase::AddRef() const
  void std::vector::
   _M_emplace_back_aux

Digging into the source of std::vector, I saw that the emplace_back_aux
function is used when emplacing requires resizing the underlying array.
Since scoped_refptr has a move constructor I was surprised to see it
using AddRef() here during the array expansion. In fact it turns out
that std::vector<> will only use the move constructor when it is marked
'noexcept', which ours was not.

I verified this with a simple test program as well[1]. If the move
constructor is marked 'noexcept', it is called. Otherwise, it is not.

This patch corrects scoped_refptr as well as a few other spots where we
had move constructors without noexcept.

[1] https://gist.github.com/toddlipcon/c9a807c6a67037ccd1b63caa93f74c67

Change-Id: Idd7487766be3261e3506ff9613a5d733013018b0
---
M src/kudu/cfile/block_handle.h
M src/kudu/gutil/ref_counted.h
M src/kudu/tablet/lock_manager.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/util/status.h
5 files changed, 12 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/9382/1
--
To view, visit http://gerrit.cloudera.org:8080/9382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd7487766be3261e3506ff9613a5d733013018b0
Gerrit-Change-Number: 9382
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-21 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9330

to look at the new patch set (#8).

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..

KUDU-2297 (part 4): periodically dump stacks to diagnostics log

This modifies the diagnostics log to periodically dump stack traces.
This is slightly complicated by the fact that symbolized stack traces
can be relatively large. So, we separate the logging of symbols and
stack traces. When an address first appears in a log file, it is logged
as part of a symbol line. Later logs of the same address do not need
to re-log the symbol.

With this, a typical stack trace dump of an idle tserver is about 4KB
pre-compression, and a 'symbols' dump is about 6KB. So logging stacks
reasonably often should not use much disk space or IO.

Currently this is enabled on the same interval as the metrics log, but
only if a new experimental flag --diagnostics-log-stack-traces is
enabled. I'm planning to move it to a different flag in a later commit,
but wanted to keep this one simple and incremental.

Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
M src/kudu/util/rolling_log-test.cc
M src/kudu/util/rolling_log.cc
M src/kudu/util/rolling_log.h
7 files changed, 232 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/9330/8
--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

2018-02-21 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9319 )

Change subject: KUDU-2291 (part 6) stacks: use libunwind directly for stack 
tracing
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9319/7/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9319/7/src/kudu/util/debug-util.cc@544
PS7, Line 544:   skip_frames++; // Do not include the "Collect" frame
> Since we are doing this here now, should this patch also modify all existin
I believe it did, didn't it?



--
To view, visit http://gerrit.cloudera.org:8080/9319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
Gerrit-Change-Number: 9319
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 09:19:50 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9319

to look at the new patch set (#7).

Change subject: KUDU-2291 (part 6) stacks: use libunwind directly for stack 
tracing
..

KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

Previously we used glog's wrapper around libunwind for stack tracing.
However that has a deficiency that it assumes that, process wide, only
one thread can be inside libunwind at a time[1]

It appears that this is left over from some very old versions of
libunwind, or was already unnecessarily conservative. libunwind is meant
to be thread safe, and we have tests that will trigger if it is not.

This just extracts the function body of the glog function we were using
and does the same work manually.

Without this fix, the "collect from all the threads at the same time"
code path resulted in most of the threads collecting an empty trace
since they tried to call libunwind at the same time.

[1] https://github.com/google/glog/issues/298
Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
---
M build-support/iwyu/iwyu.sh
A build-support/iwyu/mappings/libunwind.imp
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
5 files changed, 81 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/9319/7
--
To view, visit http://gerrit.cloudera.org:8080/9319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
Gerrit-Change-Number: 9319
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#6).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
..

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 317 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/6
--
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9330 )

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9330/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9330/5/src/kudu/server/diagnostics_log.cc@27
PS5, Line 27: #include 
> warning: #includes are not sorted properly [llvm-include-order]
Done



--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 07:14:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
..


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.h
File src/kudu/util/debug-util.h:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.h@208
PS4, Line 208: Trigger
> TriggerAsync
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@141
PS4, Line 141:   void Signal() {
> add: // Wakes all waiters, notifying them of completion.
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@151
PS4, Line 151:   bool WaitUntil(MonoTime deadline) {
> add: // Returns false if reached deadline before completion, true otherwise
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@216
PS4, Line 216: maybe
> nit: capitalize
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@219
PS4, Line 219: ANNOTATE_HAPPENS_AFTER(sig);
> why is this needed?
Because we're using the signal data as a sort of queue to move the 'info' 
pointer between two threads, we need this to ensure that TSAN sees there is an 
enforced ordering between the two. Normally we get this ordering implied when 
thread A acquires a mutex or spinlock previously released by thread B, but in 
this case there is no such mutex.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@230
PS4, Line 230: 2
> nit: /*skip_frames=*/
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@312
PS4, Line 312: }
> nit: move up one line
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@395
PS4, Line 395: SYS_rt_tgsigqueueinfo
> Did you look into using pthread_sigqueue() ? I suppose a minor benefit of t
yea, the problem is that for pthread_sigqueue we need to have pthread_t for all 
the threads, and best I can tell there is no way to go from a tid to a 
pthread_t. Additionally in the past I looked into how to list all running 
threads using pthreads APIs and came up with nothing particularly nice. There's 
some 'thread_db' API but it seems meant for debuggers and not easy to use at 
all.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@396
PS4, Line 396: )
> , ErrnoToString(errno), errno) ?
this ends up printed in the /stacks page and stuff, so I think it's better to 
just have the nice message here.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@427
PS4, Line 427: Status StackTraceCollector::TriggerAsync(int64_t tid_, 
StackTrace* stack) {
> missing void StackTraceCollector::RevokeSigData() for non-linux
Done



--
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 07:13:38 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 4): avoid potential for deadlocks in stack trace collection

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9262 )

Change subject: KUDU-2291 (part 4): avoid potential for deadlocks in stack 
trace collection
..


Patch Set 6: Code-Review+2

(1 comment)

Carrying forward +2

http://gerrit.cloudera.org:8080/#/c/9262/5/src/kudu/util/debug/unwind_safeness.cc
File src/kudu/util/debug/unwind_safeness.cc:

http://gerrit.cloudera.org:8080/#/c/9262/5/src/kudu/util/debug/unwind_safeness.cc@23
PS5, Line 23: // Based on public domain code by Aliaksey Kandratsenka at
> Do we need to add this to LICENSE?
It's only super loosely based on it, so I don't think so (eg his original was 
ANSI C). Plus public domain has no reporting requirements by ASF policy AFAIK



--
To view, visit http://gerrit.cloudera.org:8080/9262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I414718a26c4810de86a10397bb9803dc5e386d1a
Gerrit-Change-Number: 9262
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 06:08:00 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2291 (part 4): avoid potential for deadlocks in stack trace collection

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9262 )

Change subject: KUDU-2291 (part 4): avoid potential for deadlocks in stack 
trace collection
..

KUDU-2291 (part 4): avoid potential for deadlocks in stack trace collection

This follows some recommendations from [1] to avoid the potential for
deadlocks if we attempt to gather a stack trace from a signal handler
context while the thread is inside calls to libdl.

The approach is somewhat ugly: we have to override the sensitive libdl
calls and use them to increment a thread-local counter indicating that
it would be unsafe to collect a stack. We then check this counter before
capturing a stack: if we see that we are inside libdl we just return an
fake stack trace indicating why it could not be captured.

The new test included deadlocks reliably without the fix.

[1] 
https://github.com/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues
Change-Id: I414718a26c4810de86a10397bb9803dc5e386d1a
Reviewed-on: http://gerrit.cloudera.org:8080/9262
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/util/CMakeLists.txt
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
A src/kudu/util/debug/unwind_safeness.cc
A src/kudu/util/debug/unwind_safeness.h
6 files changed, 291 insertions(+), 10 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I414718a26c4810de86a10397bb9803dc5e386d1a
Gerrit-Change-Number: 9262
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 3): use futex to speed up stack collection

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9254 )

Change subject: KUDU-2291 (part 3): use futex to speed up stack collection
..

KUDU-2291 (part 3): use futex to speed up stack collection

Previously the thread which requests a stack trace would poll on an
atomic variable waiting for the stack to be collected. Of course
spinning is too CPU-hungry so it would actually sleep for 10ms in
each iteration. This limited the performance of stack collection to
about 100/second, which is problematic for the /stacks web page on a
server that may have thousands of threads.

This switches to using the futex system call to allow the dumper thread
to sleep and the dumpee thread to wake it back up when the results are
ready.

Why do we use the low level futex and not something like a condition
variable or CountdownLatch? The issue is that the dumpee thread is
running in a signal handler context, and most things are not safe to do
in that context. Particularly, mutexes may block, which is very naughty,
and condition variables and latches are currently implemented on top of
mutexes. pthread semaphores are another option, but those don't support
proper timeout semantics, which is a feature we use in this use case.

Whether the futex syscall itself is async-signal-safe is up for some debate.
Clearly the "wait" mode is not, since it may block, but is the "wake" mode
that we use here allowed? Technically I don't think so. In practice,
the pthread sem_post() API _is_ documented to be async-signal-safe and
I verified that in libc source it's implemented using futex to wake the
waiter. So, if we are doing something illegal, at least we're doing
the same illegal thing as libc, so we're very unlikely to ever break.

Obviously futex is not portable to macOS, but this code was already
Linux-specific so we haven't lost much.

This speeds up the benchmark for stack trace collection (without symbolization)
by 585x:

Before:
  I0207 19:33:20.960477 30799 debug-util-test.cc:175] Throughput: 99 
dumps/second (symbolize=0)
  I0207 19:33:21.963946 30799 debug-util-test.cc:175] Throughput: 84 
dumps/second (symbolize=1)

After:
  I0207 19:38:32.505213 32039 debug-util-test.cc:175] Throughput: 57916 
dumps/second (symbolize=0)
  I0207 19:38:33.506417 32039 debug-util-test.cc:175] Throughput: 595 
dumps/second (symbolize=1)

Change-Id: Icb772f0607275ec340c7f38be05347fc14820cbd
Reviewed-on: http://gerrit.cloudera.org:8080/9254
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/util/debug-util.cc
1 file changed, 70 insertions(+), 16 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icb772f0607275ec340c7f38be05347fc14820cbd
Gerrit-Change-Number: 9254
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] WIP: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9375


Change subject: WIP: KUDU-2297 (part 5): dump stacks into diagnostics log on 
service queue overflow
..

WIP: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue 
overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

WIP because I'd like to add some kind of test for this, though I tested
manually and it appears to work.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
---
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
8 files changed, 96 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/1
--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 1): refactor metrics log out of ServerBase

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9326

to look at the new patch set (#5).

Change subject: KUDU-2297 (part 1): refactor metrics log out of ServerBase
..

KUDU-2297 (part 1): refactor metrics log out of ServerBase

This is preparing for putting more information into the metrics log
instead of just metrics snapshots. The logging code will get more
complex as it gains features, so this makes a new class for it.

This is slightly incompatible since I also changed the name on disk to
'diagnostics' instead of 'metrics' and updated the documentation, but I
am not aware of people using this in the wild. So long as we
release-note it, I think it's reasonable to expect any sophisticated
operators to adjust their scripting accordingly.

Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
---
M docs/administration.adoc
M src/kudu/server/CMakeLists.txt
A src/kudu/server/diagnostics_log.cc
A src/kudu/server/diagnostics_log.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
6 files changed, 241 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/9326/5
--
To view, visit http://gerrit.cloudera.org:8080/9326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f7bc7335b1f4c2a5c62fc5fe672289d5027a10d
Gerrit-Change-Number: 9326
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 3): use futex to speed up stack collection

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9254

to look at the new patch set (#6).

Change subject: KUDU-2291 (part 3): use futex to speed up stack collection
..

KUDU-2291 (part 3): use futex to speed up stack collection

Previously the thread which requests a stack trace would poll on an
atomic variable waiting for the stack to be collected. Of course
spinning is too CPU-hungry so it would actually sleep for 10ms in
each iteration. This limited the performance of stack collection to
about 100/second, which is problematic for the /stacks web page on a
server that may have thousands of threads.

This switches to using the futex system call to allow the dumper thread
to sleep and the dumpee thread to wake it back up when the results are
ready.

Why do we use the low level futex and not something like a condition
variable or CountdownLatch? The issue is that the dumpee thread is
running in a signal handler context, and most things are not safe to do
in that context. Particularly, mutexes may block, which is very naughty,
and condition variables and latches are currently implemented on top of
mutexes. pthread semaphores are another option, but those don't support
proper timeout semantics, which is a feature we use in this use case.

Whether the futex syscall itself is async-signal-safe is up for some debate.
Clearly the "wait" mode is not, since it may block, but is the "wake" mode
that we use here allowed? Technically I don't think so. In practice,
the pthread sem_post() API _is_ documented to be async-signal-safe and
I verified that in libc source it's implemented using futex to wake the
waiter. So, if we are doing something illegal, at least we're doing
the same illegal thing as libc, so we're very unlikely to ever break.

Obviously futex is not portable to macOS, but this code was already
Linux-specific so we haven't lost much.

This speeds up the benchmark for stack trace collection (without symbolization)
by 585x:

Before:
  I0207 19:33:20.960477 30799 debug-util-test.cc:175] Throughput: 99 
dumps/second (symbolize=0)
  I0207 19:33:21.963946 30799 debug-util-test.cc:175] Throughput: 84 
dumps/second (symbolize=1)

After:
  I0207 19:38:32.505213 32039 debug-util-test.cc:175] Throughput: 57916 
dumps/second (symbolize=0)
  I0207 19:38:33.506417 32039 debug-util-test.cc:175] Throughput: 595 
dumps/second (symbolize=1)

Change-Id: Icb772f0607275ec340c7f38be05347fc14820cbd
---
M src/kudu/util/debug-util.cc
1 file changed, 70 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/9254/6
--
To view, visit http://gerrit.cloudera.org:8080/9254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb772f0607275ec340c7f38be05347fc14820cbd
Gerrit-Change-Number: 9254
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9330

to look at the new patch set (#6).

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..

KUDU-2297 (part 4): periodically dump stacks to diagnostics log

This modifies the diagnostics log to periodically dump stack traces.
This is slightly complicated by the fact that symbolized stack traces
can be relatively large. So, we separate the logging of symbols and
stack traces. When an address first appears in a log file, it is logged
as part of a symbol line. Later logs of the same address do not need
to re-log the symbol.

With this, a typical stack trace dump of an idle tserver is about 4KB
pre-compression, and a 'symbols' dump is about 6KB. So logging stacks
reasonably often should not use much disk space or IO.

Currently this is enabled on the same interval as the metrics log, but
only if a new experimental flag --diagnostics-log-stack-traces is
enabled. I'm planning to move it to a different flag in a later commit,
but wanted to keep this one simple and incremental.

Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
M src/kudu/util/rolling_log-test.cc
M src/kudu/util/rolling_log.cc
M src/kudu/util/rolling_log.h
7 files changed, 227 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/9330/6
--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] gutil: properly hook up ANNOTATE HAPPENS BEFORE/AFTER

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9325 )

Change subject: gutil: properly hook up ANNOTATE_HAPPENS_BEFORE/AFTER
..

gutil: properly hook up ANNOTATE_HAPPENS_BEFORE/AFTER

This updates dynamic_annotations to hook up ANNOTATE_HAPPENS_BEFORE and
ANNOTATE_HAPPENS_AFTER to the appropriate annotations in the TSAN
runtime library. According to [1] (committed in 2011) the functions have
been implemented in TSAN for quite some time, and it was an error that
we weren't using them.

The previous implementation of the function called some
condition-variable annotations in the TSAN runtime library instead.
Those functions actually turn out to be implemented as no-ops. So, I
looked for existing call sites to ANNOTATE_HAPPENS_BEFORE and AFTER and
removed them. This cleaned up atomic_refcount pretty substantially.
Again I found a matching change[2] in Chromium, from which this code is
derived.

[1] https://codereview.chromium.org/6982022
[2] https://codereview.chromium.org/580813002

Change-Id: Ida27aff6b9771c0009fba5e31ec7a0c7c53caa59
Reviewed-on: http://gerrit.cloudera.org:8080/9325
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M src/kudu/gutil/atomic_refcount.h
M src/kudu/gutil/dynamic_annotations.c
M src/kudu/gutil/dynamic_annotations.h
M src/kudu/gutil/once.cc
M src/kudu/gutil/once.h
5 files changed, 22 insertions(+), 59 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ida27aff6b9771c0009fba5e31ec7a0c7c53caa59
Gerrit-Change-Number: 9325
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 3): use futex to speed up stack collection

2018-02-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9254 )

Change subject: KUDU-2291 (part 3): use futex to speed up stack collection
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9254/5/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9254/5/src/kudu/util/debug-util.cc@144
PS5, Line 144: 0
> how about: nullptr /* ignored */
Done


http://gerrit.cloudera.org:8080/#/c/9254/5/src/kudu/util/debug-util.cc@197
PS5, Line 197: spins waiting
> s/spins waiting/waits/
Done



--
To view, visit http://gerrit.cloudera.org:8080/9254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb772f0607275ec340c7f38be05347fc14820cbd
Gerrit-Change-Number: 9254
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 21 Feb 2018 00:49:59 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 4): periodically dump stacks to diagnostics log

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9330

to look at the new patch set (#5).

Change subject: KUDU-2297 (part 4): periodically dump stacks to diagnostics log
..

KUDU-2297 (part 4): periodically dump stacks to diagnostics log

This modifies the diagnostics log to periodically dump stack traces.
This is slightly complicated by the fact that symbolized stack traces
can be relatively large. So, we separate the logging of symbols and
stack traces. When an address first appears in a log file, it is logged
as part of a symbol line. Later logs of the same address do not need
to re-log the symbol.

With this, a typical stack trace dump of an idle tserver is about 4KB
pre-compression, and a 'symbols' dump is about 6KB. So logging stacks
reasonably often should not use much disk space or IO.

Currently this is enabled on the same interval as the metrics log, but
only if a new experimental flag --diagnostics-log-stack-traces is
enabled. I'm planning to move it to a different flag in a later commit,
but wanted to keep this one simple and incremental.

Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
M src/kudu/util/rolling_log-test.cc
M src/kudu/util/rolling_log.cc
M src/kudu/util/rolling_log.h
7 files changed, 227 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/9330/5
--
To view, visit http://gerrit.cloudera.org:8080/9330
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic32abf2c48ac6a5f3c384e58838b34671bcaf147
Gerrit-Change-Number: 9330
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9319

to look at the new patch set (#5).

Change subject: KUDU-2291 (part 6) stacks: use libunwind directly for stack 
tracing
..

KUDU-2291 (part 6) stacks: use libunwind directly for stack tracing

Previously we used glog's wrapper around libunwind for stack tracing.
However that has a deficiency that it assumes that, process wide, only
one thread can be inside libunwind at a time[1]

It appears that this is left over from some very old versions of
libunwind, or was already unnecessarily conservative. libunwind is meant
to be thread safe, and we have tests that will trigger if it is not.

This just extracts the function body of the glog function we were using
and does the same work manually.

Without this fix, the "collect from all the threads at the same time"
code path resulted in most of the threads collecting an empty trace
since they tried to call libunwind at the same time.

[1] https://github.com/google/glog/issues/298
Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
---
M build-support/iwyu/iwyu.sh
A build-support/iwyu/mappings/libunwind.imp
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
5 files changed, 81 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/9319/5
--
To view, visit http://gerrit.cloudera.org:8080/9319
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a53e55d7c4e7ee50bcac5b1e81267df56383634
Gerrit-Change-Number: 9319
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

2018-02-20 Thread Todd Lipcon (Code Review)
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9329

to look at the new patch set (#4).

Change subject: KUDU-2297 (part 3): refactor process-wide stack collection out 
of /stacks
..

KUDU-2297 (part 3): refactor process-wide stack collection out of /stacks

Previously a bunch of logic to collect all the stacks from the process
was in the /stacks path handler. This logic is relatively generic and
shouldn't be intermingled with the formatting code. In particular I'd
like to use it in the diagnostics log, where a different output format
is desirable.

Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/server/diagnostics_log.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
5 files changed, 190 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/9329/4
--
To view, visit http://gerrit.cloudera.org:8080/9329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb7c6edd31254f3d7e0cbef1eaf575bde3570df6
Gerrit-Change-Number: 9329
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


  1   2   3   4   5   6   7   8   9   10   >