[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners

2018-03-30 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Dan Burkert, David Ribeiro Alves, Kudu Jenkins, Adar Dembo, 
Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9783

to look at the new patch set (#11).

Change subject: KUDU-16 pt 1: add server-side limits on scanners
..

KUDU-16 pt 1: add server-side limits on scanners

This patch adds the functionality to specify limits to the number of
rows returned per tablet server scanner. Internally, each server-side
scanner now tracks the number of rows it has already returned and
adjusts the number of rows to serialize per batch based on this limit.
Setting a non-positive limit is grounds for short circuiting the scan.

Note: Use of this functionality by the client will come in a later
patch.

Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
---
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
11 files changed, 182 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9783/11
--
To view, visit http://gerrit.cloudera.org:8080/9783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
Gerrit-Change-Number: 9783
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9871 )

Change subject: docs: add troubleshooting info on detecting ext2 and ext3 
filesystems
..

docs: add troubleshooting info on detecting ext2 and ext3 filesystems

Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
Reviewed-on: http://gerrit.cloudera.org:8080/9871
Reviewed-by: Todd Lipcon 
Tested-by: Kudu Jenkins
---
M docs/troubleshooting.adoc
1 file changed, 13 insertions(+), 6 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
Gerrit-Change-Number: 9871
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9871 )

Change subject: docs: add troubleshooting info on detecting ext2 and ext3 
filesystems
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
Gerrit-Change-Number: 9871
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 21:37:20 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9835 )

Change subject: KUDU-2384. Don't collect thread stacks when running under 
debugger
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b
Gerrit-Change-Number: 9835
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 21:23:27 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger

2018-03-30 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9835

to look at the new patch set (#4).

Change subject: KUDU-2384. Don't collect thread stacks when running under 
debugger
..

KUDU-2384. Don't collect thread stacks when running under debugger

This changes the stack watchdog and the 'collect all stacks' utility
code to not run when it appears that the process is running under a
tracer such as gdb or strace. This makes debugging Kudu processes
significantly easier because you don't need to know the magical
incantation 'handle SIGUSR2 nostop noprint'.

I tested manually by running a master with
--diagnostics-log-stack-traces-interval-ms=10 and attaching to it with
gdb. With this patch, I didn't see any signals sent.

This patch also changes the Env EIO fault injection to never inject
failures on /proc/. This is to avoid a TSAN warning since
KernelStackWatchdog would otherwise access the env_inject_eio_globs flag
concurrent to tests modifying it.

Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b
---
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
M src/kudu/util/env_posix.cc
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/os-util.cc
M src/kudu/util/os-util.h
6 files changed, 82 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/9835/4
--
To view, visit http://gerrit.cloudera.org:8080/9835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b
Gerrit-Change-Number: 9835
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9871 )

Change subject: docs: add troubleshooting info on detecting ext2 and ext3 
filesystems
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc
File docs/troubleshooting.adoc:

http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc@59
PS1, Line 59: ext3, or ext4 formatted device; see 
link:https://unix.stackexchange.com/q/60723[this Stack
> why don't we put the file and blkid examples here instead of linking to sta
Well, if we're going to do that, I'd like to include the tune2fs example too 
(because I don't know exactly what UNIX tools someone has installed). And 
that's a lot of text. I figured this is rare enough that going to SE isn't such 
a big deal.



--
To view, visit http://gerrit.cloudera.org:8080/9871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
Gerrit-Change-Number: 9871
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 21:11:47 +
Gerrit-HasComments: Yes


[kudu-CR] meta cache: improve multi-threaded scalability

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9760 )

Change subject: meta_cache: improve multi-threaded scalability
..

meta_cache: improve multi-threaded scalability

When profiling a multi-threaded 'kudu perf loadgen' I found the
bottleneck was on the rw_spinlock acquisition in
MetaCache::LookupTabletByKeyFastPath as well as some contention in
RemoteTablet::stale().

This fixes the first by switching to a percpu_rwlock and the second by
using an atomic bool instead of a spinlock-protected bool.

This moved LookupTabletByKeyFastPath from the first line of the profile
(14% of CPU) down to the 8th line (3% of CPU).

Change-Id: I38c8a94ea177bfa4f4e2048355464f76a5daa2ba
Reviewed-on: http://gerrit.cloudera.org:8080/9760
Reviewed-by: Adar Dembo 
Tested-by: Todd Lipcon 
---
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/util/locks.h
3 files changed, 25 insertions(+), 17 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Todd Lipcon: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I38c8a94ea177bfa4f4e2048355464f76a5daa2ba
Gerrit-Change-Number: 9760
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..

KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

This test was flaky in TSAN builds for two reasons:

(1) even though taking the stack trace is always pretty fast, symbolization can
sometimes take more than a second. If that was the case, the assertion that
looked at the diagnostics log content would fail just because the expected log
was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test
failure rate with 4 stress threads down from about 60% to about 0.3%.

(2) with periodic stack dumping enabled, it was possible that the periodic dump
was already in progress at the time that the overflow-triggered dump was
triggered. In that case, the overflow-triggered dump would be ignored. That
was expected behavior, but the test would fail due to not seeing the overflow
listed as the "reason". This patch fixes this issue by disabling periodic
dumping in the test.

In order to disable periodic dumping but leave triggered dumping enabled,
I changed the semantics of the configuration slightly. Previously, setting
it to '0' would disable all dumping. Now, setting it to '0' disables
periodic dumping but still leaves triggered dumps enabled. Setting it to
'-1' disables all dumping.

With both these changes I was able to loop the test 300 times with 4 stress
threads in TSAN mode with no failures.

Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Reviewed-on: http://gerrit.cloudera.org:8080/9867
Reviewed-by: Adar Dembo 
Tested-by: Todd Lipcon 
---
M src/kudu/master/master-test.cc
M src/kudu/server/diagnostics_log.cc
2 files changed, 26 insertions(+), 10 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Todd Lipcon: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 21:06:19 +
Gerrit-HasComments: No


[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems

2018-03-30 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9871 )

Change subject: docs: add troubleshooting info on detecting ext2 and ext3 
filesystems
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc
File docs/troubleshooting.adoc:

http://gerrit.cloudera.org:8080/#/c/9871/1/docs/troubleshooting.adoc@59
PS1, Line 59: ext3, or ext4 formatted device; see 
link:https://unix.stackexchange.com/q/60723[this Stack
why don't we put the file and blkid examples here instead of linking to stack 
exchange?



--
To view, visit http://gerrit.cloudera.org:8080/9871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
Gerrit-Change-Number: 9871
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 19:54:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners

2018-03-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9783 )

Change subject: KUDU-16 pt 1: add server-side limits on scanners
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc@805
PS10, Line 805:int64_t max_num_rows_to_return,
> was thinking about this from a performance angle a bit more this morning. I
Hmmm interesting, so the idea would be to resize the selection vector 
post-scan, eh? Makes sense; I'll try that out.



--
To view, visit http://gerrit.cloudera.org:8080/9783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
Gerrit-Change-Number: 9783
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 19:53:26 +
Gerrit-HasComments: Yes


[kudu-CR] thirdparty: add patch to fix printing large span stats in /memz

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9870 )

Change subject: thirdparty: add patch to fix printing large span stats in /memz
..

thirdparty: add patch to fix printing large span stats in /memz

This fixes a bug that I introduced upstream in gperftools by
importing the upstream patch that fixed it. The issue was mismatched
arguments to printf() which resulted in incorrect printing of stats in
/memz.

Without this fix, memz always claimed to have 128 large spans:

>128   large *128 spans ~0.0 MiB;2.7 MiB cum; unmapped: >0.0 MiB;   
> 0.0 MiB cum

With the fix it properly reports 0 spans on startup of an empty master:

>128   large *  0 spans ~0.0 MiB;3.6 MiB cum; unmapped: >0.0 MiB;   
> 0.0 MiB cum

Change-Id: I144830c069ec13017fd95b24aecf1489755b336f
Reviewed-on: http://gerrit.cloudera.org:8080/9870
Reviewed-by: Will Berkeley 
Tested-by: Kudu Jenkins
---
M thirdparty/download-thirdparty.sh
A thirdparty/patches/gperftools-unbreak-memz.patch
2 files changed, 30 insertions(+), 1 deletion(-)

Approvals:
  Will Berkeley: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I144830c069ec13017fd95b24aecf1489755b336f
Gerrit-Change-Number: 9870
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] docs: add troubleshooting info on detecting ext2 and ext3 filesystems

2018-03-30 Thread Adar Dembo (Code Review)
Hello Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9871

to review the following change.


Change subject: docs: add troubleshooting info on detecting ext2 and ext3 
filesystems
..

docs: add troubleshooting info on detecting ext2 and ext3 filesystems

Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
---
M docs/troubleshooting.adoc
1 file changed, 13 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/9871/1
--
To view, visit http://gerrit.cloudera.org:8080/9871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic4917e144690943e43bf5aa380b59379669846d9
Gerrit-Change-Number: 9871
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9783 )

Change subject: KUDU-16 pt 1: add server-side limits on scanners
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/9783/10/src/kudu/common/wire_protocol.cc@805
PS10, Line 805:int64_t max_num_rows_to_return,
was thinking about this from a performance angle a bit more this morning. I 
think it would be faster to actually do the limiting on the selection vector of 
the RowBlock rather than adding new conditionals into the CopyColumn code. That 
way the branches happen once up front instead of on every column, and the stuff 
like 'num_rows' calculation would just be adjusted automatically. Thoughts?



--
To view, visit http://gerrit.cloudera.org:8080/9783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
Gerrit-Change-Number: 9783
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 18:56:03 +
Gerrit-HasComments: Yes


[kudu-CR] thirdparty: add patch to fix printing large span stats in /memz

2018-03-30 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9870 )

Change subject: thirdparty: add patch to fix printing large span stats in /memz
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I144830c069ec13017fd95b24aecf1489755b336f
Gerrit-Change-Number: 9870
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Fri, 30 Mar 2018 18:50:08 +
Gerrit-HasComments: No


[kudu-CR] thirdparty: add patch to fix printing large span stats in /memz

2018-03-30 Thread Todd Lipcon (Code Review)
Hello Will Berkeley,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9870

to review the following change.


Change subject: thirdparty: add patch to fix printing large span stats in /memz
..

thirdparty: add patch to fix printing large span stats in /memz

This fixes a bug that I introduced upstream in gperftools by
importing the upstream patch that fixed it. The issue was mismatched
arguments to printf() which resulted in incorrect printing of stats in
/memz.

Without this fix, memz always claimed to have 128 large spans:

>128   large *128 spans ~0.0 MiB;2.7 MiB cum; unmapped: >0.0 MiB;   
> 0.0 MiB cum

With the fix it properly reports 0 spans on startup of an empty master:

>128   large *  0 spans ~0.0 MiB;3.6 MiB cum; unmapped: >0.0 MiB;   
> 0.0 MiB cum

Change-Id: I144830c069ec13017fd95b24aecf1489755b336f
---
M thirdparty/download-thirdparty.sh
A thirdparty/patches/gperftools-unbreak-memz.patch
2 files changed, 30 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/9870/1
--
To view, visit http://gerrit.cloudera.org:8080/9870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I144830c069ec13017fd95b24aecf1489755b336f
Gerrit-Change-Number: 9870
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2191 (8/n): Integrate HmsCatalog into CatalogManager

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9863 )

Change subject: KUDU-2191 (8/n): Integrate HmsCatalog into CatalogManager
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/integration-tests/master_hms-itest.cc@215
PS2, Line 215:   LOG(INFO) << "STOPPING HMS!!!";
Still want this?


http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9863/2/src/kudu/master/catalog_manager.cc@2280
PS2, Line 2280:   // 7. Update sys-catalog with the new table schema and 
tablets to add/drop.
Need to renumber the rest of the steps.



--
To view, visit http://gerrit.cloudera.org:8080/9863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie68e143c3c317c7690af097e6485934feb1010b4
Gerrit-Change-Number: 9863
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 17:40:46 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners

2018-03-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9783 )

Change subject: KUDU-16 pt 1: add server-side limits on scanners
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc@502
PS9, Line 502:  Returns number of
> It's necessary for the scanned rows metrics. AFAICT, scanners (and their in
I meant, scanners can *span* multiple *batches*, and the row collectors don't, 
and so we currently update metrics once per batch.



--
To view, visit http://gerrit.cloudera.org:8080/9783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
Gerrit-Change-Number: 9783
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 17:37:59 +
Gerrit-HasComments: Yes


[kudu-CR] rowset metadata: cache min/max encoded keys

2018-03-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9372 )

Change subject: rowset_metadata: cache min/max encoded keys
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9372/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9372/6//COMMIT_MSG@29
PS6, Line 29: tablet. I repeated this, configuring Kudu to read the keys from 
the
> This seems to be missing the punch line. Is this implying that, with the pa
We could cut out the 15s difference (these two rows are time spent 
bootstrapping when reading keys from the rowset meta vs reading from the data 
blocks. Updated to be a bit more clear.



--
To view, visit http://gerrit.cloudera.org:8080/9372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d
Gerrit-Change-Number: 9372
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 17:35:33 +
Gerrit-HasComments: Yes


[kudu-CR] rowset metadata: cache min/max encoded keys

2018-03-30 Thread Andrew Wong (Code Review)
Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9372

to look at the new patch set (#7).

Change subject: rowset_metadata: cache min/max encoded keys
..

rowset_metadata: cache min/max encoded keys

This patch adds a new flag rowset_metadata_store_keys that, when true,
will indicate that Kudu should duplicate diskrowset min/max keys into
the rowset metadata. Doing so allows Kudu to read the keys from tablet
metadata and bootstrap tablets without having to fully initializing the
CFileReaders for the key columns of each rowset.

A small test is added to tablet_server-test that ensures we don't read
any extraneous bytes when starting up the tablet server if we're reading
keys from the rowset metadata.

I benchmarked this with ~50GB of flushed YCSB data (92 tablets of
varying sizes) on a single node with 4 data directories and a separate
WAL/metadata directory. To set up, I let the server flush/compact for a
while so bootstrap times wouldn't be dominated by reading WAL segments,
and set rowset_metadata_store_keys to true so the tserver had the option
of reading the cached keys from the rowset metadata at startup.

With the above setup, I started the tserver with a disabled maintenance
manager (to avoid further IO) and waited for the tablets to get to a
RUNNING state, recording the sum of the logged bootstrap times of each
tablet. I repeated this, configuring Kudu to read the keys from the
rowset metadata, and to read the keys from the data blocks, dropping OS
caches in between runs. The results are below.

Run number:   1   2   3   Avg
Reading cached keys (s):  26.430  24.143  20.826  23.800
Not reading cached keys (s):  40.578  38.428  37.093  38.700

Based on this, ~15 seconds worth of bootstrapping time was spent on
initializing the key index readers, that could be avoided by reading the
keys from the rowset metadata instead.

Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d
---
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tserver/tablet_server-test.cc
7 files changed, 124 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/9372/7
--
To view, visit http://gerrit.cloudera.org:8080/9372
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37d6f7160e3a7188753684e063963110f70e9b8d
Gerrit-Change-Number: 9372
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2191 (7/n): HmsCatalog

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9862 )

Change subject: KUDU-2191 (7/n): HmsCatalog
..


Patch Set 2:

(14 comments)

The IWYU failures are weird; could you bug Alexey?

http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/CMakeLists.txt
File src/kudu/hms/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/CMakeLists.txt@85
PS2, Line 85:   ADD_KUDU_TEST(hms_catalog-test RUN_SERIAL true NUM_SHARDS 4)
Nit: should precede hms_client-test alphabetically.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc
File src/kudu/hms/hms_catalog-test.cc:

http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@168
PS2, Line 168: ASSERT_OK(hms_->Stop());
 : ASSERT_OK(hms_client_->Stop());
Don't the destructors do this automatically?


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@242
PS2, Line 242:   string table_id = "table-id";
 :   string hms_database_name = "default";
 :   string hms_table_name = "table_name";
 :   string table_name = Substitute("$0.$1", hms_database_name, 
hms_table_name);
 :   string hms_altered_table_name = "altered_table_name";
 :   string altered_table_name = Substitute("$0.$1", 
hms_database_name, hms_altered_table_name);
Add 'const' and use kVariableName notation, since these are constants.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@263
PS2, Line 263:   ASSERT_OK(hms_catalog_->DropTable(table_id, 
altered_table_name));
And then verify that the table is gone?


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@272
PS2, Line 272: CHECK_OK
Could ASSERT_OK here too.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@280
PS2, Line 280:   ASSERT_OK(hms_client_->GetTable("default", "foo", ));
To make it clearer that AlterTable will create the entry, could you assert that 
it doesn't exist before the alter?

Below as well.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@287
PS2, Line 287:   // Drop a table containing a non Hive-compatible character, 
and ensure it doesn't fail.
And verify that the tables are gone? Or is that a given?


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@297
PS2, Line 297:   string table_id = "abc123";
 :   string hms_database = "default";
 :
 :   string hms_external_table = "external_table";
 :   string external_table_name = Substitute("$0.$1", hms_database, 
hms_external_table);
 :
 :   string hms_kudu_table = "kudu_table";
 :   string kudu_table_name = Substitute("$0.$1", hms_database, 
hms_kudu_table);
const and kName.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@358
PS2, Line 358:   // Drop a Kudu table with no corresponding HMS entry.
Verify that the table is gone?


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog-test.cc@383
PS2, Line 383: operations
Nit: and ensure that the same operations succeed

It's a bit pedantic, but it's important for these to be constant to prove that 
only one thing has changed: the status of the HMS.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc
File src/kudu/hms/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc@65
PS2, Line 65: // Validated in master.cc.
Nit: if this applies to hive_metastore_sasl_enabled, please move it to just 
before the definition rather than just after.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc@70
PS2, Line 70: TAG_FLAG(hive_metastore_retry_count, advanced);
: TAG_FLAG(hive_metastore_retry_count, experimental);
FWIW, I think experimental implies advanced so I don't think you need to tag 
with both.


http://gerrit.cloudera.org:8080/#/c/9862/2/src/kudu/hms/hms_catalog.cc@74
PS2, Line 74: DEFINE_int32(hive_metastore_send_timeout, 60,
:  "Configures the socket send timeout, in seconds, for 
Thrift "
:  "connections to the Hive Metastore.");
: TAG_FLAG(hive_metastore_send_timeout, advanced);
: TAG_FLAG(hive_metastore_send_timeout, experimental);
: TAG_FLAG(hive_metastore_send_timeout, runtime);
:
: DEFINE_int32(hive_metastore_recv_timeout, 60,
:  "Configures the socket receive timeout, in seconds, 
for Thrift "
:  "connections to the Hive Metastore.");
: TAG_FLAG(hive_metastore_recv_timeout, advanced);
: TAG_FLAG(hive_metastore_recv_timeout, experimental);
: TAG_FLAG(hive_metastore_recv_timeout, runtime);
:
: 

[kudu-CR] KUDU-2191 (6/n): Fixups to the C++ HMS client

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9861 )

Change subject: KUDU-2191 (6/n): Fixups to the C++ HMS client
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client-test.cc
File src/kudu/hms/hms_client-test.cc:

http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client-test.cc@20
PS1, Line 20: #include 
So IWYU wants you to remove all of these new includes. Are all of them 
necessary to compile this particular patch (as opposed to others in the 
series)? If so, could you annotate each one with the appropriate IWYU pragma?


http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client.h
File src/kudu/hms/hms_client.h:

http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client.h@22
PS1, Line 22: #include 
My guess is that this is already included through ThriftHiveMetastore.h, which 
is why IWYU wants you to remove it.


http://gerrit.cloudera.org:8080/#/c/9861/1/src/kudu/hms/hms_client.h@33
PS1, Line 33: class NotificationEvent;
: class Partition;
What happens if you drop these? Perhaps they're declared/defined in the 
ThriftHiveMetastore.h chain?



--
To view, visit http://gerrit.cloudera.org:8080/9861
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a8d984e67fe6b4c004725828b4296476c2389e
Gerrit-Change-Number: 9861
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:59:09 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9835 )

Change subject: KUDU-2384. Don't collect thread stacks when running under 
debugger
..


Patch Set 3:

> > That said, the test in question (TestCreateWithFailedDirs) already resets 
> > env_inject_eio to 0 when it finishes; if it didn't, we'd also see a warning 
> > that we couldn't clean up the test's temp dir. But I guess here the issue 
> > isn't with env_inject_eio, but with env_inject_eio_globs? We don't 
> > typically reset that one.
>
> Yea, the issue is with env_inject_eio_globs which is a string and thus 
> particularly unsafe to read. The odd thing, though, is that we only read 
> env_inject_eio_globs when env_inject_eio is non-zero, but it appears we're 
> still reading it. So perhaps we aren't resetting env_inject_eio properly in 
> all cases or somesuch.

I think it's a real data race: right before the main thread (T1) resets 
env_inject_eio, the kernel stack watchdog thread (T2) gets scheduled and reads 
the non-zero gflag value. Then T1 is rescheduled and resets to 0, but it's too 
late: when T2 is rescheduled, it will read env_inject_eio_globs.


--
To view, visit http://gerrit.cloudera.org:8080/9835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b
Gerrit-Change-Number: 9835
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:51:06 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2191 (6/n): Hive Metastore catalog manager integration

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8312 )

Change subject: KUDU-2191 (6/n): Hive Metastore catalog manager integration
..


Patch Set 10:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc
File src/kudu/hms/hms_client.cc:

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/hms/hms_client.cc@202
PS8, Line 202:   // operations properly.
> Yes, I believe it's taken directly from the hive-site.xml so it's possible
Gotcha. Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@162
PS8, Line 162:   }
 :
 :   // Checks that a table does not exist in the Kudu and HMS 
catalogs.
 :   void CheckCatalogsNegative(const string& database_n
> I've tried to reduce the number of tests cases since spinning up an HMS is
No good ideas here short of changing the MiniHMS lifecycle to outlive its 
minicluster (so that it's reset once per test suite rather than once per test). 
But that's a really bad idea for other reasons.

If you have the time, it might be fruitful to profile the HMS startup and 
figure out why it takes so long. If that time is spent searching an enormous 
classpath, we may be able to prune it to improve startup time.


http://gerrit.cloudera.org:8080/#/c/8312/8/src/kudu/integration-tests/master_hms-itest.cc@268
PS8, Line 268:   ASSERT_TRUE(s.IsIllegalState()) << s.ToString();
 :   ASSERT_STR_CONTAINS(s.ToString(), "
> You may be reading too much into this.  This is just setting up a hypotheti
But isn't the CheckCatalogs() call on L277 ensuring that the Kudu table 
(renamed_table_name) and the HMS table (hms_renamed_table_name) have the same 
names?

I was still confused so I went through the entire test and wrote down the 
actions taking place:

  // Create the Kudu table.
  Create Kudu table rename_db.kudu_table
  - Implicit create of same HMS table

  // Create a non-Kudu HMS table entry.
  Create non-Kudu HMS table rename_db.external_table

  // Drop the HMS table entry and rename the table. This tests that the
  // HmsCatalog will create a new entry when necessary.
  Drop HMS table rename_db.kudu_table
  Rename Kudu table from rename_db.kudu_table to rename_db.renamed_table
  - Implicit create of Kudu HMS table rename_db.renamed_table

  // Rename the table back to the original name. This tests the happy path.
  Rename Kudu table from rename_db.renamed_table to rename_db.kudu_table
  - Implicit rename of Kudu HMS table rename_db.renamed_table to 
rename_db.kudu_table

  // Drop the HMS table entry, then create a non-Kudu table entry in it's place,
  // and attempt to rename the table.
  Drop Kudu HMS table rename_db.kudu_table
  Create non-Kudu HMS table rename_db.kudu_table
  Rename Kudu table from rename_db.kudu_table to rename_db.renamed_table
  - No implicit rename because rename_db.kudu_table is non-Kudu HMS entry
  - Implicit create of Kudu HMS table rename_db.renamed_table

What I was missing was that in the last Kudu rename, not only does Kudu NOT 
rename the existing HMS entry (because it's a non-Kudu entry), but it'll 
implicitly create a new HMS entry for the destination table name.

A couple things that would make this easier to understand:
1. hms_client_->CreateTable() and CreateTable() are similar looking; rename 
CreateTable() to something else.
2. The table name variables are also pretty similar looking; rename them so 
that they stand out from one another more.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/catalog_manager.cc@1755
PS9, Line 1755: Schema schema;
> It's a hack that allows using the RETURN_NOT_OK_LOG macro inside the lambda
Hmm, I do think it's somewhat unintuitive; I could see someone else reading 
this and thinking that they've misunderstood how ScopedCleanup works.

I wouldn't mind it if there was a comment saying the return value is ignored. 
Or if you wrote it out manually.


http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc
File src/kudu/master/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8312/9/src/kudu/master/hms_catalog.cc@201
PS9, Line 201: HMS table entry: "
 : << s.ToString();
 : return create_table();
> Well you're right to expect there are TOCTOU races going on all over the pl
I don't understand your explanation.

On L172 we retrieve the original HMS table entry. On L207, if that entry's name 
matches the new name, we short-circuit. What's stopping another HMS client 
(perhaps another Kudu cluster?) from renaming the entry from the original 

[kudu-CR] KUDU-2384. Don't collect thread stacks when running under debugger

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9835 )

Change subject: KUDU-2384. Don't collect thread stacks when running under 
debugger
..


Patch Set 3:

> Patch Set 3:
> That said, the test in question (TestCreateWithFailedDirs) already resets 
> env_inject_eio to 0 when it finishes; if it didn't, we'd also see a warning 
> that we couldn't clean up the test's temp dir. But I guess here the issue 
> isn't with env_inject_eio, but with env_inject_eio_globs? We don't typically 
> reset that one.

Yea, the issue is with env_inject_eio_globs which is a string and thus 
particularly unsafe to read. The odd thing, though, is that we only read 
env_inject_eio_globs when env_inject_eio is non-zero, but it appears we're 
still reading it. So perhaps we aren't resetting env_inject_eio properly in all 
cases or somesuch.

Anyway I'll do option (c) since I think it makes sense anyway - we don't want 
to inject failures reading from proc in general.


--
To view, visit http://gerrit.cloudera.org:8080/9835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib17ffb9514e9b440b8d97843374d2a60913a398b
Gerrit-Change-Number: 9835
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:37:19 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:26:19 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106
PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms);
> Nit: alphabetical order (at least switch with master_inject_latency_on_tabl
Done



--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:09:42 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Todd Lipcon (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9867

to look at the new patch set (#2).

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..

KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

This test was flaky in TSAN builds for two reasons:

(1) even though taking the stack trace is always pretty fast, symbolization can
sometimes take more than a second. If that was the case, the assertion that
looked at the diagnostics log content would fail just because the expected log
was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test
failure rate with 4 stress threads down from about 60% to about 0.3%.

(2) with periodic stack dumping enabled, it was possible that the periodic dump
was already in progress at the time that the overflow-triggered dump was
triggered. In that case, the overflow-triggered dump would be ignored. That
was expected behavior, but the test would fail due to not seeing the overflow
listed as the "reason". This patch fixes this issue by disabling periodic
dumping in the test.

In order to disable periodic dumping but leave triggered dumping enabled,
I changed the semantics of the configuration slightly. Previously, setting
it to '0' would disable all dumping. Now, setting it to '0' disables
periodic dumping but still leaves triggered dumps enabled. Setting it to
'-1' disables all dumping.

With both these changes I was able to loop the test 300 times with 4 stress
threads in TSAN mode with no failures.

Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
---
M src/kudu/master/master-test.cc
M src/kudu/server/diagnostics_log.cc
2 files changed, 26 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/9867/2
--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
..


Patch Set 3: Verified+1 Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:07:37 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
..

KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

This test was flaky with two different failure modes which I believe
have the same root cause: when the test starts, it's possible that there
is still some thread from a prior test case in the process of shutting
down. Even though we join() on our threads religiously, there is a very
short period of time after join() returns where the dying thread is
still listed in /proc/self/task/. This meant that we would sometimes
fail the assertion that checks that the number of returned stacks
matches the number of running threads (because the thread finished
exiting just before we called SnapshotAllStacks). In other cases we
would still see that exiting thread, but it would exit before responding
to our stack trace signal, so we'd see a "timed out" response.

The fix is a little hacky - we calculate the expected number of running
threads and ASSERT_EVENTUALLY to wait until reality matches
expectations.

Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Reviewed-on: http://gerrit.cloudera.org:8080/9864
Reviewed-by: Todd Lipcon 
Tested-by: Todd Lipcon 
---
M src/kudu/util/debug-util-test.cc
1 file changed, 30 insertions(+), 6 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

2018-03-30 Thread Todd Lipcon (Code Review)
Hello Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9864

to look at the new patch set (#3).

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
..

KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

This test was flaky with two different failure modes which I believe
have the same root cause: when the test starts, it's possible that there
is still some thread from a prior test case in the process of shutting
down. Even though we join() on our threads religiously, there is a very
short period of time after join() returns where the dying thread is
still listed in /proc/self/task/. This meant that we would sometimes
fail the assertion that checks that the number of returned stacks
matches the number of running threads (because the thread finished
exiting just before we called SnapshotAllStacks). In other cases we
would still see that exiting thread, but it would exit before responding
to our stack trace signal, so we'd see a "timed out" response.

The fix is a little hacky - we calculate the expected number of running
threads and ASSERT_EVENTUALLY to wait until reality matches
expectations.

Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
---
M src/kudu/util/debug-util-test.cc
1 file changed, 30 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/9864/3
--
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG@17
PS2, Line 17: exitinng
> exiting
I bet you can't wait for my new laptop to arrive



--
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:06:46 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9866 )

Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
..

KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

This fixes the following TSAN race:

WARNING: ThreadSanitizer: data race (pid=17822)  Read of size 1 at 
0x7b4c54e8 by thread T59 (mutexes: write M1750):
...
#3 
strings::internal::SubstituteArg::SubstituteArg(std::__1::basic_string const&) 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/gutil/strings/substitute.h:76
 (libtserver.so+0x9edb0)
#4 kudu::MaintenanceManager::LogPrefix() const 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:545:31
 (libkudu_util.so+0x167791)
#5 kudu::MaintenanceManager::UnregisterOp(kudu::MaintenanceOp*) 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:235:3
 (libkudu_util.so+0x165963)
#6 kudu::MaintenanceOp::Unregister() 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:123:13
 (libkudu_util.so+0x1654fe)
#7 kudu::tablet::Tablet::UnregisterMaintenanceOps() 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet.cc:1405:9
 (libtablet.so+0xfb5af)
#8 kudu::tablet::TabletReplica::Stop() 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tablet/tablet_replica.cc:271:25
 (libtablet.so+0x146e66)
#9 
kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string c

  Previous write of size 8 at 0x7b4c54e8 by main thread:
#0 memcpy 
/data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:655
 (kudu-tserver+0x449e4c)
#1 std::__1::basic_string::__move_assign(std::__1::basic_string&, 
std::__1::integral_constant) 
/data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2044:18
 (libkudu_util.so+0x16664d)
#2 std::__1::basic_string::operator=(std::__1::basic_string&&) 
/data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/tsan/include/c++/v1/string:2055
 (libkudu_util.so+0x16664d)
#3 kudu::MaintenanceManager::Init(std::__1::basic_string) 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/maintenance_manager.cc:169
 (libkudu_util.so+0x16664d)
...

The race is on the 'server_uuid_' field in the MaintenanceManager. This is set
during startup, but was being set _after_ calls such as UnregisterOp could be
made as seen above. That means the UnregisterOp call could either see an empty
UUID or even crash due to the above race.

This simply rejiggers the MaintenanceManager startup to take the UUID in as a
constructor parameter instead, and to instantiate the object slightly later
during startup.

Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Reviewed-on: http://gerrit.cloudera.org:8080/9866
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/master/master.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
5 files changed, 24 insertions(+), 16 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

2018-03-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9866 )

Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc
File src/kudu/tserver/tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc@97
PS1, Line 97:   maintenance_manager_.reset(new MaintenanceManager(
> I think this should move up above the path handler registration, because th
yea, it doesn't start actually accepting connections until you call Start() 
which happens in ServerBase::Start(), called by TabletServer::Start(), which 
happens after Init()



--
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:06:12 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106
PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms);
Nit: alphabetical order (at least switch with 
master_inject_latency_on_tablet_lookups_ms).



--
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Fri, 30 Mar 2018 15:02:09 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9866 )

Change subject: KUDU-2388. Fix TSAN race in MaintenanceManager::UnregisterOp
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc
File src/kudu/tserver/tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/9866/1/src/kudu/tserver/tablet_server.cc@97
PS1, Line 97:   maintenance_manager_.reset(new MaintenanceManager(
I think this should move up above the path handler registration, because the MM 
can be dereferenced by a handler.

Unless we're not actually listening on any ports yet, in which case it's fine.



--
To view, visit http://gerrit.cloudera.org:8080/9866
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id06731f56eb98146f7b88541b936c0026b781b16
Gerrit-Change-Number: 9866
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Fri, 30 Mar 2018 14:59:45 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

2018-03-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG@17
PS2, Line 17: exitinng
exiting



--
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 14:42:26 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2293 fix cleanup after failed copies

2018-03-30 Thread Andrew Wong (Code Review)
Hello Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9452

to look at the new patch set (#5).

Change subject: KUDU-2293 fix cleanup after failed copies
..

KUDU-2293 fix cleanup after failed copies

Before, when a tablet server failed to tablet copy, Kudu would perform a
best-effort cleanup of the partially-copied replica and leave the tablet
tombstoned. If this tombstoning were to fail due to disk issues (e.g.
out of space), Kudu would allow this and tablet would remain in
TABLET_DATA_COPYING both in-memory and on-disk. This would lead to a
CHECK failure if another tablet copy were started for the same replica,
as the server would attempt to copy over a replica with data already
marked TABLET_DATA_COPYING.

This behavior arose from trying to balance two invariants:
- keep on-disk state consistent with in-memory state when possible
- when a tablet copy fails, leave it in as dead of a state as we can
  (i.e. TABLET_DATA_TOMBSTONED with no transitions in progress)

This patch updates the Abort() logic to lean towards the latter
invariant: if a tablet copy fails, at least its in-memory state will be
set as TABLET_DATA_TOMBSTONED. This may not be true on-disk, but that's
okay because either 1) the tablet server will eventually overwrite it
via another tablet copy (at which time its data must _not_ be in the
TABLET_DATA_COPYING state), or 2) the server will be restarted and the
tablet will be tombstoned upon seeing a non-TABLET_DATA_READY state
anyway.

Additionally, we previously used the tablet copy internal state machine
to determine whether to call Abort(). This wasn't always right, since
the state machine updates when various stages complete, rather than when
various stages start, and w.r.t. cleanup, it's useful to know when we
have already begun to update state on disk, and when we have finished.

So in addition to the above changes to Abort(), this patch also
registers a function to be called by TabletCopyClient's destructor that
calls Abort() if cleanup is necessary (i.e. if a tablet copy has started
and not completed).

A test is added to tablet_copy-itest that tests failures to copy,
ensuring that the tablet is left in such an state that further copies
are possible without crashing. The test uses EIO injection to fail the
copies, but the logic is the same as if full disks were used instead.

Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04
---
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/tablet_copy_client.h
4 files changed, 157 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/9452/5
--
To view, visit http://gerrit.cloudera.org:8080/9452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0459d484a3064aa2de392328e2910c9f6741be04
Gerrit-Change-Number: 9452
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-16 pt 1: add server-side limits on scanners

2018-03-30 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9783 )

Change subject: KUDU-16 pt 1: add server-side limits on scanners
..


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol-test.cc@323
PS9, Line 323: boost::none /* max_num_rows_to_return */, 
true /* pad timestamps */);
> if you write this as: /* max_num_rows_to_return= */boost::none then clang-t
Done


http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc@762
PS9, Line 762: max_num_rows_to_return
> we can get rid of some branching here by doing this once up front:
Done


http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc@771
PS9, Line 771: (*max_num_rows_to_return - rows_returned)
> can also avoid this by updating a 'max_rows' above incrementally
Done


http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/common/wire_protocol.cc@838
PS9, Line 838: num_rows
> what's the purpose of allowing this to be negative in this function? can't
Done


http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_server-test.cc@2072
PS9, Line 2072: lmits
> typo
Done


http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc@485
PS9, Line 485: optional
> maybe this can be non-optional and simplify code paths by just setting it t
Nice, I like it.


http://gerrit.cloudera.org:8080/#/c/9783/9/src/kudu/tserver/tablet_service.cc@502
PS9, Line 502: num_rows_returned_
> do we still need this member if it's redundant with the one inside the scan
It's necessary for the scanned rows metrics. AFAICT, scanners (and their 
internal count) can batches multiple scans, whereas the metrics get updated 
after each HandleRowBlock() (per each batch's result collector). I'm not sure 
there's elegant way to refactor this out.



--
To view, visit http://gerrit.cloudera.org:8080/9783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
Gerrit-Change-Number: 9783
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 30 Mar 2018 07:52:49 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-16 pt 2: add client-side limits on scanners

2018-03-30 Thread Andrew Wong (Code Review)
Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9790

to look at the new patch set (#5).

Change subject: KUDU-16 pt 2: add client-side limits on scanners
..

KUDU-16 pt 2: add client-side limits on scanners

This patch adds a public API to allow the specification of
per-client-side-scanner limits on the number of rows returned. Each
scanner will maintain a count of the number of rows already read, and
adjust the server-side limit upon sending the next scan request.

A couple of tests are included to verify that the limits act as
expected. I also verified that lowering the limit reduces the number of
bytes read on disk (at the granularity of a single scan batch at a
time).

Note: this patch does not implement the behavior for scan tokens.

Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
7 files changed, 184 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9790/5
--
To view, visit http://gerrit.cloudera.org:8080/9790
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib2d40e3d14e36f3bf1d09a4bfdb3e17a745d244d
Gerrit-Change-Number: 9790
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon