[kudu-CR](branch-1.2.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Will Berkeley (Code Review)
Will Berkeley has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8532 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

--
Backport notes
---
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin 
Tested-by: Alexey Serbin 
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Reviewed-on: http://gerrit.cloudera.org:8080/8532
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley 
---
M src/kudu/server/hybrid_clock.cc
M src/kudu/server/hybrid_clock.h
2 files changed, 23 insertions(+), 51 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/8532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8532
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR](branch-1.2.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8532 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8532
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 14 Nov 2017 07:41:49 +
Gerrit-HasComments: No


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8531 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Patch Set 1: Verified+1 Code-Review+2

Jenkins failure due to KUDU-1736


--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 14 Nov 2017 07:41:01 +
Gerrit-HasComments: No


[kudu-CR] tool: new action for adding to the set of data directories

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8352 )

Change subject: tool: new action for adding to the set of data directories
..

tool: new action for adding to the set of data directories

This patch includes support for adding to the set of data directories in
an existing Kudu filesystem. The only user-facing bit is a new FsManager
option that, when set, augments Open() to also look for missing fs
roots. If any are found, they will be created, and the existing data
directory instance files updated to recognize these new roots. Also
included is a new tool action that opens an FsManager with this option
set.

Updating the set of data directories is a complex, multi-step operation,
and a single error could leave the filesystem in a difficult-to-repair
state. As such, there is some fairly gnarly rollback code that attempts
to undo the changes made in the event of an error.

This logic can be extended to remove data directories. This patch only
addresses adding.

Change-Id: I6ddbdd6cc6231996e7802a622a8b4691527a0643
Reviewed-on: http://gerrit.cloudera.org:8080/8352
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/fs/block_manager_util-test.cc
M src/kudu/fs/block_manager_util.cc
M src/kudu/fs/block_manager_util.h
M src/kudu/fs/data_dirs-test.cc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/fs_manager-test.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_fs.cc
11 files changed, 730 insertions(+), 169 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/8352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6ddbdd6cc6231996e7802a622a8b4691527a0643
Gerrit-Change-Number: 8352
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Will Berkeley (Code Review)
Will Berkeley has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8531 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

--
Backport notes
---
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin 
Tested-by: Alexey Serbin 
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Reviewed-on: http://gerrit.cloudera.org:8080/8531
Reviewed-by: Will Berkeley 
Tested-by: Will Berkeley 
---
M src/kudu/server/hybrid_clock.cc
M src/kudu/server/hybrid_clock.h
2 files changed, 23 insertions(+), 51 deletions(-)

Approvals:
  Will Berkeley: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Will Berkeley (Code Review)
Will Berkeley has removed a vote on this change.

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] tool: new action for adding to the set of data directories

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8352 )

Change subject: tool: new action for adding to the set of data directories
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ddbdd6cc6231996e7802a622a8b4691527a0643
Gerrit-Change-Number: 8352
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 07:40:32 +
Gerrit-HasComments: No


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 5:

> Patch Set 5:
>
> (2 comments)

Thanks for the proposal! I've been thinking through this a bit, about potential 
issues/benefits.

I agree it would be fewer calls to error handling. My first impression is that 
it treats all files the same, when maybe we shouldn't. That would assume that 
disk failure handling should be the same for all kinds of files, which might 
not be true (e.g. how would we handle EIOs in cmeta/tmeta vs for a block?). I 
suppose that could be sorted out in the callback (matching the file to the sort 
of handling or somesuch).

A large, maybe subtle change that I think you're proposing is that the error 
manager would be owned by the Env. This feels weird, but I could see it. Wiring 
could be done by the tablet_server as it is now, going through 
`fs_manager()->env()->RegisterCallback()` or somesuch. In this case, though, I 
think it'd be weird to handle any errors outside of the Env layer (e.g. the 
aforementioned tablet data corruptions, checksums, etc.), since most things 
only know about the FsManager, not the Env (again, could be remedied with the 
`fs_manager()->env()`). Of course, _all_ of the error-handling plumbing would 
move from the block manager to the Env, which is a pretty significant change.

This also doesn't address the issue in the commit message, where errors may be 
indirectly caused by disk failures. It's a "tablet" error, so I suppose it will 
have to reach into the Env and trigger error-handling explicitly. That sort of 
feels weird too, since now we're conflating the layers that we're handling 
these errors (not a dealbreaker, and not not saying that the current impl is 
free of this problem, but worth bringing up).

Overall the (biased!) impression I get is that it seems a bit less flexible, 
since most things are ignorant of the Env, but a lot of things know about the 
FsManager (easily worked around though). I'll have to think about it a bit 
more. There may also some more intricacies to think about regarding the 
lifecycle of the block manager, dir manager, etc.


--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 04:11:50 +
Gerrit-HasComments: No


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h@96
PS5, Line 96:   FsErrorManager() :
I find the error handling semantics confusing and complicated and I think we 
can avoid so many call sites to the error handling. I wrote this out but it was 
too much for Gerrit, so please see this Gist for how I think this could work: 
https://gist.github.com/mpercy/eed419cc4bd2da23cd93ddf9ec51e5d9

The idea behind the above is that the lowest level possible reports the errors 
to the central error reporter, while we decide what to do at the highest levels.

I have only addressed disk handling in that gist, but this generic approach 
also works for so-called "tablet" errors. Really I think they are fundamentally 
different:

1. actual disk failures - these are detected at a very low level when they are 
reported from POSIX calls.
2. invariant violations - this is corruption / mismatched checksums, as well as 
missing files, etc. These are not so much reported as actively checked for.

I think the above are different enough that we probably need 2 different types 
of callbacks but I think it makes sense to invoke them very differently, where 
we actively invoke "tablet" callbacks when we do these kinds of sanity checks 
but block manager code never reports the disk errors because it's all taken 
care of in env_posix.cc

LMK what you think.


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@80
PS4, Line 80:
> Ah, I see. Yeah, maybe putting it in util/status.h was the right call then
I see. Personally I find it confusing for two reasons: we call it 
RETURN_NOT_OK_CALL() when maybe we really mean RETURN_NOT_OK_EVAL(). So 
semantically it's a little tricky. 2nd, there is a danger of someone doing 
something like: RETURN_NOT_OK_CALL(DoSomething(), [&] { 
my_obj->invoke_err_callback(); }) because it will compile and run but the 
callback will never get called. We will probably get a compiler warning, but 
that's it. I think it would be less surprising to require the 2nd argument to 
be callable and make the usage I just noted the required idiom.



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 03:07:57 +
Gerrit-HasComments: Yes


[kudu-CR] Fix warnings from clang-6.0

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8538 )

Change subject: Fix warnings from clang-6.0
..

Fix warnings from clang-6.0

Running a nightly clang popped out a few new warnings that seemed like
true code smells. None seems to be a real bug, but may as well fix them.

Change-Id: Idac7fafa4d4b090566c359df297663ec53d43a6b
Reviewed-on: http://gerrit.cloudera.org:8080/8538
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley 
---
M src/kudu/client/client-test.cc
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/server/webserver-test.cc
3 files changed, 10 insertions(+), 10 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/8538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idac7fafa4d4b090566c359df297663ec53d43a6b
Gerrit-Change-Number: 8538
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Fix warnings from clang-6.0

2017-11-13 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8538 )

Change subject: Fix warnings from clang-6.0
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac7fafa4d4b090566c359df297663ec53d43a6b
Gerrit-Change-Number: 8538
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:57:43 +
Gerrit-HasComments: No


[kudu-CR] Add the ability to enable xray instrumentation

2017-11-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8537 )

Change subject: Add the ability to enable xray instrumentation
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8537/1/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/8537/1/CMakeLists.txt@333
PS1, Line 333:   endif()
 :   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address 
-DADDRESS_SANITIZER")
 : endif()
> i think if you set this, it's your own fault if the compiler barfs with "I
sgtm



--
To view, visit http://gerrit.cloudera.org:8080/8537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a8d5a27912e3b8d4db97b05906cbe7e87ba2cc8
Gerrit-Change-Number: 8537
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:49:31 +
Gerrit-HasComments: Yes


[kudu-CR] Add the ability to enable xray instrumentation

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8537 )

Change subject: Add the ability to enable xray instrumentation
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8537/1/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/8537/1/CMakeLists.txt@333
PS1, Line 333: if (${KUDU_USE_XRAY})
 :   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fxray-instrument")
 : endif()
> Should this depend on the type and version of the compiler used (i.e., don'
i think if you set this, it's your own fault if the compiler barfs with "I dont 
understand -fxray-instrument" rather than us trying to do version-based 
checking here



--
To view, visit http://gerrit.cloudera.org:8080/8537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a8d5a27912e3b8d4db97b05906cbe7e87ba2cc8
Gerrit-Change-Number: 8537
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:37:56 +
Gerrit-HasComments: Yes


[kudu-CR] Add the ability to enable xray instrumentation

2017-11-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8537 )

Change subject: Add the ability to enable xray instrumentation
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8537/1/CMakeLists.txt
File CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/8537/1/CMakeLists.txt@333
PS1, Line 333: if (${KUDU_USE_XRAY})
 :   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fxray-instrument")
 : endif()
Should this depend on the type and version of the compiler used (i.e., don't 
enable that for gcc and clang < v3.9?)



--
To view, visit http://gerrit.cloudera.org:8080/8537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a8d5a27912e3b8d4db97b05906cbe7e87ba2cc8
Gerrit-Change-Number: 8537
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:28:44 +
Gerrit-HasComments: Yes


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h@107
PS5, Line 107: // disk failure handling of the failed disks to return.
> The main gripe was that it would be confusing to add such a Wait() call since 
> developers in this layer of code would have to think something along the 
> lines of "Could this indirectly fail due to some error that should be 
> handled?" and if so, place in a Wait().

hmm.. isn't the only call site in my suggestion the 'CreateBlock' code where it 
determines there are no available disks? ie it doesn't have to change all the 
call sites in Tablet, etc, just the one place where we try to pick a disk to 
put a new block on?



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:23:05 +
Gerrit-HasComments: Yes


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h@107
PS5, Line 107: // disk failure handling of the failed disks to return.
> One thought on a potentially simpler approach:
A previous rev had this implemented (see rev 3), but Adar convinced me that 
simplicity at the call-sites is perhaps more important.

The main gripe was that it would be confusing to add such a Wait() call since 
developers in this layer of code would have to think something along the lines 
of "Could this indirectly fail due to some error that should be handled?" and 
if so, place in a Wait().

This approach, while adding slightly more complexity to the error manager, is 
simpler to use. If the block manager's gotten an error while doing something 
with a block (creating, writing to, etc.), it must be error-handled by one of 
the two callbacks. If a data dir UUID is available, call the disk-handling 
callback. Else, call the tablet-handling one.

Also, considering the error manager could in the future be used for other kinds 
of errors (e.g. corruptions, space errors, etc), separate classes of error 
types seems not unreasonable.



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:20:51 +
Gerrit-HasComments: Yes


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/5/src/kudu/fs/error_manager.h@107
PS5, Line 107: // disk failure handling of the failed disks to return.
One thought on a potentially simpler approach:

what if CreateNewBlock detected the case when all of the disks in the disk 
group are failed, and in that case, do something like the following before 
returning:

foreach disk {
  wait for error callback to be complete on this disk
}

In the case that the disk error happened long ago, this "wait" would be 
instantaneous. In the case that the callbacks are still in progress, it would 
block until the error handling had finished?

Then we don't need two separate types of callbacks? Or am I missing something 
about the issue at hand?



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 01:11:05 +
Gerrit-HasComments: Yes


[kudu-CR] Fix warnings from clang-6.0

2017-11-13 Thread Todd Lipcon (Code Review)
Hello Will Berkeley,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8538

to review the following change.


Change subject: Fix warnings from clang-6.0
..

Fix warnings from clang-6.0

Running a nightly clang popped out a few new warnings that seemed like
true code smells. None seems to be a real bug, but may as well fix them.

Change-Id: Idac7fafa4d4b090566c359df297663ec53d43a6b
---
M src/kudu/client/client-test.cc
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/server/webserver-test.cc
3 files changed, 10 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/38/8538/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idac7fafa4d4b090566c359df297663ec53d43a6b
Gerrit-Change-Number: 8538
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] tablet: check for stopped in drivers of IO

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8441 )

Change subject: tablet: check for stopped in drivers of IO
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8441/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8441/4//COMMIT_MSG@15
PS4, Line 15: before returning, it
: must again check that the tablet has been stopped
per comment in the code, not sure why you have to check on the way out in the 
success case?


http://gerrit.cloudera.org:8080/#/c/8441/4/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8441/4/src/kudu/tablet/tablet.cc@293
PS4, Line 293:   return CheckActiveState();
it strikes me as odd here (and a few places below) that you are calling 
CheckActiveState on the way out of the success-path of these functions. I would 
expect that if I got an error here that it wouldn't have transitioned to a new 
state


http://gerrit.cloudera.org:8080/#/c/8441/4/src/kudu/tablet/tablet.cc@369
PS4, Line 369:   return CheckActiveState();
same here - this check is already racy, right?


http://gerrit.cloudera.org:8080/#/c/8441/4/src/kudu/tablet/tablet.cc@957
PS4, Line 957: return CheckActiveState();
same, not sure of the point of doing these on the way out and not just on the 
way in



--
To view, visit http://gerrit.cloudera.org:8080/8441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141
Gerrit-Change-Number: 8441
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:42:36 +
Gerrit-HasComments: Yes


[kudu-CR] Add the ability to enable xray instrumentation

2017-11-13 Thread Todd Lipcon (Code Review)
Hello Dan Burkert,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8537

to review the following change.


Change subject: Add the ability to enable xray instrumentation
..

Add the ability to enable xray instrumentation

Change-Id: I2a8d5a27912e3b8d4db97b05906cbe7e87ba2cc8
---
M CMakeLists.txt
1 file changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/8537/1
--
To view, visit http://gerrit.cloudera.org:8080/8537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2a8d5a27912e3b8d4db97b05906cbe7e87ba2cc8
Gerrit-Change-Number: 8537
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Dan Burkert 


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8531 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Patch Set 1: Verified+1

Seems I hit KUDU-1863 (deadlock while shutting down master) in the precommit, 
but unrelated to this.


--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:25:13 +
Gerrit-HasComments: No


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Removed Verified+1 by Todd Lipcon 
--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] mvcc: allow tablet shutdown without completing txs

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7439 )

Change subject: mvcc: allow tablet shutdown without completing txs
..


Patch Set 23:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/7439/23//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7439/23//COMMIT_MSG@49
PS23, Line 49: a new test stop_tablet-itest is added to ensure stopped leaders 
don't
 :   complete writes and stopped followers do
having trouble parsing this. why do followers complete writes?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/integration-tests/stop_tablet-itest.cc
File src/kudu/integration-tests/stop_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/integration-tests/stop_tablet-itest.cc@82
PS23, Line 82:  *ret = *ret %
nit: %=


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/integration-tests/stop_tablet-itest.cc@90
PS23, Line 90: T $0 P $1
nit: usually we do P ... T ... (peer before tablet id)
Probably better to be consistent for easier grepping


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/integration-tests/stop_tablet-itest.cc@102
PS23, Line 102: NO_PENDING_FATALS();
I don't think this has any effect - it would just return if there is a pending 
fatal. it needs to be after any _calls_ to this function


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/integration-tests/stop_tablet-itest.cc@129
PS23, Line 129:   // or -1 if it can't be found.
sort of odd to return -1 instead of bad status here. Up above 
GetTserverNumToFail would end up returning -1 in the case this returns -1 which 
seems not right (may lead to crash?)


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@181
PS23, Line 181:  If this returns an error, the Tablet must
  :   // have been stopped.
Similar to some comments on earlier revs, I find this doc hard to interpret.

Is this a guarantee that: if this returns an error, then the Tablet state will 
have already transitioned to stopped state? In that case I think it is better 
to say "If this returns an error, it is guaranteed that the tablet will be in 
stopped state upon return."

Or, if, say, we have some storage problem in our compaction code or something, 
this could return an error but _not_ have the tablet in stopped state, in which 
case the caller should probably be CHECKing?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@190
PS23, Line 190:   // have been stopped.
same as above


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@408
PS23, Line 408:   // This method is not thread safe.
nit: it seems this is just copying from above, but would be nice if this 
clarified the thread safety a bit better. Should it be called while holding 
some other lock? When's it safe to call?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@465
PS23, Line 465:   void set_state(State s) {
maybe name set_state_unlocked


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@466
PS23, Line 466: state_ = s;
can you add a DCHECK assert on state_lock_ here?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@594
PS23, Line 594:   // ones specified by 'expected_states'.
it is surprising to me that passing an empty set here, then, has the effect 
that it does. Maybe this should be two separate methods?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@597
PS23, Line 597:   Status CheckActiveState(const std::set& 
expected_states = {}) const;
given that we expect the set to be very small, I think a vector is probably 
more efficient here than std::set


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.h@683
PS23, Line 683: potecting
typo


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.cc@299
PS23, Line 299: &&
you mean ||?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.cc@879
PS23, Line 879: expected_states.empty()
is this necessary?


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tablet/tablet.cc@1120
PS23, Line 1120: kOpen, kBootstrapping
how is this different than checking for {}? see other comment about maybe 
splitting this into separate methods that are a little easier to understand


http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/7439/23/src/kudu/tserver/tablet_service.cc@1761
PS23, Line 1761: LOG(WARNING) << "Error setting up scanner with request " 
<< SecureShortDebugString(*req);
this should probably include s.ToString()



--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit 

[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@80
PS4, Line 80:
> Hmm. If err_handler is a closure or a std::function then no, this will not
Ah, I see. Yeah, maybe putting it in util/status.h was the right call then for 
clarity, since it follows similar calls (e.g. RETURN_NOT_OK_RET). Also maybe 
renaming 'err_handler' to 'on_error' helps there too.

Sample usage:
 RETURN_NOT_OK_CALL(s, error_manager->RunErrorCallback());
or in your example:
 RETURN_NOT_OK_CALL(s, f());



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 23:56:56 +
Gerrit-HasComments: Yes


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@80
PS4, Line 80: (err_handler)
> Not sure I fully understand, I thought () would invoke, even if it's not re
Hmm. If err_handler is a closure or a std::function then no, this will not 
invoke it.

Example:

  #include 

  int main()
  {
auto f = [] { std::cout << "Invoked" << std::endl; };
f();
(f);
  }

The above only prints "Invoked" one time.



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 23:53:00 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8536 )

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
..


Patch Set 1:

(4 comments)

Injection looks good, mostly nits here

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.h
File src/kudu/util/kernel_stack_watchdog.h:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.h@176
PS1, Line 176: static void ThreadExiting(void* tls_void);
nit: maybe doc here that this is used internally by CreateTLS to create 
thread-local destructors?


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.cc
File src/kudu/util/kernel_stack_watchdog.cc:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.cc@151
PS1, Line 151:  vector to_delete;
 : {
 :   lock_guard l(tls_lock_);
 :   to_delete.swap(pending_delete_);
 :   tls_map_copy = tls_by_tid_;
 : }
 : to_delete.clear();
Is it important to document somewhere that the pending TLS instances only get 
d'ted in RunThread? Or is that more of an implementation detail.


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc
File src/kudu/util/stack_watchdog-test.cc:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc@139
PS1, Line 139: std::t
nit: drop std::


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc@139
PS1, Line 139: i
nit: could replace with `started % threads.size()`? One fewer variable to think 
about, as trivial as it is



--
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 13 Nov 2017 23:45:13 +
Gerrit-HasComments: Yes


[kudu-CR] Strip Hadoop and Hive tarballs of unecessary lib jars

2017-11-13 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8535 )

Change subject: Strip Hadoop and Hive tarballs of unecessary lib jars
..

Strip Hadoop and Hive tarballs of unecessary lib jars

The huge number of jars being added to the HMS classpath is causing the
HMS to take 60+ seconds to initialize on under-provisioned test boxes.
I've removed the unecessary lib jars using the included scripts and
uploaded them to the dependency S3 bucket with a '-stripped' suffix.

Change-Id: I9e0607259c5a21c65f0e1418481b1d100864d742
Reviewed-on: http://gerrit.cloudera.org:8080/8535
Reviewed-by: Todd Lipcon 
Tested-by: Kudu Jenkins
---
M thirdparty/download-thirdparty.sh
A thirdparty/package-hadoop.sh
A thirdparty/package-hive.sh
M thirdparty/package-llvm.sh
M thirdparty/vars.sh
5 files changed, 106 insertions(+), 9 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9e0607259c5a21c65f0e1418481b1d100864d742
Gerrit-Change-Number: 8535
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] MiniHms: log HMS thread stacks when startup times out

2017-11-13 Thread Dan Burkert (Code Review)
Dan Burkert has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8534 )

Change subject: MiniHms: log HMS thread stacks when startup times out
..

MiniHms: log HMS thread stacks when startup times out

We're seeing instances where the HMS can take more than 60 seconds to
startup on test machines. This commit changes MiniHms to send SIGQUIT to
the HMS on timeout, in order to capture thread stack traces of the slow
process.

Change-Id: Ifefb3b16ef097412604d555d5c2118aca3d998a9
Reviewed-on: http://gerrit.cloudera.org:8080/8534
Reviewed-by: Todd Lipcon 
Tested-by: Kudu Jenkins
---
M src/kudu/hms/mini_hms.cc
1 file changed, 7 insertions(+), 5 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifefb3b16ef097412604d555d5c2118aca3d998a9
Gerrit-Change-Number: 8534
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration

2017-11-13 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8312 )

Change subject: KUDU-2191 (4/n): WIP: Hive Metastore catalog manager integration
..


Patch Set 3:

(14 comments)

This is going to be significantly reworked in the near future.  Leaving open 
for now because I do want to integrate the feedback.

http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc
File src/kudu/integration-tests/master_hms-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@58
PS2, Line 58: using hms::HmsClient;
> warning: using decl 'KuduTable' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@65
PS2, Line 65: class MasterHmsTest : public KuduTest {
> warning: using decl 'MiniHms' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@67
PS2, Line 67:  public:
> warning: using decl 'set' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@71
PS2, Line 71:   void SetUp() override {
> warning: using decl 'Split' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@72
PS2, Line 72: KuduTest::SetUp();
> warning: using decl 'Substitute' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/integration-tests/master_hms-itest.cc@112
PS2, Line 112:
> warning: passing result of std::move() as a const reference argument; no mo
???


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc
File src/kudu/master/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@43
PS2, Line 43: using strings::CharSet;
> warning: using decl 'function' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@47
PS2, Line 47:
> warning: using decl 'Substitute' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@54
PS2, Line 54:
> warning: initializer for member 'lock_' is redundant [readability-redundant
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@161
PS2, Line 161:   rpc.callback = s.AsStdStatusCallback();
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@182
PS2, Line 182:   table->parameters = {
> warning: parameter 'schema' is unused [misc-unused-parameters]
this will be used in a later rev once were handling columns correctly.


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/master/hms_catalog.cc@250
PS2, Line 250:   }
> warning: the parameter 's' is copied for each invocation but only used as a
Done


http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc
File src/kudu/master/hms_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8312/3/src/kudu/master/hms_catalog.cc@203
PS3, Line 203:   for (int idx = 0; idx < table.size(); idx++) {
> I feel like this code could be written more concisely using some of the stu
perhaps we should just do the split on '.', and send the two parts to hive to 
validate.  We would probably just screw it up if we tried to recreate their 
rules.


http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/util/net/net_util.h
File src/kudu/util/net/net_util.h:

http://gerrit.cloudera.org:8080/#/c/8312/2/src/kudu/util/net/net_util.h@133
PS2, Line 133: bool ValidateAddressListFlag(const char* flag_name, const 
std::string& addr_list);
> warning: function 'kudu::ValidateAddressListFlag' has a definition with dif
Done



--
To view, visit http://gerrit.cloudera.org:8080/8312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8f360408b05d6ac4f7359a8ac0dd889b196baf
Gerrit-Change-Number: 8312
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:54:33 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2191 (5/n): WIP: Hive Metastore notification log event listener

2017-11-13 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8313 )

Change subject: KUDU-2191 (5/n): WIP: Hive Metastore notification log event 
listener
..


Patch Set 3:

This is going to be significantly reworked in the near future.  Leaving open 
for now because I do want to integrate the feedback.


--
To view, visit http://gerrit.cloudera.org:8080/8313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ed099c44a593ffe514152135957018f21ed775
Gerrit-Change-Number: 8313
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:54:25 +
Gerrit-HasComments: No


[kudu-CR] mvcc: allow tablet shutdown without completing txs

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7439 )

Change subject: mvcc: allow tablet shutdown without completing txs
..


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7439/22/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/7439/22/src/kudu/tablet/tablet.h@562
PS22, Line 562:
> Between Stopped() and CheckNotStopped() I think we are introducing a new li
Done



--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-Change-Number: 7439
Gerrit-PatchSet: 23
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:51:55 +
Gerrit-HasComments: Yes


[kudu-CR] tablet: check for stopped in drivers of IO

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8441 )

Change subject: tablet: check for stopped in drivers of IO
..


Patch Set 4:

I've changed the part of this patch that tries to enforce consistency with a 
higher-level check that the Tablet has been stopped.


--
To view, visit http://gerrit.cloudera.org:8080/8441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141
Gerrit-Change-Number: 8441
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:50:59 +
Gerrit-HasComments: No


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8395

to look at the new patch set (#5).

Change subject: error_manager: synchronize/serialize handling
..

error_manager: synchronize/serialize handling

The FsErrorManager is helper infrasturcture that other classes can use to
help provide API contracts related to error handling.

Here is an example error-handling contract provided to the TSTabletManager
in a future patch:
If a disk failure Status is returned during tablet operation, either:
1) the tablet server will crash and we can rely on Kudu's crash-consistency
   mechanisms for safety, or
2) any affected Tablet will transition to the 'kStopped' state via an error
   manager callback. Most components will simply pass the non-OK Status up
   the call chain. However, if a method expects an IO operation to
   succeed for correctness purposes, and it receives an error Status,
   then it should check that the Tablet is stopped. If so, it can be
   assumed that the error was handled. Otherwise, Kudu must rely on the
   crash-consistency mechanisms and crash.

Given the above contract, The state of a tablet server post-disk-failure
depends significantly on the completion of disk-failure-handling
callbacks. Error-handling _must_ finish before anything is propagated
back to the offending caller.

To ensure completion of error-handling even in a concurrent setting,
this patch extends the error manager with a lock and a second
error-handling callback type. This ensures that errors indirectly caused
by disk failures can be handled by non-disk-specific handling,
serializing failure-handling in the same fashion.

As an example of where this is necessary, say a tablet has data in a
single directory and hits a bad disk. That directory is immediately
marked failed and handling starts to fail all tablets in the directory.
Before, if the tablet were to create a new block before being failed, it
would fail immediately, complaining that no directories are available,
and would eventually fail a CHECK that translates roughly to: "Has error
handling for this tablet completed?"

By wrapping block creation with tablet-specific error handling and with
serialized error-handling, this CHECK will pass.

Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
---
M src/kudu/fs/CMakeLists.txt
M src/kudu/fs/data_dirs-test.cc
A src/kudu/fs/error_manager-test.cc
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/tserver/tablet_server.cc
M src/kudu/util/status.h
11 files changed, 323 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/8395/5
--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Updated known limitations to clarify column and row maximum size recommendations.

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8514 )

Change subject: Updated known limitations to clarify column and row maximum 
size recommendations.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8514/2/docs/known_issues.adoc
File docs/known_issues.adoc:

http://gerrit.cloudera.org:8080/#/c/8514/2/docs/known_issues.adoc@73
PS2, Line 73: single column
would this be clearer to say "single cell" instead of column?



--
To view, visit http://gerrit.cloudera.org:8080/8514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I29d94d475f072b667d8fdb28f48c8fc129185f20
Gerrit-Change-Number: 8514
Gerrit-PatchSet: 2
Gerrit-Owner: Peter Ebert
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:48:36 +
Gerrit-HasComments: Yes


[kudu-CR] mvcc: allow tablet shutdown without completing txs

2017-11-13 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Mike Percy, David Ribeiro Alves, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7439

to look at the new patch set (#23).

Change subject: mvcc: allow tablet shutdown without completing txs
..

mvcc: allow tablet shutdown without completing txs

Currently, the only way to stop an Applying transaction is to wait for
it to finish and Commit it. This constraint was put in place to
guarantee on-disk correctness, but is sometimes too strict. E.g. if the
tablet is shutting down, the Apply doesn't need to finish.

This patch adds a new state to the MvccManager in which it is closed
for transactions. Once in this closed state:
1. New Applies will return and not move to the Commit phase, and any
   methods waiting for the tablet's Applies to Commit (e.g. new snapshot
   scans, FlushMRS) will respond with an error immediately. This allows
   an escape from the existing invariant that Applies _must_ Commit,
   provided the MvccManager is in this closed state.
2. Applies that are already underway may still Commit, but will return
   early on a best-effort basis. These non-Committed operations are
   inconsequential w.r.t. consistency; having some in-flight
   transactions Commit and others not is consistent with the server
   shutting down in between the Commits of two transactions.
3. New transactions drivers will abort immediately before even reaching
   the Prepare phase, ensuring no more writes to the tablet are made
   durable.

The Tablet class uses this closed MVCC state in a new "stopped" state of
its own. A Tablet that has been stopped will avoid further activity:
its MvccManager is closed to prevent further writes, and its maintenance
ops are cancelled to prevent further scheduling.

This patch includes these new behaviors when shutting down a tablet,
with the assumption that a tablet will only be shut down when it's being
deleted and we don't care too much about its in-flight transactions
Committing or its further maintenance ops. Code paths that previously
crashed if Applies did not succeed (e.g. TransactionDriver::ApplyTask,
MvccManager::AbortTransaction, etc.) or that waited for Applies to
finish (e.g. Tablet:: FlushUnlocked) will now _not_ crash if the Tablet
has been stopped and will log a warning instead.

Testing is done by adding the following:
- a test in mvcc-test to shut down MVCC and delete an Applying
  transaction, ensuring that there are no errors when it leaves scope.
- a test in mvcc-test to wait on an Applying transaction, shut down
  MVCC, and ensure that any waiters will return with an error.
- a new test stop_tablet-itest is added to ensure stopped leaders don't
  complete writes and stopped followers do, and stopped tablets don't
  prevent fault-tolerant scans

Change-Id: I983620f27e7226806a2cca253db7619731914d42
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/stop_tablet-itest.cc
M src/kudu/tablet/local_tablet_writer.h
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tserver/tablet_service.cc
16 files changed, 631 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/7439/23
--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-Change-Number: 7439
Gerrit-PatchSet: 23
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager-test.cc
File src/kudu/fs/error_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager-test.cc@20
PS4, Line 20: #include 
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager-test.cc@27
PS4, Line 27: #include "kudu/gutil/map-util.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager-test.cc@40
PS4, Line 40: using strings::Substitute;
> warning: using decl 'Substitute' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@80
PS4, Line 80: (err_handler)
> This doesn't appear to actually invoke err_handler
Not sure I fully understand, I thought () would invoke, even if it's not 
returning the resulting value?
Anyway, I'm moving this to util/status.h. Usage is in 
{log,file}_block_manager.cc.


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@96
PS4, Line 96: etesbefore
> nit: space
Done


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@101
PS4, Line 101: class FsErrorManager {
> Here's how I think about the Error Manager and how I think we should docume
Sure, I'll put this in the commit message for now.


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@126
PS4, Line 126: InsertOrUpdate(_map_, e, std::move(cb));
> warning: passing result of std::move() as a const reference argument; no mo
Done


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@152
PS4, Line 152:   std::map cb_map_;
> Given that there are only two callback types, I don't think there's a need
Done



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:48:50 +
Gerrit-HasComments: Yes


[kudu-CR] tablet: check for stopped in drivers of IO

2017-11-13 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Mike Percy, Kudu Jenkins, Adar Dembo, Todd 
Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8441

to look at the new patch set (#4).

Change subject: tablet: check for stopped in drivers of IO
..

tablet: check for stopped in drivers of IO

This patch adds the invariant that if a tablet function drives I/O, it
must first check that the tablet has not been stopped, and, before
returning, it must again check that the tablet has not been stopped.

E.g. Tablet::FlushUnlocked() is a function that drives IO and it is
drives the operation of flush MRSs. Thus, before doing anything, it must
check that the tablet has not been stopped, and before returning, it
must again check that the tablet has been stopped. The calling
maintenance manager thread must be able to handle errors returned in
this way.

For now, this is an optimization to allow tablet shutdown without
having to wait on maintenance ops. In the future, this can be used to
allow errors that leave in-memory state inconsistent to not crash,
provided they first stop the tablet. This would guarantee that nothing
can successfully return with inconsistent results.

In preparation for this, there are various fatal safety checks
surrounding such driving functions that are updated to simply return
errors. Their call-sites have been updated to check the state of the
tablet, ensuring these failures are safe.

This patch also refactors some void methods to return Statuses where
these checks may be possible, and vise versa when Statuses are not
needed.

Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141
---
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/metadata-test.cc
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
12 files changed, 215 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/8441/4
--
To view, visit http://gerrit.cloudera.org:8080/8441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141
Gerrit-Change-Number: 8441
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add initial internal INT128/ int128 support

2017-11-13 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8533 )

Change subject: Add initial internal INT128/__int128 support
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc
File src/kudu/util/int128.cc:

http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc@30
PS2, Line 30: v = -v;
This is wrong for the minimum value. FastInt128ToBufferLeft does it right.


http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc@37
PS2, Line 37: std::ostream& operator<<(std::ostream& os, const unsigned 
__int128& val) {
> The main reason is that this implementation is very heavily borrowed from i
I think it'd be better to call into that implementation here in order to define 
operator<<, so we don't have two implementations of the same algorithm in the 
codebase.  The google impl is also a bit simpler (no loop).



--
To view, visit http://gerrit.cloudera.org:8080/8533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
Gerrit-Change-Number: 8533
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:36:29 +
Gerrit-HasComments: Yes


[kudu-CR] Strip Hadoop and Hive tarballs of unecessary lib jars

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8535 )

Change subject: Strip Hadoop and Hive tarballs of unecessary lib jars
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e0607259c5a21c65f0e1418481b1d100864d742
Gerrit-Change-Number: 8535
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:34:16 +
Gerrit-HasComments: No


[kudu-CR] MiniHms: log HMS thread stacks when startup times out

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8534 )

Change subject: MiniHms: log HMS thread stacks when startup times out
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifefb3b16ef097412604d555d5c2118aca3d998a9
Gerrit-Change-Number: 8534
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:32:53 +
Gerrit-HasComments: No


[kudu-CR] Add initial internal INT128/ int128 support

2017-11-13 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8533 )

Change subject: Add initial internal INT128/__int128 support
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc
File src/kudu/util/int128.cc:

http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc@25
PS2, Line 25: // Print the value in base 10 by converting v into parts that are 
base
> I think this docstring is referring to the unsigned version.
Right, will fix. This was a result of modifying similar Impala functionality.


http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc@37
PS2, Line 37: std::ostream& operator<<(std::ostream& os, const unsigned 
__int128& val) {
> Why isn't this implemented using FastInt128ToBufferLeft and a stack-allocat
The main reason is that this implementation is very heavily borrowed from 
impala here: 
https://github.com/apache/incubator-impala/blob/2e63752858d71cc745534367a686980e060a8180/be/src/runtime/multi-precision.cc



--
To view, visit http://gerrit.cloudera.org:8080/8533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
Gerrit-Change-Number: 8533
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:27:39 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

2017-11-13 Thread Todd Lipcon (Code Review)
Hello Andrew Wong,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8536

to review the following change.


Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
..

KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit

This changes the stack watchdog so that thread unregistration no longer
blocks if the watchdog thread is in the middle of dumping a stack.

This is to try to avoid cases where a user thread is waiting to join on
another thread, but that thread is blocked due to watchdog interference.

A new stress-test/benchmark verifies the improvement. It simulates slow
stack trace collection by injecting latency into the watchdog thread,
and then starts and joins threads in a loop for 5 seconds. Without the
fix, it was only able to start about 1000 threads/second, whereas with
the fix it's able to start 10,000 threads/second.

Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
---
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/kernel_stack_watchdog.h
M src/kudu/util/stack_watchdog-test.cc
5 files changed, 139 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/8536/1
--
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Andrew Wong 


[kudu-CR] Add initial internal INT128/ int128 support

2017-11-13 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8533 )

Change subject: Add initial internal INT128/__int128 support
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc
File src/kudu/util/int128.cc:

http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc@25
PS2, Line 25: // Print the value in base 10 by converting v into parts that are 
base
I think this docstring is referring to the unsigned version.


http://gerrit.cloudera.org:8080/#/c/8533/2/src/kudu/util/int128.cc@37
PS2, Line 37: std::ostream& operator<<(std::ostream& os, const unsigned 
__int128& val) {
Why isn't this implemented using FastInt128ToBufferLeft and a stack-allocated 
buffer?



--
To view, visit http://gerrit.cloudera.org:8080/8533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
Gerrit-Change-Number: 8533
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 13 Nov 2017 22:16:17 +
Gerrit-HasComments: Yes


[kudu-CR] Strip Hadoop and Hive tarballs of unecessary lib jars

2017-11-13 Thread Dan Burkert (Code Review)
Hello Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8535

to review the following change.


Change subject: Strip Hadoop and Hive tarballs of unecessary lib jars
..

Strip Hadoop and Hive tarballs of unecessary lib jars

The huge number of jars being added to the HMS classpath is causing the
HMS to take 60+ seconds to initialize on under-provisioned test boxes.
I've removed the unecessary lib jars using the included scripts and
uploaded them to the dependency S3 bucket with a '-stripped' suffix.

Change-Id: I9e0607259c5a21c65f0e1418481b1d100864d742
---
M thirdparty/download-thirdparty.sh
A thirdparty/package-hadoop.sh
A thirdparty/package-hive.sh
M thirdparty/package-llvm.sh
M thirdparty/vars.sh
5 files changed, 106 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/8535/1
--
To view, visit http://gerrit.cloudera.org:8080/8535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e0607259c5a21c65f0e1418481b1d100864d742
Gerrit-Change-Number: 8535
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] MiniHms: log HMS thread stacks when startup times out

2017-11-13 Thread Dan Burkert (Code Review)
Hello Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8534

to review the following change.


Change subject: MiniHms: log HMS thread stacks when startup times out
..

MiniHms: log HMS thread stacks when startup times out

We're seeing instances where the HMS can take more than 60 seconds to
startup on test machines. This commit changes MiniHms to send SIGQUIT to
the HMS on timeout, in order to capture thread stack traces of the slow
process.

Change-Id: Ifefb3b16ef097412604d555d5c2118aca3d998a9
---
M src/kudu/hms/mini_hms.cc
1 file changed, 7 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/8534/1
--
To view, visit http://gerrit.cloudera.org:8080/8534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifefb3b16ef097412604d555d5c2118aca3d998a9
Gerrit-Change-Number: 8534
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Burkert 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add initial internal INT128/ int128 support

2017-11-13 Thread Grant Henke (Code Review)
Hello Tidy Bot, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8533

to look at the new patch set (#2).

Change subject: Add initial internal INT128/__int128 support
..

Add initial internal INT128/__int128 support

This patch adds internal support for INT128/__int128 in Kudu.
This preliminary functionality is required to support the DECIMAL
type in KUDU-721.

Follow up patches will be added to review and discus making the
INT128 column type publicly exposed.

Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
---
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/encoding-test.cc
M src/kudu/cfile/type_encodings.cc
M src/kudu/common/common.proto
M src/kudu/common/types.cc
M src/kudu/common/types.h
M src/kudu/gutil/CMakeLists.txt
M src/kudu/gutil/endian.h
M src/kudu/gutil/integral_types.h
M src/kudu/gutil/mathlimits.cc
M src/kudu/gutil/mathlimits.h
A src/kudu/gutil/strings/numbers-test.cc
M src/kudu/gutil/strings/numbers.cc
M src/kudu/gutil/strings/numbers.h
M src/kudu/gutil/strings/substitute.h
M src/kudu/gutil/type_traits.h
M src/kudu/util/CMakeLists.txt
A src/kudu/util/int128-test.cc
A src/kudu/util/int128.cc
A src/kudu/util/int128.h
21 files changed, 327 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/8533/2
--
To view, visit http://gerrit.cloudera.org:8080/8533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
Gerrit-Change-Number: 8533
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot


[kudu-CR] error manager: synchronize/serialize handling

2017-11-13 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@80
PS4, Line 80: (err_handler)
This doesn't appear to actually invoke err_handler


http://gerrit.cloudera.org:8080/#/c/8395/4/src/kudu/fs/error_manager.h@101
PS4, Line 101: class FsErrorManager {
Here's how I think about the Error Manager and how I think we should document 
it.

Document that the Error Manger is helper infrastructure that other classes can 
use to provide API contracts related to error handling.

Here is an example error handling contract provided by TSTabletManager in a 
follow-up patch:

> If an IO error occurs during tablet operation, either: (1) the tablet server
> will crash, or (2) any affected Tablet will transition to STOPPED state by
> means of an ErrorManager callback. Most components can ignore this, and
> should simply use RETURN_NOT_OK() to pass any non-OK Status back up the call
> chain. However, if a component expects an IO operation to succeed for
> correctness purposes, and it receives an error Status from that operation,
> then it should check whether the Tablet is STOPPED. If the Tablet is still
> RUNNING, that component may still need to crash the process. However, if the
> affected Tablet is STOPPED, it can be assumed that the error was handled and
> the component can return or exit, relying on the correctness guarantees
> provided by Kudu's crash-consistency mechanisms.

Maybe we document this as an example until we merge the follow-up that 
implements it, at which time we point to the header file that documents that 
contract instead of providing it here.

I agree that it's important to indicate the technical limitations and 
guarantees of the FsErrorManager API itself as well.



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 13 Nov 2017 20:58:01 +
Gerrit-HasComments: Yes


[kudu-CR] Add initial internal INT128/ int128 support

2017-11-13 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8533


Change subject: Add initial internal INT128/__int128 support
..

Add initial internal INT128/__int128 support

This patch adds internal support for INT128/__int128 in Kudu.
This preliminary functionality is required to support the DECIMAL
type in KUDU-721.

Follow up patches will be added to review and discus making the
INT128 column type publicly exposed.

Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
---
M src/kudu/cfile/bshuf_block.h
M src/kudu/cfile/cfile-test-base.h
M src/kudu/cfile/encoding-test.cc
M src/kudu/cfile/type_encodings.cc
M src/kudu/common/common.proto
M src/kudu/common/types.cc
M src/kudu/common/types.h
M src/kudu/gutil/CMakeLists.txt
M src/kudu/gutil/endian.h
M src/kudu/gutil/integral_types.h
M src/kudu/gutil/mathlimits.cc
M src/kudu/gutil/mathlimits.h
A src/kudu/gutil/strings/numbers-test.cc
M src/kudu/gutil/strings/numbers.cc
M src/kudu/gutil/strings/numbers.h
M src/kudu/gutil/strings/substitute.h
M src/kudu/gutil/type_traits.h
M src/kudu/util/CMakeLists.txt
A src/kudu/util/int128-test.cc
A src/kudu/util/int128.cc
A src/kudu/util/int128.h
21 files changed, 327 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/8533/1
--
To view, visit http://gerrit.cloudera.org:8080/8533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I36458a54dfdd28be03f80d83688c0d658944e8e1
Gerrit-Change-Number: 8533
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 


[kudu-CR](branch-1.4.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8530 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

--
Backport notes
---
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin 
Tested-by: Alexey Serbin 
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Reviewed-on: http://gerrit.cloudera.org:8080/8530
Reviewed-by: Alexey Serbin 
Tested-by: Kudu Jenkins
---
M src/kudu/server/hybrid_clock.cc
M src/kudu/server/hybrid_clock.h
2 files changed, 23 insertions(+), 51 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8530
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR](branch-1.4.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8530 )

Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8530
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Mon, 13 Nov 2017 20:15:58 +
Gerrit-HasComments: No


[kudu-CR](branch-1.2.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8532


Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

--
Backport notes
---
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin 
Tested-by: Alexey Serbin 
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
---
M src/kudu/server/hybrid_clock.cc
M src/kudu/server/hybrid_clock.h
2 files changed, 23 insertions(+), 51 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/8532/1
--
To view, visit http://gerrit.cloudera.org:8080/8532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8532
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 


[kudu-CR](branch-1.3.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8531


Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

--
Backport notes
---
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin 
Tested-by: Alexey Serbin 
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
---
M src/kudu/server/hybrid_clock.cc
M src/kudu/server/hybrid_clock.h
2 files changed, 23 insertions(+), 51 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/8531/1
--
To view, visit http://gerrit.cloudera.org:8080/8531
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8531
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 


[kudu-CR](branch-1.4.x) KUDU-2209. HybridClock doesn't handle changes in STA NANO flag

2017-11-13 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8530

to review the following change.


Change subject: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
..

KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

--
Backport notes
---
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin 
Tested-by: Alexey Serbin 
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
---
M src/kudu/server/hybrid_clock.cc
M src/kudu/server/hybrid_clock.h
2 files changed, 23 insertions(+), 51 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/8530/1
--
To view, visit http://gerrit.cloudera.org:8080/8530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Gerrit-Change-Number: 8530
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin