Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11052 )

Change subject: IMPALA-6644: Add last heartbeat timestamp into Statestore metric
......................................................................


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.h@444
PS5, Line 444:     int64_t last_heartbeat_ts_;
nit: technically this should be atomic, right? The "monitor" thread is 
accessing this variable only under 'subscribers_lock_', but this variable gets 
*set* from the heartbeat-sending thread which doesn't hold that lock, if I read 
the code correctly.

In practice it probably doesn't matter since it's just an int and x86 has a 
strong enough memory model, but I think it would be better to use a 
std::atomic<int64_t> here so that tools like TSAN don't complain.


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.h@687
PS5, Line 687:   [[noreturn]] void MonitorSubscriberHeartbeat();
the 'MainLoop' call seems to indicate that it would run until some exit flag is 
set, but I don't see the exit flag. So, i'm not 100% sure: is it possible that 
a StateStore object ever destructs? If it did, I think this thread would keep 
on running and then crash as it accessed the freed memory of the StateStore 
object.

If we believe that StateStore can never destruct, can we add a destructor which 
does something like a LOG(FATAL) << "Cannot shut down StateStore object once 
started" in the case that it sees that Init() was called? That way if someone 
tries to destruct a StateStore we'll get an obvious crash instead of a 
tricky-to-debug one


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@78
PS5, Line 78: hearbeats
nit: typo


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@315
PS5, Line 315:     last_heartbeat_ts_(0) {
is there any race here when a subscriber first registers before it sends a 
heartbeat where it will report as something like 48 years since heartbeat? 
Perhaps this should be set to the current time instead?


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@389
PS5, Line 389:   DCHECK_GT(timestamp_ms, last_heartbeat_ts_);
if these are wall times, it's possible for them to go backwards if the host 
clock resets. Perhaps we should be using monotonic times here to avoid this 
issue?


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@542
PS5, Line 542:     Value last_heartbeat_timestamp(
for the purposes of the user understanding this data, I'd think it would be 
more usable to output the number of seconds since the heartbeat, instead of the 
last heartbeat time. It's much easier to scan down a list of 100 subscribers 
and find the one with a big value, instead of trying to scan down a list of 
timestamps to find the one that is farthest in the past.


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@984
PS5, Line 984:   int num_subscribers, num_active_subscribers;
You don't gain any real performance by declaring these outside of the loop, and 
I think you lose some clarity. For the ints, I don't think it makes a 
performance difference at all, since the compiler will just allocate stack 
slots for them either way, and there is no constructor/destructor. For the 
vector, in theory there might be some gain, but given this loop only runs once 
a minute I don't think saving potentially 1 tiny allocation matters.

Instead, could you move these down to inside the for loop so it's clear they 
don't cross from one loop iteration to the next? I think doing so will remove 
some lines of code and make it more obvious that only 'previous_timestamp' is 
carried from one iteration to the next.


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@990
PS5, Line 990:       num_subscribers = subscribers_.size();
maybe it would be clearer to not track this incrementally, but instead, just 
before the log statement, do:

int num_active_subscrbers = num_subscribers - inactive_subscribers.size();

since it would reduce some extra branching and mental overhead


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@992
PS5, Line 992: SubscriberMap::value_type
nit: i think using 'const auto&' is just as good here


http://gerrit.cloudera.org:8080/#/c/11052/5/be/src/statestore/statestore.cc@998
PS5, Line 998:     LOG(INFO) << num_active_subscribers << "/" << num_subscribers
is it worth escalating this to WARN level if there are any inactive?



--
To view, visit http://gerrit.cloudera.org:8080/11052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I754adccc4569e8219d5d01500cccdfc8782953f7
Gerrit-Change-Number: 11052
Gerrit-PatchSet: 5
Gerrit-Owner: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Wed, 08 Aug 2018 00:48:47 +0000
Gerrit-HasComments: Yes

Reply via email to