Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8758 )
Change subject: IMPALA-6190/6246: Add instances tab and event sequence ...................................................................... Patch Set 4: (18 comments) http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/coordinator.h@187 PS4, Line 187: instances instances' stats http://gerrit.cloudera.org:8080/#/c/8758/3/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/8758/3/be/src/runtime/fragment-instance-state.cc@241 PS3, Line 241: UpdateState(StateEvent::CODEGEN_END); > Yes, these overlap. I thought it'd still be nice to show it in the event se Please see comments in Open(). This state doesn't seem necessary anymore if you take the recommendation there. http://gerrit.cloudera.org:8080/#/c/8758/3/be/src/runtime/fragment-instance-state.cc@302 PS3, Line 302: UpdateState(StateEvent::WAITING_FOR_BATCH); : SCOPED_TIMER(plan_exec_timer); : RETURN_IF_ERROR( : exec_tree_->GetNext(runtime_state_, row_batch_.get(), &exec_tree_complete)); : UpdateState(StateEvent::BATCH_RECEIVED); : } : if (VLOG_ROW_IS_ON) row_batch_->VLogRows("FragmentInstanceState::ExecInternal()"); : COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows()); : RETURN_IF_ERROR(sink_->Send(runtime_state_, row_batch_.get())); : UpdateState(StateEvent::BATCH_SENT); > I think it's helpful to track "first batch received/sent" separately since I think it's definitely worth doing a perf run to make sure things don't regress. I suppose it should be mostly okay if it's done once per row batch. That said, it's a bit unfortunate that we have to do call UpdateState() in the loop. I wonder if there is a chance that the compiler will be smart enough to constant fold the argument (e.g. StateEvent::WAITING_FOR_BATCH) into the function and avoid the switch at runtime. http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@63 PS4, Line 63: static const string INSTANCE_EVENT_TIMELINE_NAME = : "Fragment Instance Lifecycle Event Timeline"; : static const string CODEGEN_EVENT_NAME = "Codegen Finished"; : static const string PREPARE_EVENT_NAME = "Prepare Finished"; : static const string OPEN_EVENT_NAME = "Open Finished"; : static const string FIRST_BATCH_RECEIVED_EVENT_NAME = "First Batch Received"; : static const string FIRST_BATCH_SENT_EVENT_NAME = "First Batch Sent"; : static const string EXEC_INTERNAL_EVENT_NAME = "ExecInternal Finished"; Please provide comments about these names and how they may correlate with the current state of the state machine. http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@73 PS4, Line 73: FINSTANCE_STATE_LABELS[] Is there any static assert to ensure that number of entries in this array match the declaration in the thrift definition ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@79 PS4, Line 79: received Will "produced" be more accurate as there is usually some extra processing on top of the raw data received from either data source or another fragment ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@81 PS4, Line 81: Receiving Data "Producing row batches" seems more consistent with the rest of the state names. http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@283 PS4, Line 283: RETURN_IF_ERROR(codegen->FinalizeModule()); This is actually the majority of time codegen spent on. May want to extend the tracking the StateEvent::CODEGEN to this point and start StateEvent::OPEN_START below. In that way, you don't need StateEvent::CODEGEN_END either. http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@302 PS4, Line 302: UpdateState(StateEvent::WAITING_FOR_BATCH); Can this be moved out of the loop to line 297 and simplify the state machine to jump directly to RECEIVING_DATA once the first batch has been sent ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@306 PS4, Line 306: UpdateState(StateEvent::BATCH_RECEIVED); Should we not count the time to execute UpdateState() towards "plan_exec_timer" ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@410 PS4, Line 410: void FragmentInstanceState::UpdateState(StateEvent event) const StateEvent http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@418 PS4, Line 418: DCHECK( DCHECK_EQ(). Same below. http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@483 PS4, Line 483: default: nit: wrong indent http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@484 PS4, Line 484: DCHECK(false) << "Unexpected Event"; Does it help to print "event" too ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/runtime/fragment-instance-state.cc@487 PS4, Line 487: current_state_.Store(next_state); Can there be multiple threads updating the state here ? If so, does the above code need to be thread safe too ? If it's single threaded only, do we still need AtomicEnum ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/util/runtime-profile-counters.h@305 PS4, Line 305: boost::lock_guard<SpinLock> event_lock(lock_); : Event event = make_pair(move(label), sw_.ElapsedTime()); Why the change in order ? Do you mean to count the spin lock wait time towards elapsed time too ? http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/util/runtime-profile-counters.h@333 PS4, Line 333: DCHECK( DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/8758/4/be/src/util/runtime-profile-counters.h@337 PS4, Line 337: events_.back().second Not sure if the compiler is smart enough to hoist this out of the loop as it may assume that events_ can be changed during the loop iteration. -- To view, visit http://gerrit.cloudera.org:8080/8758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I626456b6afa9101eeeeffd5cda10c4096d63d7f9 Gerrit-Change-Number: 8758 Gerrit-PatchSet: 4 Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Comment-Date: Tue, 12 Dec 2017 01:35:21 +0000 Gerrit-HasComments: Yes
