Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8758 )
Change subject: IMPALA-6190/6246: Add instances tab and event sequence ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/coordinator-backend-state.cc@512 PS2, Line 512: // We send 'done' explicitly so we don't have to infer it by comparison with a string : // constant in the debug page JS code. : value->AddMember("done", done_, document->GetAllocator()); : : Value state_val( : FragmentInstanceState::GetFInstanceStateDescription(current_state_) > is 'done' completely redundant with 'current_state', or can it not always b There's a subtle difference: 'done' is set in the final report after the reporting thread has been stopped. Between the final state of 'current_state' and reporting 'done', we release the thread token and stop the reporting thread. With the state machine I added in the next PS the state should always be FINISHED when a FI ends in error though. With abort_on_error=1, the coordinator will not accept status updates once the query has been cancelled. I couldn't come up with a query that runs for a while before aborting an instance and thus the resulting profiles I looked at all had empty instance sub-profiles. Sending 'done' explicitly also makes handling in the debug page's javascript code easier since we don't have to perform a string comparison with a constant to determine whether the fragment is done. I left a comment, but I'm happy to remove it if you feel strongly about it and instead replace it with a comparison in JS. http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/fragment-instance-state.h@169 PS2, Line 169: /// 'current_state_'. Events are issued throughout the execution by calling > Should document why this is atomic -- which threads read and which write it Done http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/fragment-instance-state.cc@296 PS2, Line 296: SCOPED_THREAD_COUNTER_MEASUREMENT(runtime_state_->total_thread_statistics()); : bool exec_tree_complete = false; : do { : Status status; : row_batch_->Reset(); : { : UpdateState(StateEvent::WAITING_FOR_BATCH); : SCOPED_TIMER(plan_exec_timer); : RETURN_IF_ERROR( : exec_tree_->GetNext(runtime_state_, row_batch_.get(), &exec_tree_complete)); : UpdateState(StateEvent::BATCH_RECEIVED); : } : if (VLOG_ROW_IS_ON) row_batch_->VLogRows("FragmentInstanceState::ExecInternal()"); : COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows()); : RETURN_IF_ERROR(sink_->Send(runtime_state_, row_batch_.get())); : UpdateState(StateEvent::BATCH_SENT); : } while (!exec_tree_complete); : : UpdateState(StateEvent::END_OF_STREAM); : // Flush the sink *before* stopping the report thread. Flush may need to add some : // important information to the last report that gets sent. (e.g. table sinks record the : // files they have written to in this method) : RETURN_IF_ERROR(sink_->FlushFinal(runtime_state())); : return Status::OK(); : } : : void FragmentInstanceState::Close() { : DCHECK(!report_thread_active_); : DCHECK(runtime_state_ != nullptr); > To be clear, the goal is to remove all the if-stmts and state variables fro Done http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/runtime/fragment-instance-state.cc@443 PS2, Line 443: next_state = TFInstanceState::RECEIVING_DATA; : } else { > this would be easier to understand if you explicitly list out all cases. th Done http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/8758/2/be/src/util/runtime-profile.cc@310 PS2, Line 310: } > what's the summary for why this change is needed? There was a bug, I shouldn't have removed L296. In general we need to merge the event sequence to get the instance events into the query profile - something we just had not done so far. http://gerrit.cloudera.org:8080/#/c/8758/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/8758/2/common/thrift/ImpalaInternalService.thrift@639 PS2, Line 639: } > given this interacts with IMPALA-2990 work, we may want to split this chang I had talked to Sailesh about this change before pushing it and we agreed to keep it in a single commit for now. I'm also happy to pull it apart if we think it'll make things a lot easier. -- To view, visit http://gerrit.cloudera.org:8080/8758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I626456b6afa9101eeeeffd5cda10c4096d63d7f9 Gerrit-Change-Number: 8758 Gerrit-PatchSet: 3 Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Comment-Date: Fri, 08 Dec 2017 00:39:42 +0000 Gerrit-HasComments: Yes
