Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16351 )
Change subject: IMPALA-10065: Fix DCHECK when retrying a query in FINISHED state ...................................................................... Patch Set 2: (1 comment) Hi Sahil, sorry that I have to give a long reply to discuss each question inline. Thanks for your patience in reading my long reply! http://gerrit.cloudera.org:8080/#/c/16351/1/be/src/runtime/query-driver.cc File be/src/runtime/query-driver.cc: http://gerrit.cloudera.org:8080/#/c/16351/1/be/src/runtime/query-driver.cc@99 PS1, Line 99: If fetch() is called but nothing was fetched (e.g. due to : // ClientRequestState::FetchRows() timeout > > I think we need this. WaitForResults() returns after the coordinator > > fragment instance finishes Open(). It doens't mean some results are ready > > to be fetched. > IIRC when Open() returns the first RowBatch is actually available to to be > fetched. This confused me as well, and I was able to track down the change > that did this, but I can't seem to find it now. I verify this by logging fetched_rows_ and num_rows_fetched_ at the end of ClientRequestState::FetchRows(). Then running a query like "select * from functional.alltypes where bool_col = sleep(600)" which needs more than 10s to produce the first batch (1024 rows). I can see when the first FetchRows() returns, fetched_rows_ is true and num_rows_fetched_ is 0. Let me write down my understanding on the codes. Please correct me if anything is wrong. The call path for a beeswax fetch: ImpalaServer::fetch() -> ImpalaServer::FetchInternal() ImpalaServer::WaitForResults() ImpalaServer::BlockOnWait() -> ClientRequestState::BlockOnWait() Wait for ClientRequestState::block_on_wait_cv_ ClientRequestState::set_fetched_rows() to update fetched_rows_ ClientRequestState::FetchRows() -> FetchRowsInternal() Coordinator::GetNext() and update num_rows_fetched_ The only code path that notifies block_on_wait_cv_: ClientRequestState::Wait() -> WaitInternal() Coordinator::Wait() FragmentInstanceState::WaitForOpen() Wait for opened_promise_.Get() of the coordinator fragment instance block_on_wait_cv_.NotifyAll(); The code path that sets opened_promise_: QueryState::ExecFInstance() FragmentInstanceState::Exec() FragmentInstanceState::Open() calling exec_tree_->Open() and sink_->Open(). Set opened_promise_ So when the coordinator fragment instance finishes its Open() call, it's opened_promise_ will be set, which unblocks the thread waiting in ClientRequestState::Wait(). Then block_on_wait_cv_ is notified, which unblocks the fetch thread, and then it sets the fetched_rows_ flag. However, when the Open() call finishes, we still need to wait some time for the first row batch to be produced in Coordinator::GetNext(). > The code comments for ImpalaServer::WaitForResults, > ImpalaServer::BlockOnWait, ClientRequestState::Wait, Coordinator::Wait all > confirm this behavior (e.g. that Wait() "Blocks until result rows are ready > to be retrieved via GetNext()"). The ExecState is also set to FINISHED by the > time WaitForResults returns. Not sure if some recent changes make those comments stale, or I have some misunderstanding on the codes. Maybe the current codes just guarentee that the first row is ready, but if the row batch size > 1, materializing the remaining rows may still need some waiting. > > The real timeout can happen in the following > > ClientRequestState::FetchRows(). Then the client still fetches nothing and > > will trigger another fetch() request. > This timeout was actually added as a timeout on the time taken to materialize > rows (e.g. convert from RowBatch to client side format). See IMPALA-7312. I'm confused on this. Isn't the timeout includes the waiting time spent in WaitForResults()? The timeout limit we use in ImpalaServer::BlockOnWait() is the fetch_rows_timeout_us: https://github.com/apache/impala/blob/e63bf9d6c1b2a67f68057f2b8bf077aa7be27256/be/src/service/impala-server.cc#L1581-L1583 We then get the actual time blocked on the wait as the returned block_on_wait_time_us. We then pass it into Coordinator::GetNext() and use (fetch_rows_timeout_us - block_on_wait_time_us) as the timeout for the GetNext call: https://github.com/apache/impala/blob/c413f9b558d51de877f497590baf14139ad5cf99/be/src/runtime/coordinator.cc#L846 So I think fetch_rows_timeout_us covers the time spent in WaitForResults() and the time we actually materialize the results. Or maybe I miss something? > > As the comment of fetched_rows_ said, it means whether a fetch was > > attempted by a client, regardless of whether a result set (or error) was > > returned to the client. > Right, but it is only set after WaitForResults is called. Maybe its worth > clarifying that in the description of fetched_rows_. I think we should clarify that if WaitForResults times out, fetched_rows_ won't be set. > Does the test still pass without this change? Yes, this line is not required by the test. But I think it's more accurate since num_rows_fetched_ only get updated when client does fetch some rows. -- To view, visit http://gerrit.cloudera.org:8080/16351 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11d82bf80640760a47325833463def8a3791bdda Gerrit-Change-Number: 16351 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Tue, 25 Aug 2020 06:35:21 +0000 Gerrit-HasComments: Yes
