Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 )
Change subject: IMPALA-12533: Support row materialization outside of fetch lock ...................................................................... Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23 PS11, Line 23: Once enabled, delayed materialization will be used for the remainder of : the rows fetched for the current query > Just brain dump, I don't think that the patch needs to be changed: The hive JDBC driver ramps up threads slowly after multiple batches are fetch which avoids this case. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23 PS11, Line 23: Once enabled, delayed materialization will be used for the remainder of : the rows fetched for the current query > Thanks for the info, I agree that 2 is solved by gradual ramping on client The maximum number of streams is enforced by the driver. The highest useful stream count that we have observed so far is around 5. Being CPU is a compressible resource, there does not seem to be much of an issue absorbing this additional parallelism. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24 PS11, Line 24: It is expected that the default : threshold will never be reached with a single fetch stream. > >There are no parallel fetches as they are serialized by the fetch lock Added counter for number of streams as suggested. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@186 PS13, Line 186: Initialize > This is only done in BufferedPlanRootSink, but not in class BlockingPlanRoo Added support to BlockingPlanRootSync. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@187 PS13, Line 187: . > nit: . Done http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@188 PS13, Line 188: RETURN_IF_ERROR(results->InitDelayedMaterialization(*row_desc_, > If I understand correctly this will create a new RowBatchContext on each fe It might be possible to attach a cache to the QueryDriver. I agree let's defer this. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/runtime/row-batch.h@295 PS13, Line 295: /// Move 'num_rows' rows from 'src' to 'dest' within the batch. Useful for exec > Should the description be updated too? Done http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc@251 PS13, Line 251: fetch_results->__set_hasMoreRows(!query_handle->eos()); > If I understand correctly at this point the coordinator may have already hi The query_handle reference keeps the QueryDriver/RunTimeState/MemPool around that the RowBachContext is using. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc@477 PS13, Line 477: c > nit: extra space Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 14 Gerrit-Owner: Kurt Deschler <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Fri, 15 Mar 2024 21:12:32 +0000 Gerrit-HasComments: Yes
