Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 )
Change subject: IMPALA-12533: Support row materialization outside of fetch lock ...................................................................... Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@21 PS11, Line 21: (default 0.2). As there is only minimal testing in the patch, it may better to disable this by default. If someone can test this properly than they could also enable it in their environment. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23 PS11, Line 23: Once enabled, delayed materialization will be used for the remainder of : the rows fetched for the current query Just brain dump, I don't think that the patch needs to be changed: It would be nice to have a more sophisticated heuristic in the future. I can imagine 2 scenarios where the current one leads to regression: 1. the coordinator already employs all cores in the CPU 2. while the client starts with parallel fetches, later it becomes practically single threaded In both cases the overhead of the extra copy is added without real gains. 2. seems typical with a client that does something expensive with the fetched data, like writing to file. At the beginning it starts with many threads, but it won't be able to process the results returned by all these threads http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24 PS11, Line 24: It is expected that the default : threshold will never be reached with a single fetch stream. It seems safer to guarantee this by detecting parallel fetches and only enable delayed materialization for a query if there were parallel fetches at least once. Theoretically unlucky GC or CPU scheduling could lead to hitting the threshold, especially if row materialization time is very small. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@31 PS11, Line 31: -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% A follow up Jira would be nice about creating some simple multi thread client that can be executed in the tests. Automatic tests would be great as they would be also exercised with ASAN and TSAN builds. http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h@329 PS11, Line 329: ror typo http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc@254 PS11, Line 254: RETURN_IF_ERROR(result_set->Flush()); : result_flush_timer.Stop(); : query_handle->AddResultFlushTime(result_flush_timer.ElapsedTime()); the timer is not stopped in case line 254 returns an error http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc@56 PS11, Line 56: class RowBatchContext { Can you add some comment about the role of this class? -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 11 Gerrit-Owner: Kurt Deschler <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Wed, 31 Jan 2024 17:51:14 +0000 Gerrit-HasComments: Yes
