Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15408 )
Change subject: IMPALA-4080: Codegen once per fragment ...................................................................... Patch Set 8: (16 comments) http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc@150 PS7, Line 150: codegen_status_msgs_.emplace_back(FragmentState::GenerateCodegenMsg( > I'm not sure that the move does anything if GenerateCodegenMsg() is returne Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc@227 PS7, Line 227: null > switch to nullptr while you're changing it in this file? Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/exec/exec-node.cc@228 PS7, Line 228: for (PlanNode* child : children_) { > Maybe change to range-based for? Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h File be/src/runtime/fragment-state.h: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@19 PS7, Line 19: > Use a #pragma once instead of the old-style included guards Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@26 PS7, Line 26: > This doesn't look like it's needed. Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@32 PS7, Line 32: > This doesn't look like it's needed. Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.h@56 PS7, Line 56: ig* sink_config() const { retur > Might be better to use references, since they should be non-nullable. Not a cpp doesn't allow the use of references in a vector http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.cc File be/src/runtime/fragment-state.cc: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/fragment-state.cc@97 PS7, Line 97: codegen()->runtime_profile()->total_time_counter()); > Should use the same fragment ID format (F01, etc) as elsewhere in the profi Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.h@387 PS7, Line 387: /// StartFInstances(); > This Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@267 PS7, Line 267: int fragment_idx = -1; > Does this need to be done in this patchset? the TODO was an outdated comment, removed it. http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@608 PS7, Line 608: DescriptorTbl::Create(&obj_pool_, query_ctx().desc_tbl_serialized, &desc_tbl_); > nit: probably don't need this vertical whitespace. Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@612 PS7, Line 612: > It would be be better to use the usual fragment ID (F01, etc), since that's Done http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@621 PS7, Line 621: FragmentInstanceState* fis = > This has the potential to change the timing of some queries significantly, Like we discussed offline, I tried implementing a solution where in the fragment instance threads waited for codegen to complete. It worked fine but it changed the timing of the execution pipeline since the fragment instance threads were waiting for InitAndCodegen before it calls Prepare(). Since the query state waits for Prepare before sending runtime filters, what ended up happening was that even if the builder finished building runtime filters, it had to wait for the slowest fragment on the same host to complete codegen before query state marked Prepare as complete and sent out the filters. Because of this the filters were being sent out very slowly. So, instead I switched the implementation to closely mimic the timing of current behavior and now codegen is done in FIS:open() and the first fragment instance to call codegen does the actual work while other instances wait. This change in timing also fixed one of the tests which was failing earlier (namely test_exchange_mem_usage_scaling) confirming our suspicion that the failure was due to a timing issue. http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/runtime/query-state.cc@622 PS7, Line 622: obj_pool_.Add(new FragmentInstanceState(this, fragment_state, *instance_ctx)); > I wonder about whether using a thread pool would be a good idea here, just with my latest change, there is no need for spawning more threads http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h File be/src/udf/udf-internal.h: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h@174 PS7, Line 174: static int GetConstFnAttr(bool uses_decimal_v2, > Maybe pass in the query options struct? I decided to just pass this bool here (just renamed it to uses_decimal_v2) to avoid adding adhoc class if IMPALA_UDF_SDK_BUILD is true http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf.cc File be/src/udf/udf.cc: http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf.cc@122 PS7, Line 122: #include "common/names.h" > Does this need addressing? This would be required if I pass a impala specific class to any of these methods here, since I decided to go with just passing the uses_decimal_v2 of type bool in FunctionContextImpl::GetConstFnAttr(), I wont need to define a new class when IMPALA_UDF_SDK_BUILD is true; -- To view, visit http://gerrit.cloudera.org:8080/15408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aef8bc621f96caafe9a1c378617a2987e4ad452 Gerrit-Change-Number: 15408 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 27 Mar 2020 01:31:40 +0000 Gerrit-HasComments: Yes
