Michael Ho has posted comments on this change. Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen ......................................................................
Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/5732/4//COMMIT_MSG Commit Message: Line 50: defined in the boost library. > is there some verification code we can add to ensure that this problem won' Not sure if there is a easy way to verify it without parsing the global ctor IR code. I manually inspected the global ctor after this change and the remaining logic seems to be doing: 1. setting up some global variables which aren't really referenced anywhere in the IR code: _ZN5boost6systemL14posix_categoryE, _ZN5boost6systemL10errno_ecatE and _ZN5boost6systemL11native_ecatE 2. Calling std::ios_base::Init() which is not necessary for IR code to function. http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS4, Line 326: which are also defined in the main module > this doesn't seem to match what the code is doing. Isn't it materializing They are not necessarily just function declarations in the new module. They could also be duplicated function definition in both the main module and new module. In an earlier attempt, this patch eagerly materializes the new module and creates a map of functions referenced by the global variables and functions in the new module. The problem is that for duplicated definition, the linker will pick one of the definition. So after linking, it's unclear which map to use: the function could be from the main module or it could be from the new module. So, to be safe, we need to call MaterializeFunctionHelper with the maps of the main module and the new module. As the function from new module may not linked in and its referenced functions can also become dead code after linking, it leads to some confusing behavior: it's hard to distinguish between an error (i.e. a referenced function is not found in the module after linking) and eliminated dead code. Ideally, we should record the functions in the new module before linking and compute their referenced functions only linking. The problem is that the "uses" of lvm::Function are only populated after the module has been completely materialized (there is an assert in LLVM for that) so it's not quite feasible. This patch implements a middle ground approach: materialize any functions which are defined in the main module and also exist either as a declaration or definition in the new module. http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: PS4, Line 85: their callee functions > what is a "callee" for a global variable? do you mean reference? Done PS4, Line 86: CalleeMap > again, callee sounds to me like only functions would be keys in this map. FnRefsMap PS4, Line 160: f 'eager' is false, the : /// functions in the module are not materialized. > this seems to contradict the class header comment that says "llvm::Function Done. I moved the materialization logic out of CreateFromFile/Memory() and removed the 'eager' argument. This function always materializes the bitcode lazily. PS4, Line 617: which also need to be materialized if : /// the input global variable or function is materialized > is this really specific to materialization? wouldn't it be enough to just s Done PS4, Line 625: fns_to_materialize_ > why this name change? does this now store functions that need to be materi The previous name is not quite self-explanatory. How about fns_to_always_materialize_; ? http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: > since this takes so long to execute, should we either move expr-test.cc to I don't feel comfortable testing the codegen path only in the exhaustive builds. The reasoning is that codegen is enabled in Impala by default so our default testing configuration should also be testing the codegen path. GVO is also not run in exhaustive mode so I am worried that bugs may slip through and we won't notice until much later. http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/exprs/timestamp-functions.h File be/src/exprs/timestamp-functions.h: Line 44: /// with its defined max year. date(max_date_time).year() will give 9999 but testing > I missed updating MAX_YEAR to 9999 in my patch 69859bddfb058d742cece7d7c2e Done. Also added a DCHECK for that. -- To view, visit http://gerrit.cloudera.org:8080/5732 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
