Michael Ho has posted comments on this change.

Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with 
codegen
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5732/4//COMMIT_MSG
Commit Message:

Line 50: defined in the boost library.
> is there some verification code we can add to ensure that this problem won'
Not sure if there is a easy way to verify it without parsing the global ctor IR 
code. I manually inspected the global ctor after this change and the remaining 
logic seems to be doing:

1. setting up some global variables which aren't really referenced anywhere in 
the IR code:

_ZN5boost6systemL14posix_categoryE, _ZN5boost6systemL10errno_ecatE and 
_ZN5boost6systemL11native_ecatE

2. Calling std::ios_base::Init() which is not necessary for IR code to function.


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS4, Line 326: which are also defined in the main module
> this doesn't seem to match what the code is doing.  Isn't it materializing 
They are not necessarily just function declarations in the new module. They 
could also be duplicated function definition in both the main module and new 
module.

In an earlier attempt, this patch eagerly materializes the new module and 
creates a map of functions referenced by the global variables and functions in 
the new module. The problem is that for duplicated definition, the linker will 
pick one of the definition. So after linking, it's unclear which map to use: 
the function could be from the main module or it could be from the new module. 
So, to be safe, we need to call MaterializeFunctionHelper with the maps of the 
main module and the new module. As the function from new module may not linked 
in and its referenced functions can also become dead code after linking, it 
leads to some confusing behavior: it's hard to distinguish between an error 
(i.e. a referenced function is not found in the module after linking) and 
eliminated dead code.

Ideally, we should record the functions in the new module before linking and 
compute their referenced functions only linking. The problem is that the "uses" 
of lvm::Function are only populated after the module has been completely 
materialized (there is an assert in LLVM for that) so it's not quite feasible.

This patch implements a middle ground approach: materialize any functions which 
are defined in the main module and also exist either as a declaration or 
definition in the new module.


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS4, Line 85: their callee functions
> what is a "callee" for a global variable? do you mean reference?
Done


PS4, Line 86: CalleeMap
> again, callee sounds to me like only functions would be keys in this map.  
FnRefsMap


PS4, Line 160: f 'eager' is false, the
             :   /// functions in the module are not materialized.
> this seems to contradict the class header comment that says "llvm::Function
Done. I moved the materialization logic out of CreateFromFile/Memory() and 
removed the 'eager' argument. This function always materializes the bitcode 
lazily.


PS4, Line 617:  which also need to be materialized if
             :   /// the input global variable or function is materialized
> is this really specific to materialization? wouldn't it be enough to just s
Done


PS4, Line 625: fns_to_materialize_
> why this name change?  does this now store functions that need to be materi
The previous name is not quite self-explanatory. How about 
fns_to_always_materialize_; ?


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

> since this takes so long to execute, should we either move expr-test.cc to 
I don't feel comfortable testing the codegen path only in the exhaustive 
builds. The reasoning is that codegen is enabled in Impala by default so our 
default testing configuration should also be testing the codegen path. GVO is 
also not run in exhaustive mode so I am worried that bugs may slip through and 
we won't notice until much later.


http://gerrit.cloudera.org:8080/#/c/5732/4/be/src/exprs/timestamp-functions.h
File be/src/exprs/timestamp-functions.h:

Line 44:   /// with its defined max year. date(max_date_time).year() will give 
9999 but testing
> I missed updating MAX_YEAR to 9999 in my patch  69859bddfb058d742cece7d7c2e
Done. Also added a DCHECK for that.


-- 
To view, visit http://gerrit.cloudera.org:8080/5732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to