Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4192: Disentangle Expr and ExprContext
......................................................................


Patch Set 17:

(62 comments)

http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

Line 79:   DCHECK_EQ(intermediate_tuple_desc_->slots().size(), 
output_tuple_desc_->slots().size());
move to c'tor


Line 124:   DCHECK_EQ(agg_fns_.size(), agg_fn_evaluators_.size());
nice, the separation is making the code a lot crisper.


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/aggregation-node.h
File be/src/exec/aggregation-node.h:

Line 53: class AggregationNode : public ExecNode {
it's sad that this still needs to be maintained. please apply my comments to 
the partitioined-*.{cc,h} as well, where it makes sense


Line 80:   /// FunctionContexts objects are stored in ObjectPool of 
RuntimeState.
so what is agg_fn_pool_ for then?


Line 98:   boost::scoped_ptr<RowDescriptor> intermediate_row_desc_;
any reason this can't go in state->obj_pool()?


Line 150:   /// Cross-compiled accessor for &agg_fn_evaluators_[0]. Used by the 
codegen'ed code.
"for agg_fn_evaluators_.data()" is more idiomatic.

are you going to do a rename of "evaluator[s]" suffixes to "evl[s]" or 
"eval[s]" at the end?


Line 165:   /// slot_idx is the idx into aggregate_exprs_ (does not include 
grouping exprs).
update comment


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 892:   AggFn::Close(analytic_fns_);
having an expr::close() and having to call that here is a bit counterintuitive 
(and manual, since we need to do that for every expr). would be nice to move 
that to a utility function in runtimestate (track the cache entries in some 
ancillary structure and then free them all at once instead of piecemeal). that 
would reduce the likelihood of forgetting this somewhere.

not necessary to do it as part of this patch, but a todo would be nice.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

Line 210:   ScalarExpr* partition_by_eq_expr_;
initialize inline


Line 215:   ScalarExpr* order_by_eq_expr_;
same


Line 216:   ScalarExprEvaluator* order_by_eq_expr_evaluator_;
same


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 100:   RETURN_IF_ERROR((*sink)->Init(state, thrift_sink, 
thrift_output_exprs));
why not just call create() directly here?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 87:   static Status Create(ObjectPool* pool, RuntimeState* state,
state already contains a pool


Line 123:   boost::scoped_ptr<MemPool> expr_mem_pool_;
do we really need a separate mempool for that?


Line 131:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
move in/out param to back


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:

Line 131: Status ExchangeNode::QueryMaintenance(RuntimeState* state) {
why did we not need this before?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 460:     if (!EvalConjunct(evaluators[i], row)) return false;
can't you just inline the call here?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 149:   static bool EvalConjunct(ScalarExprEvaluator* evaluator, TupleRow* 
row);
evalpredicate

a predicate is a conjunct if it's part of a conjunctive clause, ie, a predicate 
on its own can't be a conjunct.


Line 151:   /// Evaluate the conjuncts in 'evaluators' over 'row'.
add convenience function for a vector?


Line 263:   /// object pool.
why?


Line 290:   boost::scoped_ptr<MemPool> expr_mem_pool_;
what mem pool did we use for that before? in general, we want to get away from 
having mempools all over the place


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 102:   Status CloneFrom(ObjectPool* pool, RuntimeState* state, MemPool* 
mem_pool,
state contains pool


Line 103:       const FilterContext& from);
move const args to front


Line 114:   /// Codegen Eval() by codegen'ing the expression evaluations and 
replacing the type
update comment


Line 117:   static Status CodegenEval(LlvmCodeGen* codegen, ScalarExpr* 
filter_expr,
does filter_expr get updated in this function?


http://gerrit.cloudera.org:8080/#/c/5483/14/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 129:   void FreeBuildLocalAllocations();
FreeLocal<>Allocations()?


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-scan-node.cc
File be/src/exec/hbase-scan-node.cc:

Line 249:     DCHECK_EQ(conjunct_evaluators_.size(), conjuncts_.size());
this would benefit from a convenience function for a vector


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hbase-table-writer.h
File be/src/exec/hbase-table-writer.h:

Line 96:   /// The evaluators are owned by sink which owns this table writer.
"the sink"


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-avro-table-writer.h
File be/src/exec/hdfs-avro-table-writer.h:

Line 68:                       const std::vector<ScalarExprEvaluator*>& 
output_expr_evaluators);
do these come from the sink? you're already passing in the sink.


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 57:       const std::vector<ScalarExprEvaluator*>& output_expr_evaluators);
get those out of sink


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 387:   /// the per-scanner ScannerContext. Correspond with exprs in 
'filter_exprs_'.
"Corresponds to"


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 122:   // Setup the scan node's dictionary filtering conjuncts map.
set up


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 166:   friend class DataSink;
or make everything that's private in datasink protected?


Line 168:   virtual Status Init(RuntimeState* state, const TDataSink& tsink,
param order


Line 202:       const std::vector<ScalarExprEvaluator*>& evaluators, 
std::string* key);
param order


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 431:   if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], 
conjunct_evaluators_.size(), row)) {
uniformly switch to vector<>.data() instead of &vector<>[0], the latter is 
undefined for empty vectors. jim is in the process of fixing all existing 
instances up.


Line 482:     if (ExecNode::EvalConjuncts(&conjunct_evaluators_[0], 
conjuncts_.size(), row)) {
same


Line 675:   ht_ctx_.reset();
why?

we want to stop tearing down control structures


Line 1394: // void UpdateSlot(FunctionContext* agg_fn_ctx, ScalarExprEvaluator* 
agg_expr_evaluator,
isn't this for the input expr?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exec/partitioned-aggregation-node.h
File be/src/exec/partitioned-aggregation-node.h:

Line 448:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& 
agg_fn_evaluators,
why non-const?


Line 453:   Tuple* ConstructIntermediateTuple(std::vector<AggFnEvaluator*>& 
agg_fn_evaluators,
same here


Line 479:   void UpdateTuple(AggFnEvaluator** agg_fn_evaluators, Tuple* tuple, 
TupleRow* row,
the signatures don't conform to our standard. leave todo in class header


Line 491:   Tuple* GetOutputTuple(vector<AggFnEvaluator*>& agg_fn_evaluators,
missing std; why non-const?

there are several more non-const vector<>& here


Line 527:   /// Accessor for the expression contexts of an AggFnEvaluator. 
Returns an array of
update comment

also, what is i?


Line 641:   /// Assumes is_merge = false;
update comment to describe params


http://gerrit.cloudera.org:8080/#/c/5483/17/be/src/exec/select-node.cc
File be/src/exec/select-node.cc:

Line 97:   ScalarExprEvaluator* const* conjunct_evaluators = 
&conjunct_evaluators_[0];
.data()


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

Line 80:   /// TODO: Move the evaluation of constant input arguments to AggFn 
setup.
once that's done, can open() go away?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 266:   const std::vector<ScalarExprEvaluator*>& 
partition_key_value_evaluators() const {
value_evaluators feels redundant. partition_key_evls?


Line 290:   /// This evaluator is safe to be shared by all fragment instances 
as all expressions
plural


Line 497:   static Status CreatePartKeyExprs(ObjectPool* pool, const 
HdfsTableDescriptor* hdfs_tbl)
move pool to back
why const* instead of const&, can this really be null?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

Line 187:   RETURN_IF_ERROR(DataSink::Create(obj_pool(), runtime_state_, 
fragment_ctx_,
runtime_state_ already has an obj_pool, remove first param


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

Line 276:   Status status = DescriptorTbl::Create(
single line?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 92:   /// 'sort_materialize_exprs' are the slot exprs used to materialize 
the tuples to be
i find this harder to follow than with the original name. these are the exprs 
that materialize the individual slots of the sort tuple, no? 
sort_tuple_input_exprs or something like that, analogous to agg?


Line 96:   Sorter(TupleRowComparator& compare_less_than,
if you need non-const, then switch to *


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 77: // 'type' indicates the type of the expression value.
it feels like this should live somewhere else (in /runtime/raw-value?)


Line 80:   if (value == nullptr) return;
dcheck col_val != null


Line 246:     exprs[i]->Close();
evl->root()->Close() and then iterate over collection


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.cc
File be/src/util/tuple-row-compare.cc:

Line 51:   opened_ = false;
meaning you can call open() again, which in turns re-creates the evaluators? is 
that intentional? or should the create() call move into an init() function?


http://gerrit.cloudera.org:8080/#/c/5483/16/be/src/util/tuple-row-compare.h
File be/src/util/tuple-row-compare.h:

Line 99:         (*codegend_compare_fn_)(&ordering_expr_evaluators_lhs_[0],
.data()


Line 135:   bool opened_;
initialize inline, that way you don't need to change the c'tor if you ever move 
this around


Line 146:   std::vector<bool> is_asc_;
make const, while you're here


Line 147:   std::vector<int8_t> nulls_first_;
same


-- 
To view, visit http://gerrit.cloudera.org:8080/5483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to