Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
......................................................................


Patch Set 5:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 65:     void* value, bool copy_strings = true);
const void* value, to indicate it's not an output param


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 64:   row_.reset(table->schema().NewRow());
can this not return an error?


Line 71:     int col = tkudu_partition_expr_.referenced_columns[i];
move to where it's being used

leave todo somewhere that the schema/our metadata should record the partition 
cols, this should really be part of the table descriptor.


Line 75:       // the KuduTableSink generate the error message.
what's the point of round-robin instead of always returning, say, 0?


Line 79:     Status s = WriteKuduRowValue(row_.get(), col, type, val, false);
where is this declared?

it might be nice to require some form of namespace qualification for functions 
that aren't in the impala namespace.

i wish kudu hadn't introduced these cumbersome nested namespaces.


Line 82:     DCHECK(s.ok());
add some context output (print called function, type and col name?).


Line 89:   DCHECK(s.ok());
same here


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.h
File be/src/exprs/kudu-partition-expr.h:

Line 24: #include "runtime/descriptors.h"
do you need this include, as opposed to forward decls?


Line 46:   TKuduPartitionExpr tkudu_partition_expr_;
forward declare?


Line 50:   /// Used to call into Kudu to determine partitions.
it's nice to indicate what gets set in prepare(), because that tells me it 
doesn't change during execution (despite the fact that it's not const). same 
for the other member vars.


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 350:   kudu_ = sink.output_partition.type == TPartitionType::KUDU;
have the partition type as a member var, rather than 3 bools?


Line 425:   // to simplify this.
i don't think that's practical, for broadcast and random we have different/more 
efficient logic that doesn't require evaluating exprs.


Line 456:     // hash-partition batch's rows across channels
i'd leave a todo here to coalesce this with the kudu case, which would require 
the hash computation to be done via an expr (which probably requires codegen 
support, which would be worth having anyway).


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

Line 109:   bool kudu_;
comment missing


http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 115:   private List<Integer> partitionKeyIdxs_ = Lists.newArrayList();
> The primary keys are always the first n columns, but the partition keys can
that would be good to explain in the code, it's a bit subtle


http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 643:     Set<String> kuduPartitionByColumnNames = null;
get rid of 'partitionby' here as well


Line 682:     // declaration, and store their indexes.  We need those exprs in 
the original order to
'store their indexes' is a bit mysterious, 'column indexes' or even column 
positions is more informative.


http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java:

Line 90:   protected String toSqlImpl() { return "KuduPartition()"; }
> Yes, it is included in the explain output. See the planner test files for e
i think so, as it is it's pretty useless.


http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java:

Line 44:   // Maps from this Epxrs children to column positions in the table, 
i.e. children_[i]
since these are column positions, call the variable that?

i have to keep reminding myself what 'key indexes' are.


Line 48:   private List<Type> partitionKeyTypes_;
isn't this available in the table descriptor?


http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

Line 82:     return new DataPartition(TPartitionType.KUDU, expr);
why not just call the list<expr> c'tor and create a list here? (and drop the 
new non-list c'tor)


http://gerrit.cloudera.org:8080/#/c/6559/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 56: 01:EXCHANGE [KUDU(KuduPartition())]
that's not very informative


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to