Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ......................................................................
Patch Set 5: (22 comments) http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 65: void* value, bool copy_strings = true); const void* value, to indicate it's not an output param http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 64: row_.reset(table->schema().NewRow()); can this not return an error? Line 71: int col = tkudu_partition_expr_.referenced_columns[i]; move to where it's being used leave todo somewhere that the schema/our metadata should record the partition cols, this should really be part of the table descriptor. Line 75: // the KuduTableSink generate the error message. what's the point of round-robin instead of always returning, say, 0? Line 79: Status s = WriteKuduRowValue(row_.get(), col, type, val, false); where is this declared? it might be nice to require some form of namespace qualification for functions that aren't in the impala namespace. i wish kudu hadn't introduced these cumbersome nested namespaces. Line 82: DCHECK(s.ok()); add some context output (print called function, type and col name?). Line 89: DCHECK(s.ok()); same here http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.h File be/src/exprs/kudu-partition-expr.h: Line 24: #include "runtime/descriptors.h" do you need this include, as opposed to forward decls? Line 46: TKuduPartitionExpr tkudu_partition_expr_; forward declare? Line 50: /// Used to call into Kudu to determine partitions. it's nice to indicate what gets set in prepare(), because that tells me it doesn't change during execution (despite the fact that it's not const). same for the other member vars. http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: Line 350: kudu_ = sink.output_partition.type == TPartitionType::KUDU; have the partition type as a member var, rather than 3 bools? Line 425: // to simplify this. i don't think that's practical, for broadcast and random we have different/more efficient logic that doesn't require evaluating exprs. Line 456: // hash-partition batch's rows across channels i'd leave a todo here to coalesce this with the kudu case, which would require the hash computation to be done via an expr (which probably requires codegen support, which would be worth having anyway). http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 109: bool kudu_; comment missing http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 115: private List<Integer> partitionKeyIdxs_ = Lists.newArrayList(); > The primary keys are always the first n columns, but the partition keys can that would be good to explain in the code, it's a bit subtle http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 643: Set<String> kuduPartitionByColumnNames = null; get rid of 'partitionby' here as well Line 682: // declaration, and store their indexes. We need those exprs in the original order to 'store their indexes' is a bit mysterious, 'column indexes' or even column positions is more informative. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 90: protected String toSqlImpl() { return "KuduPartition()"; } > Yes, it is included in the explain output. See the planner test files for e i think so, as it is it's pretty useless. http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 44: // Maps from this Epxrs children to column positions in the table, i.e. children_[i] since these are column positions, call the variable that? i have to keep reminding myself what 'key indexes' are. Line 48: private List<Type> partitionKeyTypes_; isn't this available in the table descriptor? http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: Line 82: return new DataPartition(TPartitionType.KUDU, expr); why not just call the list<expr> c'tor and create a list here? (and drop the new non-list c'tor) http://gerrit.cloudera.org:8080/#/c/6559/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 56: 01:EXCHANGE [KUDU(KuduPartition())] that's not very informative -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
