Tim Armstrong has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input ......................................................................
Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/4863/5/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 277: RETURN_IF_ERROR(FinalizePartitionFile(state, output_partition)); Can avoid a level of nesting with if (!new_file) break; Line 294: GetHashTblKey(dynamic_partition_key_expr_ctxs_, &key); Oh man, I didn't notice this 'current_row_' thing earlier. It looks like for some reason the original author of the code didn't want to pass an extra argument into functions so instead did it implicitly via 'current_row_' Can you refactor it so that it's passed as an actual argument? Line 302: RETURN_IF_ERROR(WriteRowsOfPartition(state, batch, current_partition_pair)); I think we should also remove the entry in partition_keys_to_output_partitions_ here so that we use O(1) memory. http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 476: > Done. It takes 30 seconds on my machine. Too much for core? Maybe a little much given core already takes too long - I think it's useful coverage but we're also probably pretty unlikely to break it. Seems fine for exhaustive. -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
