Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4863/5/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 277:       RETURN_IF_ERROR(FinalizePartitionFile(state, output_partition));
Can avoid a level of nesting with if (!new_file) break;


Line 294:     GetHashTblKey(dynamic_partition_key_expr_ctxs_, &key);
Oh man, I didn't notice this 'current_row_' thing earlier. It looks like for 
some reason the original author of the code didn't want to pass an extra 
argument into functions so instead did it implicitly via 'current_row_'

Can you refactor it so that it's passed as an actual argument?


Line 302:         RETURN_IF_ERROR(WriteRowsOfPartition(state, batch, 
current_partition_pair));
I think we should also remove the entry in partition_keys_to_output_partitions_ 
here so that we use O(1) memory.


http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 476: 
> Done. It takes 30 seconds on my machine. Too much for core?
Maybe a little much given core already takes too long - I think it's useful 
coverage but we're also probably pretty unlikely to break it. Seems fine for 
exhaustive.


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to