Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
......................................................................


Patch Set 8: Code-Review+1

(3 comments)

+1 on the backend part.

http://gerrit.cloudera.org:8080/#/c/4863/6/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 530:   DCHECK(current_row != NULL || key == ROOT_PARTITION_KEY);
> My assumption was that InitOutputPartition wouldn't be called if key==ROOT_
Makes sense, thanks.


http://gerrit.cloudera.org:8080/#/c/4863/6/testdata/workloads/functional-query/queries/QueryTest/insert.test
File testdata/workloads/functional-query/queries/QueryTest/insert.test:

Line 912: partition (year, month) /*+ clustered,noshuffle */
> Done. I added a test and a DCHECK in HdfsTableSink::WriteClusteredRowBatch(
That's a good point, I misunderstood that aspect of the feature - I was 
thinking of the "SORT BY" clause that isn't implemented yet.


http://gerrit.cloudera.org:8080/#/c/4863/7/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 554:                l_returnflag
> What would be good values here? Should we leave it and see if things break 
I agree with Alex. I think we mainly want to check that we're not losing test 
coverage. So I'd be ok with something really loose like >= 3 and <= 30, since 
that would only fail if something drastically changed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to