Tim Armstrong has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input ......................................................................
Patch Set 8: Code-Review+1 (3 comments) +1 on the backend part. http://gerrit.cloudera.org:8080/#/c/4863/6/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 530: DCHECK(current_row != NULL || key == ROOT_PARTITION_KEY); > My assumption was that InitOutputPartition wouldn't be called if key==ROOT_ Makes sense, thanks. http://gerrit.cloudera.org:8080/#/c/4863/6/testdata/workloads/functional-query/queries/QueryTest/insert.test File testdata/workloads/functional-query/queries/QueryTest/insert.test: Line 912: partition (year, month) /*+ clustered,noshuffle */ > Done. I added a test and a DCHECK in HdfsTableSink::WriteClusteredRowBatch( That's a good point, I misunderstood that aspect of the feature - I was thinking of the "SORT BY" clause that isn't implemented yet. http://gerrit.cloudera.org:8080/#/c/4863/7/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 554: l_returnflag > What would be good values here? Should we leave it and see if things break I agree with Alex. I think we mainly want to check that we're not losing test coverage. So I'd be ok with something really loose like >= 3 and <= 30, since that would only fail if something drastically changed. -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
