Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
......................................................................


Patch Set 4:

As requested, I ran some tests with tpcds_1000_text.store_sales:

with patch: 3755.68s
with patch and noshuffle: 3778.62s
without patch: 3571.33s

The first two are averages over three runs, for the third it only passed twice 
out of 10 runs. I also tried doing "with patch and noclustered" but it timed 
out every time I ran it.

So again we see here a ~10% perf cost in exchange for consistently running to 
completion.

These results also suggest that its the pre-sorting, not the pre-partitioning, 
that's making the biggest difference in whether or not Kudu can handle the 
load. We may be able to get more impact from the partitioning if we start 
partitioning the rows to the impalad collocated with the corresponding tserver, 
though that's going to take some changes to the scheduler and we're leaving it 
as future work.

As far as adding the ability to turn this off: the current version of the patch 
just disables insert hints for Kudu tables, as none of the existing hints seem 
to make sense (would people need to specify both 'noshuffle' and 'noclustered' 
to turn this off?).

If anyone feels that its important to have an off switch, I'm happy to include 
one (possibly with a new insert hint?), but I'm not sure how much of a benefit 
that would be vs. the added complexity of more knobs, given the above perf 
numbers and the fact that the partitioning and sorting has to happen either 
way, its just a question of whether its happing in Impala or in Kudu.

Its also already the case that the exchange and sort won't happen for single 
node plans as determined by exec_single_node_rows_threshold, so for example 
small VALUES inserts will be unaffected, and we have as future work to examine 
and improve this decision, such as by not adding the exchange if the input is 
already partitioned correctly.

-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: No

Reply via email to