Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ......................................................................
Patch Set 4: > As requested, I ran some tests with tpcds_1000_text.store_sales: > > with patch: 3755.68s > with patch and noshuffle: 3778.62s > without patch: 3571.33s > > The first two are averages over three runs, for the third it only > passed twice out of 10 runs. I also tried doing "with patch and > noclustered" but it timed out every time I ran it. > > So again we see here a ~10% perf cost in exchange for consistently > running to completion. > > These results also suggest that its the pre-sorting, not the > pre-partitioning, that's making the biggest difference in whether > or not Kudu can handle the load. We may be able to get more impact > from the partitioning if we start partitioning the rows to the > impalad collocated with the corresponding tserver, though that's > going to take some changes to the scheduler and we're leaving it as > future work. > > As far as adding the ability to turn this off: the current version > of the patch just disables insert hints for Kudu tables, as none of > the existing hints seem to make sense (would people need to specify > both 'noshuffle' and 'noclustered' to turn this off?). > > If anyone feels that its important to have an off switch, I'm happy > to include one (possibly with a new insert hint?), but I'm not sure > how much of a benefit that would be vs. the added complexity of > more knobs, given the above perf numbers and the fact that the > partitioning and sorting has to happen either way, its just a > question of whether its happing in Impala or in Kudu. > > Its also already the case that the exchange and sort won't happen > for single node plans as determined by exec_single_node_rows_threshold, > so for example small VALUES inserts will be unaffected, and we have > as future work to examine and improve this decision, such as by not > adding the exchange if the input is already partitioned correctly. Thanks, Thomas. I think these results are consistent with our thinking that this patch is beneficial (runs actually completed) and at an acceptable cost (10% which can be reclaimed later). I agree that an off switch should be considered, but let's not try to squeeze it into this patch which we should get in. We can address that as a follow-up task, and we can give it the attention it deserves to make sure it's done in a way that won't overcomplicate the knobs we already have. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: No
