Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 )
Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table ...................................................................... Patch Set 14: > Patch Set 13: > > > Patch Set 13: > > > > (1 comment) > > > > > Patch Set 13: > > > > > > (1 comment) > > The problem is that it would not be practical to check the block locations > for for potential relocations when doing the query planning. Given N blocks > in a bucket for one table and M blocks for the second table, it would be > O(N+M) time to decide which distribution method to use. This would add up > depending on the number of joins in the query. We really want to 'pin' the > location but AFAIK HDFS does not allow us to do that. Other systems such as > MemSQL that do bucket join don't have to worry about this since the data is > memory resident. Given N blocks in a bucket for one table and M blocks for the second table, we can use the least common divisor of N and M as the number of buckets for the two tables temporarily. I'm not sure I understand what you mean. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Baike Xia <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 28 Feb 2023 09:59:00 +0000 Gerrit-HasComments: No
