Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 )
Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table ...................................................................... Patch Set 13: > Patch Set 13: > > (1 comment) > > > Patch Set 13: > > > > (1 comment) The problem is that it would not be practical to check the block locations for for potential relocations when doing the query planning. Given N blocks in a bucket for one table and M blocks for the second table, it would be O(N+M) time to decide which distribution method to use. This would add up depending on the number of joins in the query. We really want to 'pin' the location but AFAIK HDFS does not allow us to do that. Other systems such as MemSQL that do bucket join don't have to worry about this since the data is memory resident. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Baike Xia <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 22 Feb 2023 08:21:43 +0000 Gerrit-HasComments: No
