Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
......................................................................


Patch Set 14:

> Patch Set 13:
>
> > Patch Set 13:
> >
> > (1 comment)
> >
> > > Patch Set 13:
> > >
> > > (1 comment)
>
> The problem is that it would not be practical to check the block locations 
> for for potential relocations when doing the query planning.  Given N blocks 
> in a bucket for one table and M blocks for the second table, it would be 
> O(N+M) time to decide which distribution method to use. This would add up 
> depending on the number of joins in the query. We really want to 'pin' the 
> location but AFAIK HDFS does not allow us to do that. Other systems such as 
> MemSQL that do bucket join don't have to worry about this since the data is 
> memory resident.

Given N blocks in a bucket for one table and M blocks for the second table, we 
can use the least common divisor of N and M as the number of buckets for the 
two tables temporarily.
I'm not sure I understand what you mean.


--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 28 Feb 2023 09:59:00 +0000
Gerrit-HasComments: No

Reply via email to