Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG@25
PS13, Line 25: based on hdfs storage are supported.
Thanks for the detailed patch.  I have a high level question about the physical 
co-location for bucketed tables.  HDFS supports re-balancing of data which 
moves data across nodes in a cluster for more uniform distribution [1].  If a 
table is bucketed and the re-balancing occurred after the table was created,  
the bucket shuffle hash join will produce incorrect results. In such cases, we 
would want to not pick this join method and fall back to the regular 
distributed join. Is there a consideration for such scenario in this patch ?

[1] 
https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/scaling-namespaces/topics/hdfs-balancing-data-across-hdfs-cluster.html



--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 08 Feb 2023 02:51:20 +0000
Gerrit-HasComments: Yes

Reply via email to