Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13
PS9, Line 13:
> I don't think I understand what you mean, Can you explain that again?
I think my concern is similar to Aman's.

Lets say that a bucket has 2 Parquet files, both have 1 block with 3 replicas. 
What happens if there is no node that contains a replica for both files? In 
this case if we want a single node to process the bucket, then we cannot avoid 
remote reads for the block which is not present on the node.

My guess is that if bucketing is enforced during writing too (so a single node 
writes all files for a given bucket), then there is a good chance that the node 
will contain a replica for all files. But it is still possible that hdfs 
rebalancing will move one of the replicas to another node, leading to a state 
where not all files from a bucket has a replica on the node.

If there are remote reads caused by the bucket join optimization, then it is 
possible that it would be faster to revert back to the non-bucketed way.



--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 21 Feb 2023 09:17:18 +0000
Gerrit-HasComments: Yes

Reply via email to