[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Csaba Ringhofer (Code Review) Thu, 02 Feb 2023 04:53:20 -0800

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )


Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
......................................................................


Patch Set 13:

(5 comments)

Thanks for the changes!

I still couldn't process the patch 100%, I don't understand the backend part at 
the moment + probably some of the optimizations.

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@10
PS9, Line 10: performance for some Join queries. Th
> Yes, I can.
I still don't get the non-partitoned sort case. Can you give an example query 
and explain how it is optimized?


http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13
PS9, Line 13:
> The executor is assigned to the node where the bucket is located.
Is there a node where the whole bucket is located? I mean that if there are 
several files or several blocks for a large file, then nothing guarantees that 
there is a node that has a replica for each block. Or generally there should be 
an node like that, as Hive always writes a bucket with a specific node?


http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h@149
PS13, Line 149:   /// Define locks to ensure thread safety when replenishing 
reserved memory.
              :   std::mutex increase_memory_reservation_mtx_;
              :
              :   /// Configure a semaphore to control 
FragmentInstanceState::Exec
              :   /// for each fragment instance that is executed in a bucket.
              :   /// To save memory, only one concurrency is supported in the 
open phase and beyond,
              :   /// after the completion of prepare.
              :   std::unordered_map<TFragmentIdx, sem_t> bucket_fragment_sem_;
              :
              :   /// Configure a counter for each fragment instance to count 
the number of fragment
              :   /// instances that have not yet completed execution, to 
prevent invalid
              :   /// increase_memory_reservation, and to destroy the semaphore 
after the execution of
              :   /// all instances of the fragment in the bucket has completed.
              :   std::unordered_map<TFragmentIdx, int> 
bucket_fragment_un_finished_instances_;
I couldn't grasp the changes in query life-cycle yet. Can you give some 
explanation about the big picture?

My naive way of imagining bucketing in the backend was that:
- there would be a 1 to N mapping between fragment instances (or if mt_dop = 0, 
hosts) and buckets, so each fragment instance would get a set of buckets
- each fragment that sends data to a bucket fragment needs to look up the the 
right fragment instance for each row in KrpcDataStreamSender

If I understand correctly you are creating 1 fragment instance for each bucket, 
and try to control them to run only a limited number of them at the same time?


http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h@287
PS13, Line 287: {
Can you add some tests for this in 
https://github.com/apache/impala/blob/master/be/src/kudu/util/hash_util-test.cc 
?

I didn't look into the Hive code, but it would be the best if somehow we could 
verify that this functions returns the same hash as Hive.


http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java@1045
PS13, Line 1045: TBucketType.NONE
This is not from this patch, but I saw that the other value of TBucketType is 
hash. Can you make the name more specific, e.g. HIVE_HASH, or 
HIVE_BUCKET_V2_HASH?

The reason is that other hash types are possible, for example bucketing could 
be also used for Iceberg BUCKET(N) partitioned columns.



--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia <[email protected]>
Gerrit-Reviewer: Baike Xia <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 02 Feb 2023 12:48:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Reply via email to