Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 )
Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table ...................................................................... Patch Set 13: (5 comments) Thanks for the changes! I still couldn't process the patch 100%, I don't understand the backend part at the moment + probably some of the optimizations. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@10 PS9, Line 10: performance for some Join queries. Th > Yes, I can. I still don't get the non-partitoned sort case. Can you give an example query and explain how it is optimized? http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13 PS9, Line 13: > The executor is assigned to the node where the bucket is located. Is there a node where the whole bucket is located? I mean that if there are several files or several blocks for a large file, then nothing guarantees that there is a node that has a replica for each block. Or generally there should be an node like that, as Hive always writes a bucket with a specific node? http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h@149 PS13, Line 149: /// Define locks to ensure thread safety when replenishing reserved memory. : std::mutex increase_memory_reservation_mtx_; : : /// Configure a semaphore to control FragmentInstanceState::Exec : /// for each fragment instance that is executed in a bucket. : /// To save memory, only one concurrency is supported in the open phase and beyond, : /// after the completion of prepare. : std::unordered_map<TFragmentIdx, sem_t> bucket_fragment_sem_; : : /// Configure a counter for each fragment instance to count the number of fragment : /// instances that have not yet completed execution, to prevent invalid : /// increase_memory_reservation, and to destroy the semaphore after the execution of : /// all instances of the fragment in the bucket has completed. : std::unordered_map<TFragmentIdx, int> bucket_fragment_un_finished_instances_; I couldn't grasp the changes in query life-cycle yet. Can you give some explanation about the big picture? My naive way of imagining bucketing in the backend was that: - there would be a 1 to N mapping between fragment instances (or if mt_dop = 0, hosts) and buckets, so each fragment instance would get a set of buckets - each fragment that sends data to a bucket fragment needs to look up the the right fragment instance for each row in KrpcDataStreamSender If I understand correctly you are creating 1 fragment instance for each bucket, and try to control them to run only a limited number of them at the same time? http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h File be/src/util/hash-util.h: http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h@287 PS13, Line 287: { Can you add some tests for this in https://github.com/apache/impala/blob/master/be/src/kudu/util/hash_util-test.cc ? I didn't look into the Hive code, but it would be the best if somehow we could verify that this functions returns the same hash as Hive. http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java@1045 PS13, Line 1045: TBucketType.NONE This is not from this patch, but I saw that the other value of TBucketType is hash. Can you make the name more specific, e.g. HIVE_HASH, or HIVE_BUCKET_V2_HASH? The reason is that other hash types are possible, for example bucketing could be also used for Iceberg BUCKET(N) partitioned columns. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia <[email protected]> Gerrit-Reviewer: Baike Xia <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Thu, 02 Feb 2023 12:48:04 +0000 Gerrit-HasComments: Yes
