Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
wuxueyang96 closed pull request #61184: [opt][optimizer] optimize union all for colocated table. URL: https://github.com/apache/doris/pull/61184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
924060929 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4035966483 @wuxueyang96 We will also have an optimization in the future to ensure that random exchange only occurs on the local machine. In this case, even if there is exchange on the union, there will not be significant overhead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
924060929 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4035921429 > > > > 你好@wuxueyang96感谢 > > > > 您提交此 PR,我已经在#59006中提交了相同的功能。但是此功能会破坏本地 shuffle 并计算错误的结果,因此我在#60823中将其撤销。所以我们应该先重构本地 shuffle,然后再支持此功能。 > > > > > > > > > 我不太确定本地桶的 shuffle 操作是否与此 PR 相同。此 PR 实际上想要消除 shuffle 操作,无论是本地桶 shuffle 还是全局 shuffle。 > > > > > > 我的 PR 包含了在集合操作下消除交换的功能,因为支持桶混洗本身就要求另一端根据存储的哈希算法进行分布:基端不需要混洗,如果另一端不满足要求,则需要使用桶混洗。如果两端位于同一位置,则两端都不需要混洗,因为它们都满足存储哈希算法的分布要求。因此,我的 PR 是你的 PR 的超集,更加抽象。 > > [实际上,我根据bf2e1c2](https://github.com/apache/doris/commit/bf2e1c2dda944e47a5e9bf34972ae772570ec1c0)重新构建了代码,我想我已经理解你的意思了。但是如果你看一下片段 5,它仍然包含两个并集下面的交换,我只是想知道你最终想要实现的效果是什么。 > > ``` > MySQL [(none)]> show frontends; > +-+---+-+--+---+-++--+--++--+---+---+-+-+--+++--+---+ > | Name| Host | EditLogPort | HttpPort | QueryPort | RpcPort | ArrowFlightSqlPort | Role | IsMaster | ClusterId | Join | Alive | ReplayedJournalId | LastStartTime | LastHeartbeat | IsHelper | ErrMsg | Version| CurrentConnected | LiveSince | > +-+---+-+--+---+-++--+--++--+---+---+-+-+--+++--+---+ > | fe_781cb7e1_a9c0_49ee_845f_9ffa707ddeeb | 10.37.114.244 | 9010| 8030 | 9030 | 9020| 8070 | FOLLOWER | true | 1202823493 | true | true | 384 | 2026-03-10 19:50:11 | 2026-03-10 20:15:32 | true || doris-0.0.0-bf2e1c2dda | Yes | NULL | > +-+---+-+--+---+-++--+--++--+---+---+-+-+--+++--+---+ > 1 row in set (0.017 sec) > > MySQL [(none)]> use test; > Reading table information for completion of table and column names > You can turn off this feature to get a quicker startup with -A > > Database changed > MySQL [test]> explain select d0.sum_val, d1.val, d1.id from ( select sum(sum_val) as sum_val, id from ( ( SELECT sum(val) as sum_val, id from t2 group by id ) union all ( SELECT sum(val) as sum_val, id from t3 group by id ) ) as l group by id ) as d0 right join ( select id, val from t1 ) as d1 on d0.id = d1.id order by d0.id; > ++ > | Explain String(Nereids Planner) | > ++ > | PLAN FRAGMENT 0 | > | OUTPUT EXPRS: | > | sum_val[#44] | > | val[#45] | > | id[#46] | > | PARTITION: UNPARTITIONED | > | | > | HAS_COLO_PLAN_NODE: false
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
wuxueyang96 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4030967841 > > > Hi @wuxueyang96 ! > > > Thanks for submit this pr, and I already submit the same feature in #59006. But this feature will break local shuffle and compute the wrong result, so I revert it in #60823. So we should refactor local shuffle first, then we can support this feature. > > > > > > I'm not certainly sure that the local bucket shuffle is same to this pr. This pr want to eliminate shuffle actually no matter local bucket shuffle or global shuffle. > > My PR includes the function of eliminating exchange under set operation, because supporting bucket shuffle itself requires the other end to distribute according to the stored hash algorithm: the base end does not need shuffle, and if the other end does not meet the requirements, the other end needs to use bucket shuffle. If both ends are colocated, then neither end needs to shuffle because they both satisfy the distribution of storing hash algorithms. So my PR is a superset of your PR, more abstract Actually, I rebuild the code from bf2e1c2dda944e47a5e9bf34972ae772570ec1c0, i don't think it take effects at the same scenario: ``` MySQL [(none)]> show frontends; +-+---+-+--+---+-++--+--++--+---+---+-+-+--+++--+---+ | Name| Host | EditLogPort | HttpPort | QueryPort | RpcPort | ArrowFlightSqlPort | Role | IsMaster | ClusterId | Join | Alive | ReplayedJournalId | LastStartTime | LastHeartbeat | IsHelper | ErrMsg | Version| CurrentConnected | LiveSince | +-+---+-+--+---+-++--+--++--+---+---+-+-+--+++--+---+ | fe_781cb7e1_a9c0_49ee_845f_9ffa707ddeeb | 10.37.114.244 | 9010| 8030 | 9030 | 9020| 8070 | FOLLOWER | true | 1202823493 | true | true | 384 | 2026-03-10 19:50:11 | 2026-03-10 20:15:32 | true || doris-0.0.0-bf2e1c2dda | Yes | NULL | +-+---+-+--+---+-++--+--++--+---+---+-+-+--+++--+---+ 1 row in set (0.017 sec) MySQL [(none)]> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MySQL [test]> explain select d0.sum_val, d1.val, d1.id from ( select sum(sum_val) as sum_val, id from ( ( SELECT sum(val) as sum_val, id from t2 group by id ) union all ( SELECT sum(val) as sum_val, id from t3 group by id ) ) as l group by id ) as d0 right join ( select id, val from t1 ) as d1 on d0.id = d1.id order by d0.id; +--+ | Explain String(Nereids Planner) | +--+ | PLAN FRAGMENT 0 | | OUTPUT EXPRS: | | sum_val[#28] | | val[#29] | | id[#30] | | PARTITION: UNPARTITIONED | | | | HAS_COLO_PLAN_NODE: false | | | | VRESULT SINK | | MYSQL_PROTOCOL
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
924060929 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4030870640 > > Hi @wuxueyang96 ! > > Thanks for submit this pr, and I already submit the same feature in #59006. But this feature will break local shuffle and compute the wrong result, so I revert it in #60823. So we should refactor local shuffle first, then we can support this feature. > > I'm not certainly sure that the local bucket shuffle is same to this pr. This pr want to eliminate shuffle actually no matter local bucket shuffle or global shuffle. My PR includes the function of eliminating exchange under set operation, because supporting bucket shuffle itself requires the other end to distribute according to the stored hash algorithm: the base end does not need shuffle, and if the other end does not meet the requirements, the other end needs to use bucket shuffle. If both ends are colored, then neither end needs to shuffle because they both satisfy the distribution of storing hash algorithms. So my PR is a superset of your PR, more abstract -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
wuxueyang96 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4030244419 > Hi @wuxueyang96 ! > > Thanks for submit this pr, and I already submit the same feature in #59006. But this feature will break local shuffle and compute the wrong result, so I revert it in #60823. So we should refactor local shuffle first, then we can support this feature. I'm not certainly sure that the local bucket shuffle is same to this pr. This pr want to eliminate shuffle actually no matter local bucket shuffle or global shuffle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
924060929 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4030124164 Hi @wuxueyang96 ! Thanks for submit this pr, and I already submit the same feature in #59006. But this feature will break local shuffle and compute the wrong result, so I revert it in #60823. So we should refactor local shuffle first, then we can support this feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [opt][optimizer] optimize union all for colocated table. [doris]
wuxueyang96 commented on PR #61184: URL: https://github.com/apache/doris/pull/61184#issuecomment-4029954372 @morrySnow @924060929 hi, could you help review this pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
