[jira] [Commented] (FLINK-27862) FLIP-235: Hybrid Shuffle Mode

2022-06-13 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553406#comment-17553406
 ] 

Aitozi commented on FLINK-27862:


Hi [~xtsong] , I have an offline discussion with [~Weijie Guo]. And I will try 
to start parallel work from ticket3: Introduce HsDataBuffer. Can you help 
assign the ticket ? 

> FLIP-235: Hybrid Shuffle Mode
> -
>
> Key: FLINK-27862
> URL: https://issues.apache.org/jira/browse/FLINK-27862
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: Umbrella
>
> Introduce a new shuffle mode can overcome some of the problems of Pipelined 
> Shuffle and Blocking Shuffle in batch scenarios, it can make best use of 
> available resources and minimize disk IO load.
> More details see 
> [FLIP-235|https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27862) FLIP-235: Hybrid Shuffle Mode

2022-06-07 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550832#comment-17550832
 ] 

Aitozi commented on FLINK-27862:


Thanks [~Weijie Guo] , [~xtsong] for your kindness guide, I will take a look at 
the PoC work first and will reach you out for further discussion.

> FLIP-235: Hybrid Shuffle Mode
> -
>
> Key: FLINK-27862
> URL: https://issues.apache.org/jira/browse/FLINK-27862
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: Umbrella
>
> Introduce a new shuffle mode can overcome some of the problems of Pipelined 
> Shuffle and Blocking Shuffle in batch scenarios, it can make best use of 
> available resources and minimize disk IO load.
> More details see 
> [FLIP-235|https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27862) FLIP-235: Hybrid Shuffle Mode

2022-06-07 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550822#comment-17550822
 ] 

Xintong Song commented on FLINK-27862:
--

Hi [~aitozi],
Thanks for offering. I see several ways that you may help.
* Reviewing the PRs will be definitely appreciated.
* You may also help with transforming the PoC implementation into PRs, which 
involves some design changes w.r.t. the FLIP as well as improving the code 
quality and adding test cases. For this part if you want, you may first take a 
look at the PoC codes, and we can set up a call discussing how the workload can 
be split. I believe there are some tasks that can be worked on in parallel.

> FLIP-235: Hybrid Shuffle Mode
> -
>
> Key: FLINK-27862
> URL: https://issues.apache.org/jira/browse/FLINK-27862
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: Umbrella
>
> Introduce a new shuffle mode can overcome some of the problems of Pipelined 
> Shuffle and Blocking Shuffle in batch scenarios, it can make best use of 
> available resources and minimize disk IO load.
> More details see 
> [FLIP-235|https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27862) FLIP-235: Hybrid Shuffle Mode

2022-06-07 Thread Weijie Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550819#comment-17550819
 ] 

Weijie Guo commented on FLINK-27862:


Hi [~aitozi], Thank you very much for your attention, welcome to participate in 
it.

Let me share the current situation of this FLIP:

1、We already have a POC version in-house with some level of testing.

2、The implementation of this POC version is not exactly the same as the design 
in FLIP-235. For example, the spill strategy adopts all data write to disk 
strategy instead of selective spill strategy, etc.

3、In order to verify if there is a conflict in merging the code into the open 
source Flink version, I pushed the code to a branch on my own ([github 
repository|https://github.com/reswqa/flink/tree/hs-merge-from-vvr]). Since part 
of the code is going to be discarded in the new design, it is not pick into the 
test branch, so this branch cannot actually run. But it already contains the 
core implementation of our POC version.

4、If you have any other questions, you are very welcome to communicate with me 
offline.

> FLIP-235: Hybrid Shuffle Mode
> -
>
> Key: FLINK-27862
> URL: https://issues.apache.org/jira/browse/FLINK-27862
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: Umbrella
>
> Introduce a new shuffle mode can overcome some of the problems of Pipelined 
> Shuffle and Blocking Shuffle in batch scenarios, it can make best use of 
> available resources and minimize disk IO load.
> More details see 
> [FLIP-235|https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27862) FLIP-235: Hybrid Shuffle Mode

2022-06-06 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550507#comment-17550507
 ] 

Aitozi commented on FLINK-27862:


Hi [~Weijie Guo] Thanks for starting this work. I'm interested in this flip. 
Can I join this work and take some simple work as start

> FLIP-235: Hybrid Shuffle Mode
> -
>
> Key: FLINK-27862
> URL: https://issues.apache.org/jira/browse/FLINK-27862
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Network
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: Umbrella
>
> Introduce a new shuffle mode can overcome some of the problems of Pipelined 
> Shuffle and Blocking Shuffle in batch scenarios, it can make best use of 
> available resources and minimize disk IO load.
> More details see 
> [FLIP-235|https://cwiki.apache.org/confluence/display/FLINK/FLIP-235%3A+Hybrid+Shuffle+Mode]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)