[RESULT][VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-21 Thread Vinoo Ganesh
// Fixing Subject

Results of the voting:

Binding +1s: 5 (Tom Graves,  Dongjoon Hyun, Felix Cheung, Saisai Shao, Imran 
Rashid)

Non-Binding +1s: 8

-1 from PMC members: 0

Per PMC / SPIP Voting Rules 
(https://spark.apache.org/improvement-proposals.html 
[spark.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_improvement-2Dproposals.html=DwMGaQ=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=1EZK-YO5oN-zPv6WLMN0vMQkt3jYWh_hx9x1GUO_w7Q=u9r4Ne1QOoZWgt2AteJC56yxhZ0v1VfXBH9Zg4s_Eyc=>),
 given that the vote has been open for >72 hours and 3 +1 binding votes have 
been received, this SPIP passes.

Thanks everyone.


From: Vinoo Ganesh 
Date: Friday, June 21, 2019 at 13:44
To: Tom Graves , dhruve ashar , 
John Zhuge , "Guo, Chenzhao" 
Cc: Felix Cheung , Yinan Li , 
"rb...@netflix.com" , Dongjoon Hyun 
, Saisai Shao , Imran Rashid 
, Ilan Filonenko , bo yang 
, Matt Cheah , Spark Dev List 
, "Yifei Huang (PD)" , Imran Rashid 

Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

Results of the voting:

Binding +1s: 5 (Tom Graves,  Dongjoon Hyun, Felix Cheung, Saisai Shao, Imran 
Rashid)

Non-Binding +1s: 8

-1 from PMC members: 0

Per PMC / SPIP Voting Rules 
(https://spark.apache.org/improvement-proposals.html 
[spark.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_improvement-2Dproposals.html=DwMGaQ=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=1EZK-YO5oN-zPv6WLMN0vMQkt3jYWh_hx9x1GUO_w7Q=u9r4Ne1QOoZWgt2AteJC56yxhZ0v1VfXBH9Zg4s_Eyc=>),
 given that the vote has been open for >72 hours and 3 +1 binding votes have 
been received, this SPIP passes.

Thanks everyone.

From: Tom Graves 
Date: Friday, June 21, 2019 at 13:02
To: dhruve ashar , John Zhuge , 
"Guo, Chenzhao" 
Cc: Vinoo Ganesh , Felix Cheung 
, Yinan Li , 
"rb...@netflix.com" , Dongjoon Hyun 
, Saisai Shao , Imran Rashid 
, Ilan Filonenko , bo yang 
, Matt Cheah , Spark Dev List 
, "Yifei Huang (PD)" , Imran Rashid 

Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (binding)

I haven't looked at the low level api, but like the idea and approach to get it 
started.

Tom

On Tuesday, June 18, 2019, 10:40:34 PM CDT, Guo, Chenzhao 
 wrote:



Cool : )



+1 (non-binding)



Chenzhao



From: dhruve ashar [mailto:dhruveas...@gmail.com]
Sent: Wednesday, June 19, 2019 2:58 AM
To: John Zhuge 
Cc: Vinoo Ganesh ; Felix Cheung 
; Yinan Li ; 
rb...@netflix.com; Dongjoon Hyun ; Saisai Shao 
; Imran Rashid ; Ilan Filonenko 
; bo yang ; Matt Cheah 
; Spark Dev List ; Yifei Huang (PD) 
; Imran Rashid 
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API



+1 (non-binding)



On Tue, Jun 18, 2019 at 12:12 PM John Zhuge 
mailto:john.zh...@gmail.com>> wrote:

+1 (non-binding)  Great work!



On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh 
mailto:vgan...@palantir.com>> wrote:

+1 (non-binding).



Thanks for pushing this forward, Matt and Yifei.



From: Felix Cheung mailto:felixcheun...@hotmail.com>>
Date: Tuesday, June 18, 2019 at 00:01
To: Yinan Li mailto:liyinan...@gmail.com>>, 
"rb...@netflix.com<mailto:rb...@netflix.com>" 
mailto:rb...@netflix.com>>
Cc: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>, 
Saisai Shao mailto:sai.sai.s...@gmail.com>>, Imran 
Rashid mailto:im...@therashids.com>>, Ilan Filonenko 
mailto:i...@cornell.edu>>, bo yang 
mailto:bobyan...@gmail.com>>, Matt Cheah 
mailto:mch...@palantir.com>>, Spark Dev List 
mailto:dev@spark.apache.org>>, "Yifei Huang (PD)" 
mailto:yif...@palantir.com>>, Vinoo Ganesh 
mailto:vgan...@palantir.com>>, Imran Rashid 
mailto:iras...@cloudera.com>>
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API



+1



Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.





From: Yinan Li mailto:liyinan...@gmail.com>>
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com<mailto:rb...@netflix.com>
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API



+1 (non-binding)



On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue 
mailto:rb...@netflix.com.invalid>> wrote:

+1 (non-binding)



On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:

+1



Bests,

Dongjoon.





On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:

+1 (binding)



Thanks

Saisai



Imran Rashid mailto:im...@therashids.com>> 于2019年6月15日周六 
上午3:46写道:

+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot 

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-21 Thread Vinoo Ganesh
Results of the voting:

Binding +1s: 5 (Tom Graves,  Dongjoon Hyun, Felix Cheung, Saisai Shao, Imran 
Rashid)

Non-Binding +1s: 8

-1 from PMC members: 0

Per PMC / SPIP Voting Rules 
(https://spark.apache.org/improvement-proposals.html), given that the vote has 
been open for >72 hours and 3 +1 binding votes have been received, this SPIP 
passes.

Thanks everyone.

From: Tom Graves 
Date: Friday, June 21, 2019 at 13:02
To: dhruve ashar , John Zhuge , 
"Guo, Chenzhao" 
Cc: Vinoo Ganesh , Felix Cheung 
, Yinan Li , 
"rb...@netflix.com" , Dongjoon Hyun 
, Saisai Shao , Imran Rashid 
, Ilan Filonenko , bo yang 
, Matt Cheah , Spark Dev List 
, "Yifei Huang (PD)" , Imran Rashid 

Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (binding)

I haven't looked at the low level api, but like the idea and approach to get it 
started.

Tom

On Tuesday, June 18, 2019, 10:40:34 PM CDT, Guo, Chenzhao 
 wrote:



Cool : )



+1 (non-binding)



Chenzhao



From: dhruve ashar [mailto:dhruveas...@gmail.com]
Sent: Wednesday, June 19, 2019 2:58 AM
To: John Zhuge 
Cc: Vinoo Ganesh ; Felix Cheung 
; Yinan Li ; 
rb...@netflix.com; Dongjoon Hyun ; Saisai Shao 
; Imran Rashid ; Ilan Filonenko 
; bo yang ; Matt Cheah 
; Spark Dev List ; Yifei Huang (PD) 
; Imran Rashid 
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API



+1 (non-binding)



On Tue, Jun 18, 2019 at 12:12 PM John Zhuge 
mailto:john.zh...@gmail.com>> wrote:

+1 (non-binding)  Great work!



On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh 
mailto:vgan...@palantir.com>> wrote:

+1 (non-binding).



Thanks for pushing this forward, Matt and Yifei.



From: Felix Cheung mailto:felixcheun...@hotmail.com>>
Date: Tuesday, June 18, 2019 at 00:01
To: Yinan Li mailto:liyinan...@gmail.com>>, 
"rb...@netflix.com<mailto:rb...@netflix.com>" 
mailto:rb...@netflix.com>>
Cc: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>, 
Saisai Shao mailto:sai.sai.s...@gmail.com>>, Imran 
Rashid mailto:im...@therashids.com>>, Ilan Filonenko 
mailto:i...@cornell.edu>>, bo yang 
mailto:bobyan...@gmail.com>>, Matt Cheah 
mailto:mch...@palantir.com>>, Spark Dev List 
mailto:dev@spark.apache.org>>, "Yifei Huang (PD)" 
mailto:yif...@palantir.com>>, Vinoo Ganesh 
mailto:vgan...@palantir.com>>, Imran Rashid 
mailto:iras...@cloudera.com>>
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API



+1



Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.





From: Yinan Li mailto:liyinan...@gmail.com>>
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com<mailto:rb...@netflix.com>
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API



+1 (non-binding)



On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue 
mailto:rb...@netflix.com.invalid>> wrote:

+1 (non-binding)



On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:

+1



Bests,

Dongjoon.





On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:

+1 (binding)



Thanks

Saisai



Imran Rashid mailto:im...@therashids.com>> 于2019年6月15日周六 
上午3:46写道:

+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the 
community.  There is already a lot of interest in alternative shuffle storage, 
from dynamic allocation in kubernetes, to even just improving stability in 
standard on-premise use of Spark.  However, they're often stuck doing this in 
forks of Spark, and in ways that are not maintainable (because they copy-paste 
many spark internals) or are incorrect (for not correctly handling speculative 
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance 
between flexibility and too much complexity, to allow incremental improvements. 
 A lot of work has been put into this already to try to figure out which pieces 
are essential to make alternative shuffle storage implementations feasible.

Of course, that means it doesn't include everything imaginable; some things 
still aren't supported, and some will still choose to use the older 
ShuffleManager api to give total control over all of shuffle.  But we know 
there are a reasonable set of things which can be implemented behind the api as 
the first step, and it can continue to evolve.



On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
mailto:i...@cornell.edu>> wrote:

+1 (non-binding). This API is versatile and flexible enough to handle 
Bloomberg's internal use-cases. The ability for us to vary implementation 
strategies is quite appealing. It

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-21 Thread Tom Graves
 +1 (binding)
I haven't looked at the low level api, but like the idea and approach to get it 
started.
Tom
On Tuesday, June 18, 2019, 10:40:34 PM CDT, Guo, Chenzhao 
 wrote:  
 
 #yiv1391836063 #yiv1391836063 -- _filtered #yiv1391836063 
{font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv1391836063 
{panose-1:2 11 6 9 7 2 5 8 2 4;} _filtered #yiv1391836063 {panose-1:2 4 5 3 5 4 
6 3 2 4;} _filtered #yiv1391836063 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 
3 2 4;} _filtered #yiv1391836063 {panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered 
#yiv1391836063 {panose-1:2 11 6 9 7 2 5 8 2 4;}#yiv1391836063 #yiv1391836063 
p.yiv1391836063MsoNormal, #yiv1391836063 li.yiv1391836063MsoNormal, 
#yiv1391836063 div.yiv1391836063MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:New 
serif;}#yiv1391836063 a:link, #yiv1391836063 span.yiv1391836063MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv1391836063 a:visited, #yiv1391836063 
span.yiv1391836063MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv1391836063 
span.yiv1391836063EmailStyle17 
{font-family:sans-serif;color:#1F497D;}#yiv1391836063 
.yiv1391836063MsoChpDefault {font-family:sans-serif;} _filtered #yiv1391836063 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv1391836063 div.yiv1391836063WordSection1 
{}#yiv1391836063 
Cool : )
 
  
 
+1 (non-binding)
 
  
 
Chenzhao
 
  
 
From: dhruve ashar [mailto:dhruveas...@gmail.com]
Sent: Wednesday, June 19, 2019 2:58 AM
To: John Zhuge 
Cc: Vinoo Ganesh ; Felix Cheung 
; Yinan Li ; 
rb...@netflix.com; Dongjoon Hyun ; Saisai Shao 
; Imran Rashid ; Ilan Filonenko 
; bo yang ; Matt Cheah 
; Spark Dev List ; Yifei Huang (PD) 
; Imran Rashid 
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
 
  
 
+1 (non-binding)
 
  
 
On Tue, Jun 18, 2019 at 12:12 PM John Zhuge  wrote:
 

+1 (non-binding)  Great work!
 
  
 
On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh  wrote:
 

+1 (non-binding).
 
 
 
Thanks for pushing this forward, Matt and Yifei.
 
 
 
From:Felix Cheung 
Date: Tuesday, June 18, 2019 at 00:01
To: Yinan Li , "rb...@netflix.com" 
Cc: Dongjoon Hyun , Saisai Shao 
, Imran Rashid , Ilan Filonenko 
, bo yang , Matt Cheah 
, Spark Dev List , "Yifei Huang 
(PD)" , Vinoo Ganesh , Imran Rashid 

Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
 
 
 
+1
 
 
 
Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.
 
 
 
From: Yinan Li 
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API 
 
 
 
+1 (non-binding) 
 
 
 
On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue  wrote:
 

+1 (non-binding)
 
 
 
On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun  wrote:
 

+1
 
 
 
Bests,
 
Dongjoon.
 
 
 
 
 
On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao  wrote:
 

+1 (binding)
 
 
 
Thanks
 
Saisai
 
 
 
Imran Rashid 于2019年6月15日周六上午3:46写道:
 

+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the 
community.  There is already a lot of interest in alternative shuffle storage, 
from dynamic allocation in kubernetes, to even just improving stability in 
standard on-premise use of Spark.  However, they're often stuck doing this in 
forks of Spark, and in ways that are not maintainable (because they copy-paste 
many spark internals) or are incorrect (for not correctly handling speculative 
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance 
between flexibility and too much complexity, to allow incremental improvements. 
 A lot of work has been put into this already to try to figure out which pieces 
are essential to make alternative shuffle storage implementations feasible.

Of course, that means it doesn't include everything imaginable; some things 
still aren't supported, and some will still choose to use the older 
ShuffleManager api to give total control over all of shuffle.  But we know 
there are a reasonable set of things which can be implemented behind the api as 
the first step, and it can continue to evolve.
 
 
 
On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:
 

+1 (non-binding). This API is versatile and flexible enough to handle 
Bloomberg's internal use-cases. The ability for us to vary implementation 
strategies is quite appealing. It is also worth to note the minimal changes to 
Spark core in order to make it work. This is a very much needed addition within 
the Spark shuffle story. 
 
 
 
On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
 

+1 This is great work, allowing plugin of different sort shuffle write/read 
implementation! Also great to see it retain the current Spark configuration

RE: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-18 Thread Guo, Chenzhao
Cool : )

+1 (non-binding)

Chenzhao

From: dhruve ashar [mailto:dhruveas...@gmail.com]
Sent: Wednesday, June 19, 2019 2:58 AM
To: John Zhuge 
Cc: Vinoo Ganesh ; Felix Cheung 
; Yinan Li ; 
rb...@netflix.com; Dongjoon Hyun ; Saisai Shao 
; Imran Rashid ; Ilan Filonenko 
; bo yang ; Matt Cheah 
; Spark Dev List ; Yifei Huang (PD) 
; Imran Rashid 
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (non-binding)

On Tue, Jun 18, 2019 at 12:12 PM John Zhuge 
mailto:john.zh...@gmail.com>> wrote:
+1 (non-binding)  Great work!

On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh 
mailto:vgan...@palantir.com>> wrote:
+1 (non-binding).

Thanks for pushing this forward, Matt and Yifei.

From: Felix Cheung mailto:felixcheun...@hotmail.com>>
Date: Tuesday, June 18, 2019 at 00:01
To: Yinan Li mailto:liyinan...@gmail.com>>, 
"rb...@netflix.com<mailto:rb...@netflix.com>" 
mailto:rb...@netflix.com>>
Cc: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>, 
Saisai Shao mailto:sai.sai.s...@gmail.com>>, Imran 
Rashid mailto:im...@therashids.com>>, Ilan Filonenko 
mailto:i...@cornell.edu>>, bo yang 
mailto:bobyan...@gmail.com>>, Matt Cheah 
mailto:mch...@palantir.com>>, Spark Dev List 
mailto:dev@spark.apache.org>>, "Yifei Huang (PD)" 
mailto:yif...@palantir.com>>, Vinoo Ganesh 
mailto:vgan...@palantir.com>>, Imran Rashid 
mailto:iras...@cloudera.com>>
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1

Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.


From: Yinan Li mailto:liyinan...@gmail.com>>
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com<mailto:rb...@netflix.com>
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (non-binding)

On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue 
mailto:rb...@netflix.com.invalid>> wrote:
+1 (non-binding)

On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Bests,
Dongjoon.


On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:
+1 (binding)

Thanks
Saisai

Imran Rashid mailto:im...@therashids.com>> 于2019年6月15日周六 
上午3:46写道:
+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the 
community.  There is already a lot of interest in alternative shuffle storage, 
from dynamic allocation in kubernetes, to even just improving stability in 
standard on-premise use of Spark.  However, they're often stuck doing this in 
forks of Spark, and in ways that are not maintainable (because they copy-paste 
many spark internals) or are incorrect (for not correctly handling speculative 
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance 
between flexibility and too much complexity, to allow incremental improvements. 
 A lot of work has been put into this already to try to figure out which pieces 
are essential to make alternative shuffle storage implementations feasible.

Of course, that means it doesn't include everything imaginable; some things 
still aren't supported, and some will still choose to use the older 
ShuffleManager api to give total control over all of shuffle.  But we know 
there are a reasonable set of things which can be implemented behind the api as 
the first step, and it can continue to evolve.

On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
mailto:i...@cornell.edu>> wrote:
+1 (non-binding). This API is versatile and flexible enough to handle 
Bloomberg's internal use-cases. The ability for us to vary implementation 
strategies is quite appealing. It is also worth to note the minimal changes to 
Spark core in order to make it work. This is a very much needed addition within 
the Spark shuffle story.

On Fri, Jun 14, 2019 at 9:59 AM bo yang 
mailto:bobyan...@gmail.com>> wrote:
+1 This is great work, allowing plugin of different sort shuffle write/read 
implementation! Also great to see it retain the current Spark configuration 
(spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).


On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah 
mailto:mch...@palantir.com>> wrote:
Hi everyone,

I would like to call a vote for the SPIP for SPARK-25299 
[issues.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=LS6AKX38P5DW6ffk9u5MUvRBEAlAHiA3Ud2KODpWkQU=>,
 which proposes to introduce a pluggable storage API for temporary shuf

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-18 Thread dhruve ashar
+1 (non-binding)

On Tue, Jun 18, 2019 at 12:12 PM John Zhuge  wrote:

> +1 (non-binding)  Great work!
>
> On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh  wrote:
>
>> +1 (non-binding).
>>
>>
>>
>> Thanks for pushing this forward, Matt and Yifei.
>>
>>
>>
>> *From: *Felix Cheung 
>> *Date: *Tuesday, June 18, 2019 at 00:01
>> *To: *Yinan Li , "rb...@netflix.com" <
>> rb...@netflix.com>
>> *Cc: *Dongjoon Hyun , Saisai Shao <
>> sai.sai.s...@gmail.com>, Imran Rashid , Ilan
>> Filonenko , bo yang , Matt Cheah <
>> mch...@palantir.com>, Spark Dev List , "Yifei
>> Huang (PD)" , Vinoo Ganesh ,
>> Imran Rashid 
>> *Subject: *Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>>
>>
>>
>> +1
>>
>>
>>
>> Glad to see the progress in this space - it’s been more than a year since
>> the original discussion and effort started.
>>
>>
>> --
>>
>> *From:* Yinan Li 
>> *Sent:* Monday, June 17, 2019 7:14:42 PM
>> *To:* rb...@netflix.com
>> *Cc:* Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang;
>> Matt Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
>> *Subject:* Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>>
>>
>>
>> +1 (non-binding)
>>
>>
>>
>> On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue 
>> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
>> wrote:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
>> wrote:
>>
>> +1 (binding)
>>
>>
>>
>> Thanks
>>
>> Saisai
>>
>>
>>
>> Imran Rashid  于2019年6月15日周六 上午3:46写道:
>>
>> +1 (binding)
>>
>> I think this is a really important feature for spark.
>>
>> First, there is already a lot of interest in alternative shuffle storage
>> in the community.  There is already a lot of interest in alternative
>> shuffle storage, from dynamic allocation in kubernetes, to even just
>> improving stability in standard on-premise use of Spark.  However, they're
>> often stuck doing this in forks of Spark, and in ways that are not
>> maintainable (because they copy-paste many spark internals) or are
>> incorrect (for not correctly handling speculative execution & stage
>> retries).
>>
>> Second, I think the specific proposal is good for finding the right
>> balance between flexibility and too much complexity, to allow incremental
>> improvements.  A lot of work has been put into this already to try to
>> figure out which pieces are essential to make alternative shuffle storage
>> implementations feasible.
>>
>> Of course, that means it doesn't include everything imaginable; some
>> things still aren't supported, and some will still choose to use the older
>> ShuffleManager api to give total control over all of shuffle.  But we know
>> there are a reasonable set of things which can be implemented behind the
>> api as the first step, and it can continue to evolve.
>>
>>
>>
>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:
>>
>> +1 (non-binding). This API is versatile and flexible enough to handle
>> Bloomberg's internal use-cases. The ability for us to vary implementation
>> strategies is quite appealing. It is also worth to note the minimal changes
>> to Spark core in order to make it work. This is a very much needed addition
>> within the Spark shuffle story.
>>
>>
>>
>> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>>
>> +1 This is great work, allowing plugin of different sort shuffle
>> write/read implementation! Also great to see it retain the current Spark
>> configuration
>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>
>>
>>
>>
>>
>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>>
>> Hi everyone,
>>
>>
>>
>> I would like to call a vote for the SPIP for SPARK-25299
>> [issues.apache.org]
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=LS6AKX38P5DW6ffk9u5MUvRBEAlAHiA3Ud2KODpWkQU=>,
>> which proposes to 

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-18 Thread John Zhuge
+1 (non-binding)  Great work!

On Tue, Jun 18, 2019 at 6:22 AM Vinoo Ganesh  wrote:

> +1 (non-binding).
>
>
>
> Thanks for pushing this forward, Matt and Yifei.
>
>
>
> *From: *Felix Cheung 
> *Date: *Tuesday, June 18, 2019 at 00:01
> *To: *Yinan Li , "rb...@netflix.com" <
> rb...@netflix.com>
> *Cc: *Dongjoon Hyun , Saisai Shao <
> sai.sai.s...@gmail.com>, Imran Rashid , Ilan
> Filonenko , bo yang , Matt Cheah <
> mch...@palantir.com>, Spark Dev List , "Yifei Huang
> (PD)" , Vinoo Ganesh , Imran
> Rashid 
> *Subject: *Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>
>
>
> +1
>
>
>
> Glad to see the progress in this space - it’s been more than a year since
> the original discussion and effort started.
>
>
> --
>
> *From:* Yinan Li 
> *Sent:* Monday, June 17, 2019 7:14:42 PM
> *To:* rb...@netflix.com
> *Cc:* Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang;
> Matt Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
> *Subject:* Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API
>
>
>
> +1 (non-binding)
>
>
>
> On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue 
> wrote:
>
> +1 (non-binding)
>
>
>
> On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
> wrote:
>
> +1
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
> wrote:
>
> +1 (binding)
>
>
>
> Thanks
>
> Saisai
>
>
>
> Imran Rashid  于2019年6月15日周六 上午3:46写道:
>
> +1 (binding)
>
> I think this is a really important feature for spark.
>
> First, there is already a lot of interest in alternative shuffle storage
> in the community.  There is already a lot of interest in alternative
> shuffle storage, from dynamic allocation in kubernetes, to even just
> improving stability in standard on-premise use of Spark.  However, they're
> often stuck doing this in forks of Spark, and in ways that are not
> maintainable (because they copy-paste many spark internals) or are
> incorrect (for not correctly handling speculative execution & stage
> retries).
>
> Second, I think the specific proposal is good for finding the right
> balance between flexibility and too much complexity, to allow incremental
> improvements.  A lot of work has been put into this already to try to
> figure out which pieces are essential to make alternative shuffle storage
> implementations feasible.
>
> Of course, that means it doesn't include everything imaginable; some
> things still aren't supported, and some will still choose to use the older
> ShuffleManager api to give total control over all of shuffle.  But we know
> there are a reasonable set of things which can be implemented behind the
> api as the first step, and it can continue to evolve.
>
>
>
> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:
>
> +1 (non-binding). This API is versatile and flexible enough to handle
> Bloomberg's internal use-cases. The ability for us to vary implementation
> strategies is quite appealing. It is also worth to note the minimal changes
> to Spark core in order to make it work. This is a very much needed addition
> within the Spark shuffle story.
>
>
>
> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>
> +1 This is great work, allowing plugin of different sort shuffle
> write/read implementation! Also great to see it retain the current Spark
> configuration
> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>
>
>
>
>
> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>
> Hi everyone,
>
>
>
> I would like to call a vote for the SPIP for SPARK-25299
> [issues.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=LS6AKX38P5DW6ffk9u5MUvRBEAlAHiA3Ud2KODpWkQU=>,
> which proposes to introduce a pluggable storage API for temporary shuffle
> data.
>
>
>
> You may find the SPIP document here [docs.google.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n-5F0iMSWwhCQ_edit=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=rCSgQGD6L4of4oa0QxiTJ8IPaVdGlZVarhA4-QvO80Q=>
> .
>
>
>
> The discussion thread for the SPIP was conducted here [lists.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079-40-253Cdev.spark.apache.org-253E=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=kSJizQH7v4OHG6D7aVsLA-m0ApZxOa24CzHZv1EzLxg=>
> .
>
>
>
> Please vote on whether or not this proposal is agreeable to you.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>

-- 
John


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-18 Thread Vinoo Ganesh
+1 (non-binding).

Thanks for pushing this forward, Matt and Yifei.

From: Felix Cheung 
Date: Tuesday, June 18, 2019 at 00:01
To: Yinan Li , "rb...@netflix.com" 
Cc: Dongjoon Hyun , Saisai Shao 
, Imran Rashid , Ilan Filonenko 
, bo yang , Matt Cheah 
, Spark Dev List , "Yifei Huang 
(PD)" , Vinoo Ganesh , Imran Rashid 

Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1

Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.


From: Yinan Li 
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (non-binding)

On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue  wrote:
+1 (non-binding)

On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Bests,
Dongjoon.


On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:
+1 (binding)

Thanks
Saisai

Imran Rashid mailto:im...@therashids.com>> 于2019年6月15日周六 
上午3:46写道:
+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the 
community.  There is already a lot of interest in alternative shuffle storage, 
from dynamic allocation in kubernetes, to even just improving stability in 
standard on-premise use of Spark.  However, they're often stuck doing this in 
forks of Spark, and in ways that are not maintainable (because they copy-paste 
many spark internals) or are incorrect (for not correctly handling speculative 
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance 
between flexibility and too much complexity, to allow incremental improvements. 
 A lot of work has been put into this already to try to figure out which pieces 
are essential to make alternative shuffle storage implementations feasible.

Of course, that means it doesn't include everything imaginable; some things 
still aren't supported, and some will still choose to use the older 
ShuffleManager api to give total control over all of shuffle.  But we know 
there are a reasonable set of things which can be implemented behind the api as 
the first step, and it can continue to evolve.

On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
mailto:i...@cornell.edu>> wrote:
+1 (non-binding). This API is versatile and flexible enough to handle 
Bloomberg's internal use-cases. The ability for us to vary implementation 
strategies is quite appealing. It is also worth to note the minimal changes to 
Spark core in order to make it work. This is a very much needed addition within 
the Spark shuffle story.

On Fri, Jun 14, 2019 at 9:59 AM bo yang 
mailto:bobyan...@gmail.com>> wrote:
+1 This is great work, allowing plugin of different sort shuffle write/read 
implementation! Also great to see it retain the current Spark configuration 
(spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).


On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah 
mailto:mch...@palantir.com>> wrote:
Hi everyone,

I would like to call a vote for the SPIP for SPARK-25299 
[issues.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D25299=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=LS6AKX38P5DW6ffk9u5MUvRBEAlAHiA3Ud2KODpWkQU=>,
 which proposes to introduce a pluggable storage API for temporary shuffle data.

You may find the SPIP document here 
[docs.google.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n-5F0iMSWwhCQ_edit=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=rCSgQGD6L4of4oa0QxiTJ8IPaVdGlZVarhA4-QvO80Q=>.

The discussion thread for the SPIP was conducted here 
[lists.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079-40-253Cdev.spark.apache.org-253E=DwMFJg=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc=UG2t14gfU8QHfoj4tUD__9bIVg1xxTM3R8GHmvMUXTU=kSJizQH7v4OHG6D7aVsLA-m0ApZxOa24CzHZv1EzLxg=>.

Please vote on whether or not this proposal is agreeable to you.

Thanks!

-Matt Cheah


--
Ryan Blue
Software Engineer
Netflix


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-17 Thread Felix Cheung
+1

Glad to see the progress in this space - it’s been more than a year since the 
original discussion and effort started.


From: Yinan Li 
Sent: Monday, June 17, 2019 7:14:42 PM
To: rb...@netflix.com
Cc: Dongjoon Hyun; Saisai Shao; Imran Rashid; Ilan Filonenko; bo yang; Matt 
Cheah; Spark Dev List; Yifei Huang (PD); Vinoo Ganesh; Imran Rashid
Subject: Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

+1 (non-binding)

On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue  wrote:
+1 (non-binding)

On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Bests,
Dongjoon.


On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:
+1 (binding)

Thanks
Saisai

Imran Rashid mailto:im...@therashids.com>> 于2019年6月15日周六 
上午3:46写道:
+1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in the 
community.  There is already a lot of interest in alternative shuffle storage, 
from dynamic allocation in kubernetes, to even just improving stability in 
standard on-premise use of Spark.  However, they're often stuck doing this in 
forks of Spark, and in ways that are not maintainable (because they copy-paste 
many spark internals) or are incorrect (for not correctly handling speculative 
execution & stage retries).

Second, I think the specific proposal is good for finding the right balance 
between flexibility and too much complexity, to allow incremental improvements. 
 A lot of work has been put into this already to try to figure out which pieces 
are essential to make alternative shuffle storage implementations feasible.

Of course, that means it doesn't include everything imaginable; some things 
still aren't supported, and some will still choose to use the older 
ShuffleManager api to give total control over all of shuffle.  But we know 
there are a reasonable set of things which can be implemented behind the api as 
the first step, and it can continue to evolve.

On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
mailto:i...@cornell.edu>> wrote:
+1 (non-binding). This API is versatile and flexible enough to handle 
Bloomberg's internal use-cases. The ability for us to vary implementation 
strategies is quite appealing. It is also worth to note the minimal changes to 
Spark core in order to make it work. This is a very much needed addition within 
the Spark shuffle story.

On Fri, Jun 14, 2019 at 9:59 AM bo yang 
mailto:bobyan...@gmail.com>> wrote:
+1 This is great work, allowing plugin of different sort shuffle write/read 
implementation! Also great to see it retain the current Spark configuration 
(spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).


On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah 
mailto:mch...@palantir.com>> wrote:
Hi everyone,

I would like to call a vote for the SPIP for 
SPARK-25299<https://issues.apache.org/jira/browse/SPARK-25299>, which proposes 
to introduce a pluggable storage API for temporary shuffle data.

You may find the SPIP document 
here<https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>.

The discussion thread for the SPIP was conducted 
here<https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>.

Please vote on whether or not this proposal is agreeable to you.

Thanks!

-Matt Cheah


--
Ryan Blue
Software Engineer
Netflix


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-17 Thread Yinan Li
+1 (non-binding)

On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue  wrote:

> +1 (non-binding)
>
> On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Thanks
>>> Saisai
>>>
>>> Imran Rashid  于2019年6月15日周六 上午3:46写道:
>>>
 +1 (binding)

 I think this is a really important feature for spark.

 First, there is already a lot of interest in alternative shuffle
 storage in the community.  There is already a lot of interest in
 alternative shuffle storage, from dynamic allocation in kubernetes, to even
 just improving stability in standard on-premise use of Spark.  However,
 they're often stuck doing this in forks of Spark, and in ways that are not
 maintainable (because they copy-paste many spark internals) or are
 incorrect (for not correctly handling speculative execution & stage
 retries).

 Second, I think the specific proposal is good for finding the right
 balance between flexibility and too much complexity, to allow incremental
 improvements.  A lot of work has been put into this already to try to
 figure out which pieces are essential to make alternative shuffle storage
 implementations feasible.

 Of course, that means it doesn't include everything imaginable; some
 things still aren't supported, and some will still choose to use the older
 ShuffleManager api to give total control over all of shuffle.  But we know
 there are a reasonable set of things which can be implemented behind the
 api as the first step, and it can continue to evolve.

 On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
 wrote:

> +1 (non-binding). This API is versatile and flexible enough to handle
> Bloomberg's internal use-cases. The ability for us to vary implementation
> strategies is quite appealing. It is also worth to note the minimal 
> changes
> to Spark core in order to make it work. This is a very much needed 
> addition
> within the Spark shuffle story.
>
> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>
>> +1 This is great work, allowing plugin of different sort shuffle
>> write/read implementation! Also great to see it retain the current Spark
>> configuration
>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>
>>
>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah 
>> wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> I would like to call a vote for the SPIP for SPARK-25299
>>> , which proposes
>>> to introduce a pluggable storage API for temporary shuffle data.
>>>
>>>
>>>
>>> You may find the SPIP document here
>>> 
>>> .
>>>
>>>
>>>
>>> The discussion thread for the SPIP was conducted here
>>> 
>>> .
>>>
>>>
>>>
>>> Please vote on whether or not this proposal is agreeable to you.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-17 Thread Ryan Blue
+1 (non-binding)

On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun 
wrote:

> +1
>
> Bests,
> Dongjoon.
>
>
> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao 
> wrote:
>
>> +1 (binding)
>>
>> Thanks
>> Saisai
>>
>> Imran Rashid  于2019年6月15日周六 上午3:46写道:
>>
>>> +1 (binding)
>>>
>>> I think this is a really important feature for spark.
>>>
>>> First, there is already a lot of interest in alternative shuffle storage
>>> in the community.  There is already a lot of interest in alternative
>>> shuffle storage, from dynamic allocation in kubernetes, to even just
>>> improving stability in standard on-premise use of Spark.  However, they're
>>> often stuck doing this in forks of Spark, and in ways that are not
>>> maintainable (because they copy-paste many spark internals) or are
>>> incorrect (for not correctly handling speculative execution & stage
>>> retries).
>>>
>>> Second, I think the specific proposal is good for finding the right
>>> balance between flexibility and too much complexity, to allow incremental
>>> improvements.  A lot of work has been put into this already to try to
>>> figure out which pieces are essential to make alternative shuffle storage
>>> implementations feasible.
>>>
>>> Of course, that means it doesn't include everything imaginable; some
>>> things still aren't supported, and some will still choose to use the older
>>> ShuffleManager api to give total control over all of shuffle.  But we know
>>> there are a reasonable set of things which can be implemented behind the
>>> api as the first step, and it can continue to evolve.
>>>
>>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko 
>>> wrote:
>>>
 +1 (non-binding). This API is versatile and flexible enough to handle
 Bloomberg's internal use-cases. The ability for us to vary implementation
 strategies is quite appealing. It is also worth to note the minimal changes
 to Spark core in order to make it work. This is a very much needed addition
 within the Spark shuffle story.

 On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:

> +1 This is great work, allowing plugin of different sort shuffle
> write/read implementation! Also great to see it retain the current Spark
> configuration
> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>
>
> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah 
> wrote:
>
>> Hi everyone,
>>
>>
>>
>> I would like to call a vote for the SPIP for SPARK-25299
>> , which proposes
>> to introduce a pluggable storage API for temporary shuffle data.
>>
>>
>>
>> You may find the SPIP document here
>> 
>> .
>>
>>
>>
>> The discussion thread for the SPIP was conducted here
>> 
>> .
>>
>>
>>
>> Please vote on whether or not this proposal is agreeable to you.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -Matt Cheah
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-17 Thread Dongjoon Hyun
+1

Bests,
Dongjoon.


On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao  wrote:

> +1 (binding)
>
> Thanks
> Saisai
>
> Imran Rashid  于2019年6月15日周六 上午3:46写道:
>
>> +1 (binding)
>>
>> I think this is a really important feature for spark.
>>
>> First, there is already a lot of interest in alternative shuffle storage
>> in the community.  There is already a lot of interest in alternative
>> shuffle storage, from dynamic allocation in kubernetes, to even just
>> improving stability in standard on-premise use of Spark.  However, they're
>> often stuck doing this in forks of Spark, and in ways that are not
>> maintainable (because they copy-paste many spark internals) or are
>> incorrect (for not correctly handling speculative execution & stage
>> retries).
>>
>> Second, I think the specific proposal is good for finding the right
>> balance between flexibility and too much complexity, to allow incremental
>> improvements.  A lot of work has been put into this already to try to
>> figure out which pieces are essential to make alternative shuffle storage
>> implementations feasible.
>>
>> Of course, that means it doesn't include everything imaginable; some
>> things still aren't supported, and some will still choose to use the older
>> ShuffleManager api to give total control over all of shuffle.  But we know
>> there are a reasonable set of things which can be implemented behind the
>> api as the first step, and it can continue to evolve.
>>
>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:
>>
>>> +1 (non-binding). This API is versatile and flexible enough to handle
>>> Bloomberg's internal use-cases. The ability for us to vary implementation
>>> strategies is quite appealing. It is also worth to note the minimal changes
>>> to Spark core in order to make it work. This is a very much needed addition
>>> within the Spark shuffle story.
>>>
>>> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>>>
 +1 This is great work, allowing plugin of different sort shuffle
 write/read implementation! Also great to see it retain the current Spark
 configuration
 (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).


 On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:

> Hi everyone,
>
>
>
> I would like to call a vote for the SPIP for SPARK-25299
> , which proposes
> to introduce a pluggable storage API for temporary shuffle data.
>
>
>
> You may find the SPIP document here
> 
> .
>
>
>
> The discussion thread for the SPIP was conducted here
> 
> .
>
>
>
> Please vote on whether or not this proposal is agreeable to you.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>



Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-16 Thread Saisai Shao
+1 (binding)

Thanks
Saisai

Imran Rashid  于2019年6月15日周六 上午3:46写道:

> +1 (binding)
>
> I think this is a really important feature for spark.
>
> First, there is already a lot of interest in alternative shuffle storage
> in the community.  There is already a lot of interest in alternative
> shuffle storage, from dynamic allocation in kubernetes, to even just
> improving stability in standard on-premise use of Spark.  However, they're
> often stuck doing this in forks of Spark, and in ways that are not
> maintainable (because they copy-paste many spark internals) or are
> incorrect (for not correctly handling speculative execution & stage
> retries).
>
> Second, I think the specific proposal is good for finding the right
> balance between flexibility and too much complexity, to allow incremental
> improvements.  A lot of work has been put into this already to try to
> figure out which pieces are essential to make alternative shuffle storage
> implementations feasible.
>
> Of course, that means it doesn't include everything imaginable; some
> things still aren't supported, and some will still choose to use the older
> ShuffleManager api to give total control over all of shuffle.  But we know
> there are a reasonable set of things which can be implemented behind the
> api as the first step, and it can continue to evolve.
>
> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:
>
>> +1 (non-binding). This API is versatile and flexible enough to handle
>> Bloomberg's internal use-cases. The ability for us to vary implementation
>> strategies is quite appealing. It is also worth to note the minimal changes
>> to Spark core in order to make it work. This is a very much needed addition
>> within the Spark shuffle story.
>>
>> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>>
>>> +1 This is great work, allowing plugin of different sort shuffle
>>> write/read implementation! Also great to see it retain the current Spark
>>> configuration
>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>>
>>>
>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>>>
 Hi everyone,



 I would like to call a vote for the SPIP for SPARK-25299
 , which proposes to
 introduce a pluggable storage API for temporary shuffle data.



 You may find the SPIP document here
 
 .



 The discussion thread for the SPIP was conducted here
 
 .



 Please vote on whether or not this proposal is agreeable to you.



 Thanks!



 -Matt Cheah

>>>


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread Imran Rashid
 +1 (binding)

I think this is a really important feature for spark.

First, there is already a lot of interest in alternative shuffle storage in
the community.  There is already a lot of interest in alternative shuffle
storage, from dynamic allocation in kubernetes, to even just improving
stability in standard on-premise use of Spark.  However, they're often
stuck doing this in forks of Spark, and in ways that are not maintainable
(because they copy-paste many spark internals) or are incorrect (for not
correctly handling speculative execution & stage retries).

Second, I think the specific proposal is good for finding the right balance
between flexibility and too much complexity, to allow incremental
improvements.  A lot of work has been put into this already to try to
figure out which pieces are essential to make alternative shuffle storage
implementations feasible.

Of course, that means it doesn't include everything imaginable; some things
still aren't supported, and some will still choose to use the older
ShuffleManager api to give total control over all of shuffle.  But we know
there are a reasonable set of things which can be implemented behind the
api as the first step, and it can continue to evolve.

On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:

> +1 (non-binding). This API is versatile and flexible enough to handle
> Bloomberg's internal use-cases. The ability for us to vary implementation
> strategies is quite appealing. It is also worth to note the minimal changes
> to Spark core in order to make it work. This is a very much needed addition
> within the Spark shuffle story.
>
> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>
>> +1 This is great work, allowing plugin of different sort shuffle
>> write/read implementation! Also great to see it retain the current Spark
>> configuration
>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>
>>
>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> I would like to call a vote for the SPIP for SPARK-25299
>>> , which proposes to
>>> introduce a pluggable storage API for temporary shuffle data.
>>>
>>>
>>>
>>> You may find the SPIP document here
>>> 
>>> .
>>>
>>>
>>>
>>> The discussion thread for the SPIP was conducted here
>>> 
>>> .
>>>
>>>
>>>
>>> Please vote on whether or not this proposal is agreeable to you.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread Ilan Filonenko
+1 (non-binding). This API is versatile and flexible enough to handle
Bloomberg's internal use-cases. The ability for us to vary implementation
strategies is quite appealing. It is also worth to note the minimal changes
to Spark core in order to make it work. This is a very much needed addition
within the Spark shuffle story.

On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:

> +1 This is great work, allowing plugin of different sort shuffle
> write/read implementation! Also great to see it retain the current Spark
> configuration
> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>
>
> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>
>> Hi everyone,
>>
>>
>>
>> I would like to call a vote for the SPIP for SPARK-25299
>> , which proposes to
>> introduce a pluggable storage API for temporary shuffle data.
>>
>>
>>
>> You may find the SPIP document here
>> 
>> .
>>
>>
>>
>> The discussion thread for the SPIP was conducted here
>> 
>> .
>>
>>
>>
>> Please vote on whether or not this proposal is agreeable to you.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -Matt Cheah
>>
>


Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread bo yang
+1 This is great work, allowing plugin of different sort shuffle write/read
implementation! Also great to see it retain the current Spark configuration
(spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).


On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:

> Hi everyone,
>
>
>
> I would like to call a vote for the SPIP for SPARK-25299
> , which proposes to
> introduce a pluggable storage API for temporary shuffle data.
>
>
>
> You may find the SPIP document here
> 
> .
>
>
>
> The discussion thread for the SPIP was conducted here
> 
> .
>
>
>
> Please vote on whether or not this proposal is agreeable to you.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>


[VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-13 Thread Matt Cheah
Hi everyone,

 

I would like to call a vote for the SPIP for SPARK-25299, which proposes to 
introduce a pluggable storage API for temporary shuffle data.

 

You may find the SPIP document here.

 

The discussion thread for the SPIP was conducted here.

 

Please vote on whether or not this proposal is agreeable to you.

 

Thanks!

 

-Matt Cheah



smime.p7s
Description: S/MIME cryptographic signature