[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-10-26 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184576#comment-14184576
 ] 

Patrick Wendell commented on SPARK-2532:


Hey [~matei] - you created some sub-tasks here that are pretty tersely 
described... would you mind looking through them and deciding whether these are 
still relevant? Not sure whether we can close this.

 Fix issues with consolidated shuffle
 

 Key: SPARK-2532
 URL: https://issues.apache.org/jira/browse/SPARK-2532
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 1.1.0
 Environment: All
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Critical

 Will file PR with changes as soon as merge is done (earlier merge became 
 outdated in 2 weeks unfortunately :) ).
 Consolidated shuffle is broken in multiple ways in spark :
 a) Task failure(s) can cause the state to become inconsistent.
 b) Multiple revert's or combination of close/revert/close can cause the state 
 to be inconsistent.
 (As part of exception/error handling).
 c) Some of the api in block writer causes implementation issues - for 
 example: a revert is always followed by close : but the implemention tries to 
 keep them separate, resulting in surface for errors.
 d) Fetching data from consolidated shuffle files can go badly wrong if the 
 file is being actively written to : it computes length by subtracting next 
 offset from current offset (or length if this is last offset)- the latter 
 fails when fetch is happening in parallel to write.
 Note, this happens even if there are no task failures of any kind !
 This usually results in stream corruption or decompression errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-09-25 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148666#comment-14148666
 ] 

Andrew Ash commented on SPARK-2532:
---

[~pwendell] should we close this ticket and track the individual items 
separately?  It sounds like we should expect consolidated shuffle to work in 
1.1 and any issues should have separate tickets filed for them.  I know 
[~mridulm80] has several fixes on his branch that should be cherry picked over 
at some point as well though.

 Fix issues with consolidated shuffle
 

 Key: SPARK-2532
 URL: https://issues.apache.org/jira/browse/SPARK-2532
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 1.1.0
 Environment: All
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Critical

 Will file PR with changes as soon as merge is done (earlier merge became 
 outdated in 2 weeks unfortunately :) ).
 Consolidated shuffle is broken in multiple ways in spark :
 a) Task failure(s) can cause the state to become inconsistent.
 b) Multiple revert's or combination of close/revert/close can cause the state 
 to be inconsistent.
 (As part of exception/error handling).
 c) Some of the api in block writer causes implementation issues - for 
 example: a revert is always followed by close : but the implemention tries to 
 keep them separate, resulting in surface for errors.
 d) Fetching data from consolidated shuffle files can go badly wrong if the 
 file is being actively written to : it computes length by subtracting next 
 offset from current offset (or length if this is last offset)- the latter 
 fails when fetch is happening in parallel to write.
 Note, this happens even if there are no task failures of any kind !
 This usually results in stream corruption or decompression errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-08-01 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082878#comment-14082878
 ] 

Matei Zaharia commented on SPARK-2532:
--

I'm going to create a few sub-tasks for the major improvements here to make it 
easier to put some of them in 1.1 and leave others for later.

 Fix issues with consolidated shuffle
 

 Key: SPARK-2532
 URL: https://issues.apache.org/jira/browse/SPARK-2532
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: All
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Critical
 Fix For: 1.1.0


 Will file PR with changes as soon as merge is done (earlier merge became 
 outdated in 2 weeks unfortunately :) ).
 Consolidated shuffle is broken in multiple ways in spark :
 a) Task failure(s) can cause the state to become inconsistent.
 b) Multiple revert's or combination of close/revert/close can cause the state 
 to be inconsistent.
 (As part of exception/error handling).
 c) Some of the api in block writer causes implementation issues - for 
 example: a revert is always followed by close : but the implemention tries to 
 keep them separate, resulting in surface for errors.
 d) Fetching data from consolidated shuffle files can go badly wrong if the 
 file is being actively written to : it computes length by subtracting next 
 offset from current offset (or length if this is last offset)- the latter 
 fails when fetch is happening in parallel to write.
 Note, this happens even if there are no task failures of any kind !
 This usually results in stream corruption or decompression errors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080391#comment-14080391
 ] 

Apache Spark commented on SPARK-2532:
-

User 'aarondav' has created a pull request for this issue:
https://github.com/apache/spark/pull/1678

 Fix issues with consolidated shuffle
 

 Key: SPARK-2532
 URL: https://issues.apache.org/jira/browse/SPARK-2532
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: All
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Critical
 Fix For: 1.1.0


 Will file PR with changes as soon as merge is done (earlier merge became 
 outdated in 2 weeks unfortunately :) ).
 Consolidated shuffle is broken in multiple ways in spark :
 a) Task failure(s) can cause the state to become inconsistent.
 b) Multiple revert's or combination of close/revert/close can cause the state 
 to be inconsistent.
 (As part of exception/error handling).
 c) Some of the api in block writer causes implementation issues - for 
 example: a revert is always followed by close : but the implemention tries to 
 keep them separate, resulting in surface for errors.
 d) Fetching data from consolidated shuffle files can go badly wrong if the 
 file is being actively written to : it computes length by subtracting next 
 offset from current offset (or length if this is last offset)- the latter 
 fails when fetch is happening in parallel to write.
 Note, this happens even if there are no task failures of any kind !
 This usually results in stream corruption or decompression errors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2532) Fix issues with consolidated shuffle

2014-07-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075639#comment-14075639
 ] 

Apache Spark commented on SPARK-2532:
-

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/1609

 Fix issues with consolidated shuffle
 

 Key: SPARK-2532
 URL: https://issues.apache.org/jira/browse/SPARK-2532
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: All
Reporter: Mridul Muralidharan
Assignee: Mridul Muralidharan
Priority: Critical
 Fix For: 1.1.0


 Will file PR with changes as soon as merge is done (earlier merge became 
 outdated in 2 weeks unfortunately :) ).
 Consolidated shuffle is broken in multiple ways in spark :
 a) Task failure(s) can cause the state to become inconsistent.
 b) Multiple revert's or combination of close/revert/close can cause the state 
 to be inconsistent.
 (As part of exception/error handling).
 c) Some of the api in block writer causes implementation issues - for 
 example: a revert is always followed by close : but the implemention tries to 
 keep them separate, resulting in surface for errors.
 d) Fetching data from consolidated shuffle files can go badly wrong if the 
 file is being actively written to : it computes length by subtracting next 
 offset from current offset (or length if this is last offset)- the latter 
 fails when fetch is happening in parallel to write.
 Note, this happens even if there are no task failures of any kind !
 This usually results in stream corruption or decompression errors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)