[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-10 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241788#comment-14241788
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

[~eepayne], thanks for updating the patch. It might have gone stale. Please 
check and rebase it.
{code}
The patch does not appear to apply with p0 to p2
{code}

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt, MAPREDUCE-6166.v3.txt, 
> MAPREDUCE-6166.v4.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241681#comment-14241681
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org
  against trunk revision 2e98ad3.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5071//console

This message is automatically generated.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt, MAPREDUCE-6166.v3.txt, 
> MAPREDUCE-6166.v4.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237297#comment-14237297
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685532/MAPREDUCE-6166-gera-missing-cs-test.patch
  against trunk revision 1b3bb9e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 72 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5062//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5062//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5062//console

This message is automatically generated.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166-gera-missing-cs-test.patch, 
> MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt, 
> MAPREDUCE-6166.v3.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-06 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237016#comment-14237016
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

Thanks, It makes sense, [~eepayne].  I just wanted to confirm that there are no 
existing tests catching missing checksum in on-disk shuffle ([confirmed!|by 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?focusedCommentId=14236700&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236700]).

For the production code I have only one comment:
I am now convinced {{FileSystem.getLocal(conf)}} should be 
{{FileSystem.getLocal(conf).getRaw()}} 
In the corresponding {{OnDiskMapOutput}} constructor.

{{TestFetcher#testCorruptedFiles}} comments:

Nit: Lower-case {{FETCHER}} because it's a final variable.

{{FileSystem fs}} need to be changed {{FileSystem.getLocal(conf).getRaw()}}} 
just like in production code, and it could be made an instance variable because 
we can use it for cleanup in {{tearDown}}.

{{Path p}} should be more mnemonic, and it it's better to use a root directory 
that matches the test method so we can use it for cleanup in {{tearDown}}
{code}
Path outputPath = new Path(name.getMethodName() + "/foo");
{code}

Instead of reverse-engineering the path {{shuffledToDisk}}, we can use 
{code}Path shuffledToDisk = OnDiskMapOutput.getTempPath(outputPath, 
fetcher);{code}

{quote}
{code}
457 ios.write(mapData.getBytes());
458 ios.close();
{code}
{quote}
{{ios.close()}} should be in a finally block.

{quote}
{code}
476 bin = new ByteArrayInputStream(corrupted);
477 // Read past the shuffle header.
478 bin.read(new byte[headerSize], 0, headerSize);
{code}
{quote}
Move lines 477-478 inside the following try on 480.

Drop {{fs.deleteOnExit}}. It comes too late in case there was an exception 
before, we should better do cleanup inside the {{tearDown}} method.

{quote}
{code}
491 IFileInputStream iFin = new IFileInputStream(
492 new FileInputStream(shuffledToDisk.toString()), dataSize, job);
{code}
{quote}
It's probably better not to mix in java.io API if we already use Hadoop 
FileSystem API. Why not do:
{code}
491 IFileInputStream iFin = new 
IFileInputStream(fs.open(shuffledToDisk), dataSize, job); 
{code}


{quote}
{code}
493 iFin.read(new byte[dataSize], 0, dataSize);
494 iFin.close();
495 fs.close();
{code}
{quote}
Make sure to put 493, 494 in try/finally, accordingly.
Since we are getting rid of {{fs.deleteOnExit}} we don't need {{fs.close}}


> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166-gera-missing-cs-test.patch, 
> MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt, 
> MAPREDUCE-6166.v3.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236822#comment-14236822
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

Thank you very much, [~jira.shegalov]
{quote}
I'am uploading a modified patch based on my previous review only with the 
intention to see what if any tests would catch a missing checksum in 
ondisk-shuffle.
{quote}
The last segment of the test I added ({{TestFetcher#testCorruptedIFile}}) will 
catch that the checksum is missing or incorrect when it tries to read the IFile 
that was shuffled to disk by {{OnDiskMapOutput#shuffle}}

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166-gera-missing-cs-test.patch, 
> MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt, 
> MAPREDUCE-6166.v3.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236700#comment-14236700
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685532/MAPREDUCE-6166-gera-missing-cs-test.patch
  against trunk revision e227fb8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5061//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5061//console

This message is automatically generated.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166-gera-missing-cs-test.patch, 
> MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt, 
> MAPREDUCE-6166.v3.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235543#comment-14235543
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12685168/MAPREDUCE-6166.v3.txt
  against trunk revision 0653918.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5060//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5060//console

This message is automatically generated.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt, MAPREDUCE-6166.v3.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-04 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234498#comment-14234498
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

[~eepayne], thanks for your reproducer. Would you mind uploading your patch to 
run it through Jenkins. If there is no test catching it, we should add it. Now 
it is more clear why we would need the checksum via IFile .  Now I feel more 
strongly that we need to get rid of the checksumming LocalFileSystem just like 
in MapTask.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234461#comment-14234461
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

Thanks [~jira.shegalov], for taking time to investigate this issue.

The unit tests are not catching this. I am testing this in a 10-node secure 
cluster.

I am running wordcount on a file that 1) has no repeated words and 2) is large 
enough to ensure that at least some of the map outputs are shuffled to disk:
{code}
$ $HADOOP_PREFIX/bin/hadoop fs -cat 
Input/NoRecurringWords/NoRecurringWords-part0.txt | wc -l -w
 4008920 4008920
$ $HADOOP_PREFIX/bin/hadoop jar 
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-$HADOOP_VERSION.jar
 wordcount Input/NoRecurringWords/NoRecurringWords-part0.txt Output/01
{code}

 If I implement the fix by adjusting {{bytesLeft}} and leaving {{input.read()}} 
alone, the {{OnDiskMapOutput#shuffle}} does succeed and writes the map output 
to disk in a temporary location. However, when the {{Merger}} goes to read that 
temporary file (via {{RawKVIteratorReader}} in {{MergeManager}}) , it fails 
with the following exception:
{code}
2014-12-04 17:47:12,040 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : org.apache.hadoop.fs.ChecksumException: Checksum Error
at 
org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:228)
at 
org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:152)
at 
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:127)
at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at 
org.apache.hadoop.io.IOUtils.wrappedReadForCompressedData(IOUtils.java:170)
at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:378)
at org.apache.hadoop.mapred.IFile$Reader.nextRawKey(IFile.java:426)
at org.apache.hadoop.mapred.Merger$Segment.nextRawKey(Merger.java:337)
at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:519)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:547)
at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:601)
...
{code}
This is because the checksum is not on the end of the temporary file.
On the other hand, if I leave {{bytesLeft}} alone and instead call 
{{((IFileInputStream)input).readWithChecksum}}, the reducers all succeed. This 
is because {{readWithChecksum}} not only compares the input against the 
checksum, it also includes the checksum at the end of the byte buffer.

Please let me know if that makes sense.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233666#comment-14233666
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

Hi [~eepayne], sorry for the delay. I knew what modifications you were talking 
about. But I did not have the time to verify and convince myself whether this 
double checksumming was really needed in the Merger. I did though run a version 
of the patch that would implement my suggestion above through a couple of UT 
and did not see any issues. 
{code}
mapreduce.task.reduce.TestFetcher
mapreduce.task.reduce.TestMergeManager
mapreduce.task.reduce.TestMerger
{code}
That's why I hoped you'd point me where a failure occurs. That's my current 
status about it. I hope to get back to it soon again.  

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233613#comment-14233613
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

[~jira.shegalov], did that make sense?

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230426#comment-14230426
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

I'm sorry. The second code snippet should have been this:
{code}
input = new IFileInputStream(input, compressedLength, conf);
// Copy data to local-disk
long bytesLeft = compressedLength;
try {
  final int BYTES_TO_READ = 64 * 1024;
  byte[] buf = new byte[BYTES_TO_READ];
  while (bytesLeft > 0) {
int n = ((IFileInputStream)input).readWithChecksum(buf, 0, (int) 
Math.min(bytesLeft, BYTES_TO_READ));
...
{code}

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-12-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230407#comment-14230407
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

[~jira.shegalov], I'm sorry for not being clear.
{quote}
Can you clarify where in the code it's required to keep the original checksum?
Then this contents are written out using {{LocalFileSystem}}, which will create 
again an on-disk checksum because it's based on {{ChecksumFileSystem}}.
{quote}
I don't think the {{IFile}} format is related to {{ChecksumFileSystem}}.

The {{IFile}} checksum is expected to be the last 4 bytes of the {{IFile}}, and 
if we use {{input.read}} as below, those 4 bytes of checksum are not copied 
into {{buf}}:

{code}
input = new IFileInputStream(input, compressedLength, conf);
// Copy data to local-disk
long bytesLeft = compressedLength - ((IFileInputStream)input).getSize();
try {
  final int BYTES_TO_READ = 64 * 1024;
  byte[] buf = new byte[BYTES_TO_READ];
  while (bytesLeft > 0) {
int n = input.read(buf, 0, (int) Math.min(bytesLeft, BYTES_TO_READ));
...
{code}

However, if we use {{readWithChecksum}} as below, the checksum is copied into 
{{buf}}:
{code}
input = new IFileInputStream(input, compressedLength, conf);
// Copy data to local-disk
long bytesLeft = compressedLength;
try {
  final int BYTES_TO_READ = 64 * 1024;
  byte[] buf = new byte[BYTES_TO_READ];
  while (bytesLeft > 0) {
int n = ((IFileInputStream)input).read(buf, 0, (int) 
Math.min(bytesLeft, BYTES_TO_READ));
...
{code}

Without those last 4 bytes of checksum on the end of the {{IFile}} format, the 
final read will fail during the last merge pass with a chedksum error.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-30 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229432#comment-14229432
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

Thanks for commenting [~eepayne]!
bq. Since OnDiskMapOutput is shuffling the whole IFile to disk, the checksum is 
needed later during the last merge pass when the IFile contents are read again 
and decompressed.

Can you clarify where in the code it's required to keep the original checksum?

What I see is that after your modifications, {{OnDiskMapOutput}} is guaranteed 
to validate the contents of the destination buffer against the remote checksum. 
Then this contents are written out using {{LocalFileSystem}}, which will create 
again an on-disk checksum because it's based on {{ChecksumFileSystem}}. Are you 
proposing an optimization that the checksum is not computed twice when 
shuffling straight to disk by using {{RawLocalFileSystem}}? Can we defer it to 
another JIRA?

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226753#comment-14226753
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

Just to clarify, neither {{read}} nor {{readWithChecksum}} writes anything to 
disk. They both read data into the byte buffer, which then is written to disk 
by {{shuffle}}.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226745#comment-14226745
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

[~jira.shegalov], I have one question about re-using the {{input.read}} code in 
{{OnDisMapOutput}}.
{quote}
We can set 
{code}
long bytesLeft = compressedLength - ((IFileInputStream)input).getSize()
{code}
Then we don't need to touch the line {{input.read}} to do {{readWithChecksum}}
{quote}
In this case, {{input.read}} does not write the checksum to the disk while 
{{readWithChecksum}} will write it.

Since {{OnDiskMapOutput}} is shuffling the whole IFile to disk, the checksum is 
needed later during the last merge pass when the IFile contents are read again 
and decompressed.

If we were to implement {{input.read}} as above, it looks like we would still 
need to add something like the following in order to put the checksum on the 
disk:
{code}
disk.write(((IFileInputStream)input).getChecksum(), 0, (int) 
((IFileInputStream)input).getSize());
{code}

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226322#comment-14226322
 ] 

Eric Payne commented on MAPREDUCE-6166:
---

[~jira.shegalov]], thank you very much for your detailed analysis of this patch.

I have opened MAPREDUCE-6174 to cover the parent class for 
{{InMemoryMapOutput}} and {{OnDiskMapOutput}}, and I will continue to work on 
the above-mentioned code points.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-25 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225864#comment-14225864
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

Sounds good, [~eepayne]. 
I have a few comments then. Some are in light of a follow-up JIRA. 
bq. update this patch with the final keyword on JobConf jobConf
Let us make the instance variable type a more general {{Configuration}} as we 
are not doing anything specific to {{JobConf}}.

Instead of introducing a new local variable iFin in 
{{OnDiskMapOutput#shuffle}}, we can overwrite it as in 
{{InMemoryMapOutput#shuffle}}.
We can either capture the shuffle size in an instance variable as 
{{InMemoryMapOutput}} does implicitly via {{memory.length}}. Or we can set 
{code}
long bytesLeft = compressedLength - ((IFileInputStream)input).getSize()
{code}
Then we don't need to touch the line {{input.read}} to do {{readWithChecksum}}

Good call adding {{finally}} with {{close}}.

I also have some comments for the test:
{{ios.finish()}} should be removed because it's redundant: 
{{IFileOutputStream#close()}} will call it as well.

We don't need PrintStream wrapping and we need to be careful not to leak file 
descriptors in case I/O fails.
{code}
   new PrintStream(fout).print(bout.toString());
   fout.close();
{code}
Should be something like:
{code}
try {
  fout.write(bout.toByteArray());
} finally {
  fout.close();
}
{code}

Similarly we need to make sure that {{fin.close()}} is in a try-finally block 
enclosing header and shuffle read.
Let us not do
{code}
 catch(Exception e) {
 fail("OnDiskMapOutput.shuffle did not process the map partition file");
{code}
It's redundant because the exception is failing the test already.

Same PrintStream and fout.close remarks for the code creating the corrupted file
{{dataSize/2}}: I believe Sun Java Coding Style require spaces around 
arithmetic operations.

In the fragment where we expect the checksum to fail, {{fin.close()}} should be 
in some finally.
{{catch(Exception e)}} is too broad. Let us be more specific and maybe even log 
it:
{code}
  } catch(ChecksumException e) {
LOG.info("Expected checksum exception thrown.", e);
  }
{code} 

Thinking a bit more about the file.out, it does not seem to be cleaned up after 
the test has finished. But we probably don't even need to create files, we can 
simply use {{new ByteArrayInputStream(bout.toByteArray())}} and {{new 
ByteArrayInputStream(corrupted)}} as input. 

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224834#comment-14224834
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683592/MAPREDUCE-6166.v2.201411251627.txt
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5050//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5050//console

This message is automatically generated.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
> MAPREDUCE-6166.v2.201411251627.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-24 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223448#comment-14223448
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

This patch adds one more common instance field, configuration: {{JobConf 
jobConf}} :). It should be final by the way.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223433#comment-14223433
 ] 

Jason Lowe commented on MAPREDUCE-6166:
---

I'm not sure we need all the boilerplate of an extra class to save one line of 
code (two if we count the MergeManager member), and I'm not sure that extra 
class alone will make it clear that MapOutput can be used externally.  IMHO if 
we want to do this kind of refactoring then that can be done as another JIRA.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-24 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223368#comment-14223368
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

Hi [~jlowe], thanks for pointing out the 3rd-party use cases, I completely 
forgot about it. So how about we make it explicit that InMemoryMapOutput and 
OnDiskMapOutput are different from 3rd-party (so I don't forget it next time) 
by having it subclass a common class. We can put there the common 
IFileInputStream wrapping logic, and maybe even move {{private final 
MergeManagerImpl merger;}}.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223009#comment-14223009
 ] 

Jason Lowe commented on MAPREDUCE-6166:
---

I'd be a little wary of doing this.  I believe the MergeManager and MapOutput 
classes are being used by third-party software like SyncSort, see 
MAPREDUCE-4808, MAPREDUCE-4039, and related JIRAs.  By changing the input 
stream being passed to mapOutput.shuffle to an IFileInputStream then calling 
read() on the data subtly changes the behavior.  Before it was an IFileInput 
stream, calling read() would read all the data and the checksum.  After it's 
wrapped at a higher level it won't.  If the third-party software is itself 
wrapping the stream with IFileInputStream to handle the trailing checksum then 
after this change the stream would be double-wrapped and checksum verification 
would fail.


> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-22 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222194#comment-14222194
 ] 

Gera Shegalov commented on MAPREDUCE-6166:
--

Hi [~eepayne], thanks for reporting the issue. If you look at 
{{InMemoryMapOutput#shuffle}}, first thing it does is overwriting the passed 
InputStream with the IFileInputStream-wrapped version of it. So if we simply 
move this logic from there to the caller of {{mapOutput.shuffle}}, i.e., 
{{Fetcher#setupShuffleConnection}}, this common behavior is automatically 
consumed by both InMemory and OnDisk and we don't have to modify the latter.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222161#comment-14222161
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683171/MAPREDUCE-6166.v1.201411221941.txt
  against trunk revision a4df9ee.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5044//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5044//console

This message is automatically generated.

> Reducers do not catch bad map output transfers during shuffle if data 
> shuffled directly to disk
> ---
>
> Key: MAPREDUCE-6166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: MAPREDUCE-6166.v1.201411221941.txt
>
>
> In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
> map partition output gets corrupted on disk on the map side. If this 
> corrupted map output is too large to shuffle in memory, the reducer streams 
> it to disk without validating the checksum. In jobs this large, it could take 
> hours before the reducer finally tries to read the corrupted file and fails. 
> Since retries of the failed reduce attempt will also take hours, this delay 
> in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)