[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998181#comment-13998181 ] Hudson commented on MAPREDUCE-5821: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/]) MAPREDUCE-5821. Avoid unintentional reallocation of byte arrays in segments during merge. Contributed by Todd Lipcon (cdouglas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594654) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998219#comment-13998219 ] Hudson commented on MAPREDUCE-5821: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/]) MAPREDUCE-5821. Avoid unintentional reallocation of byte arrays in segments during merge. Contributed by Todd Lipcon (cdouglas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594654) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999011#comment-13999011 ] Hudson commented on MAPREDUCE-5821: --- SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5605/]) MAPREDUCE-5821. Avoid unintentional reallocation of byte arrays in segments during merge. Contributed by Todd Lipcon (cdouglas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594654) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967652#comment-13967652 ] Chris Douglas commented on MAPREDUCE-5821: -- +1 This looks like the intended behavior from HADOOP-5494 Good catch IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963651#comment-13963651 ] Chris Douglas commented on MAPREDUCE-5821: -- Sure, I can take a look later this week if it can wait. IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963703#comment-13963703 ] Todd Lipcon commented on MAPREDUCE-5821: no rush. It's been there for years, so what's another week? :) IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960190#comment-13960190 ] Todd Lipcon commented on MAPREDUCE-5821: The issue is that, if the input buffer doesn't have room for the value, it will allocate a new array. But, whenever the Merger calls nextRawValue on a disk file, it always first resets the buffer to {{diskIFileValue.getData()}} which is empty. So, at the entry to that function, the length is always 0, and the code path which reallocs is always taken. IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960213#comment-13960213 ] Hadoop QA commented on MAPREDUCE-5821: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638732/after-patch.png against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4485//console This message is automatically generated. IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960314#comment-13960314 ] Karthik Kambatla commented on MAPREDUCE-5821: - Patch looks good to me. [~tlipcon] - mind updating the patch to apply against trunk? IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960824#comment-13960824 ] Hadoop QA commented on MAPREDUCE-5821: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638796/mapreduce-5821.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4487//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4487//console This message is automatically generated. IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960866#comment-13960866 ] Todd Lipcon commented on MAPREDUCE-5821: No new tests because this is a performance fix. [~chris.douglas] - if you have a spare minute, want to take a look at this? I think you were the one who worked on this area of the code back in 2009. IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)