[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5601: -- Summary: ShuffleHandler fadvises file regions as DONTNEED even when fetch fails (was: Fetches when reducer can't fit them result in unnecessary reads on shuffle server) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. Unfortunately, this has some consequences on the server side - it forces unnecessary disk and network IO as the server begins to read the output data that will go nowhere. Also, when the channel is closed, it triggers an fadvise DONTNEED that causes the data region to be evicted from the OS page cache. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. The fix would be to reserve space in the reducer before fetching the data. Currently the fetching the size of the data and fetching the actual data happen in the same HTTP request. Fixing it would require doing these in separate HTTP requests. Or transferring the sizes through the AM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5601: -- Description: When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. was: When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. Unfortunately, this has some consequences on the server side - it forces unnecessary disk and network IO as the server begins to read the output data that will go nowhere. Also, when the channel is closed, it triggers an fadvise DONTNEED that causes the data region to be evicted from the OS page cache. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. The fix would be to reserve space in the reducer before fetching the data. Currently the fetching the size of the data and fetching the actual data happen in the same HTTP request. Fixing it would require doing these in separate HTTP requests. Or transferring the sizes through the AM. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5601: -- Attachment: MAPREDUCE-5601.patch ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5601: -- Status: Patch Available (was: Open) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808844#comment-13808844 ] Sandy Ryza commented on MAPREDUCE-5601: --- Attached a patch that fixes the problem by only fadvising as DONTNEED if the Netty transfer completes successfully. With the patch applied, the average reducer shuffle time for my job goes down from 80 seconds to 34, on par with MR1. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808849#comment-13808849 ] Hadoop QA commented on MAPREDUCE-5601: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611003/MAPREDUCE-5601.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4154//console This message is automatically generated. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5601: -- Attachment: MAPREDUCE-5601.patch ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808859#comment-13808859 ] Hadoop QA commented on MAPREDUCE-5601: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4155//console This message is automatically generated. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5602) cygwin path error
Amit Cahanovich created MAPREDUCE-5602: -- Summary: cygwin path error Key: MAPREDUCE-5602 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5602 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.0.6-alpha Environment: cygwin Reporter: Amit Cahanovich the path for a file is received wrong, due to the fact that code is not taking into consideration cigwyn. /hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/TaskLog.java: static final String USERLOGS_DIR_NAME = userlogs; the outcome of it is: C:\cygwin\home\AMITCA\hadoop-2.0.6-alpha\logs/userlogs is not a valid path -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey
[ https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808966#comment-13808966 ] Hudson commented on MAPREDUCE-5598: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/378/]) MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed by Robert Kanter (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java TestUserDefinedCounters.testMapReduceJob is flakey -- Key: MAPREDUCE-5598 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk, 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 3.0.0, 2.3.0, 2.2.1 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch {{TestUserDefinedCounters.testMapReduceJob}} is flakey. We sometimes see it fail: {noformat} junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113) {noformat} Upon investigation, the problem is that the input for the MR job in this test is at {{System.getProperty(test.build.data, /tmp) + /input}}. If an earlier test wrote some files there, this test will use them as part of its input. This can cause all sorts of problems with this test because its not expecting the additional input data. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808963#comment-13808963 ] Hudson commented on MAPREDUCE-5596: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/378/]) MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle connections. Contributed by Sandy Ryza (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java Allow configuring the number of threads used to serve shuffle connections - Key: MAPREDUCE-5596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 3.0.0, 2.3.0, 2.2.1 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch MR1 had mapreduce.tasktracker.http.threads. MR2 always uses the Netty default 2 * Runtime.availableProcessors(). We should make this configurable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808996#comment-13808996 ] Hudson commented on MAPREDUCE-5596: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/]) MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle connections. Contributed by Sandy Ryza (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java Allow configuring the number of threads used to serve shuffle connections - Key: MAPREDUCE-5596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 3.0.0, 2.3.0, 2.2.1 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch MR1 had mapreduce.tasktracker.http.threads. MR2 always uses the Netty default 2 * Runtime.availableProcessors(). We should make this configurable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey
[ https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808999#comment-13808999 ] Hudson commented on MAPREDUCE-5598: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/]) MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed by Robert Kanter (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java TestUserDefinedCounters.testMapReduceJob is flakey -- Key: MAPREDUCE-5598 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk, 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 3.0.0, 2.3.0, 2.2.1 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch {{TestUserDefinedCounters.testMapReduceJob}} is flakey. We sometimes see it fail: {noformat} junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113) {noformat} Upon investigation, the problem is that the input for the MR job in this test is at {{System.getProperty(test.build.data, /tmp) + /input}}. If an earlier test wrote some files there, this test will use them as part of its input. This can cause all sorts of problems with this test because its not expecting the additional input data. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809036#comment-13809036 ] Todd Lipcon commented on MAPREDUCE-5601: Good find. One question: could we improve this even further by having the client send a header like Max-response-size: bytes, and then have the server avoid doing any IO for the case where the client is going to abandon the request anyway? Seems like we might be incurring extra seeks in some cases due to the behavior you described above. It would be unrelated to this JIRA, just thought of it now. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections
[ https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809055#comment-13809055 ] Hudson commented on MAPREDUCE-5596: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/]) MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle connections. Contributed by Sandy Ryza (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java Allow configuring the number of threads used to serve shuffle connections - Key: MAPREDUCE-5596 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 3.0.0, 2.3.0, 2.2.1 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch MR1 had mapreduce.tasktracker.http.threads. MR2 always uses the Netty default 2 * Runtime.availableProcessors(). We should make this configurable. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey
[ https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809058#comment-13809058 ] Hudson commented on MAPREDUCE-5598: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/]) MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed by Robert Kanter (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java TestUserDefinedCounters.testMapReduceJob is flakey -- Key: MAPREDUCE-5598 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk, 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 3.0.0, 2.3.0, 2.2.1 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch {{TestUserDefinedCounters.testMapReduceJob}} is flakey. We sometimes see it fail: {noformat} junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113) {noformat} Upon investigation, the problem is that the input for the MR job in this test is at {{System.getProperty(test.build.data, /tmp) + /input}}. If an earlier test wrote some files there, this test will use them as part of its input. This can cause all sorts of problems with this test because its not expecting the additional input data. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809418#comment-13809418 ] Sandy Ryza commented on MAPREDUCE-5601: --- Was worried about that as well. But the fetcher doesn't know whether it's going to abandon the request before it sends it. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809423#comment-13809423 ] Sandy Ryza commented on MAPREDUCE-5601: --- Or you're saying we would pass the amount of unreserved memory remaining? ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
Jason Lowe created MAPREDUCE-5603: - Summary: Ability to disable FileInputFormat listLocatedStatus optimization to save client memory Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 2.2.0, 0.23.10 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809537#comment-13809537 ] Jason Lowe commented on MAPREDUCE-5603: --- Recently we ran across a jobclient that failed with an OOM error once we updated the cluster to 0.23.10. The OOM was triggered by the FileInputFormat listLocatedStatus optimization from MAPREDUCE-1981, as the client now caches the BlockLocations of all files along with the FileStatus objects it was caching before. Normally the user can bump the heap size of the client to work around this issue. However if a job has an input with a particularly large number of BlockLocations, as this job did, it would be nice if the user had the option to disable the optimization to reduce the required memory necessary for input split calculations. Ability to disable FileInputFormat listLocatedStatus optimization to save client memory --- Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809542#comment-13809542 ] Jason Lowe commented on MAPREDUCE-5603: --- Sample OOM backtrace for reference: {noformat} Exception in thread main java.io.IOException: Failed on local exception: java.io.IOException: Error reading responses; Host Details : local host is: x/x.x.x.x; destination host is: x:x; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:738) at org.apache.hadoop.ipc.Client.call(Client.java:1098) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195) at com.sun.proxy.$Proxy6.getListing(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67) at com.sun.proxy.$Proxy6.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1286) at org.apache.hadoop.hdfs.DistributedFileSystem$1.init(DistributedFileSystem.java:418) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:409) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1654) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:225) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:265) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:500) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:492) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:568) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:568) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:844) at x at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at x at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.io.IOException: Error reading responses at org.apache.hadoop.ipc.Client$Connection.run(Client.java:764) Caused by: java.lang.OutOfMemoryError: Java heap space at java.lang.AbstractStringBuilder.init(AbstractStringBuilder.java:64) at java.lang.StringBuilder.init(StringBuilder.java:97) at org.apache.hadoop.io.UTF8.readString(UTF8.java:216) at org.apache.hadoop.hdfs.DeprecatedUTF8.readString(DeprecatedUTF8.java:59) at org.apache.hadoop.hdfs.protocol.DatanodeID.readFields(DatanodeID.java:212) at org.apache.hadoop.hdfs.protocol.DatanodeInfo.readFields(DatanodeInfo.java:389) at org.apache.hadoop.hdfs.protocol.LocatedBlock.readFields(LocatedBlock.java:146) at org.apache.hadoop.hdfs.protocol.LocatedBlocks.readFields(LocatedBlocks.java:223) at org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus.readFields(HdfsLocatedFileStatus.java:87) at org.apache.hadoop.hdfs.protocol.DirectoryListing.readFields(DirectoryListing.java:120) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:833) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:757) {noformat} Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5603: -- Status: Patch Available (was: Open) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory --- Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 2.2.0, 0.23.10 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-5603.patch It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5603: -- Attachment: MAPREDUCE-5603.patch Patch that adds a mapreduce.input.fileinputformat.uselocatedstatus config to control whether the listLocatedStatus optimization is enabled. The property defaults to true. Ability to disable FileInputFormat listLocatedStatus optimization to save client memory --- Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-5603.patch It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809558#comment-13809558 ] Hadoop QA commented on MAPREDUCE-5603: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611157/MAPREDUCE-5603.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4156//console This message is automatically generated. Ability to disable FileInputFormat listLocatedStatus optimization to save client memory --- Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-5603.patch It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-3860: --- Attachment: MAPREDUCE-3860--n4.patch Jonathan, The logs don't provide much info on why tests fail. Per your description it seems that the tests hang indefinitely, so probably printing thread dumps on test timeouts would help. I'm attaching a patch which modifyis Rumen's pom.xml by adding a JUnit listener that prints thread dumps. I could not reproduce any failures in Rumen tests, tried to use 4 different machines (osx, centos, fedora on h/w nodes, and rhel on a VM). Please reproduce the failures in your environment one more time and attach Console output of Maven and all Surefire logs (not just *-output.txt). Thanks for working on this. [Rumen] Bring back the removed Rumen unit tests --- Key: MAPREDUCE-3860 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Reporter: Ravi Gummadi Assignee: Andrey Klochkov Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, MAPREDUCE-3860.patch, org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, rumen-test-data.tar.gz MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need to be brought back: TestZombieJob.java TestRumenJobTraces.java TestRumenFolder.java TestRumenAnonymization.java TestParsedLine.java TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809577#comment-13809577 ] Andrey Klochkov commented on MAPREDUCE-3860: Also, it could be that the timeouts I set in the tests are still too low for you, if your machine is that slow. Can you increase them by up to an order of magnitude to check that? [Rumen] Bring back the removed Rumen unit tests --- Key: MAPREDUCE-3860 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Reporter: Ravi Gummadi Assignee: Andrey Klochkov Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, MAPREDUCE-3860.patch, org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, rumen-test-data.tar.gz MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need to be brought back: TestZombieJob.java TestRumenJobTraces.java TestRumenFolder.java TestRumenAnonymization.java TestParsedLine.java TestConcurrentRead.java -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-4980: --- Attachment: MAPREDUCE-4980--n8.patch Attaching rebased patch. Parallel test execution of hadoop-mapreduce-client-core --- Key: MAPREDUCE-4980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Andrey Klochkov Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.patch The maven surefire plugin supports parallel testing feature. By using it, the tests can be run more faster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809611#comment-13809611 ] Hadoop QA commented on MAPREDUCE-4980: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611165/MAPREDUCE-4980--n8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 125 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4157//console This message is automatically generated. Parallel test execution of hadoop-mapreduce-client-core --- Key: MAPREDUCE-4980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Andrey Klochkov Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.patch The maven surefire plugin supports parallel testing feature. By using it, the tests can be run more faster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809636#comment-13809636 ] Andrey Klochkov commented on MAPREDUCE-4980: The build failed due to OOM while processing native code. Not related to the patch. Parallel test execution of hadoop-mapreduce-client-core --- Key: MAPREDUCE-4980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Andrey Klochkov Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.patch The maven surefire plugin supports parallel testing feature. By using it, the tests can be run more faster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809641#comment-13809641 ] Hadoop QA commented on MAPREDUCE-5601: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4158//console This message is automatically generated. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809680#comment-13809680 ] Sandy Ryza commented on MAPREDUCE-5601: --- The patch compiles fine for me locally. The failure seems to be some sort of javah issue that I've seen in other builds as well. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809703#comment-13809703 ] Hadoop QA commented on MAPREDUCE-5601: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4159//console This message is automatically generated. ShuffleHandler fadvises file regions as DONTNEED even when fetch fails -- Key: MAPREDUCE-5601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch When a reducer initiates a fetch request, it does not know whether it will be able to fit the fetched data in memory. The first part of the response tells how much data will be coming. If space is not currently available, the reduce will abandon its request and try again later. When this occurs, the ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the next time it's asked for, it will definitely be read from disk, even if it happened to be in the page cache before the request. I noticed this when trying to figure out why my job was doing so much more disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk reads went to nearly 0 on machines that had enough memory to fit map outputs into the page cache. I then straced the NodeManager and noticed that there were over four times as many fadvise DONTNEED calls as map-reduce pairs. Further logging showed the same map outputs being fetched about this many times. This is a regression from MR1, which only did the fadvise DONTNEED after all the bytes were transferred. -- This message was sent by Atlassian JIRA (v6.1#6144)