date:20131030

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sandy Ryza updated MAPREDUCE-5601:
--

Summary: ShuffleHandler fadvises file regions as DONTNEED even when fetch
fails (was: Fetches when reducer can't fit them result in unnecessary reads on
shuffle server)

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

Key: MAPREDUCE-5601
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. Unfortunately, this has
some consequences on the server side - it forces unnecessary disk and network
IO as the server begins to read the output data that will go nowhere. Also,
when the channel is closed, it triggers an fadvise DONTNEED that causes the
data region to be evicted from the OS page cache. Meaning that the next time
it's asked for, it will definitely be read from disk, even if it happened to
be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager noticed that
there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
The fix would be to reserve space in the reducer before fetching the data.
Currently the fetching the size of the data and fetching the actual data
happen in the same HTTP request. Fixing it would require doing these in
separate HTTP requests. Or transferring the sizes through the AM.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sandy Ryza updated MAPREDUCE-5601:
--

Description:
When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the reduce
will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk
IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk
reads went to nearly 0 on machines that had enough memory to fit map outputs
into the page cache. I then straced the NodeManager and noticed that there
were over four times as many fadvise DONTNEED calls as map-reduce pairs.
Further logging showed the same map outputs being fetched about this many times.

This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

was:
When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the reduce
will abandon its request and try again later. Unfortunately, this has some
consequences on the server side - it forces unnecessary disk and network IO as
the server begins to read the output data that will go nowhere. Also, when the
channel is closed, it triggers an fadvise DONTNEED that causes the data region
to be evicted from the OS page cache. Meaning that the next time it's asked
for, it will definitely be read from disk, even if it happened to be in the
page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk
IO in MR2 than in MR1. When I turned the fadvise stuff off, I found that disk
reads went to nearly 0 on machines that had enough memory to fit map outputs
into the page cache. I then straced the NodeManager noticed that there were
over four times as many fadvise DONTNEED calls as map-reduce pairs. Further
logging showed the same map outputs being fetched about this many times.

The fix would be to reserve space in the reducer before fetching the data.
Currently the fetching the size of the data and fetching the actual data happen
in the same HTTP request. Fixing it would require doing these in separate HTTP
requests. Or transferring the sizes through the AM.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

Key: MAPREDUCE-5601
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

Key: MAPREDUCE-5601
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: MAPREDUCE-5601.patch

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sandy Ryza updated MAPREDUCE-5601:
--

Status: Patch Available (was: Open)

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808844#comment-13808844
]

Sandy Ryza commented on MAPREDUCE-5601:
---

Attached a patch that fixes the problem by only fadvising as DONTNEED if the
Netty transfer completes successfully. With the patch applied, the average
reducer shuffle time for my job goes down from 80 seconds to 34, on par with
MR1.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808849#comment-13808849
]

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12611003/MAPREDUCE-5601.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4154//console

This message is automatically generated.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808859#comment-13808859
]

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4155//console

This message is automatically generated.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (MAPREDUCE-5602) cygwin path error

2013-10-30 Thread Amit Cahanovich (JIRA)

Amit Cahanovich created MAPREDUCE-5602:
--

 Summary: cygwin path error
 Key: MAPREDUCE-5602
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5602
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.6-alpha
 Environment: cygwin
Reporter: Amit Cahanovich


the path for a file is received wrong, due to the fact that code is not taking 
into consideration cigwyn.
/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/TaskLog.java:
  
static final String USERLOGS_DIR_NAME = userlogs;

the outcome of it is:
 C:\cygwin\home\AMITCA\hadoop-2.0.6-alpha\logs/userlogs is not a valid path



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808966#comment-13808966
 ] 

Hudson commented on MAPREDUCE-5598:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


 TestUserDefinedCounters.testMapReduceJob is flakey
 --

 Key: MAPREDUCE-5598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk, 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch


 {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
 We sometimes see it fail:
 {noformat}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
 {noformat}
 Upon investigation, the problem is that the input for the MR job in this test 
 is at {{System.getProperty(test.build.data, /tmp) + /input}}.  If an 
 earlier test wrote some files there, this test will use them as part of its 
 input.  This can cause all sorts of problems with this test because its not 
 expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808963#comment-13808963
 ] 

Hudson commented on MAPREDUCE-5596:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


 Allow configuring the number of threads used to serve shuffle connections
 -

 Key: MAPREDUCE-5596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch


 MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
 default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808996#comment-13808996
 ] 

Hudson commented on MAPREDUCE-5596:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


 Allow configuring the number of threads used to serve shuffle connections
 -

 Key: MAPREDUCE-5596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch


 MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
 default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808999#comment-13808999
 ] 

Hudson commented on MAPREDUCE-5598:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


 TestUserDefinedCounters.testMapReduceJob is flakey
 --

 Key: MAPREDUCE-5598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk, 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch


 {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
 We sometimes see it fail:
 {noformat}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
 {noformat}
 Upon investigation, the problem is that the input for the MR job in this test 
 is at {{System.getProperty(test.build.data, /tmp) + /input}}.  If an 
 earlier test wrote some files there, this test will use them as part of its 
 input.  This can cause all sorts of problems with this test because its not 
 expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809036#comment-13809036
]

Todd Lipcon commented on MAPREDUCE-5601:

Good find.

One question: could we improve this even further by having the client send a
header like Max-response-size: bytes, and then have the server avoid doing
any IO for the case where the client is going to abandon the request anyway?
Seems like we might be incurring extra seeks in some cases due to the behavior
you described above. It would be unrelated to this JIRA, just thought of it now.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809055#comment-13809055
 ] 

Hudson commented on MAPREDUCE-5596:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


 Allow configuring the number of threads used to serve shuffle connections
 -

 Key: MAPREDUCE-5596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch


 MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
 default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809058#comment-13809058
 ] 

Hudson commented on MAPREDUCE-5598:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


 TestUserDefinedCounters.testMapReduceJob is flakey
 --

 Key: MAPREDUCE-5598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk, 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch


 {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
 We sometimes see it fail:
 {noformat}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
 {noformat}
 Upon investigation, the problem is that the input for the MR job in this test 
 is at {{System.getProperty(test.build.data, /tmp) + /input}}.  If an 
 earlier test wrote some files there, this test will use them as part of its 
 input.  This can cause all sorts of problems with this test because its not 
 expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809418#comment-13809418
]

Sandy Ryza commented on MAPREDUCE-5601:
---

Was worried about that as well. But the fetcher doesn't know whether it's
going to abandon the request before it sends it.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809423#comment-13809423
]

Sandy Ryza commented on MAPREDUCE-5601:
---

Or you're saying we would pass the amount of unreserved memory remaining?

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

Jason Lowe created MAPREDUCE-5603:
-

 Summary: Ability to disable FileInputFormat listLocatedStatus 
optimization to save client memory
 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor


It would be nice if users had the option to disable the listLocatedStatus 
optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809537#comment-13809537
 ] 

Jason Lowe commented on MAPREDUCE-5603:
---

Recently we ran across a jobclient that failed with an OOM error once we 
updated the cluster to 0.23.10.  The OOM was triggered by the FileInputFormat 
listLocatedStatus optimization from MAPREDUCE-1981, as the client now caches 
the BlockLocations of all files along with the FileStatus objects it was 
caching before.  Normally the user can bump the heap size of the client to work 
around this issue.  However if a job has an input with a particularly large 
number of BlockLocations, as this job did, it would be nice if the user had the 
option to disable the optimization to reduce the required memory necessary for 
input split calculations.

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor

 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809542#comment-13809542
 ] 

Jason Lowe commented on MAPREDUCE-5603:
---

Sample OOM backtrace for reference:

{noformat}
Exception in thread main java.io.IOException: Failed on local exception:
java.io.IOException: Error reading responses; Host Details : local host is: 
x/x.x.x.x; destination host is: x:x;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:738)
at org.apache.hadoop.ipc.Client.call(Client.java:1098)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
at com.sun.proxy.$Proxy6.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
at com.sun.proxy.$Proxy6.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1286)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$1.init(DistributedFileSystem.java:418)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:409)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1654)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:225)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:265)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:500)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:492)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:568)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:568)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:844)
at x
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at x
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Error reading responses
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:764)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.init(AbstractStringBuilder.java:64)
at java.lang.StringBuilder.init(StringBuilder.java:97)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:216)
at org.apache.hadoop.hdfs.DeprecatedUTF8.readString(DeprecatedUTF8.java:59)
at 
org.apache.hadoop.hdfs.protocol.DatanodeID.readFields(DatanodeID.java:212)
at 
org.apache.hadoop.hdfs.protocol.DatanodeInfo.readFields(DatanodeInfo.java:389)
at 
org.apache.hadoop.hdfs.protocol.LocatedBlock.readFields(LocatedBlock.java:146)
at 
org.apache.hadoop.hdfs.protocol.LocatedBlocks.readFields(LocatedBlocks.java:223)
at 
org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus.readFields(HdfsLocatedFileStatus.java:87)
at 
org.apache.hadoop.hdfs.protocol.DirectoryListing.readFields(DirectoryListing.java:120)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:833)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:757)
{noformat}

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory

[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5603:
--

Status: Patch Available  (was: Open)

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5603:
--

Attachment: MAPREDUCE-5603.patch

Patch that adds a mapreduce.input.fileinputformat.uselocatedstatus config to 
control whether the listLocatedStatus optimization is enabled.  The property 
defaults to true.

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809558#comment-13809558
 ] 

Hadoop QA commented on MAPREDUCE-5603:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611157/MAPREDUCE-5603.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4156//console

This message is automatically generated.

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-3860:
---

Attachment: MAPREDUCE-3860--n4.patch

Jonathan,
The logs don't provide much info on why tests fail. Per your description it 
seems that the tests hang indefinitely, so probably printing thread dumps on 
test timeouts would help. I'm attaching a patch which modifyis Rumen's pom.xml 
by adding a JUnit listener that prints thread dumps. I could not reproduce any 
failures in Rumen tests, tried to use 4 different machines (osx, centos, fedora 
on h/w nodes, and rhel on a VM). Please reproduce the failures in your 
environment one more time and attach Console output of Maven and all Surefire 
logs (not just *-output.txt). Thanks for working on this. 

 [Rumen] Bring back the removed Rumen unit tests
 ---

 Key: MAPREDUCE-3860
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Reporter: Ravi Gummadi
Assignee: Andrey Klochkov
 Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, 
 MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, 
 MAPREDUCE-3860.patch, 
 org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, 
 rumen-test-data.tar.gz


 MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
 and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
 to be brought back:
 TestZombieJob.java
 TestRumenJobTraces.java
 TestRumenFolder.java
 TestRumenAnonymization.java
 TestParsedLine.java
 TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809577#comment-13809577
 ] 

Andrey Klochkov commented on MAPREDUCE-3860:


Also, it could be that the timeouts I set in the tests are still too low for 
you, if your machine is that slow. Can you increase them by up to an order of 
magnitude to check that? 

 [Rumen] Bring back the removed Rumen unit tests
 ---

 Key: MAPREDUCE-3860
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Reporter: Ravi Gummadi
Assignee: Andrey Klochkov
 Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, 
 MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, 
 MAPREDUCE-3860.patch, 
 org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, 
 rumen-test-data.tar.gz


 MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
 and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
 to be brought back:
 TestZombieJob.java
 TestRumenJobTraces.java
 TestRumenFolder.java
 TestRumenAnonymization.java
 TestParsedLine.java
 TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-4980:
---

Attachment: MAPREDUCE-4980--n8.patch

Attaching rebased patch.

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Andrey Klochkov
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
 MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

[
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809611#comment-13809611
]

Hadoop QA commented on MAPREDUCE-4980:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12611165/MAPREDUCE-4980--n8.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 125
new or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4157//console

This message is automatically generated.

Parallel test execution of hadoop-mapreduce-client-core
---

Key: MAPREDUCE-4980
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
Project: Hadoop Map/Reduce
Issue Type: Test
Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Andrey Klochkov
Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch,
MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch,
MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch,
MAPREDUCE-4980.patch

The maven surefire plugin supports parallel testing feature. By using it, the
tests can be run more faster.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809636#comment-13809636
 ] 

Andrey Klochkov commented on MAPREDUCE-4980:


The build failed due to OOM while processing native code. Not related to the 
patch.

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Andrey Klochkov
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
 MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809641#comment-13809641
]

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4158//console

This message is automatically generated.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

[
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809680#comment-13809680
]

Sandy Ryza commented on MAPREDUCE-5601:
---

The patch compiles fine for me locally. The failure seems to be some sort of
javah issue that I've seen in other builds as well.

ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
--

When a reducer initiates a fetch request, it does not know whether it will be
able to fit the fetched data in memory. The first part of the response tells
how much data will be coming. If space is not currently available, the
reduce will abandon its request and try again later. When this occurs, the
ShuffleHandler still fadvises the file region as DONTNEED. Meaning that the
next time it's asked for, it will definitely be read from disk, even if it
happened to be in the page cache before the request.
I noticed this when trying to figure out why my job was doing so much more
disk IO in MR2 than in MR1. When I turned the fadvise stuff off, I found
that disk reads went to nearly 0 on machines that had enough memory to fit
map outputs into the page cache. I then straced the NodeManager and noticed
that there were over four times as many fadvise DONTNEED calls as map-reduce
pairs. Further logging showed the same map outputs being fetched about this
many times.
This is a regression from MR1, which only did the fadvise DONTNEED after all
the bytes were transferred.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails