[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Summary: ShuffleHandler fadvises file regions as DONTNEED even when fetch 
fails  (was: Fetches when reducer can't fit them result in unnecessary reads on 
shuffle server)

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  Unfortunately, this has 
 some consequences on the server side - it forces unnecessary disk and network 
 IO as the server begins to read the output data that will go nowhere.  Also, 
 when the channel is closed, it triggers an fadvise DONTNEED that causes the 
 data region to be evicted from the OS page cache.  Meaning that the next time 
 it's asked for, it will definitely be read from disk, even if it happened to 
 be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager noticed that 
 there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 The fix would be to reserve space in the reducer before fetching the data.  
 Currently the fetching the size of the data and fetching the actual data 
 happen in the same HTTP request.  Fixing it would require doing these in 
 separate HTTP requests.  Or transferring the sizes through the AM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Description: 
When a reducer initiates a fetch request, it does not know whether it will be 
able to fit the fetched data in memory.  The first part of the response tells 
how much data will be coming.  If space is not currently available, the reduce 
will abandon its request and try again later.  When this occurs, the 
ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
next time it's asked for, it will definitely be read from disk, even if it 
happened to be in the page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk 
IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found that disk 
reads went to nearly 0 on machines that had enough memory to fit map outputs 
into the page cache.  I then straced the NodeManager and noticed that there 
were over four times as many fadvise DONTNEED calls as map-reduce pairs.  
Further logging showed the same map outputs being fetched about this many times.

This is a regression from MR1, which only did the fadvise DONTNEED after all 
the bytes were transferred.

  was:
When a reducer initiates a fetch request, it does not know whether it will be 
able to fit the fetched data in memory.  The first part of the response tells 
how much data will be coming.  If space is not currently available, the reduce 
will abandon its request and try again later.  Unfortunately, this has some 
consequences on the server side - it forces unnecessary disk and network IO as 
the server begins to read the output data that will go nowhere.  Also, when the 
channel is closed, it triggers an fadvise DONTNEED that causes the data region 
to be evicted from the OS page cache.  Meaning that the next time it's asked 
for, it will definitely be read from disk, even if it happened to be in the 
page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk 
IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found that disk 
reads went to nearly 0 on machines that had enough memory to fit map outputs 
into the page cache.  I then straced the NodeManager noticed that there were 
over four times as many fadvise DONTNEED calls as map-reduce pairs.  Further 
logging showed the same map outputs being fetched about this many times.

The fix would be to reserve space in the reducer before fetching the data.  
Currently the fetching the size of the data and fetching the actual data happen 
in the same HTTP request.  Fixing it would require doing these in separate HTTP 
requests.  Or transferring the sizes through the AM.



 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Status: Patch Available  (was: Open)

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808844#comment-13808844
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

Attached a patch that fixes the problem by only fadvising as DONTNEED if the 
Netty transfer completes successfully.  With the patch applied, the average 
reducer shuffle time for my job goes down from 80 seconds to 34, on par with 
MR1. 

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808849#comment-13808849
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611003/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4154//console

This message is automatically generated.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808859#comment-13808859
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4155//console

This message is automatically generated.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5602) cygwin path error

2013-10-30 Thread Amit Cahanovich (JIRA)
Amit Cahanovich created MAPREDUCE-5602:
--

 Summary: cygwin path error
 Key: MAPREDUCE-5602
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5602
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.6-alpha
 Environment: cygwin
Reporter: Amit Cahanovich


the path for a file is received wrong, due to the fact that code is not taking 
into consideration cigwyn.
/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/TaskLog.java:
  
static final String USERLOGS_DIR_NAME = userlogs;

the outcome of it is:
 C:\cygwin\home\AMITCA\hadoop-2.0.6-alpha\logs/userlogs is not a valid path



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808966#comment-13808966
 ] 

Hudson commented on MAPREDUCE-5598:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


 TestUserDefinedCounters.testMapReduceJob is flakey
 --

 Key: MAPREDUCE-5598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk, 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch


 {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
 We sometimes see it fail:
 {noformat}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
 {noformat}
 Upon investigation, the problem is that the input for the MR job in this test 
 is at {{System.getProperty(test.build.data, /tmp) + /input}}.  If an 
 earlier test wrote some files there, this test will use them as part of its 
 input.  This can cause all sorts of problems with this test because its not 
 expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808963#comment-13808963
 ] 

Hudson commented on MAPREDUCE-5596:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


 Allow configuring the number of threads used to serve shuffle connections
 -

 Key: MAPREDUCE-5596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch


 MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
 default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808996#comment-13808996
 ] 

Hudson commented on MAPREDUCE-5596:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


 Allow configuring the number of threads used to serve shuffle connections
 -

 Key: MAPREDUCE-5596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch


 MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
 default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808999#comment-13808999
 ] 

Hudson commented on MAPREDUCE-5598:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


 TestUserDefinedCounters.testMapReduceJob is flakey
 --

 Key: MAPREDUCE-5598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk, 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch


 {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
 We sometimes see it fail:
 {noformat}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
 {noformat}
 Upon investigation, the problem is that the input for the MR job in this test 
 is at {{System.getProperty(test.build.data, /tmp) + /input}}.  If an 
 earlier test wrote some files there, this test will use them as part of its 
 input.  This can cause all sorts of problems with this test because its not 
 expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809036#comment-13809036
 ] 

Todd Lipcon commented on MAPREDUCE-5601:


Good find.

One question: could we improve this even further by having the client send a 
header like Max-response-size: bytes, and then have the server avoid doing 
any IO for the case where the client is going to abandon the request anyway? 
Seems like we might be incurring extra seeks in some cases due to the behavior 
you described above. It would be unrelated to this JIRA, just thought of it now.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5596) Allow configuring the number of threads used to serve shuffle connections

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809055#comment-13809055
 ] 

Hudson commented on MAPREDUCE-5596:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
MAPREDUCE-5596. Allow configuring the number of threads used to serve shuffle 
connections. Contributed by Sandy Ryza (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536711)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


 Allow configuring the number of threads used to serve shuffle connections
 -

 Key: MAPREDUCE-5596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5596
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5596-1.patch, MAPREDUCE-5596.patch


 MR1 had mapreduce.tasktracker.http.threads.  MR2 always uses the Netty 
 default 2 * Runtime.availableProcessors().  We should make this configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5598) TestUserDefinedCounters.testMapReduceJob is flakey

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809058#comment-13809058
 ] 

Hudson commented on MAPREDUCE-5598:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
MAPREDUCE-5598. TestUserDefinedCounters.testMapReduceJob is flakey. Contributed 
by Robert Kanter (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1536724)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestUserDefinedCounters.java


 TestUserDefinedCounters.testMapReduceJob is flakey
 --

 Key: MAPREDUCE-5598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk, 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 3.0.0, 2.3.0, 2.2.1

 Attachments: MAPREDUCE-5598.patch, MAPREDUCE-5598.patch


 {{TestUserDefinedCounters.testMapReduceJob}} is flakey.  
 We sometimes see it fail:
 {noformat}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertTrue(Assert.java:27)
   at 
 org.apache.hadoop.mapred.TestUserDefinedCounters.testMapReduceJob(TestUserDefinedCounters.java:113)
 {noformat}
 Upon investigation, the problem is that the input for the MR job in this test 
 is at {{System.getProperty(test.build.data, /tmp) + /input}}.  If an 
 earlier test wrote some files there, this test will use them as part of its 
 input.  This can cause all sorts of problems with this test because its not 
 expecting the additional input data.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809418#comment-13809418
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

Was worried about that as well.  But the fetcher doesn't know whether it's 
going to abandon the request before it sends it.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809423#comment-13809423
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

Or you're saying we would pass the amount of unreserved memory remaining?

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-5603:
-

 Summary: Ability to disable FileInputFormat listLocatedStatus 
optimization to save client memory
 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor


It would be nice if users had the option to disable the listLocatedStatus 
optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809537#comment-13809537
 ] 

Jason Lowe commented on MAPREDUCE-5603:
---

Recently we ran across a jobclient that failed with an OOM error once we 
updated the cluster to 0.23.10.  The OOM was triggered by the FileInputFormat 
listLocatedStatus optimization from MAPREDUCE-1981, as the client now caches 
the BlockLocations of all files along with the FileStatus objects it was 
caching before.  Normally the user can bump the heap size of the client to work 
around this issue.  However if a job has an input with a particularly large 
number of BlockLocations, as this job did, it would be nice if the user had the 
option to disable the optimization to reduce the required memory necessary for 
input split calculations.

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor

 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809542#comment-13809542
 ] 

Jason Lowe commented on MAPREDUCE-5603:
---

Sample OOM backtrace for reference:

{noformat}
Exception in thread main java.io.IOException: Failed on local exception:
java.io.IOException: Error reading responses; Host Details : local host is: 
x/x.x.x.x; destination host is: x:x;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:738)
at org.apache.hadoop.ipc.Client.call(Client.java:1098)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
at com.sun.proxy.$Proxy6.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
at com.sun.proxy.$Proxy6.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1286)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$1.init(DistributedFileSystem.java:418)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:409)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1654)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:225)
at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:265)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:500)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:492)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:568)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1264)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:568)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:844)
at x
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at x
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Error reading responses
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:764)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.init(AbstractStringBuilder.java:64)
at java.lang.StringBuilder.init(StringBuilder.java:97)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:216)
at org.apache.hadoop.hdfs.DeprecatedUTF8.readString(DeprecatedUTF8.java:59)
at 
org.apache.hadoop.hdfs.protocol.DatanodeID.readFields(DatanodeID.java:212)
at 
org.apache.hadoop.hdfs.protocol.DatanodeInfo.readFields(DatanodeInfo.java:389)
at 
org.apache.hadoop.hdfs.protocol.LocatedBlock.readFields(LocatedBlock.java:146)
at 
org.apache.hadoop.hdfs.protocol.LocatedBlocks.readFields(LocatedBlocks.java:223)
at 
org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus.readFields(HdfsLocatedFileStatus.java:87)
at 
org.apache.hadoop.hdfs.protocol.DirectoryListing.readFields(DirectoryListing.java:120)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:833)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:757)
{noformat}

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 

[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5603:
--

Status: Patch Available  (was: Open)

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5603:
--

Attachment: MAPREDUCE-5603.patch

Patch that adds a mapreduce.input.fileinputformat.uselocatedstatus config to 
control whether the listLocatedStatus optimization is enabled.  The property 
defaults to true.

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809558#comment-13809558
 ] 

Hadoop QA commented on MAPREDUCE-5603:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611157/MAPREDUCE-5603.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4156//console

This message is automatically generated.

 Ability to disable FileInputFormat listLocatedStatus optimization to save 
 client memory
 ---

 Key: MAPREDUCE-5603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Minor
 Attachments: MAPREDUCE-5603.patch


 It would be nice if users had the option to disable the listLocatedStatus 
 optimization in FileInputFormat to save client memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests

2013-10-30 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-3860:
---

Attachment: MAPREDUCE-3860--n4.patch

Jonathan,
The logs don't provide much info on why tests fail. Per your description it 
seems that the tests hang indefinitely, so probably printing thread dumps on 
test timeouts would help. I'm attaching a patch which modifyis Rumen's pom.xml 
by adding a JUnit listener that prints thread dumps. I could not reproduce any 
failures in Rumen tests, tried to use 4 different machines (osx, centos, fedora 
on h/w nodes, and rhel on a VM). Please reproduce the failures in your 
environment one more time and attach Console output of Maven and all Surefire 
logs (not just *-output.txt). Thanks for working on this. 

 [Rumen] Bring back the removed Rumen unit tests
 ---

 Key: MAPREDUCE-3860
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Reporter: Ravi Gummadi
Assignee: Andrey Klochkov
 Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, 
 MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, 
 MAPREDUCE-3860.patch, 
 org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, 
 rumen-test-data.tar.gz


 MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
 and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
 to be brought back:
 TestZombieJob.java
 TestRumenJobTraces.java
 TestRumenFolder.java
 TestRumenAnonymization.java
 TestParsedLine.java
 TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests

2013-10-30 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809577#comment-13809577
 ] 

Andrey Klochkov commented on MAPREDUCE-3860:


Also, it could be that the timeouts I set in the tests are still too low for 
you, if your machine is that slow. Can you increase them by up to an order of 
magnitude to check that? 

 [Rumen] Bring back the removed Rumen unit tests
 ---

 Key: MAPREDUCE-3860
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Reporter: Ravi Gummadi
Assignee: Andrey Klochkov
 Attachments: linux-surefire-reports.tar, mac-surfire-reports.tar, 
 MAPREDUCE-3860--n2.patch, MAPREDUCE-3860--n3.patch, MAPREDUCE-3860--n4.patch, 
 MAPREDUCE-3860.patch, 
 org.apache.hadoop.tools.rumen.TestRumenAnonymization-output.txt, 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces-output.txt, 
 rumen-test-data.tar.gz


 MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
 and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
 to be brought back:
 TestZombieJob.java
 TestRumenJobTraces.java
 TestRumenFolder.java
 TestRumenAnonymization.java
 TestParsedLine.java
 TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-10-30 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-4980:
---

Attachment: MAPREDUCE-4980--n8.patch

Attaching rebased patch.

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Andrey Klochkov
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
 MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809611#comment-13809611
 ] 

Hadoop QA commented on MAPREDUCE-4980:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611165/MAPREDUCE-4980--n8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 125 
new or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4157//console

This message is automatically generated.

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Andrey Klochkov
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
 MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-10-30 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809636#comment-13809636
 ] 

Andrey Klochkov commented on MAPREDUCE-4980:


The build failed due to OOM while processing native code. Not related to the 
patch.

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Andrey Klochkov
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, 
 MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809641#comment-13809641
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4158//console

This message is automatically generated.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809680#comment-13809680
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

The patch compiles fine for me locally.  The failure seems to be some sort of 
javah issue that I've seen in other builds as well.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809703#comment-13809703
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611008/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4159//console

This message is automatically generated.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)