date:20140724


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved MAPREDUCE-5994.


  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch, thanks Sean.

 native-task: TestBytesUtil fails
 

 Key: MAPREDUCE-5994
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5994
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-5994.txt


 This class appears to have some bugs. Two tests fail consistently on my 
 system. BytesUtil itself appears to duplicate a lot of code from guava - we 
 should probably just use the Guava functions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-5996) native-task: Rename system tests into standard directory layout


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved MAPREDUCE-5996.


  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks for reviewing, Binglin.

 native-task: Rename system tests into standard directory layout
 ---

 Key: MAPREDUCE-5996
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5996
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-5996.txt


 Currently there are a number of tests in src/java/system. This confuses IDEs 
 which think that the package should then be system.org.apache.hadoop instead 
 of just org.apache.hadoop.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6002:
--

Status: Patch Available  (was: Open)

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6002:
--

Attachment: MR-6002.patch

Attached a patch for review.

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler

2014-07-24 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated MAPREDUCE-1380:
-

 Target Version/s: 2.4.1
Affects Version/s: 2.4.1
 Hadoop Flags: Incompatible change,Reviewed
   Status: Patch Available  (was: Reopened)

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5997) native-task: Use DirectBufferPool from Hadoop Common


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-5997:
---

Attachment: mapreduce-5997.txt

Attached patch switches over to using the implementation in common.

This implementation doesn't provide a singleton, so I changed it to use a 
static instance inside InputBuffer. In order to do that, I also changed the 
returnBuffer call to happen in a new InputBuffer.close() method, and made 
InputBuffer implement Closeable. That'll make sure that findbugs properly warns 
if anyone uses InputBuffers without calling close()

 native-task: Use DirectBufferPool from Hadoop Common
 

 Key: MAPREDUCE-5997
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5997
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-5997.txt


 The native task code has its own direct buffer pool, but Hadoop already has 
 an implementation. HADOOP-10882 will move that implementation into Common, 
 and this JIRA is to remove the duplicate code and use that one instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5997) native-task: Use DirectBufferPool from Hadoop Common


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072900#comment-14072900
 ] 

Sean Zhong commented on MAPREDUCE-5997:
---

looks good to me, +1

 native-task: Use DirectBufferPool from Hadoop Common
 

 Key: MAPREDUCE-5997
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5997
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-5997.txt


 The native task code has its own direct buffer pool, but Hadoop already has 
 an implementation. HADOOP-10882 will move that implementation into Common, 
 and this JIRA is to remove the duplicate code and use that one instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned MAPREDUCE-5991:


Assignee: Binglin Chang

 native-task should not run unit tests if native profile is not enabled
 --

 Key: MAPREDUCE-5991
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Binglin Chang

 Currently, running mvn test without the 'native' profile enabled causes all 
 of the native-task tests to fail. In order to integrate to trunk, we need to 
 fix this - either using JUnit Assume commands in each test that depends on 
 native code, or disabling the tests from the pom unless -Pnative is specified



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5995) native-task: revert changes which expose Text internals


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-5995:
---

Attachment: mapreduce-5995.txt

Actually, just rebased the patch onto the tip of branch instead. Hopefully this 
one will apply properly for you, Manu.

 native-task: revert changes which expose Text internals
 ---

 Key: MAPREDUCE-5995
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5995
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-5995.txt, mapreduce-5995.txt


 The current branch has some changes to the Text writable which allow it to 
 manually set the backing array, capacity, etc. Rather than exposing these 
 internals, we should use the newly-committed facility from HADOOP-10855 to 
 implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAPREDUCE-883) harchive: Document how to unarchive


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned MAPREDUCE-883:
---

Assignee: Akira AJISAKA

 harchive: Document how to unarchive
 ---

 Key: MAPREDUCE-883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, harchive
Reporter: Koji Noguchi
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: mapreduce-883-0.patch


 I was thinking of implementing harchive's 'unarchive' feature, but realized 
 it has been implemented already ever since harchive was introduced.
 It just needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated MAPREDUCE-5991:
-

Attachment: MAPREDUCE-5991.v1.patch

Changes:
1. add hadoop-mapreduce-client-common test jar, so to remove 
additionalClasspathElements
2. remove system tests in default profile, add them in native profile
now just run mvn test succeed. note that mvn test -Pnative is still failing

 native-task should not run unit tests if native profile is not enabled
 --

 Key: MAPREDUCE-5991
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Binglin Chang
 Attachments: MAPREDUCE-5991.v1.patch


 Currently, running mvn test without the 'native' profile enabled causes all 
 of the native-task tests to fail. In order to integrate to trunk, we need to 
 fix this - either using JUnit Assume commands in each test that depends on 
 native code, or disabling the tests from the pom unless -Pnative is specified



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-883) harchive: Document how to unarchive


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-883:


  Labels: newbie  (was: )
Target Version/s: 2.6.0
  Status: Patch Available  (was: Open)

 harchive: Document how to unarchive
 ---

 Key: MAPREDUCE-883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, harchive
Reporter: Koji Noguchi
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch


 I was thinking of implementing harchive's 'unarchive' feature, but realized 
 it has been implemented already ever since harchive was introduced.
 It just needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-883) harchive: Document how to unarchive


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-883:


Attachment: MAPREDUCE-883.1.patch

Rebased for the latest trunk.

 harchive: Document how to unarchive
 ---

 Key: MAPREDUCE-883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, harchive
Reporter: Koji Noguchi
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch


 I was thinking of implementing harchive's 'unarchive' feature, but realized 
 it has been implemented already ever since harchive was introduced.
 It just needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-1380) Adaptive Scheduler

[
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072945#comment-14072945
]

Hadoop QA commented on MAPREDUCE-1380:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12630631/MAPREDUCE-1380-branch-1.2.patch
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4765//console

This message is automatically generated.

Adaptive Scheduler
--

Key: MAPREDUCE-1380
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
Project: Hadoop Map/Reduce
Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Priority: Minor
Attachments: MAPREDUCE-1380-branch-1.2.patch,
MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf

The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically
adjusts the amount of used resources depending on the performance of jobs and
on user-defined high-level business goals.
Existing Hadoop schedulers are focused on managing large, static clusters in
which nodes are added or removed manually. On the other hand, the goal of
this scheduler is to improve the integration of Hadoop and the applications
that run on top of it with environments that allow a more dynamic
provisioning of resources.
The current implementation is quite straightforward. Users specify a deadline
at job submission time, and the scheduler adjusts the resources to meet that
deadline (at the moment, the scheduler can be configured to either minimize
or maximize the amount of resources). If multiple jobs are run
simultaneously, the scheduler prioritizes them by deadline. Note that the
current approach to estimate the completion time of jobs is quite simplistic:
it is based on the time it takes to finish each task, so it works well with
regular jobs, but there is still room for improvement for unpredictable jobs.
The idea is to further integrate it with cloud-like and virtual environments
(such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't
able to meet its deadline, the scheduler automatically requests more
resources.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072957#comment-14072957
 ] 

Hadoop QA commented on MAPREDUCE-6002:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657552/MR-6002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4764//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4764//console

This message is automatically generated.

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-883) harchive: Document how to unarchive


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072973#comment-14072973
 ] 

Hadoop QA commented on MAPREDUCE-883:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657567/MAPREDUCE-883.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4766//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4766//console

This message is automatically generated.

 harchive: Document how to unarchive
 ---

 Key: MAPREDUCE-883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, harchive
Reporter: Koji Noguchi
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch


 I was thinking of implementing harchive's 'unarchive' feature, but realized 
 it has been implemented already ever since harchive was introduced.
 It just needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-6000:
---

Attachment: mapreduce-6000.txt

Uploading new rev, per above.

 native-task: Simplify ByteBufferDataReader/Writer
 -

 Key: MAPREDUCE-6000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-6000.txt, mapreduce-6000.txt


 The ByteBufferDataReader and ByteBufferDataWriter class are more complex than 
 necessary:
 - several methods related to reading/writing strings and char arrays are 
 implemented but never used by the native task code. Given that the use case 
 for these classes is limited to serializing binary data to/from the native 
 code, it seems unlikely people will want to use these methods in any 
 performance-critical space. So, let's do simpler implementations that are 
 less likely to be buggy, even if they're slightly less performant.
 - methods like readLine() are even less likely to be used. Since it's a 
 complex implementation, let's just throw UnsupportedOperationException
 - in the test case, we can use Mockito to shorten the amount of new code



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072976#comment-14072976
 ] 

Todd Lipcon commented on MAPREDUCE-6000:


I chatted offline with Sean a bit about this one, and we realized there are a 
couple issues:
- he made the point that, even if we don't internally use all of the methods, 
users might expect a DataOutput/DataInput implementation to be fully functional.
- I hadn't realized that DataInput/DataOutput use modified UTF-8, and not 
actually UTF-8. I added a new test case with a unicode cat face and found that 
my new implementation was incompatible with Java's implementation, since I was 
using real UTF8 and not modified.

But, we still need to do something like this patch, since the current code 
actually appears to copy-paste stuff from the Oracle JDK, and thus isn't 
Apache-license-compatible. So, I'll upload a new rev of the patch with the 
following changes since the previous one:

- add a test case with unicode cat face (it's outside the basic multilingual 
plane so tends to expose bugs like the above)
- instead of trying to implement the more complex methods, the reader/writer 
classes now have an internal java.io.Data{Input,Output}Stream instance. We 
delegate to these for any of the more substantial methods, so that we don't 
have to duplicate any encoding/decoding code. This is safe since those classes 
don't do any of their own buffering.

 native-task: Simplify ByteBufferDataReader/Writer
 -

 Key: MAPREDUCE-6000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-6000.txt, mapreduce-6000.txt


 The ByteBufferDataReader and ByteBufferDataWriter class are more complex than 
 necessary:
 - several methods related to reading/writing strings and char arrays are 
 implemented but never used by the native task code. Given that the use case 
 for these classes is limited to serializing binary data to/from the native 
 code, it seems unlikely people will want to use these methods in any 
 performance-critical space. So, let's do simpler implementations that are 
 less likely to be buggy, even if they're slightly less performant.
 - methods like readLine() are even less likely to be used. Since it's a 
 complex implementation, let's just throw UnsupportedOperationException
 - in the test case, we can use Mockito to shorten the amount of new code



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-5997) native-task: Use DirectBufferPool from Hadoop Common


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved MAPREDUCE-5997.


  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks for reviewing.

 native-task: Use DirectBufferPool from Hadoop Common
 

 Key: MAPREDUCE-5997
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5997
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-5997.txt


 The native task code has its own direct buffer pool, but Hadoop already has 
 an implementation. HADOOP-10882 will move that implementation into Common, 
 and this JIRA is to remove the duplicate code and use that one instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072981#comment-14072981
 ] 

Sean Zhong commented on MAPREDUCE-6000:
---

looks great, +1

 native-task: Simplify ByteBufferDataReader/Writer
 -

 Key: MAPREDUCE-6000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-6000.txt, mapreduce-6000.txt


 The ByteBufferDataReader and ByteBufferDataWriter class are more complex than 
 necessary:
 - several methods related to reading/writing strings and char arrays are 
 implemented but never used by the native task code. Given that the use case 
 for these classes is limited to serializing binary data to/from the native 
 code, it seems unlikely people will want to use these methods in any 
 performance-critical space. So, let's do simpler implementations that are 
 less likely to be buggy, even if they're slightly less performant.
 - methods like readLine() are even less likely to be used. Since it's a 
 complex implementation, let's just throw UnsupportedOperationException
 - in the test case, we can use Mockito to shorten the amount of new code



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAPREDUCE-5992) native-task test logs should not write to console


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned MAPREDUCE-5992:
--

Assignee: (was: Todd Lipcon)

 native-task test logs should not write to console
 -

 Key: MAPREDUCE-5992
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5992
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon

 Most of our unit tests are configured with a log4j.properties test resource 
 so they don't spout a bunch of output to the console. We need to do the same 
 for native-task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved MAPREDUCE-6000.


  Resolution: Fixed
Hadoop Flags: Reviewed

Thanks for the prompt review. Committed to branch.

 native-task: Simplify ByteBufferDataReader/Writer
 -

 Key: MAPREDUCE-6000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-6000.txt, mapreduce-6000.txt


 The ByteBufferDataReader and ByteBufferDataWriter class are more complex than 
 necessary:
 - several methods related to reading/writing strings and char arrays are 
 implemented but never used by the native task code. Given that the use case 
 for these classes is limited to serializing binary data to/from the native 
 code, it seems unlikely people will want to use these methods in any 
 performance-critical space. So, let's do simpler implementations that are 
 less likely to be buggy, even if they're slightly less performant.
 - methods like readLine() are even less likely to be used. Since it's a 
 complex implementation, let's just throw UnsupportedOperationException
 - in the test case, we can use Mockito to shorten the amount of new code



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5987) native-task: Unit test TestGlibCBug fails on ubuntu


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072988#comment-14072988
 ] 

Todd Lipcon commented on MAPREDUCE-5987:


Can you give some more context on the bug this is referring to? I found the 
description in the RHEL release notes:

{quote}
Prior to this update, the internal FILE offset was set incorrectly in wide 
character streams. As a consequence, the offset returned by ftell was 
incorrect. In some cases, this could result in over-writing data. This update 
modifies the ftell code to correctly set the internal FILE offset field for 
wide characters. Now, ftell and fseek handle the offset as expected.
{quote}

but best I can tell, the nativetask code never calls ftell() or fseek(). 
Running 'nm' on the libnativetask.so confirms that they aren't linked in.

 native-task: Unit test TestGlibCBug fails on ubuntu
 ---

 Key: MAPREDUCE-5987
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5987
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Sean Zhong
Assignee: Sean Zhong
Priority: Minor

 On  ubuntu12, glibc: 2.15-0ubuntu10.3, UT TestGlibCBug fails
 [ RUN  ] IFile.TestGlibCBug
 14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out
 /home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186:
  Failure
 Value of: realKey
   Actual: 1127504685
 Expected: expect[index]
 Which is: 4102672832
 [  FAILED  ] IFile.TestGlibCBug (0 ms)
 [--] 2 tests from IFile (240 ms total)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072991#comment-14072991
 ] 

Todd Lipcon commented on MAPREDUCE-5991:


+1, looks good to me. Feel free to commit!

 native-task should not run unit tests if native profile is not enabled
 --

 Key: MAPREDUCE-5991
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Binglin Chang
 Attachments: MAPREDUCE-5991.v1.patch


 Currently, running mvn test without the 'native' profile enabled causes all 
 of the native-task tests to fail. In order to integrate to trunk, we need to 
 fix this - either using JUnit Assume commands in each test that depends on 
 native code, or disabling the tests from the pom unless -Pnative is specified



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5987) native-task: Unit test TestGlibCBug fails on ubuntu

[
https://issues.apache.org/jira/browse/MAPREDUCE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073003#comment-14073003
]

Sean Zhong commented on MAPREDUCE-5987:
---

The steps to reproduce this bug:

1. allocate a small direct buffer, like 10 bytes
2. prepare a large data set in java side, suppose 1MB. And make the source data
a incremental sequence.
3. write the data, it will first try to fill direct buffer, when it is full, it
will notify native side to fetch the data, over and over.
4. In native side, check the flushed data, and make sure there are also
sequential. Ocassionally, one data element data is corrupted.
5. The test can only be reproduced when direct buffer size is extremely small.

After the Glibc update to https://rhn.redhat.com/errata/RHBA-2013-0279.html,
this no longer happens.

native-task: Unit test TestGlibCBug fails on ubuntu
---

Key: MAPREDUCE-5987
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5987
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: task
Reporter: Sean Zhong
Assignee: Sean Zhong
Priority: Minor

On ubuntu12, glibc: 2.15-0ubuntu10.3, UT TestGlibCBug fails
[ RUN ] IFile.TestGlibCBug
14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out
/home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186:
Failure
Value of: realKey
Actual: 1127504685
Expected: expect[index]
Which is: 4102672832
[ FAILED ] IFile.TestGlibCBug (0 ms)
[--] 2 tests from IFile (240 ms total)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned MAPREDUCE-5999:


Assignee: Akira AJISAKA

 Fix dead link in InputFormat javadoc
 

 Key: MAPREDUCE-5999
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.2-alpha
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie

 In 
 [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html]
  javadoc, there is a dead link 
 'mapreduce.input.fileinputformat.split.minsize'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-5999:
-

Target Version/s: 2.6.0
  Status: Patch Available  (was: Open)

 Fix dead link in InputFormat javadoc
 

 Key: MAPREDUCE-5999
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.2-alpha
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-5999.patch


 In 
 [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html]
  javadoc, there is a dead link 
 'mapreduce.input.fileinputformat.split.minsize'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-5999:
-

Attachment: MAPREDUCE-5999.patch

Attaching a patch.

 Fix dead link in InputFormat javadoc
 

 Key: MAPREDUCE-5999
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.2-alpha
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-5999.patch


 In 
 [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html]
  javadoc, there is a dead link 
 'mapreduce.input.fileinputformat.split.minsize'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5995) native-task: revert changes which expose Text internals


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073023#comment-14073023
 ] 

Manu Zhang commented on MAPREDUCE-5995:
---

looks good and passed kv test for Text at my side. +1

 native-task: revert changes which expose Text internals
 ---

 Key: MAPREDUCE-5995
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5995
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-5995.txt, mapreduce-5995.txt


 The current branch has some changes to the Text writable which allow it to 
 manually set the backing array, capacity, etc. Rather than exposing these 
 internals, we should use the newly-committed facility from HADOOP-10855 to 
 implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073039#comment-14073039
 ] 

Hadoop QA commented on MAPREDUCE-5999:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657581/MAPREDUCE-5999.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4767//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4767//console

This message is automatically generated.

 Fix dead link in InputFormat javadoc
 

 Key: MAPREDUCE-5999
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.2-alpha
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-5999.patch


 In 
 [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html]
  javadoc, there is a dead link 
 'mapreduce.input.fileinputformat.split.minsize'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5987) native-task: Unit test TestGlibCBug fails on ubuntu


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073046#comment-14073046
 ] 

Binglin Chang commented on MAPREDUCE-5987:
--

Hi Sean, I don't know why the steps in your comments is related to the test 
code?
The test code just read from a file sequentially and check the data is not 
corrupted. 

 native-task: Unit test TestGlibCBug fails on ubuntu
 ---

 Key: MAPREDUCE-5987
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5987
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Sean Zhong
Assignee: Sean Zhong
Priority: Minor

 On  ubuntu12, glibc: 2.15-0ubuntu10.3, UT TestGlibCBug fails
 [ RUN  ] IFile.TestGlibCBug
 14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out
 /home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186:
  Failure
 Value of: realKey
   Actual: 1127504685
 Expected: expect[index]
 Which is: 4102672832
 [  FAILED  ] IFile.TestGlibCBug (0 ms)
 [--] 2 tests from IFile (240 ms total)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang resolved MAPREDUCE-5991.
--

Resolution: Fixed

Committed to branch. Thanks Todd.

 native-task should not run unit tests if native profile is not enabled
 --

 Key: MAPREDUCE-5991
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Binglin Chang
 Attachments: MAPREDUCE-5991.v1.patch


 Currently, running mvn test without the 'native' profile enabled causes all 
 of the native-task tests to fail. In order to integrate to trunk, we need to 
 fix this - either using JUnit Assume commands in each test that depends on 
 native code, or disabling the tests from the pom unless -Pnative is specified



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases

2014-07-24 Thread Chengbing Liu (JIRA)

Chengbing Liu created MAPREDUCE-6003:


 Summary: Resource Estimator suggests huge map output in some cases
 Key: MAPREDUCE-6003
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.2.1
Reporter: Chengbing Liu


In some cases, ResourceEstimator can return way too large map output 
estimation. This happens when input size is not correctly calculated.

A typical case is when joining two Hive tables (one in HDFS and the other in 
HBase). The maps that process the HBase table finish first, which has a 0 
length of inputs due to its TableInputFormat. Then for a map that processes 
HDFS table, the estimated output size is very large because of the wrong input 
size, causing the map task not possible to be assigned.

There are two possible solutions to this problem:
(1) Make input size correct for each case, e.g. HBase, etc.
(2) Use another algorithm to estimate the map output, or at least make it 
closer to reality.

I prefer the second way, since the first would require all possibilities to be 
taken care of. It is not easy for some inputs such as URIs.

In my opinion, we could make a second estimation which is independent of the 
input size:
estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10

Here, multiplying by 10 makes the estimation more conservative, so that it will 
be less likely to assign it to some where not big enough.

The former estimation goes like this:
estimationA = (inputSize * completedMapOutputSize * 2.0) / completedMapInputSize

My suggestion is to take minimum of the two estimations:
estimation = min(estimationA, estimationB)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5976) native-task should not fail to build if snappy is missing


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated MAPREDUCE-5976:
--

Attachment: mapreduce-5976-v2.txt

 native-task should not fail to build if snappy is missing
 -

 Key: MAPREDUCE-5976
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5976
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Sean Zhong
 Attachments: mapreduce-5976-v2.txt, mapreduce-5976.txt


 Other native parts of Hadoop will automatically disable snappy support if 
 snappy is not present and -Drequire.snappy is not passed. native-task should 
 do the same. (right now, it fails to build if snappy is missing)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-883) harchive: Document how to unarchive

2014-07-24 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073222#comment-14073222
 ] 

Allen Wittenauer commented on MAPREDUCE-883:


+1

 harchive: Document how to unarchive
 ---

 Key: MAPREDUCE-883
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, harchive
Reporter: Koji Noguchi
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch


 I was thinking of implementing harchive's 'unarchive' feature, but realized 
 it has been implemented already ever since harchive was introduced.
 It just needs to be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5976) native-task should not fail to build if snappy is missing


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073229#comment-14073229
 ] 

Sean Zhong commented on MAPREDUCE-5976:
---

Thanks, Manu. Looks good, +1

Changes of the new patch:
1. use system provided snappy header files, remove builtin snappy header
2. Java side delegate the codec check to a native function, 
NativeRuntime.supportCompressionCodec(codecName : String) 


 native-task should not fail to build if snappy is missing
 -

 Key: MAPREDUCE-5976
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5976
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Sean Zhong
 Attachments: mapreduce-5976-v2.txt, mapreduce-5976.txt


 Other native parts of Hadoop will automatically disable snappy support if 
 snappy is not present and -Drequire.snappy is not passed. native-task should 
 do the same. (right now, it fails to build if snappy is missing)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5976) native-task should not fail to build if snappy is missing


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073231#comment-14073231
 ] 

Manu Zhang commented on MAPREDUCE-5976:
---

patch updated. Changes include 

1. checks for all compression codecs in the java side whether they have native 
support built in. 
2. remove snappy header files and include system snappy library in native code



 native-task should not fail to build if snappy is missing
 -

 Key: MAPREDUCE-5976
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5976
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Sean Zhong
 Attachments: mapreduce-5976-v2.txt, mapreduce-5976.txt


 Other native parts of Hadoop will automatically disable snappy support if 
 snappy is not present and -Drequire.snappy is not passed. native-task should 
 do the same. (right now, it fails to build if snappy is missing)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler

2014-07-24 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-1380:
---

Hadoop Flags:   (was: Incompatible change,Reviewed)

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073943#comment-14073943
 ] 

Manu Zhang commented on MAPREDUCE-5991:
---

[~tlipcon][~decster], how am I supposed to run a scenario test now ? 
I've tried mvn test -Dtest=CompressTest and mvn test -Dtest=CompressTest 
-Pnative but both failed with Cannot initialize Cluster. Please check your 
configuration for mapreduce.framework.name and the correspond server adresses 
exception. 
Removing the test-jar type of hadoop-mapreduce-client-common would work. I 
think tests depend on hadoop-mapreduce-client-common for LocalJobRunner.


 native-task should not run unit tests if native profile is not enabled
 --

 Key: MAPREDUCE-5991
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Todd Lipcon
Assignee: Binglin Chang
 Attachments: MAPREDUCE-5991.v1.patch


 Currently, running mvn test without the 'native' profile enabled causes all 
 of the native-task tests to fail. In order to integrate to trunk, we need to 
 fix this - either using JUnit Assume commands in each test that depends on 
 native code, or disabling the tests from the pom unless -Pnative is specified



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-6004) native-task should not fail to build if zlib is missing

Manu Zhang created MAPREDUCE-6004:
-

 Summary: native-task should not fail to build if zlib is missing
 Key: MAPREDUCE-6004
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6004
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Manu Zhang






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-6004) native-task should not fail to build if zlib is missing


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated MAPREDUCE-6004:
--

Description: zlib is required by Gzip. We need to check for its existence 
in build and exclude Gzip related codes when zlib is missing. similar to 
MAPREDUCE-5976

 native-task should not fail to build if zlib is missing
 ---

 Key: MAPREDUCE-6004
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6004
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Reporter: Manu Zhang

 zlib is required by Gzip. We need to check for its existence in build and 
 exclude Gzip related codes when zlib is missing. similar to MAPREDUCE-5976



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAPREDUCE-6005) native-task: fix some valgrind errors