[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075607#comment-14075607
 ] 

Hudson commented on MAPREDUCE-6002:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #625 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/625/])
MAPREDUCE-6002. Made MR task avoid reporting error to AM when the task process 
is shutting down. Contributed by Wangda Tan. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613743)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java


 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075628#comment-14075628
 ] 

Hudson commented on MAPREDUCE-6002:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1817 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1817/])
MAPREDUCE-6002. Made MR task avoid reporting error to AM when the task process 
is shutting down. Contributed by Wangda Tan. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613743)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java


 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075636#comment-14075636
 ] 

Hudson commented on MAPREDUCE-6002:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1844 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1844/])
MAPREDUCE-6002. Made MR task avoid reporting error to AM when the task process 
is shutting down. Contributed by Wangda Tan. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613743)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java


 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075523#comment-14075523
 ] 

Hudson commented on MAPREDUCE-6002:
---

FAILURE: Integrated in Hadoop-trunk-Commit #5977 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5977/])
MAPREDUCE-6002. Made MR task avoid reporting error to AM when the task process 
is shutting down. Contributed by Wangda Tan. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1613743)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java


 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074185#comment-14074185
 ] 

Zhijie Shen commented on MAPREDUCE-6002:


TaskUmbilicalProtocol#fsError and #fatalError are the two calls that will 
result in TA_FAILMSG, and consequently move a task attempt to failure. Checking 
whether the task attempt process is stopping and only notifying the listener 
when the process is NOT stopping can prevent the task attempt being moved to 
FAILED because of the exception caused by stopping a process, such as the 
aforementioned case.

In general, the solution makes sense to me. Just one concern: it may result in 
another race condition on the contradictory. For example, an exception which is 
NOT caused by stopping the task attempt process happens MERELY before the 
shutdown hook is invoked. Then, when we check whether the task attempt process 
is stopping, it already returns true. In this extreme case, the exception is 
going to be missed by the listener, and the task attempt is moved to PREEMPTED 
instead of FAILED.

While marking a TA that is supposed to PREEMPTED as FAILED and vice versa are 
the rare cases, IMHO, they have different levels of down side. Marking a TA 
that is supposed to PREEMPTED as FAILED is likely to make the task not be able 
to retry. IMHO, On the other side, marking a TA that is supposed to FAILED as 
PREEMPTED will make the attempt retry even it used up the retry quota, which is 
not too bad. Offering users more what the are promised sounds better than 
offering less. Any thoughts?

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074228#comment-14074228
 ] 

Wangda Tan commented on MAPREDUCE-6002:
---

Hi [~zjshen],
Thanks for review
I totally agree with you, I think we should ignore the extremely race condition 
you mentioned too, since we don't deprive its right to retry :)

Wangda

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074398#comment-14074398
 ] 

Jason Lowe commented on MAPREDUCE-6002:
---

bq. In this extreme case, the exception is going to be missed by the listener, 
and the task attempt is moved to PREEMPTED instead of FAILED.

This will only be true if the task was trying to be preempted *as* it failed, 
correct?   The AM will see the container completion event from the RM, and 
since the attempt didn't explicitly report a completion status it will key off 
the container status code to determine the attempt's fate.  If the attempt 
really happened to fail independently just as it was being preempted then 
that's a race we can live with either way, IMHO.  The thing we don't want is to 
have the attempt fail _because_ of a preemption or task-kill, so I think it 
will be safe to squelch errors that are occurring during shutdown.

I think the biggest issue will be if an error in the task attempt causes the 
entire JVM to start shutting down before the error is reported via the 
umbilical (e.g.: the user code calls System.exit on an error).  The good news 
is that the task attempt will still end up in the FAILED state but any useful, 
context-specific error messages from the attempt will not be reported via the 
umbilical.  The AM will only know that the task attempt exited without saying 
why.  I suspect this is a rare situation when it occurs, probably correctable 
in the user's code in many of those cases, and the attempt logs should be able 
to sort things out if it does occur.

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074592#comment-14074592
 ] 

Zhijie Shen commented on MAPREDUCE-6002:


Thanks for your feedback, [~jlowe]!

bq. This will only be true if the task was trying to be preempted as it failed, 
correct? The AM will see the container completion event from the RM, and since 
the attempt didn't explicitly report a completion status it will key off the 
container status code to determine the attempt's fate. If the attempt really 
happened to fail independently just as it was being preempted then that's a 
race we can live with either way, IMHO. The thing we don't want is to have the 
attempt fail because of a preemption or task-kill, so I think it will be safe 
to squelch errors that are occurring during shutdown.

Exactly. This was the point I'd like to make, and the patch is actually solving 
the problem in this way.

bq. The good news is that the task attempt will still end up in the FAILED 
state but any useful,

Isn't it possible that PREEMPTED from RM still comes before AM knows the task 
attempt FAILED? Say preemption logic has already happened on RM, and the 
completed container status has already be sent to AM, but NM hasn't notified RM 
and PingChecker hasn't found it. Anyway, it is still safe, because it doesn't 
break the agreement that we don't want is to have the attempt fail because of a 
preemption or task-kill.

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075183#comment-14075183
 ] 

Wangda Tan commented on MAPREDUCE-6002:
---

Jason, thanks for your comments,
bq. I suspect this is a rare situation when it occurs, probably correctable in 
the user's code in many of those cases, and the attempt logs should be able to 
sort things out if it does occur.
I agree, in normal failure, no matter what kind of exception throw, YarnChild 
should be able to catch them and report to AM. In some rare cases, if some 
error cause JVM starting shutdown before reporting to AM, it cannot 
successfully report to AM in a big chance even if we don't change this.

To Zhijie,
bq. Isn't it possible that PREEMPTED from RM still comes before AM knows the 
task attempt FAILED?
I think what Jason mentioned is another case: there's no preemption happens, 
it's a failure happens in TA side, and JVM shutdown happens before TA can 
report such error to AM.

Thanks,
Wangda

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-25 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075253#comment-14075253
 ] 

Zhijie Shen commented on MAPREDUCE-6002:


+1 for the patch. I will commit the patch late tomorrow, given [~jlowe] some 
time, if you'd like to look into the patch as well.

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072957#comment-14072957
 ] 

Hadoop QA commented on MAPREDUCE-6002:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657552/MR-6002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4764//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4764//console

This message is automatically generated.

 MR task should prevent report error to AM when process is shutting down
 ---

 Key: MAPREDUCE-6002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: MR-6002.patch


 With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
 But it is still possible a MR task fail and report to AM when preemption take 
 effect and the AM hasn't received completed container from RM yet. It will 
 cause the task attempt marked failed instead of preempted.
 An example is FileSystem has shutdown hook, it will close all FileSystem 
 instance, if at the same time, the FileSystem is in-use (like reading split 
 details from HDFS), MR task will fail and report the fatal error to MR AM. An 
 exception will be raised:
 {code}
 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: 
 Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
   at java.io.DataInputStream.readByte(DataInputStream.java:265)
   at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
   at 
 org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
   at org.apache.hadoop.io.Text.readString(Text.java:464)
   at org.apache.hadoop.io.Text.readString(Text.java:457)
   at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {code}
 We should prevent this, because it is possible other exceptions happen when 
 shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)