[ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066331#comment-15066331
 ] 

tangshangwen commented on YARN-4324:
------------------------------------

i found this message in the jstack,is a  JDK epollWait bug?:

"IPC Client (2118999553) connection to RMHost:8030 from UserName" daemon 
prio=10 tid=0x00007f298c664000 nid=0x6e2d runnable [0x00007f297d9a8000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x0000000785182940> (a sun.nio.ch.Util$2)
        - locked <0x0000000785182930> (a java.util.Collections$UnmodifiableSet)
        - locked <0x0000000785182718> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:457)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        - locked <0x000000078023aa40> (a java.io.BufferedInputStream)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)



> AM hang more than 10 min was kill by RM
> ---------------------------------------
>
>                 Key: YARN-4324
>                 URL: https://issues.apache.org/jira/browse/YARN-4324
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: tangshangwen
>         Attachments: logs.rar, yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT                  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to