Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/5663#issuecomment-96983228
  
    @tgravescs Yeah the status  ` FinalApplicationStatus.UNDEFINED`  about the 
application is better than  Fail and Kill since the client can't know really on 
the application when network shaky.
    So the modify will be as this:
    ```scala
        logError("Can't gain the status of application from Yarn because of 
exception: ", e)
        return (YarnApplicationState.FAILED, FinalApplicationStatus.UNDEFINED)
    ```
    Back the suggestion as @vanzin said, I used `jstack` to catch the 
process(net is back but process still wait). It showed that only the 
main-thread was a non-demon thread except some java thread such as GC task.
    ```java
    "main" prio=10 tid=0x000000000060a800 nid=0xd7ae in Object.wait() 
[0x00007fe7dcfb3000]
       java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000e02b9a58> (a 
org.apache.spark.scheduler.JobWaiter)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
        - locked <0x00000000e02b9a58> (a org.apache.spark.scheduler.JobWaiter)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:526)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1586)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1655)
        at org.apache.spark.rdd.RDD.reduce(RDD.scala:906)
        at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:35)
        at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:611)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:171)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:115)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    
    "VM Thread" prio=10 tid=0x0000000000665800 nid=0xd7b1 runnable 
    
    "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000000620000 nid=0xd7af 
runnable 
    
    "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000000622000 nid=0xd7b0 
runnable 
    
    "VM Periodic Task Thread" prio=10 tid=0x00007fe7d003d800 nid=0xd7b8 waiting 
on condition 
    
    JNI global references: 29
    ```
    
    I don't know the deeper reason why the main thread is still waiting, but my 
modify is work for this problem. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to