Github user SaintBacchus commented on the pull request:
https://github.com/apache/spark/pull/5663#issuecomment-96983228
@tgravescs Yeah the status ` FinalApplicationStatus.UNDEFINED` about the
application is better than Fail and Kill since the client can't know really on
the application when network shaky.
So the modify will be as this:
```scala
logError("Can't gain the status of application from Yarn because of
exception: ", e)
return (YarnApplicationState.FAILED, FinalApplicationStatus.UNDEFINED)
```
Back the suggestion as @vanzin said, I used `jstack` to catch the
process(net is back but process still wait). It showed that only the
main-thread was a non-demon thread except some java thread such as GC task.
```java
"main" prio=10 tid=0x000000000060a800 nid=0xd7ae in Object.wait()
[0x00007fe7dcfb3000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000e02b9a58> (a
org.apache.spark.scheduler.JobWaiter)
at java.lang.Object.wait(Object.java:503)
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
- locked <0x00000000e02b9a58> (a org.apache.spark.scheduler.JobWaiter)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:526)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1586)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1655)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:906)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:35)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:611)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:171)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:194)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:115)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
"VM Thread" prio=10 tid=0x0000000000665800 nid=0xd7b1 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000000620000 nid=0xd7af
runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000000622000 nid=0xd7b0
runnable
"VM Periodic Task Thread" prio=10 tid=0x00007fe7d003d800 nid=0xd7b8 waiting
on condition
JNI global references: 29
```
I don't know the deeper reason why the main thread is still waiting, but my
modify is work for this problem.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]