[jira] [Commented] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)

2020-02-14 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036788#comment-17036788
 ] 

Jungtaek Lim commented on SPARK-30586:
--

"onExecutorAdded" doesn't fill up hostPort in LiveExecutor which looks to be 
null in the stack trace. "onBlockManagerAdded" does - this seems to show one of 
possible case.

If we have event log file for the application encountered the issue, easier to 
check what's happening there.

> NPE in LiveRDDDistribution (AppStatusListener)
> --
>
> Key: SPARK-30586
> URL: https://issues.apache.org/jira/browse/SPARK-30586
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: A Hadoop cluster consisting of Centos 7.4 machines.
>Reporter: Jan Van den bosch
>Priority: Major
>
> We've been noticing a great amount of NullPointerExceptions in our 
> long-running Spark job driver logs:
> {noformat}
> 20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an 
> exception
> java.lang.NullPointerException
> at 
> org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
> at 
> org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
> at 
> org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
> at 
> org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
> at 
> org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
> at 
> org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
> at 
> org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
> at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
> at 
> org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
> at 
> org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
> at 
> org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
> at 
> org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
> at 
> org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
> at 
> org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
> at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
> at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
> at 
> 

[jira] [Commented] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)

2020-02-13 Thread Saisai Shao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036734#comment-17036734
 ] 

Saisai Shao commented on SPARK-30586:
-

We also met the same issue. Seems like the code doesn't check the nullable of 
string and directly called String intern, which throws NPE from guava. My first 
thinking is to add nullable check in {{weakIntern}}. Still investigating how 
this could be happened, might be due to the lost or out-of-order spark listener 
event.

> NPE in LiveRDDDistribution (AppStatusListener)
> --
>
> Key: SPARK-30586
> URL: https://issues.apache.org/jira/browse/SPARK-30586
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: A Hadoop cluster consisting of Centos 7.4 machines.
>Reporter: Jan Van den bosch
>Priority: Major
>
> We've been noticing a great amount of NullPointerExceptions in our 
> long-running Spark job driver logs:
> {noformat}
> 20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an 
> exception
> java.lang.NullPointerException
> at 
> org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
> at 
> org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
> at 
> org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
> at 
> org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
> at 
> org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
> at 
> org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
> at 
> org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
> at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
> at 
> org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
> at 
> org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
> at 
> org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
> at 
> org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
> at 
> org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
> at 
> org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
> at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
> at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
> at 
>