sunchao commented on code in PR #45052:
URL: https://github.com/apache/spark/pull/45052#discussion_r1501212482


##########
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##########
@@ -177,15 +177,17 @@ private[spark] class HostLocalDirManager(
  * Manager running on every node (driver and executors) which provides 
interfaces for putting and
  * retrieving blocks both locally and remotely into various stores (memory, 
disk, and off-heap).
  *
- * Note that [[initialize()]] must be called before the BlockManager is usable.
+ * Note that [[initialize()]] must be called before the BlockManager is 
usable. Also, the
+ * `memoryManager` is initialized at a later stage after DriverPlugin is 
loaded, to allow the
+ * plugin to overwrite memory configurations.
  */
 private[spark] class BlockManager(
     val executorId: String,
     rpcEnv: RpcEnv,
     val master: BlockManagerMaster,
     val serializerManager: SerializerManager,
     val conf: SparkConf,
-    memoryManager: MemoryManager,
+    var memoryManager: MemoryManager,

Review Comment:
   I was able to reproduce the failure locally after many tries.
   
   So the root cause is that when a task is running, it needs to access the 
`memoryStore` in a block manager (see 
[here](86575ad0-8f6f-422f-b10c-ea920f95caf5)). The access of `memoryStore` in 
turn will call `memoryManager`, which could be null in the following scenario:
   
   1. a previous test launched a bunch of tasks which may not finish when the 
next test is started
   2. the next test started, and calls `new SparkContext` which initialize 
everything include updating the active `SparkEnv` via `SparkEnv.set`
   3. one of the left over tasks in the step 1 accesses (in a different thread 
than 2) `memoryStore` during its `run` method, which will then call 
`SparkEnv.get.memoryManager`. However, the `SparkEnv.get` method will return 
the new `SparkEnv` instance created in step 2, which, at this point may not 
have the `memoryManager` initialized yet!
   
   I'm not sure whether this is a valid case. Looking at the [`Task` 
class](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala),
 it calls `SparkEnv.get` in several places. However, in the rare cases when a 
`SparkContext` has been shut down and a new `SparkContext` is created, running 
tasks launched by the former could access `SparkEnv` created by the latter, 
which seems not valid. I think this can only happen in the local mode.
   
   My latest change doesn't fix the issue either. I guess I just got lucky in 
the unit tests.
   
   One approach I think, is not to use `SparkEnv.get` in the `Task` class, but 
rather pass in the active environment used by the `Executor`. Let me try this 
approach.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to