Github user dragos commented on the pull request:
https://github.com/apache/spark/pull/10993#issuecomment-179251806
I'm having troubles running this with dynamic allocation. Did you test it
in that scenario?
I'm seeing disconnects from the driver, leading to
```
6/02/03 15:03:29 WARN TaskSetManager: Lost task 3.2 in stage 4.0 (TID 4015,
10.0.1.205): java.io.FileNotFoundException:
/tmp/blockmgr-f008b463-1d87-406b-b879-bae73c915907/27/shuffle_2_3_0.data.607ce66e-b528-4fc8-97e2-5028fc7b8e99
(No such file or directory)
```
In the Shuffle Service logs I see
```
16/02/03 14:58:32 DEBUG MesosExternalShuffleBlockHandler: Received
registration request from app 1521e408-d8fe-416d-898b-3801e73a8293-0119 (remote
address /10.0.1.47:52808).
16/02/03 14:58:34 INFO ExternalShuffleBlockResolver: Registered executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=4} with
ExecutorShuffleInfo{localDirs=[/tmp/blockmgr-248a584a-89b7-461a-8d8d-3363bd0f1a1b],
subDirsPerLocalDir=64, shuffleManager=sort}
16/02/03 14:58:34 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.208:42483 disconnected.
16/02/03 14:58:43 INFO ExternalShuffleBlockResolver: Registered executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=2} with
ExecutorShuffleInfo{localDirs=[/tmp/blockmgr-d9865194-5c38-46ae-bce7-de5605cbb4f6],
subDirsPerLocalDir=64, shuffleManager=sort}
16/02/03 14:58:43 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.208:42498 disconnected.
16/02/03 14:58:43 INFO ExternalShuffleBlockResolver: Registered executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=0} with
ExecutorShuffleInfo{localDirs=[/tmp/blockmgr-b8350cfd-fa2e-4a29-92c2-a88f1bec17ca],
subDirsPerLocalDir=64, shuffleManager=sort}
16/02/03 14:58:43 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.208:42499 disconnected.
16/02/03 14:59:20 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.208:42509 disconnected.
16/02/03 14:59:20 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.205:35465 disconnected.
16/02/03 14:59:20 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.205:35462 disconnected.
16/02/03 15:00:09 INFO ExternalShuffleBlockResolver: Registered executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=7} with
ExecutorShuffleInfo{localDirs=[/tmp/blockmgr-19a734ac-496a-4b7d-b304-acf16f4b5a78],
subDirsPerLocalDir=64, shuffleManager=sort}
16/02/03 15:00:09 WARN MesosExternalShuffleBlockHandler: Unknown
/10.0.1.208:42522 disconnected.
16/02/03 15:00:32 INFO MesosExternalShuffleBlockHandler: Application
1521e408-d8fe-416d-898b-3801e73a8293-0119 disconnected (address was
/10.0.1.47:52808).
16/02/03 15:00:32 INFO ExternalShuffleBlockResolver: Application
1521e408-d8fe-416d-898b-3801e73a8293-0119 removed, cleanupLocalDirs = true
16/02/03 15:00:32 INFO ExternalShuffleBlockResolver: Cleaning up executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=4}'s 1 local
dirs
16/02/03 15:00:32 INFO ExternalShuffleBlockResolver: Cleaning up executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=2}'s 1 local
dirs
16/02/03 15:00:32 INFO ExternalShuffleBlockResolver: Cleaning up executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=0}'s 1 local
dirs
16/02/03 15:00:32 INFO ExternalShuffleBlockResolver: Cleaning up executor
AppExecId{appId=1521e408-d8fe-416d-898b-3801e73a8293-0119, execId=7}'s 1 local
dirs
```
I am not sure if it's related to this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]