Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/3994#issuecomment-70359713
  
    @tnachen And slave's logs around task 34, 63. It looks like that if any 
task occurs error while running, the executor running that task is terminated. 
Check this, please.
    
    ```
    I0117 17:21:43.678827 41388 slave.cpp:625] Got assigned task 34 for 
framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:21:43.679612 41388 slave.cpp:734] Launching task 34 for framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:21:43.721297 41388 slave.cpp:844] Queuing task '34' for executor 
20141110-112437-3374320138-60030-57359-44 of framework 
'20150117-171023-3391097354-60030-7325-0004
    I0117 17:21:43.775977 41388 slave.cpp:358] Successfully attached file 
'/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee'
    I0117 17:21:43.721451 41386 mesos_containerizer.cpp:407] Starting container 
'3fdbdd09-98cd-4197-954f-d95d9b3b4aee' for executor 
'20141110-112437-3374320138-60030-57359-44' of framework 
'20150117-171023-3391097354-60030-7325-0004'
    I0117 17:21:43.777179 41386 mesos_containerizer.cpp:528] Fetching URIs for 
container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee' using command '/usr/bin/env 
MESOS_EXECUTOR_URIS="hdfs:///app/spark/spark-1.3.0-SNAPSHOT-bin-2.3.0-cdh5.0.1.tgz+0X"
 
MESOS_WORK_DIRECTORY=/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee
 HADOOP_HOME=/app/hdfs/ /app/mesos-0.18.1/libexec/mesos/mesos-fetcher'
    I0117 17:22:28.863304 41374 slave.cpp:2523] Current usage 44.85%. Max 
allowed age: 3.160841566048842days
    I0117 17:22:38.472086 41384 slave.cpp:625] Got assigned task 63 for 
framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:38.472584 41384 slave.cpp:734] Launching task 63 for framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:38.472801 41384 slave.cpp:844] Queuing task '63' for executor 
20141110-112437-3374320138-60030-57359-44 of framework 
'20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.721726 41370 slave.cpp:2475] Terminating executor 
20141110-112437-3374320138-60030-57359-44 of framework 
20150117-171023-3391097354-60030-7325-0004 because it did not register within 
1mins
    I0117 17:22:43.722038 41378 mesos_containerizer.cpp:818] Destroying 
container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee'
    I0117 17:22:43.722295 41378 slave.cpp:2052] Executor 
'20141110-112437-3374320138-60030-57359-44' of framework 
20150117-171023-3391097354-60030-7325-0004 has terminated with unknown status
    E0117 17:22:43.722744 41376 slave.cpp:2332] Failed to unmonitor container 
for executor 20141110-112437-3374320138-60030-57359-44 of framework 
20150117-171023-3391097354-60030-7325-0004: Not monitored
    I0117 17:22:43.737566 41378 slave.cpp:1669] Handling status update 
TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004 from @0.0.0.0:0
    I0117 17:22:43.737829 41378 slave.cpp:3142] Terminating task 34
    I0117 17:22:43.738701 41372 status_update_manager.cpp:315] Received status 
update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of 
framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.739341 41378 slave.cpp:1669] Handling status update 
TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004 from @0.0.0.0:0
    I0117 17:22:43.739398 41372 status_update_manager.cpp:494] Creating 
StatusUpdate stream for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.739542 41378 slave.cpp:3142] Terminating task 63
    I0117 17:22:43.739869 41372 status_update_manager.cpp:368] Forwarding 
status update TASK_LOST (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 
34 of framework 20150117-171023-3391097354-60030-7325-0004 to 
master@10.10.32.202:60030
    I0117 17:22:43.740393 41372 status_update_manager.cpp:315] Received status 
update TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of 
framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.740411 41384 slave.cpp:1789] Status update manager 
successfully handled status update TASK_LOST (UUID: 
cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.740573 41372 status_update_manager.cpp:494] Creating 
StatusUpdate stream for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.740892 41372 status_update_manager.cpp:368] Forwarding 
status update TASK_LOST (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 
63 of framework 20150117-171023-3391097354-60030-7325-0004 to 
master@10.10.32.202:60030
    I0117 17:22:43.741240 41379 slave.cpp:1789] Status update manager 
successfully handled status update TASK_LOST (UUID: 
f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.762383 41394 process.cpp:1010] Socket closed while receiving
    I0117 17:22:43.762490 41370 status_update_manager.cpp:393] Received status 
update acknowledgement (UUID: cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 
of framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.762662 41394 process.cpp:1010] Socket closed while receiving
    I0117 17:22:43.762800 41370 status_update_manager.cpp:525] Cleaning up 
status update stream for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.763167 41370 status_update_manager.cpp:393] Received status 
update acknowledgement (UUID: f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 
of framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.763293 41388 slave.cpp:1256] Status update manager 
successfully handled status update acknowledgement (UUID: 
cc571b34-e161-4ec9-bcf1-033bd967209f) for task 34 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.763332 41370 status_update_manager.cpp:525] Cleaning up 
status update stream for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.763442 41388 slave.cpp:3165] Completing task 34
    I0117 17:22:43.763651 41388 slave.cpp:1256] Status update manager 
successfully handled status update acknowledgement (UUID: 
f198e879-a762-4cce-97ff-5261cb4ff820) for task 63 of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.763836 41388 slave.cpp:3165] Completing task 63
    I0117 17:22:43.763959 41388 slave.cpp:2198] Cleaning up executor 
'20141110-112437-3374320138-60030-57359-44' of framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.764222 41374 gc.cpp:56] Scheduling 
'/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44/runs/3fdbdd09-98cd-4197-954f-d95d9b3b4aee'
 for gc 6.99999115569481days in the future
    I0117 17:22:43.764240 41388 slave.cpp:2273] Cleaning up framework 
20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.764397 41374 gc.cpp:56] Scheduling 
'/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004/executors/20141110-112437-3374320138-60030-57359-44'
 for gc 6.99999115478222days in the future
    I0117 17:22:43.764551 41371 status_update_manager.cpp:277] Closing status 
update streams for framework 20150117-171023-3391097354-60030-7325-0004
    I0117 17:22:43.764658 41374 gc.cpp:56] Scheduling 
'/data03/mesos/slaves/20141110-112437-3374320138-60030-57359-44/frameworks/20150117-171023-3391097354-60030-7325-0004'
 for gc 6.99999115088889days in the future
    I0117 17:23:21.208463 41377 launcher.cpp:120] Forked child with pid '51757' 
for container '3fdbdd09-98cd-4197-954f-d95d9b3b4aee'
    W0117 17:23:21.209522 41386 mesos_containerizer.cpp:808] Ignoring destroy 
of unknown container: 3fdbdd09-98cd-4197-954f-d95d9b3b4aee
    E0117 17:23:21.209867 41389 slave.cpp:1956] Container 
'3fdbdd09-98cd-4197-954f-d95d9b3b4aee' for executor 
'20141110-112437-3374320138-60030-57359-44' of framework 
'20150117-171023-3391097354-60030-7325-0004' failed to start: Collect failed: 
Unknown container: 3fdbdd09-98cd-4197-954f-d95d9b3b4aee
    I0117 17:23:28.864325 41384 slave.cpp:2523] Current usage 44.86%. Max 
allowed age: 3.159526880805023days
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to