[ 
https://issues.apache.org/jira/browse/SPARK-34361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-34361.
----------------------------------
    Fix Version/s: 3.1.2
                   3.2.0
         Assignee: Attila Zsolt Piros
       Resolution: Fixed

> Dynamic allocation on K8s kills executors with running tasks
> ------------------------------------------------------------
>
>                 Key: SPARK-34361
>                 URL: https://issues.apache.org/jira/browse/SPARK-34361
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.2.0, 3.1.1, 3.1.2
>            Reporter: Attila Zsolt Piros
>            Assignee: Attila Zsolt Piros
>            Priority: Major
>             Fix For: 3.2.0, 3.1.2
>
>
> There is race between executor POD allocator and cluster scheduler backend. 
> During downscaling (in dynamic allocation) we experienced a lot of killed new 
> executors with running task on them.
> The pattern in the log is the following:
> {noformat}
> 21/02/01 15:12:03 INFO ExecutorMonitor: New executor 312 has registered (new 
> total is 138)
> ...
> 21/02/01 15:12:03 INFO TaskSetManager: Starting task 247.0 in stage 4.0 (TID 
> 2079, 100.100.18.138, executor 312, partition 247, PROCESS_LOCAL, 8777 bytes)
> 21/02/01 15:12:03 INFO ExecutorPodsAllocator: Deleting 3 excess pod requests 
> (408,312,307).
> ...
> 21/02/01 15:12:04 ERROR TaskSchedulerImpl: Lost executor 312 on 
> 100.100.18.138: The executor with id 312 was deleted by a user or the 
> framework.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to