[ 
https://issues.apache.org/jira/browse/FLINK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-23354:
-------------------------------

    Assignee: Zhilong Hong

> Limit the size of ShuffleDescriptors in PermanentBlobCache on TaskExecutor
> --------------------------------------------------------------------------
>
>                 Key: FLINK-23354
>                 URL: https://issues.apache.org/jira/browse/FLINK-23354
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Zhilong Hong
>            Assignee: Zhilong Hong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> _This is the part 3 of the optimization related to task deployments. For more 
> details about the overall description and the part 1, please see FLINK-23005. 
> For more details about the part 2 please see FLINK-23218._
> Currently a TaskExecutor uses BlobCache to cache the blobs transported from 
> JobManager. The caches are the local file stored on the TaskExecutor. The 
> blob cache will not be cleaned up until one hour after the related job is 
> finished. In FLINK-23218, we are going to distribute the cached 
> ShuffleDescriptors via blob. When large amount of failovers happen, there 
> will be a lot of cache stored on local disk. The blob cache will occupy large 
> amount of disk space. In extreme cases, the blob would blow up the disk space.
> So we need to add a limit size for the ShuffleDescriptors stored in 
> PermanentBlobCache on TaskExecutor, as described in the comments of 
> FLINK-23218. The main idea is to add a size limit and and delete the blobs in 
> LRU order if the size limit is exceeded. Before a blob item is cached, 
> TaskExecutor will firstly check the overall size of cache. If the overall 
> size exceeds the limit, the blob will be deleted in LRU order until the limit 
> is not exceeded anymore. For the blob cache that is deleted, if it is used 
> afterwards, it will be downloaded from the HA or the blob server again.
> The default value of the size limit for the ShuffleDescriptors in 
> PermanentBlobCache on TaskExecutor will be 100 MiB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to