[ 
https://issues.apache.org/jira/browse/FLINK-31139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanfei Lei reassigned FLINK-31139:
----------------------------------

    Assignee: Feifan Wang

> not upload empty state changelog file
> -------------------------------------
>
>                 Key: FLINK-31139
>                 URL: https://issues.apache.org/jira/browse/FLINK-31139
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>            Reporter: Feifan Wang
>            Assignee: Feifan Wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.16.2
>
>         Attachments: image-2023-02-20-19-51-34-397.png
>
>
> h1. Problem
> *_BatchingStateChangeUploadScheduler_* will upload many empty changelog files 
> (file size == 1  and only contains compressed flag).
> !image-2023-02-20-19-51-34-397.png|width=1062,height=188!
> These files are not referenced by any checkpoints, are not cleaned up, and 
> become more numerous as the job runs. Taking our big job as an example, 2292 
> such files were generated within 7 hours. It only takes about 4 months and 
> the number of files in the changelog directory will exceed a million.
> h1. Problem causes
> This problem is caused by *_BatchingStateChangeUploadScheduler#drainAndSave_* 
> not checking whether the task collection is empty. The data in the scheduled 
> queue may have been uploaded when the 
> _*BatchingStateChangeUploadScheduler#drainAndSave*_ method is executed.
>  
> So we should check whether the task collection is empty in 
> *_BatchingStateChangeUploadScheduler#drainAndSave_* . WDYT [~roman] , 
> [~Yanfei Lei] ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to