Hi, As Martijn mentioned, snapshot ownership in 1.15 is the best way. You say there are just 24000/100000 references in a shared directory in a job. Is your case in the scope of [1] ? If right, I think it works if you could check the _metadata and find some files not referenced. And I suggest you also check the created timestamp of files to make sure deletion safely.
[1] https://issues.apache.org/jira/browse/FLINK-24852 On Fri, Nov 25, 2022 at 6:02 PM Evgeniy Lyutikov <eblyuti...@avito.ru> wrote: > Thanks for the answer > We can't update flink to version 1.15 yet. > I'm interested in restoring from a checkpoint, theoretically, only those > sst files that are mentioned in _metadata or something else are enough? > Can I just delete files that are not referenced in _metadata? > > ------------------------------ > *От:* Martijn Visser <martijnvis...@apache.org> > *Отправлено:* 25 ноября 2022 г. 16:15:45 > *Кому:* Evgeniy Lyutikov > *Копия:* user > *Тема:* Re: Safe way to clear old checkpoint data > > Hi, > > I would recommend upgrading to Flink 1.15, given the changes that were > made in 1.15 make ownership more understandable. See > https://flink.apache.org/2022/05/06/restore-modes.html > <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2022%2F05%2F06%2Frestore-modes.html&data=05%7C01%7Ceblyutikov%40avito.ru%7C03d6a3ebfef64a69562d08dacec5abad%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638049645635457760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8k1fbhEkTBOP3MvXBMP97LzDqo7oRFrxYG7Y3lMFeBg%3D&reserved=0> > > Best regards, > > Martijn > > On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov <eblyuti...@avito.ru> > wrote: > >> Hello >> We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint >> data store in s3 bucket. >> >> If parse _metadata file of checkpoint it contains links to objects in >> the shared directory and their number is much less than the total number >> of objects in the directory. >> >> For example, the number of links in _metadata file is 24000, and the >> total number of objects in shared directory is about 100000. What is the >> safest way to delete unused files and free up space? >> >> * ------------------------------ *“This message contains confidential >> information/commercial secret. If you are not the intended addressee of >> this message you may not copy, save, print or forward it to any third party >> and you are kindly requested to destroy this message and notify the sender >> thereof by email. >> Данное сообщение содержит конфиденциальную информацию/информацию, >> являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом >> данного сообщения, Вы не вправе копировать, сохранять, печатать или >> пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и >> уведомить об этом отправителя электронным письмом.” >> > -- Best, Hangxiang.