Hi,
As Martijn mentioned, snapshot ownership in 1.15 is the best way.
You say there are just 24000/100000 references in a shared directory in a
job. Is your case in the scope of [1] ?
If right,  I think it works if you could check the  _metadata and find some
files not referenced.
And I suggest you also check the created timestamp of files to make sure
deletion safely.

[1] https://issues.apache.org/jira/browse/FLINK-24852

On Fri, Nov 25, 2022 at 6:02 PM Evgeniy Lyutikov <eblyuti...@avito.ru>
wrote:

> Thanks for the answer
> We can't update flink to version 1.15 yet.
> I'm interested in restoring from a checkpoint, theoretically, only those
> sst files that are mentioned in _metadata or something else are enough?
> Can I just delete files that are not referenced in _metadata?
>
> ------------------------------
> *От:* Martijn Visser <martijnvis...@apache.org>
> *Отправлено:* 25 ноября 2022 г. 16:15:45
> *Кому:* Evgeniy Lyutikov
> *Копия:* user
> *Тема:* Re: Safe way to clear old checkpoint data
>
> Hi,
>
> I would recommend upgrading to Flink 1.15, given the changes that were
> made in 1.15 make ownership more understandable.  See
> https://flink.apache.org/2022/05/06/restore-modes.html
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2022%2F05%2F06%2Frestore-modes.html&data=05%7C01%7Ceblyutikov%40avito.ru%7C03d6a3ebfef64a69562d08dacec5abad%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638049645635457760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8k1fbhEkTBOP3MvXBMP97LzDqo7oRFrxYG7Y3lMFeBg%3D&reserved=0>
>
> Best regards,
>
> Martijn
>
> On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov <eblyuti...@avito.ru>
> wrote:
>
>> Hello
>> We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint
>> data store in s3 bucket.
>>
>> If parse _metadata file of checkpoint it contains links to objects in
>> the shared directory and their number is much less than the total number
>> of objects in the directory.
>>
>> For example, the number of links in _metadata file is 24000, and the
>> total number of objects in shared directory is about 100000. What is the
>> safest way to delete unused files and free up space?
>>
>> * ------------------------------ *“This message contains confidential
>> information/commercial secret. If you are not the intended addressee of
>> this message you may not copy, save, print or forward it to any third party
>> and you are kindly requested to destroy this message and notify the sender
>> thereof by email.
>> Данное сообщение содержит конфиденциальную информацию/информацию,
>> являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом
>> данного сообщения, Вы не вправе копировать, сохранять, печатать или
>> пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и
>> уведомить об этом отправителя электронным письмом.”
>>
>

-- 
Best,
Hangxiang.

Reply via email to