After enabling some more logging for the storage account, I figured out the
errors correspond to 404 PathNotFound responses. My guess is the file
system checks the status of a path to see if it exists or not before
trying to write to it, in this case for _metadata files from each new
checkpoint. Seems like normal operations, so it's just unfortunate the
Azure API exposes that as continuous ClientOtherError metrics.

Regards,
Alexis.

Am Fr., 6. Okt. 2023 um 08:10 Uhr schrieb Alexis Sarda-Espinosa <
sarda.espin...@gmail.com>:

> Yes, that also works correctly, at least based on the Kafka source we use
> (we'd get an alert if it suddenly started consuming from a very old offset).
>
> Regards,
> Alexis.
>
> On Thu, 5 Oct 2023, 19:36 ramkrishna vasudevan, <ramvasu.fl...@gmail.com>
> wrote:
>
>> Sorry for the late reply. Just in case you restart the job , is it able
>> to safely use the checkpoint and get back to the checkpointed state?
>>
>> Regards
>> Ram,
>>
>> On Thu, Sep 28, 2023 at 4:46 PM Alexis Sarda-Espinosa <
>> sarda.espin...@gmail.com> wrote:
>>
>>> Hi Surendra,
>>>
>>> there are no exceptions in the logs, nor anything salient with
>>> INFO/WARN/ERROR levels. The checkpoints are definitely completing, we even
>>> set the config
>>>
>>> execution.checkpointing.tolerable-failed-checkpoints: 1
>>>
>>> Regards,
>>> Alexis.
>>>
>>> Am Do., 28. Sept. 2023 um 09:32 Uhr schrieb Surendra Singh Lilhore <
>>> surendralilh...@gmail.com>:
>>>
>>>> Hi Alexis,
>>>>
>>>> Could you please check the TaskManager log for any exceptions?
>>>>
>>>> Thanks
>>>> Surendra
>>>>
>>>>
>>>> On Thu, Sep 28, 2023 at 7:06 AM Alexis Sarda-Espinosa <
>>>> sarda.espin...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We are using ABFSS for RocksDB's backend as well as the storage dir
>>>>> required for Kubernetes HA. In the Azure Portal's monitoring insights I 
>>>>> see
>>>>> that every single operation contains failing transactions for the
>>>>> GetPathStatus API. Unfortunately I don't see any additional details, but I
>>>>> know the storage account is only used by Flink. Checkpointing isn't
>>>>> failing, but I wonder if this could be an issue in the long term?
>>>>>
>>>>> Regards,
>>>>> Alexis.
>>>>>
>>>>>

Reply via email to