Re: Job fails with FileNotFoundException from blobStore

Till Rohrmann Thu, 05 Feb 2015 06:29:54 -0800

Hi Robert,

thanks for the info. If the TaskManager/JobManager does not shutdown
properly, i.e. killing of the process, then it is indeed the case that the
BlobManager cannot properly remove all stored files. I don't know if this
was lately the case for you. Furthermore, the files are not directly
deleted after the job has finished. Internally there is a cleanup task
which is triggered every our and deletes all blobs which are no longer
referenced.


But we definitely have to look into it to see how we could improve this
behaviour.

Greets,

Till

On Thu, Feb 5, 2015 at 3:21 PM, Robert Waury <robert.wa...@googlemail.com>
wrote:

> I talked with the admins. The problem seemed to have been that the disk
> was full and Flink couldn't create the directory.
>
> Maybe the the error message should reflect if that is the cause.
>
> While cleaning up the disk we noticed that a lot of temporary blobStore
> files were not deleted by Flink after the job finished. This seemed to have
> caused or at least worsened the problem.
>
> Cheers,
> Robert
>
> On Thu, Feb 5, 2015 at 1:14 PM, Ufuk Celebi <u...@apache.org> wrote:
>
>> On Thu, Feb 5, 2015 at 11:23 AM, Robert Waury <
>> robert.wa...@googlemail.com> wrote:
>>
>>> Hi,
>>>
>>> I can reproduce the error on my cluster.
>>>
>>> Unfortunately I can't check whether the parent directories were created
>>> on the different nodes since I have no way of accessing them. I start all
>>> the jobs from a gateway.
>>>
>>
>> I've added a check to the directory creation (in branches release-0.8 and
>> master), which should fail with a proper error message if that is the
>> problem. If you have time to (re)deploy Flink, it would be great to know if
>> that indeed is the issue. Otherwise, we need to further investigate this.
>>
>>
>>
>

Re: Job fails with FileNotFoundException from blobStore

Reply via email to