Re: job history server

2020-02-18 Thread Richard Moorhead
2020-02-18 09:44:45,227 ERROR
org.apache.flink.runtime.webmonitor.hist/ry.HistoryServerArchiveFetcher  -
Failure while fetching/process
ing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/062e4d80ed1d4bdafd24e46
2245c5926/subtasks/86/attempts/0.json: No space left on device

and there it is:

42103b5b-5410-d2d8-6a0b-21757e4a0fbc ~
0 % df -iH
Filesystem   Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg00-rootlv00
   132k   13k  119k   10% /
tmpfs   `  508k  465k   43k   92% /dev/shm

Thanks for the tip.

On Mon, Feb 17, 2020 at 8:08 PM Richard Moorhead 
wrote:

> I did not know that.
>
> I have since wiped the directory. I will post when I see this error again.
>
> On Mon, Feb 17, 2020 at 8:03 PM Benchao Li  wrote:
>
>> `df -H` only gives the sizes, not inodes information. Could you also show
>> us the result of `df -iH`?
>>
>> Richard Moorhead  于2020年2月18日周二 上午9:40写道:
>>
>>> Yes, I did. I mentioned it last but I should have been clearer:
>>>
>>> 22526:~/ $ df -H
>>>
>>>
>>>  [18:15:20]
>>> FilesystemSize  Used Avail Use% Mounted on
>>> /dev/mapper/vg00-rootlv00
>>>   2.1G  777M  1.2G  41% /
>>> tmpfs 2.1G  753M  1.4G  37% /dev/shm
>>>
>>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:
>>>
 Hi Richard,

 Have you checked that inodes of the disk partition were full or not?

 Richard Moorhead |richard.moorh...@gmail.com> 于2020年2月18日周二 上午8:16写道:

> I see the following exception often:
>
> 2020-02-17 18:13:26,796 ERROR
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
> Failure while fetching/processing job archive for job
> eaf0639027aca1624adaa100bdf1332e.
> java.nio.file.FileSystemException:
> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6ab&3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
> No space left on device
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
> at java.nio.file.Files.createDirectory(Files.java:674)
> at
> java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
J> at java.nio.file.Files.createDirectories(Files.java:767)
> at
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Unfortunately the partition listed does not appear to be full or
> anywhere near full?
>
> Is there ! workaround to this?
>
>

 --

 Benchao Li
 School of Electronics Engineering and Computer Science, Peking University
 Tel:+86-15650713730
 Email: libenc...@gmail.com; libenc...@pku.edu.cn


>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>>


Re: job history server

2020-02-17 Thread Richard Moorhead
I did not know that.

I have since wiped the directory. I will post when I see this error again.

On Mon, Feb 17, 2020 at 8:03 PM Benchao Li  wrote:

> `df -H` only gives the sizes, not inodes information. Could you also show
> us the result of `df -iH`?
>
> Richard Moorhead  于2020年2月18日周二 上午9:40写道:
>
>> Yes, I did. I mentioned it last but I should have been clearer:
>>
>> 22526:~/ $ df -H
>>
>>
>>  [18:15:20]
>> FilesystemSize  Used Avail Use% Mounted on
>> /dev/mapper/vg00-rootlv00
>>   2.1G  777M  1.2G  41% /
>> tmpfs 2.1G  753M  1.4G  37% /dev/shm
>>
>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:
>>
>>> Hi Richard,
>>>
>>> Have you checked that inodes of the disk partition were full or not?
>>>
>>> Richard Moorhead  于2020年2月18日周二 上午8:16写道:
>>>
 I see the following exception often:

 2020-02-17 18:13:26,796 ERROR
 org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
 Failure while fetching/processing job archive for job
 eaf0639027aca1624adaa100bdf1332e.
 java.nio.file.FileSystemException:
 /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
 No space left on device
 at
 sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
 at
 sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
 at
 sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
 at
 sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
 at java.nio.file.Files.createDirectory(Files.java:674)
 at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
 at java.nio.file.Files.createDirectories(Files.java:767)
 at
 org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)


 Unfortunately the partition listed does not appear to be full or
 anywhere near full?

 Is there a workaround to this?


>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>


Re: job history server

2020-02-17 Thread Benchao Li
`df -H` only gives the sizes, not inodes information. Could you also show
us the result of `df -iH`?

Richard Moorhead  于2020年2月18日周二 上午9:40写道:

> Yes, I did. I mentioned it last but I should have been clearer:
>
> 22526:~/ $ df -H
>
>
>[18:15:20]
> FilesystemSize  Used Avail Use% Mounted on
> /dev/mapper/vg00-rootlv00
>   2.1G  777M  1.2G  41% /
> tmpfs 2.1G  753M  1.4G  37% /dev/shm
>
> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:
>
>> Hi Richard,
>>
>> Have you checked that inodes of the disk partition were full or not?
>>
>> Richard Moorhead  于2020年2月18日周二 上午8:16写道:
>>
>>> I see the following exception often:
>>>
>>> 2020-02-17 18:13:26,796 ERROR
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>> Failure while fetching/processing job archive for job
>>> eaf0639027aca1624adaa100bdf1332e.
>>> java.nio.file.FileSystemException:
>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>> No space left on device
>>> at
>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>> at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>> at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>> at
>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>> at
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> Unfortunately the partition listed does not appear to be full or
>>> anywhere near full?
>>>
>>> Is there a workaround to this?
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: job history server

2020-02-17 Thread Richard Moorhead
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H


   [18:15:20]
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
  2.1G  777M  1.2G  41% /
tmpfs 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:

> Hi Richard,
>
> Have you checked that inodes of the disk partition were full or not?
>
> Richard Moorhead  于2020年2月18日周二 上午8:16写道:
>
>> I see the following exception often:
>>
>> 2020-02-17 18:13:26,796 ERROR
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>> Failure while fetching/processing job archive for job
>> eaf0639027aca1624adaa100bdf1332e.
>> java.nio.file.FileSystemException:
>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>> No space left on device
>> at
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>> at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>> at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>> at
>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>> at java.nio.file.Files.createDirectory(Files.java:674)
>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>> at java.nio.file.Files.createDirectories(Files.java:767)
>> at
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Unfortunately the partition listed does not appear to be full or anywhere
>> near full?
>>
>> Is there a workaround to this?
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>


Re: job history server

2020-02-17 Thread Benchao Li
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead  于2020年2月18日周二 上午8:16写道:

> I see the following exception often:
>
> 2020-02-17 18:13:26,796 ERROR
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
> Failure while fetching/processing job archive for job
> eaf0639027aca1624adaa100bdf1332e.
> java.nio.file.FileSystemException:
> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
> No space left on device
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
> at java.nio.file.Files.createDirectory(Files.java:674)
> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
> at java.nio.file.Files.createDirectories(Files.java:767)
> at
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Unfortunately the partition listed does not appear to be full or anywhere
> near full?
>
> Is there a workaround to this?
>
>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


job history server

2020-02-17 Thread Richard Moorhead
I see the following exception often:

2020-02-17 18:13:26,796 ERROR
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
Failure while fetching/processing job archive for job
eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
No space left on device
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:767)
at
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere
near full?

Is there a workaround to this?