Hi,

As far as I know when uploading a file to S3, the writer needs to first
create some temporary files on the local disks. I would suggest to double
check all of the partitions on the local machine and monitor available disk
space continuously while the job is running. If you are just checking the
free space manually, you can easily miss a point of time when you those
temporary files are too big and approaching the available disk space usage,
as I'm pretty sure those temporary files are cleaned up immediately after
throwing this exception that you see.

Piotrek

czw., 20 sie 2020 o 00:56 Vishwas Siravara <vsirav...@gmail.com> napisaƂ(a):

> Hi guys,
> I have a deduplication job that runs on flink 1.7, that has some state
> which uses FsState backend. My TM heap size is 16 GB. I see the below error
> while trying to checkpoint a state of size 2GB. There is enough space
> available in s3, I tried to upload larger files and they were all
> successful. There is also enough disk space in the local file system, the
> disk utility tool does not show anything suspicious. Whenever I try to
> start my job from the last successful checkpoint , it runs into the same
> error. Can someone tell me what is the cause of this issue? Many thanks.
>
>
> Note: This error goes away when I delete io.tmp.dirs and restart the job
> from last checkpoint , but the disk utility tool does not show much usage
> before deletion, so I am not able to figure out what the problem is.
>
> 2020-08-19 21:12:01,909 WARN
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory - Could
> not close the state stream for s3p://featuretoolkit.c
> heckpoints/dev_dedup/9b64aafadcd6d367cfedef84706abcba/chk-189/f8e668dd-8019-4830-ab12-d48940ff5353.
> 1363 java.io.IOException: No space left on device
> 1364 at java.io.FileOutputStream.writeBytes(Native Method)
> 1365 at java.io.FileOutputStream.write(FileOutputStream.java:326)
> 1366 at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 1367 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> 1368 at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
> 1369 at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> 1370 at
> org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:986)
> 1371 at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> 1372 at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> 1373 at
> org.apache.flink.fs.s3.common.hadoop.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
> 1374 at
> org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
> 1375 at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.close(FsCheckpointStreamFactory.java:269)
> 1376 at
> org.apache.flink.runtime.state.CheckpointStreamWithResultProvider.close(CheckpointStreamWithResultProvider.java:58)
> 1377 at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263)
> 1378 at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:250)
> 1379 at
> org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:122)
> 1380 at
> org.apache.flink.runtime.state.AsyncSnapshotCallable.closeSnapshotIO(AsyncSnapshotCallable.java:185)
> 1381 at
> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:84)
> 1382 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 1383 at
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
> 1384 at
> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47)
> 1385 at
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
> 1386 at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 1387 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 1388 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 1389 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 1390 at java.lang.Thread.run(Thread.java:748)
> 1391 Suppressed: java.io.IOException: No space left on device
> 1392 at java.io.FileOutputStream.writeBytes(Native Method)
> 1393 at java.io.FileOutputStream.write(FileOutputStream.java:326)
> 1394 at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 1395 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> 1396 at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> 1397 at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> 1398 ... 21 more
>
>
> Thanks,
> Vishwas
>

Reply via email to