Hey Stefano, I would wait for Stephan's take on this, but with caught IOExceptions the hash table should properly clean up after itself and delete the file.
Can you still reproduce this problem for your use case? – Ufuk On Tue, Dec 2, 2014 at 7:07 PM, Stefano Bortoli <[email protected]> wrote: > Hi guys, > > a quite long process failed due to this No Space Left on Device exception, > but the machine disk is not full at all. > > okkam@okkam-nano-2:/opt/flink-0.8$ df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdb2 223302236 22819504 189116588 11% / > none 4 0 4 0% /sys/fs/cgroup > udev 8156864 4 8156860 1% /dev > tmpfs 1633520 524 1632996 1% /run > none 5120 0 5120 0% /run/lock > none 8167584 0 8167584 0% /run/shm > none 102400 0 102400 0% /run/user > /dev/sdb1 523248 3428 519820 1% /boot/efi > /dev/sda1 961302560 2218352 910229748 1% /media/data > cm_processes 8167584 12116 8155468 1% > /run/cloudera-scm-agent/process > > Is it possible that the temporary files were deleted 'after the problem'? > I read so, but there was no confirmation. However, it is a 256SSD disk. > Each of the 6 nodes has it. > > Here is the stack trace: > > 16:37:59,581 ERROR > org.apache.flink.runtime.operators.RegularPactTask - Error in > task code: CHAIN Join > (org.okkam.flink.maintenance.deduplication.consolidate.Join2ToGetCandidates) > -> Filter > (org.okkam.flink.maintenance.deduplication.match.SingleMatchFilterFunctionWithFlagMatch) > -> Map > (org.okkam.flink.maintenance.deduplication.match.MapToTuple3MapFunction) -> > Combine(org.apache.flink.api.java.operators.DistinctOperator$DistinctFunction) > (4/28) > java.io.IOException: The channel is erroneous. > at > org.apache.flink.runtime.io.disk.iomanager.ChannelAccess.checkErroneous(ChannelAccess.java:132) > at > org.apache.flink.runtime.io.disk.iomanager.BlockChannelWriter.writeBlock(BlockChannelWriter.java:73) > at > org.apache.flink.runtime.io.disk.iomanager.ChannelWriterOutputView.writeSegment(ChannelWriterOutputView.java:218) > at > org.apache.flink.runtime.io.disk.iomanager.ChannelWriterOutputView.nextSegment(ChannelWriterOutputView.java:204) > at > org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140) > at > org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223) > at > org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.write(AbstractPagedOutputView.java:173) > at org.apache.flink.types.StringValue.writeString(StringValue.java:808) > at > org.apache.flink.api.common.typeutils.base.StringSerializer.serialize(StringSerializer.java:68) > at > org.apache.flink.api.common.typeutils.base.StringSerializer.serialize(StringSerializer.java:28) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:95) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30) > at > org.apache.flink.runtime.operators.hash.HashPartition.insertIntoProbeBuffer(HashPartition.java:269) > at > org.apache.flink.runtime.operators.hash.MutableHashTable.processProbeIter(MutableHashTable.java:474) > at > org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:537) > at > org.apache.flink.runtime.operators.hash.BuildSecondHashMatchIterator.callWithNextKey(BuildSecondHashMatchIterator.java:106) > at > org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:148) > at > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:484) > at > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:359) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:246) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: No space left on device > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205) > at > org.apache.flink.runtime.io.disk.iomanager.SegmentWriteRequest.write(BlockChannelAccess.java:259) > at > org.apache.flink.runtime.io.disk.iomanager.IOManager$WriterThread.run(IOManager.java:636) > >
