Re: Disk I/O in Flink

2017-05-18 Thread Robert Schmidtke
mAccessFile, FileChannel, ZipFile, multiple *Buffer classes >>>> for memory >>>> >>> mapped files etc., and have the same statistics: start/end of a read >>>> >>> from/write to disk, no. of bytes involved and such. I can plot >>>> these number

Re: Disk I/O in Flink

2017-04-29 Thread Robert Schmidtke
e HDFS JVMs write 1 TiB of data to disk during >>> TeraGen >>> >>> (expected) and read and write 1 TiB from and to disk during TeraSort >>> >>> (expected). >>> >>> >>> >>> Sorry for the enormous introduction, but now there's

Re: Disk I/O in Flink

2017-04-29 Thread Martin Eden
teresting part: Flink's JVMs read from and write to disk 1 TiB of >> data >> >>> each during TeraSort. I'm suspecting there is some sort of spilling >> >>> involved, potentially because I have not done the setup properly. But >> that >> >>> is

Re: Disk I/O in Flink

2017-04-24 Thread Ufuk Celebi
aSort, and there I'm not >>> missing any data, meaning my statistics agree with XFS for TeraSort on >>> Hadoop, which is why I suspect there are some cases where Flink goes to disk >>> without me noticing it. >>> >>> Therefore here finally the question: in whic

Re: Disk I/O in Flink

2017-04-18 Thread Robert Schmidtke
;> involved, so I can check my bytecode instrumentation)? This would also >> include any kind of resource distribution via HDFS/YARN I guess (like JAR >> files and I don't know what). Seeing that I'm missing an amount of data >> equal to the size of my input set I'd suspect the

Disk I/O in Flink

2017-04-07 Thread Robert Schmidtke
not sure. Maybe there is also some sort of remote I/O involved via sockets or so that I'm missing. Any hints as to where Flink might incur disk I/O are greatly appreciated! I'm also happy with doing the digging myself, once pointed to the proper packages in the Apache Flink source tree (I have done