RollingSink - question on a failure scenario

2016-06-29 Thread vpra...@gmail.com
Hi, Is there a chance of data loss if there is a failure between the checkpoint completion and when "notifyCheckpointComplete" is invoked. The pending files are moved to final state in the "notifyCheckpointComplete" method. So if there is a failure in this method or just before the method is

Parameters to Control Intra-node Parallelism

2016-06-29 Thread Saliya Ekanayake
Hi, We are trying to scale some of our scientific applications written in Flink. A few questions on tuning Flink performance. 1. What parameters are available to control parallelism within a node? 2. Does Flink support shared memory-based messaging within a node (without doing TCP calls)? 3. Is

Checkpointing very large state in RocksDB?

2016-06-29 Thread Daniel Li
When RocksDB holds a very large state, is there a concern over the time takes in checkpointing the RocksDB data to HDFS? Is asynchronous checkpointing a recommended practice here? https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/state_backends.html "The RocksDBStateBackend

Switch to skip the stream alignment during a checkpoint?

2016-06-29 Thread Daniel Li
I am reading Stream Checkpointing doc below. But somehow couldn't find that "switch" in any other Apache Flink docs. Has anyone of you tried this switch? https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html "Flink has a switch to skip the stream alignment

Flink on YARN - how to resize a running cluster?

2016-06-29 Thread Josh
I'm running a Flink cluster as a YARN application, started by: ./bin/yarn-session.sh -n 4 -jm 2048 -tm 4096 -d There are 2 worker nodes, so each are allocated 2 task managers. There is a stateful Flink job running on the cluster with a parallelism of 2. If I now want to increase the number of

Re: Optimizations not performed - please confirm

2016-06-29 Thread Fabian Hueske
Yes, that was my fault. I'm used to auto reply-all on my desktop machine, but my phone just did a simple reply. Sorry for the confusion, Fabian 2016-06-29 19:24 GMT+02:00 Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr>: > Thank you, Aljoscha! > I received a similar update from Fabian,

Re: Documentation for translation of Job graph to Execution graph

2016-06-29 Thread Bajaj, Abhinav
Hi Robert, Thanks for helpful reply. I have couple of follow up questions on your reply - "In general, we recommend running one JobManager per job” I understand how this can be achieved while running in Yarn, I.e. by submitting single Flink Jobs. Is their some other way of setting Flink to

How to avoid breaking states when upgrading Flink job?

2016-06-29 Thread Josh
Hi all, Is there any information out there on how to avoid breaking saved states/savepoints when making changes to a Flink job and redeploying it? I want to know how to avoid exceptions like this: java.lang.RuntimeException: Failed to deserialize state handle and setup initial operator state.

Re: Optimizations not performed - please confirm

2016-06-29 Thread Ovidiu-Cristian MARCU
Thank you, Aljoscha! I received a similar update from Fabian, only now I see the user list was not in CC. Fabian::The optimizer hasn’t been touched (except for bugfixes and new operators) for quite some time. These limitations are still present and I don’t expect them to be removed anytime

Re: How to count number of records received per second in processing time while using event time characteristic

2016-06-29 Thread Aljoscha Krettek
Hi, you can explicitly specify that you want processing-time windows like this: stream.keyBy(...).window(TumblingProcessingTimeWindows.of(Time.seconds(1))).sum(...) Also note that the timestamp you append in "writeAsCsv("records-per-second-" + System.currentTimeMillis())" will only take the

Re: Best way to read property file in flink

2016-06-29 Thread Aljoscha Krettek
Hi, could you load the properties file when starting the application and add it to the user functions so that it would be serialized along with them? This way, you wouldn't have to ship the file to each node. Cheers, Aljoscha On Wed, 29 Jun 2016 at 12:09 Janardhan Reddy

Re: maximum size of window

2016-06-29 Thread Aljoscha Krettek
Hi, the result of splitting by key is that processing can easily be distributed among the workers because the windows for individual keys can be processed independently. This should improve cluster utilization. Cheers, Aljoscha On Tue, 28 Jun 2016 at 17:26 Vishnu Viswanath

Re: Optimizations not performed - please confirm

2016-06-29 Thread Aljoscha Krettek
Hi, I think this document is still up-to-date since not much was done in these parts of the code for the 1.0 release and after that. Maybe Timo can give some insights into what optimizations are done in the Table API/SQL that will be be released in an updated version in 1.1. Cheers, Aljoscha

Re: Apache Flink 1.0.3 IllegalStateException Partiction State on chaining

2016-06-29 Thread Martin Scholl
Other than increasing the ask.timeout, we've seen such failures being caused by long GC pauses over bigger heaps. In such a case, you could fiddle with a) enabling object reuse, or b) enabling off-heap memory (i.e. taskmanager.memory.off-heap == true) to mitigate GC-induced issues a bit. Hope it

Re: Apache Flink 1.0.3 IllegalStateException Partiction State on chaining

2016-06-29 Thread Ufuk Celebi
OK, looks like you can easily give more memory to the network stack, e.g. for 2 GB set taskmanager.network.numberOfBuffers = 65536 taskmanager.network.bufferSizeInBytes = 32768 For the other exception, your logs confirm that there is something else going on. Try increasing the akka ask timeout:

Re: Apache Flink 1.0.3 IllegalStateException Partiction State on chaining

2016-06-29 Thread ANDREA SPINA
Hi Ufuk, so the memory available per node is 48294 megabytes per node, but I reserve 28 by flink conf file. taskmanager.heap.mb = 28672 taskmanager.memory.fraction = 0.7 taskmanager.network.numberOfBuffers = 32768 taskmanager.network.bufferSizeInBytes = 16384 Anyway Follows what I found in log

Re: Apache Flink 1.0.3 IllegalStateException Partiction State on chaining

2016-06-29 Thread Ufuk Celebi
Hey Andrea! Sorry for the bad user experience. Regarding the network buffers: you should be able to run it after increasing the number of network buffers, just account for it when specifying the heap size etc. You currently allocate 32768 * 16384 bytes = 512 MB for them. If you have a very long

Apache Flink 1.0.3 IllegalStateException Partiction State on chaining

2016-06-29 Thread ANDREA SPINA
Hi everyone, I am running some Flink experiments with Peel benchmark http://peel-framework.org/ and I am struggling with exceptions: the environment is a 25-nodes cluster, 16 cores per nodes. The dataset is ~80GiB and is located on Hdfs 2.7.1. Flink version is 1.0.3. At the beginning I tried

Best way to read property file in flink

2016-06-29 Thread Janardhan Reddy
We are running multiple flink jobs inside a yarn session. For each flink job we have a separate property file. We are copying the property files to each node in the cluster before submitting the job. Is there a better way to read the properties file? Can we read it from hdfs or s3. Do we need

Re: Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-29 Thread ANDREA SPINA
Hi, the problem was solved after I figured out there was an istance of Flink TaskManager running on a node of the cluster. Thank you, Andrea 2016-06-28 12:17 GMT+02:00 ANDREA SPINA <74...@studenti.unimore.it>: > Hi Max, > thank you for the fast reply and sorry: I use flink-1.0.3. > Yes I tested