Re: Question about exactly-once

2015-09-09 Thread Stephan Ewen
Hi! The order of tuples in stream may vary, depending on certain operations. When windows are computed on "processing time" (sometimes called "stream time"), then the result of the windowing depends on the speed of the tuple streams. There are multiple possible outcomes of the computation. Upon

Re: Flink HA mode

2015-09-09 Thread Ufuk Celebi
> On 09 Sep 2015, at 04:48, Emmanuel wrote: > > my questions is: how critical is the bootstrap ip list in masters? Hey Emmanuel, good questions. I read over the docs for this again [1] and you are right that we should make this clearer. The “masters" file is only relevant

Re: Performance Issue

2015-09-09 Thread Aljoscha Krettek
Ok, that's a special case but the system still shouldn't behave that way. The problem is that the grouped discretizer that is responsible for grouping the elements into grouped windows is keeping state for every key that it encounters. And that state is never released, ever. That's the reason for

Re: output writer

2015-09-09 Thread Fabian Hueske
Hi Michele, If I see that correctly, you are using the groupBy and groupReduce to partition and group the data. This does work, but you can do it even easier like this: ds.partitionByHash(0).sortPartition(0, Order.ASCENDING).output(yourOF); This will partition and sort the data on field 0

Re: output writer

2015-09-09 Thread Fabian Hueske
For your use case is would make more sense to partition and sort the data on the same key on which you want to partition the output files, i.e., partitioning on key1 and sorting on key3 might not help a lot. Any order is destroyed if you have to partition the data. What you can try to do is to

Re: Performance Issue

2015-09-09 Thread Stephan Ewen
Aljoscha and me are currently working on an alternative Windowing implementation. That new implementation will support out-of-order event time and release keys properly. We will hopefully have a first version to try out in a week or so... Greetings, Stephan On Wed, Sep 9, 2015 at 9:08 AM,

Re: Flink HA mode

2015-09-09 Thread Till Rohrmann
The only necessary information for the JobManager and TaskManager is to know where to find the ZooKeeper quorum to do leader election and retrieve the leader address from. This will be configured via the config parameter `ha.zookeeper.quorum`. On Wed, Sep 9, 2015 at 10:15 AM, Stephan Ewen

Adjusting number of YARN containers

2015-09-09 Thread Peter Voß
Hi, I have started a Flink YARN session using yarn-session.sh and the configuration of number of YARN container seems to be pretty static. Is it possible to have Flink adjust the number of containers depending on the actual workload. E.g. stop containers that are idle for too long and start

Flink and sbt

2015-09-09 Thread Giancarlo Pagano
Hi, I’m trying to write a simple test project using Flink streaming and sbt. The project is in scala and it’s basically the set version of the maven archetype. I’m using the latest Flink 0.10-SNAPSHOT, building it for scala 2.11. I’ve written a simple test that starts a local environment,

Re: Adjusting number of YARN containers

2015-09-09 Thread Robert Metzger
Hi, Currently, Flink does not support automatic scaling of the YARN containers. There are certainly plans to add this feature: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/add-some-new-api-to-the-scheduler-in-the-job-manager-tp7153p7448.html Adding an API for manually starting

RE: Flink HA mode

2015-09-09 Thread Fabian Hueske
Hi Emmanuel, yes Master HA is currently under development and only available in 0.10 snapshot. AFAIK, it is almost but not completely done yet. Best, Fabian On Sep 10, 2015 01:29, "Emmanuel" wrote: > is this a 0.10 snapshot feature only? I'm using 0.9.1 right now > > >