Appending Windowed Aggregates to Events

2017-06-23 Thread Tim Stearn
Hello All, I'm *very* new to Flink. I read through the documentation and played with some sample code, but I'm struggling to get started with my requirements. We want to use Flink to maintain windowed aggregates as part of a transaction monitoring application. These would use sliding window

Re: Looking for Contributors: Apache Flink meets Apache Mesos and DC/OS

2017-06-23 Thread Till Rohrmann
A quick addition concerning the time. The office hour will take place on the 29th of June at 10am PST. Cheers, Till On Wed, Jun 21, 2017 at 10:16 PM, Till Rohrmann wrote: > Hi, > > we are actively looking for contributors (and anyone interested) for the > Flink DC/OS

Re: Strange behavior of DataStream.countWindow

2017-06-23 Thread Fabian Hueske
If the data does not have a key (or you do not care about it) you can also use a FlatMapFunction (or ProcessFunction) with Operator State. Operator State is not bound to a key but to a parallel operator instance. Have a look at the ListCheckpointed interface and its JavaDocs. 2017-06-23 18:27

A way to purge an empty session

2017-06-23 Thread Gwenhael Pasquiers
Hello, This may be premature optimization for memory usage but here is my question : I have to do an app that will have to monitor sessions of (millions of) users. I don’t know when the session starts nor ends, nor a reasonable maximum duration. I want to have a maximum duration (timeout) of

Re: Strange behavior of DataStream.countWindow

2017-06-23 Thread Edward
So there is no way to do a countWindow(100) and preserve data locality? My use case is this: augment a data stream with new fields from DynamoDB lookup. DynamoDB allows batch get's of up to 100 records, so I am trying to collect 100 records before making that call. I have no other reason to do a

Re: Strange behavior of DataStream.countWindow

2017-06-23 Thread Fabian Hueske
No, you will lose data locality if you use keyBy(), which is the only way to obtain a KeyedStream. 2017-06-23 17:52 GMT+02:00 Edward : > Thanks, Fabian. > In this case, I could just extend your idea by creating some deterministic > multiplier of the subtask index: > >

[ANNOUNCE] Apache Flink 1.3.1 released

2017-06-23 Thread Robert Metzger
The Apache Flink community is pleased to announce the release of Apache Flink 1.3.1. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at:

Re: Strange behavior of DataStream.countWindow

2017-06-23 Thread Edward
Thanks, Fabian. In this case, I could just extend your idea by creating some deterministic multiplier of the subtask index: RichMapFunction> keyByMap = new RichMapFunction>() { public Tuple2 map(String

Re: Strange behavior of DataStream.countWindow

2017-06-23 Thread Fabian Hueske
Flink hashes the keys and computes the target partition using modulo. This works well, if you have many keys but leads to skew if the number of keys is close to the number of partitions. If you use parittionCustom, you can explicitly define the target partition, however, partitionCustom does not

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Suneel Marthi
Sorry I didn't read the whole thread. We have a similar rqmt wherein the users would like to add/update/delete CEP patterns via UX or REST api and we started discussing building a REST api for that, glad to see that this is a common ask and if there's already a community effort around this -

Gelly - bipartite graph runs vertex-centric

2017-06-23 Thread Kaepke, Marc
Hi, does Gelly provides a vertex-centric iteration on a bipartite graph? A bipartite graph is using BipartiteEdges and vertex-centric supports regular edges only. Thanks! Best, Marc

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Stefan Richter
Hi, I suggest that you simply open an issue for this in our jira, describing the improvement idea. That should be the fastest way to get this changed. Best, Stefan > Am 23.06.2017 um 15:08 schrieb Adarsh Jain : > > Hi Stefan, > > I think I found the problem, try it

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Kostas Kloudas
The rules, or patterns supported by FlinkCEP are presented in the documentation link I posted earlier. Dynamically updating these patterns, is not supported yet, but there are discussions to add this feature soon. If the rules you need are supported by the current version of FlinkCEP, then

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Sridhar Chellappa
Folks, Plenty of very good points but I see this discussion digressing from what I originally asked for. We need a dashboard to let the Business Analysts to define rules and the CEP to run them. My original question was how to solve this with Flink CEP? >From what I see, this is not a solved

Re: Strange behavior of DataStream.countWindow

2017-06-23 Thread Edward
Hi Fabian - I've tried this idea of creating a KeyedStream based on getRuntimeContext().getIndexOfThisSubtask(). However, not all target subtasks are receiving records. All subtasks have a parallelism of 12, so I have 12 source subtasks and 12 target subtasks. I've confirmed that the call to

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Kostas Kloudas
Hi all, Currently there is an ongoing effort to integrate FlinkCEP with Flink's SQL API. There is already an open FLIP for this: https://cwiki.apache.org/confluence/display/FLINK/FLIP-20%3A+Integration+of+SQL+and+CEP

Re: About nodes number on Flink

2017-06-23 Thread AndreaKinn
Hi Timo, thanks for your answer. I think my elaboration are not too much heavy so I imagine I will have no advantages to "parallelize" streams. In my mind I have this pipeline:

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Adarsh Jain
Hi Stefan, I think I found the problem, try it with a file which starts with underscore in the name like "_part-1-0.csv". While saving Flink appends a "_" to the file name however while reading at folder level it does not pick those files. Can you suggest if we can do a setting so that it does

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Suneel Marthi
FWIW, here's an old Cloudera blog about using Drools with Spark. https://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/ It should be possible to invoke Drools from Flink in a similar way (I have not tried it). It all depends on what the

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Ismaël Mejía
Hello, It is really interesting to see this discussion because that was one of the questions on the presentation on CEP at Berlin Buzzwords, and this is one line of work that may eventually make sense to explore. Rule engines like drools implement the Rete algorithm that if I understood

Re: About nodes number on Flink

2017-06-23 Thread Timo Walther
Hi Andrea, the number of nodes usually depends on the work that you do within your Functions. E.g. if you have a computation intensive machine learning library in a MapFunction and takes 10 seconds per element, it might make sense to paralellize this in order to increase your throughput. Or

Re: Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Kostas Kloudas
Hi Jorn and Sridhar, It would be worth describing a bit more what these tools are and what are your needs. In addition, and to see what the CEP library already offers here you can find the documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/libs/cep.html

Integrating Flink CEP with a Rules Engine

2017-06-23 Thread Sridhar Chellappa
Folks, I am new to Flink. One of the reasons why I am interested in Flink is because of its CEP library. Our CEP logic comprises of a set of complex business rules which will have to be managed (Create, Update, Delete) by a bunch of business analysts. Is there a way I can integrate other third

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Stefan Richter
No, that doesn’t make a difference and also works. > Am 23.06.2017 um 11:40 schrieb Adarsh Jain : > > I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this > be the problem? > > With "import org.apache.flink.api.scala.ExecutionEnvironment" > >

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Adarsh Jain
I am using "val env = ExecutionEnvironment.getExecutionEnvironment", can this be the problem? With "import org.apache.flink.api.scala.ExecutionEnvironment" Using scala in my program. Regards, Adarsh On Fri, Jun 23, 2017 at 3:01 PM, Stefan Richter wrote: > I just

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Stefan Richter
I just copy pasted your code, adding the missing "val env = LocalEnvironment.createLocalEnvironment()" and exchanged the string with a local directory for some test files that I created. No other changes. > Am 23.06.2017 um 11:25 schrieb Adarsh Jain : > > Hi Stefan, >

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Adarsh Jain
Hi Stefan, Thanks for your efforts in checking the same, still doesn't work for me. Can you copy paste the code you used maybe I am doing some silly mistake and am not able to figure out the same. Thanks again. Regards, Adarsh On Fri, Jun 23, 2017 at 2:32 PM, Stefan Richter

Re: Problem with ProcessFunction timeout feature

2017-06-23 Thread Álvaro Vilaplana García
Well, it sounds very reasonable to me! I will let you know how it goes. 2017-06-23 10:05 GMT+01:00 Stefan Richter : > Yes, exactly. The idea would be, that you operate in event time, but > combine it with processing time timers to trigger timeout detection. Could >

Re: Problem with ProcessFunction timeout feature

2017-06-23 Thread Stefan Richter
Yes, exactly. The idea would be, that you operate in event time, but combine it with processing time timers to trigger timeout detection. Could that help for your case? > Am 23.06.2017 um 10:55 schrieb Álvaro Vilaplana García > : > > Hi Stefan, > > You meant > >

Re: Recursive Traversal of the Input Path Directory, Not working

2017-06-23 Thread Stefan Richter
Hi, I tried this out on the current master and the 1.3 release and both work for me everything works exactly as expected, for file names, a directory, and even nested directories. Best, Stefan > Am 22.06.2017 um 21:13 schrieb Adarsh Jain : > > Hi Stefan, > > Yes your

Re: Error "key group must belong to the backend" on restore

2017-06-23 Thread Stefan Richter
Hi, I had a closer look at those exceptions now, and I would expect so see this in the case where there is suddenly a mismatch between the key-group range assigned to the keyed backend and the key-groups covered by the state handle we try to restore. For example like when the wrong state

Re: Different Window Sizes in keyed stream

2017-06-23 Thread Ahmad Hassan
Thanks Fabian for the advice! Best Regards, Dr. Ahmad Hassan On 23 June 2017 at 09:05, Fabian Hueske wrote: > Hi Ahmad, > > that is not possible, at least not with Flink's built-in windows. > You can probably implement something like that on top of the DataStream > API but

Re: Problem with ProcessFunction timeout feature

2017-06-23 Thread Stefan Richter
Hi, yes, I think you understood the basic concept of watermarks. Events are basically driving „the event time clock“, so it can only advance when you see events. I am not sure if I got the part about partitions correctly, but the watermark event time is a global thing. For example, if you have

Re: Different Window Sizes in keyed stream

2017-06-23 Thread Fabian Hueske
Hi Ahmad, that is not possible, at least not with Flink's built-in windows. You can probably implement something like that on top of the DataStream API but I think it would quite a bit of effort. IMO, the better approach would be to start a separate Flink job per tenant. This would also improve

Re: Problem with ProcessFunction timeout feature

2017-06-23 Thread Álvaro Vilaplana García
Hi Stefan, Thank you so much for your answer. Regarding the 'artificial events', our main problem is that we have no control at all in the devices. I have been reading more about event time and watermarks and what I understood is that when we use event times (device times) Flink does not know