Join two kafka topics

2017-05-02 Thread Tarek khal
I have two kafka topics (tracking and rules) and I would like to join "tracking" datastream with "rules" datastream as the data arrives in the "tracking" datastream. Here the result that I expect, but without restarting the Job, here I restarted the Job to get this result:

Re: RocksDB error with flink 1.2.0

2017-05-02 Thread Aljoscha Krettek
They can’t (with the current design of Flink) because each CEP pattern get’s executed by a separate operator. We could think about doing multiplexing of several patterns inside one operator. It’s what I hinted at earlier as a possible solution when I mentioned that you could implement your own

Re: RocksDB error with flink 1.2.0

2017-05-02 Thread Elias Levy
Any reason they can't share a single RocksDB state backend instance? On Fri, Apr 28, 2017 at 8:44 AM, Aljoscha Krettek wrote: > The problem here is that this will try to open 300 RocksDB instances on > each of the TMs (depending on how the parallelism is spread between the

Re: Queryable State

2017-05-02 Thread Chet Masterson
Can do. Any advice on where the trace prints should go in the task manager source code? BTW - How do I know I have a correctly configured cluster? Is there a set of messages in the job / task manager logs that indicate all required connectivity is present? I know I use the UI to make sure all the

Re: Collector.collect

2017-05-02 Thread Chesnay Schepler
In the Batch API only a single operator can be chained to another operator. So we're starting with this code: input = ... input.filter(conditionA).output(formatA) input.filter(conditonB).output(formatB) In the Batch API this would create a CHAIN(filterA -> formatA) and a

RE: Collector.collect

2017-05-02 Thread Newport, Billy
Why doesn’t this work with batch though. We did input = ... input.filter(conditionA).output(formatA) input.filter(conditonB).output(formatB) And it was pretty slow compared with a custom outputformat with an integrated filter. From: Chesnay Schepler [mailto:ches...@apache.org] Sent: Monday,

Re: Batch source improvement

2017-05-02 Thread Fabian Hueske
Hi Flavio, actually, Flink did always lazily assign input splits. The JM gets the list of IS from the InputFormat. Parallel source instances (with an input format) request an input split from the JM whenever they do not have anything to do. This should actually balance some of the data skew in

Re: CEP timeout occurs even for a successful match when using followedBy

2017-05-02 Thread Kostas Kloudas
Glad to hear that Moiz! And thanks for helping us test out the library. Kostas > On May 2, 2017, at 12:34 PM, Moiz S Jinia wrote: > > Thanks! I downloaded and built 1.3-SNAPSHOT locally and was able to verify > that followedBy now works as I want. > > Moiz > > On Sat,

Re: CEP timeout occurs even for a successful match when using followedBy

2017-05-02 Thread Moiz S Jinia
Thanks! I downloaded and built 1.3-SNAPSHOT locally and was able to verify that followedBy now works as I want. Moiz On Sat, Apr 29, 2017 at 11:08 PM, Kostas Kloudas < k.klou...@data-artisans.com> wrote: > Hi Moiz, > > Here are the instructions on how to build Flink from source: > >

Re: RocksDB error with flink 1.2.0

2017-05-02 Thread Aljoscha Krettek
Hi, I think there the bottleneck might be HDFS. With 300 operators with parallelism 6 you will have 1800 concurrent writes (i.e. connections) to HDFS, which might be to much for the master node and the worker nodes. This is the same problem that you had on the local filesystem but now in the

Join two kafka topics

2017-05-02 Thread Tarek khal
I have two kafka topics (tracking and rules) and I would like to join "tracking" datastream with "rules" datastream as the data arrives in the "tracking" datastream. Here the result that I expect, but without restarting the Job, here I restarted the Job to get this result:

Window Function on AllWindowed Stream - Combining Kafka Topics

2017-05-02 Thread G.S.Vijay Raajaa
Hi, I am trying to combine two kafka topics using the a single kafka consumer on a list of topics, further convert the json string in the stream to POJO. Then, join them via keyBy ( On event time field ) and to merge them as a single fat json, I was planning to use a window stream and apply a

Re: Problems reading Parquet input from HDFS

2017-05-02 Thread Lukas Kircher
Hi Flavio, thanks for your help. With Flink 1.2.0 and avro 1.8.1 it works fine for me too as long as I run it from the IDE. As soon as I submit it as a job to the cluster I get the described dependency issues. * If I use the Flink 1.2.0 binary and just add Flink as a Maven dependency to my