Re: Why TimerService interface in ProcessFunction doesn't have deleteEventTimeTimer

2017-05-04 Thread Jagadish Bihani
Hi Thanks for the multiple responses on this question. Please correct me if I am wrong about the 3 possible ways of it: 1. As per FLINK-3089, RocksDB based timer implementation is efficient. But it is not merged yet. Which release this will be part of? 2. FLINK-6359 suggests alternate approach

Re: CEP memory requirements

2017-05-04 Thread Dawid Wysakowicz
Yes you are right, prior to 1.3.0 the state per key was never cleared. Right now due to FLINK-5174 , in master branch, it is stored only if necessary. Z pozdrowieniami! / Cheers! Dawid Wysakowicz *Data/Software Engineer* Skype: dawid_wys |

Re: CEP memory requirements

2017-05-04 Thread Elias Levy
Looking at the code I gather that 1.2 does not clear the per key NFA state even if there is no state to keep, whereas this appears fixed in the master branch. Yes? On Thu, May 4, 2017 at 11:25 AM, Elias Levy wrote: > I am observing odd memory behavior with the CEP

Re: Queryable State

2017-05-04 Thread Chet Masterson
I found the issue. When parallelism = 3, my test data set was skewed such that data was only going to two of the three task managers (kafka partition = 3, number of flink nodes = 3, parallelism = 3). As soon as I created a test data set with enough keys that spread across all three task managers,

Re: Window Function on AllWindowed Stream - Combining Kafka Topics

2017-05-04 Thread G.S.Vijay Raajaa
I tried to reorder and the window function works fine. but then after processing few stream of data from Topic A and Topic B, the window function seem to throw the below error. The keyby is on eventTime field. java.lang.RuntimeException: Unexpected key group index. This indicates a bug. at

CEP memory requirements

2017-05-04 Thread Elias Levy
I am observing odd memory behavior with the CEP library and I am wondering if it is expected. If I write a simple local streaming Flink job that reads from a 65MB compressed file of JSON objects, one per line, parses the JSON, performs a filter operation, and then a keyBy, heap usage is stable,

Re: assignTimestampsAndWatermarks not working as expected

2017-05-04 Thread Kostas Kloudas
Hi Jayesh, Glad that it finally worked! From a first look, I cannot spot anything wrong with the code itself. The only thing I have to note is that the locations of the logs and the printouts you put in your code differ and normally they are not printed in the console. Thanks, Kostas > On

RE: assignTimestampsAndWatermarks not working as expected

2017-05-04 Thread Jayesh Patel
I figured out what's wrong - there was a silly mistake on my side. There is nothing wrong with the code here, but please do let me know if you see anything wrong with my approach. Thank you. From: Jayesh Patel Sent: Thursday, May 04, 2017 10:00 AM To: 'user@flink.apache.org'

Re: Long running time based Patterns

2017-05-04 Thread Kostas Kloudas
Hi Moiz, Then it should work. And the previous issue is already fixed on the master. Kostas > On May 4, 2017, at 6:02 PM, Moiz Jinia wrote: > > It'll definitely have a where clause. Just forgot to include it in the > example. Just meant to focus on the within clause. >

Re: Long running time based Patterns

2017-05-04 Thread Moiz Jinia
It'll definitely have a where clause. Just forgot to include it in the example. Just meant to focus on the within clause. Am on 1.3 - expect it'll be fixed by the time stable is out? Thanks! Moiz — sent from phone On 04-May-2017, at 8:12 PM, Kostas Kloudas

Re: OperatorState partioning when recovering from failure

2017-05-04 Thread Kostas Kloudas
Hi Seth, Upon restoring, splits will be re-shuffled among the new tasks, and I believe that state is repartitioned in a round robin way (although I am not 100% sure so I am also including Stefan and Aljoscha in this). The priority queues will be reconstructed based on the restored elements. So

Re: Queryable State

2017-05-04 Thread Ufuk Celebi
Could you try KvStateRegistry#registerKvState please? In the JM logs you should see something about the number of connected task managers and in the task manager logs that each one connects to a JM. – Ufuk On Tue, May 2, 2017 at 2:53 PM, Chet Masterson wrote: > Can

Re: Join two kafka topics

2017-05-04 Thread Kostas Kloudas
Perfect! Thanks a lot for the clarification! Kostas > On May 4, 2017, at 4:37 PM, Tarek khal wrote: > > Hi Kostas, > > Yes, now is solved by the help of Jason. > > Best, > > > > -- > View this message in context: >

Re: Join two kafka topics

2017-05-04 Thread Tarek khal
Hi Kostas, Yes, now is solved by the help of Jason. Best, -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Join-two-kafka-topics-tp12954p13006.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Join two kafka topics

2017-05-04 Thread Kostas Kloudas
Hi Tarek, This question seems to be a duplicate with your other question “ConnectedStream keyBy issues”, right? I am just asking for clarification. Thanks, Kostas > On May 4, 2017, at 1:41 PM, Tarek khal wrote: > > Hi Aljoscha, > > I tested ConnectedStream and

Re: Long running time based Patterns

2017-05-04 Thread Kostas Kloudas
Hi Moiz, You are on Flink 1.2 or 1.3? In Flink 1.2 (latest stable) there are no known issues, so this will work correctly. Keep in mind that without any conditions (where-clauses), you will only get all possible 2-tuples of incoming elements, which could also be done with a simple process

Re: ConnectedStream keyby issues

2017-05-04 Thread Tarek khal
Hi Jason, Thank you very much for your help, it solves my problem. Best regards, -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ConnectedStream-keyby-issues-tp12999p13003.html Sent from the Apache Flink User Mailing List archive. mailing

assignTimestampsAndWatermarks not working as expected

2017-05-04 Thread Jayesh Patel
Can anybody see what's wrong with the following code? I am using Flink 1.2 and have tried running it in Eclipse (local mode) as well as on a 3 node cluster and it's not behaving as expected. The idea is to have a custom source collect messages from a JMS topic (I have a fake source for now

Re: ConnectedStream keyby issues

2017-05-04 Thread Jason Brelloch
I think the issue is that t2 is not registered to keyed state, so it is being shared across all of the keys on that taskmanager. Take a look at this article: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/state.html#using-managed-keyed-state Basically you need to change

OperatorState partioning when recovering from failure

2017-05-04 Thread Seth Wiesman
I am curious about how operator state is repartitioned to subtasks when a job is resumed from a checkpoint or savepoint. The reason is that I am having issues with the ContinuousFileReaderOperator when recovering from a failure. I consume most of my data from files off S3. I have a custom file

ConnectedStream keyby issues

2017-05-04 Thread Tarek khal
Hi , I have two kafka topics (tracking and rules) and I would like to join "tracking" datastream with "rules" datastream as the data arrives in the "tracking" datastream. The problem with a join is that the rules only “survive” for the length of the window while I suspect that i want them to

Re: Join two kafka topics

2017-05-04 Thread Tarek khal
Hi Aljoscha, I tested ConnectedStream and CoFlatMapFunction as you told me but the result is not as I wait. *For the execution:* 1) I added 3 rules on "rules" topic (imei: "01","02,"03") 2) Perform 15 events with different imei but i guess i have problem with "keyby" *Result : *

Long running time based Patterns

2017-05-04 Thread Moiz S Jinia
Does Flink (with a persistent State backend such as RocksDB) work well with long running Patterns of this type? (running into days) Pattern.begin("start").followedBy("end").within(Time.days(3)) Is there some gotchas here or things to watch out for? Thanks, Moiz

Re: High Availability on Yarn

2017-05-04 Thread Aljoscha Krettek
Hi, Yes, for YARN there is only one running JobManager. As far as I Know, In this case ZooKeeper is only used to keep track of checkpoint metadata and the execution graph of the running job. Such that a restoring JobManager can pick up the data again. I’m not 100 % sure on this, though, so

Re: Fault tolerance & idempotency on window functions

2017-05-04 Thread Aljoscha Krettek
Hi, When keying, keep in mind that Kafka and Flink might use a different scheme for hashing. For example, Flink also applies a murmur hash on the hash code retrieved from the key and then has some internal logic for assigning that hash to a key group (the internal unit of key partitioning). I