Excessive stdout is causing java heap out of mem

2017-05-19 Thread Fritz Budiyanto
Hi, I notice that when I enabled DataStreamSink’s print() for debugging, (kinda excessive printing), its causing java Heap out of memory. Possibly the Task Manager is buffering all stdout for the WebInterface? I haven’t spent time debugging it, but I wonder if this is expected where massive

Re: Best practices to maintain reference data for Flink Jobs

2017-05-19 Thread Sand Stone
Also, took a quick read on side input. it's unclear to me how side input could solve this issue better. At a high level, this is what I have in mind: flatmap(byte[] value, Collector<> output) { var iter = someStoreStateObject.seek(akeyprefix); //or

Re: Best practices to maintain reference data for Flink Jobs

2017-05-19 Thread Sand Stone
Thanks Gordon and Fabian. The enriching data is really reference data, e.g. the reverseIP database. It's hard to be keyed as the main data stream as the "ip address" in the event is not a primary key in the main data stream. QueryableState is close, but it does not support range scan as far as I

Re: ConnectedStream keyby issues

2017-05-19 Thread Renjie Liu
Even if you increase the operator parallelism, you can still use the state operation. On Fri, May 19, 2017 at 7:47 PM Tarek khal wrote: > If I increase the parallelism operator, I risk losing shared state solution > or it has nothing to do. > And if it's going to

Re: ConnectedStream keyby issues

2017-05-19 Thread Tarek khal
If I increase the parallelism operator, I risk losing shared state solution or it has nothing to do. And if it's going to be an advantage, is it limited to what? I am new with this framework I find difficulty in some notions. Best Regards, -- View this message in context:

Re: ConnectedStream keyby issues

2017-05-19 Thread Renjie Liu
Jason's solution is right, l'm just clarifying the mistake in the explanation. Tarek khal 于2017年5月19日周五 下午7:11写道: > Hello Renjie, > > Yes, the parallelism is 1. what should i do pls ? > > Regards, > > > > -- > View this message in context: >

Re: ConnectedStream keyby issues

2017-05-19 Thread Tarek khal
Hello Renjie, Yes, the parallelism is 1. what should i do pls ? Regards, -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ConnectedStream-keyby-issues-tp12999p13226.html Sent from the Apache Flink User Mailing List archive. mailing list

Re: ConnectedStream keyby issues

2017-05-19 Thread Renjie Liu
@Jason I think there's a mistake in your explanation since each task in the task manager has its own copy of an operator instance, so the tuple may not be shared. State is a great solution but I think that's not the root cause. @Tarek What's the parallelism of your data stream? I think the reason

Re: Flink metrics related problems/questions

2017-05-19 Thread Chesnay Schepler
2. isn't quite accurate actually; metrics on the TaskManager are not persisted across restarts. On 19.05.2017 11:21, Chesnay Schepler wrote: 1. This shouldn't happen. Do you access the counter from different threads? 2. Metrics in general are not persisted across restarts, and there is no

Re: FlinkCEP latency/throughput

2017-05-19 Thread Dawid Wysakowicz
Hello Alfred, Just some considerations from my side as for the latency. I think the first step should be defining what does "latency" for a CEP library really means. The first thing that comes to my mind is the time period between the arrival of an event that should trigger a match (ending

Re: Flink metrics related problems/questions

2017-05-19 Thread Chesnay Schepler
1. This shouldn't happen. Do you access the counter from different threads? 2. Metrics in general are not persisted across restarts, and there is no way to configure flink to do so at the moment. 3. Counters are sent as gauges since as far as I know StatsD counters are not allowed to be

Re: Best practices to maintain reference data for Flink Jobs

2017-05-19 Thread Fabian Hueske
+1 to what Gordon said. Queryable state is rather meant as an external interface to streaming jobs than for lookups within jobs. Accessing co-located state should give you better performance and is probably easier to implement and maintain. Cheers, Fabian 2017-05-19 7:43 GMT+02:00 Tzu-Li

Re: ConnectedStream keyby issues

2017-05-19 Thread gaurav
Hello I am little confused on when the state will be gc. For example, Example 1: Class abc extends RichProcessFunction,Tuple<>> { public void processElement(..) { if(timer never set) {

Flink metrics related problems/questions

2017-05-19 Thread jaxbihani
Background: We are using a job using ProcessFunction which reads data from kafka fires ~5-10K timers per second and sends matched events to KafkaSink. We are collecting metrics for collecting no of active timers, no of timers scheduled etc. We use statsd reporter and monitor using Grafana