Re: Capturing statistics for more than 5 minutes

2017-11-14 Thread Chesnay Schepler
Hello, the Metric System should be exactly what you're looking for. On 15.11.2017 03:55, Nomchin Banga wrote: Hi We are a group of graduate students from Purdue University who are doing an experimental

Streaming User-defined Aggregate Function gives exception when using Table API jars

2017-11-14 Thread Colin Williams
>From the documentation there is a note which instructs not to include the flink-table dependency into the project. However when I put the flink-table dependency on the cluster the User-defined Aggregate Function gives an Exception. When I do include the flink-table into the dependencies, the

Capturing statistics for more than 5 minutes

2017-11-14 Thread Nomchin Banga
Hi We are a group of graduate students from Purdue University who are doing an experimental study to compare different data ingestion engines. For this purpose, we are trying to collect statistics of running jobs over a few days. However, Flink’s UI captures the statistics for the last 5

[Flink] merge-sort for a DataStream

2017-11-14 Thread Jiewen Shao
In Flink, I have DataStream, each list is individually pre-sorted, what I need to do is persist everything in one shot with global sort order. any ides the best to do this? Hope it makes sense. Thanks in advance!

Model serving in Flink DataStream

2017-11-14 Thread Adarsh Jain
Hi, I have a Flink Streaming system and I need to use couple of models trained on other systems with Flink Streaming i.e. need to do model serving. Is PMML the best way to do it? Any inputs on Flink-JPMML performance? Any other suggested alternatives? Regards, Adarsh

Re: Integration testing my Flink job

2017-11-14 Thread Ron Crocker
GRR. Hit send too soon. The thoughts we have right now are docker compose based - we run kafka and flink as docker containers and inject events and watch the output. Ron — Ron Crocker Principal Engineer & Architect ( ( •)) New Relic rcroc...@newrelic.com M: +1 630 363 8835 > On Nov 14, 2017,

Integration testing my Flink job

2017-11-14 Thread Ron Crocker
Is there a good way to do integration testing for a Flink job - that is, I want to inject a set of events and see the proper behavior? Ron — Ron Crocker Principal Engineer & Architect ( ( •)) New Relic rcroc...@newrelic.com M: +1 630 363 8835

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Shailesh Jain
1. Single data source because I have one kafka topic where all events get published. But I am creating multiple data streams by applying a series of filter operations on the single input stream, to generate device specific data stream, and then assigning the watermarks on that stream. Will this

Apache Flink - Question about Global Windows

2017-11-14 Thread M Singh
Hi: I am reading about global windows and the documentation indicates: 'A global windows assigner assigns all elements with the same key to the same single global window' >From my understanding if we have a keyed stream - then all elements with the >same key are also assigned to a single

Re: Flink memory leak

2017-11-14 Thread Piotr Nowojski
Best would be to analyse memory usage via some profiler. What I have done was: 1. Run your scenario on the test cluster until memory consumption goes up 2. Stop submitting new jobs, cancel or running jobs 3. Manually triggered GC couple of times via jconsole (other tools can do that as well) 4.

Re: Flink memory leak

2017-11-14 Thread Flavio Pompermaier
What should we do to confirm it? Do you have any github repo start from? On Tue, Nov 14, 2017 at 4:02 PM, Piotr Nowojski wrote: > Ebru, Javier, Flavio: > > I tried to reproduce memory leak by submitting a job, that was generating > classes with random names. And indeed

Re: Flink memory leak

2017-11-14 Thread Piotr Nowojski
Ebru, Javier, Flavio: I tried to reproduce memory leak by submitting a job, that was generating classes with random names. And indeed I have found one. Memory was accumulating in `char[]` instances that belonged to `java.lang.ClassLoader#parallelLockMap`. OldGen memory pool was growing in size

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-14 Thread Shankara
Hi Nico, - how do you run the job? >> If we run same program in flink local then it works fine. For flink local we used command line mvn package exec:java -Dexec.mainClass=com.huawei.ccn.intelliom.ims.tmon.TMon -Dexec.args="--threshold=Measurment:0:4001:1:90:85:CPU

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Piotr Nowojski
1. It seems like you have one single data source, not one per device. That might make a difference. Single data source followed by comap might create one single operator chain. If you want to go this way, please use my suggested solution c), since you will have troubles with handling watermarks

Re: Flink takes too much memory in record serializer.

2017-11-14 Thread Nico Kruber
We're actually also trying to have the serializer stateless in future and may be able to remove the intermediate serialization buffer which is currently growing on heap before we copy the data into the actual target buffer. This intermediate buffer grows and is pruned after serialization if it

Re: readFile, DataStream

2017-11-14 Thread Juan Miguel Cejuela
Hi Kostas, thank you very much for your answer. Yes, I proposed the change in https://github.com/apache/flink/pull/4997 to compare as modificationTime < globalModificationTime (without accepting equals). Later, however, I realized, as you correctly point out, that this creates duplicates. The

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Shailesh Jain
1. Okay, I understand. My code is similar to what you demonstrated. I have attached a snap of my job plan visualization. 3. Have attached the logs and exception raised (15min - configured akka timeout) after submitting the job. Thanks, Shailesh On Tue, Nov 14, 2017 at 2:46 PM, Piotr Nowojski

Re: Flink takes too much memory in record serializer.

2017-11-14 Thread Chesnay Schepler
I don't there's anything you can do except reducing the parallelism or the size of your messages. A separate serializer is used for each channel as the serializers are stateful; they are capable of writing records partially to a given MemorySegment to better utilize the allocated memory. How

Flink takes too much memory in record serializer.

2017-11-14 Thread yunfan123
In the class org.apache.flink.runtime.io.network.api.writer.RecordWriter, it has same number of serializers with the numChannels. If I first operator has 500 parallels and the next operator has 1000 parallels. And every message in flink is 2MB. The job takes 500 * 1000 * 2MB as 1TB memory in

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Piotr Nowojski
Hi, 1. I’m not sure what is your code. However I have tested it and here is the example with multiple streams in one job: https://gist.github.com/pnowojski/63fb1c56f2938091769d8de6f513567f As expected it created 5 source