Re: Improving Flink Performance

2017-01-25 Thread Jonas
Images: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png and http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png -- View this message in context:

Re: Custom Partitioning and windowing questions/concerns

2017-01-25 Thread Fabian Hueske
Hi Nikos, yes, the hash function is not only used for partitioning but also to organize the key-partitioned state. My intuition is that the AbstractStreamOperator approach would be easier to realize, because you don't need to worry about side effects of changing Flink internals. Best, Fabian

Re: REST api: how to upload jar?

2017-01-25 Thread Cliff Resnick
Thanks for that, issue here https://issues.apache.org/jira/browse/FLINK-5646 On Tue, Jan 24, 2017 at 11:30 PM, Sachin Goel wrote: > Hey Cliff > You can upload a jar file using http post with the file data sent under a > form field 'jarfile'. > > Can you also please

Re: Improving Flink Performance

2017-01-25 Thread Jonas
I ran a profiler on my Job and it seems that most of the time, its waiting :O See here: Also, the following code snippet executes unexpectedly slow: as you can see in this call graph:

Flink dependencies shading

2017-01-25 Thread Dmitry Golubets
I've build latest Flink from sources and it seems that httpclient dependency from flink-mesos is not shaded. It causes troubles with latest AWS SDK. Do I build it wrong or is it a known problem? Best regards, Dmitry

Re: benchmarking flink streaming

2017-01-25 Thread Stephan Ewen
The latency markers "pass through windows" so they do not take the latency of windows into account. They represent only the latency of the actual streams and their backpressure. On Wed, Jan 25, 2017 at 6:08 PM, Dominik Safaric wrote: > Hi Stephan, > > As I’m already

Debugging, logging and measuring operator subtask performance

2017-01-25 Thread Dominik Safaric
Hi, As I am experiencing certain performance degradations in a streaming job, I want to determine the root cause of it by measuring subtask performance in terms of resource utilisation - e.g. CPU utilisation of the thread. Is this somehow possible? Does Flink log scheduled and executed

Re: Improving Flink Performance

2017-01-25 Thread Jonas
I tried and it added a little performance (~10%) but nothing outstanding. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11301.html Sent from the Apache Flink User Mailing List archive. mailing list

Re: benchmarking flink streaming

2017-01-25 Thread Meghashyam Sandeep V
Hi Stephan, Thats great to hear. We are using EMR which is still on Flink 1.1.3. I'll use the latency markers when Flink on EMR is upgraded. Thanks, Sandeep On Wed, Jan 25, 2017 at 9:08 AM, Dominik Safaric wrote: > Hi Stephan, > > As I’m already familiar with the

RE: Custom Partitioning and windowing questions/concerns

2017-01-25 Thread Katsipoulakis, Nikolaos Romanos
Hello Fabian, Thank you for your response and there is no need for apologies ☺ . As I mentioned in my previous email, my wording seemed confusing and it was only expected that you had an incomplete picture of my goal. Again, thank you for your help and your time. Moving on to my plan from

Re: Custom Partitioning and windowing questions/concerns

2017-01-25 Thread Fabian Hueske
Hi Nikos, you are of course right. I forgot that ProcessFunction requires a KeyedStream. Sorry for this advice. The problem is that you need need to implement some kind of time-based function that emits partial counts every 10 seconds. AFAIK, the DataStream API does not offers built-in operator

Re: benchmarking flink streaming

2017-01-25 Thread Dominik Safaric
Hi Stephan, As I’m already familiar with the latency markers of Flink 1.2, there is one question that bothers me in regard to them - how does Flink measure end-to-end latency when dealing with e.g. aggregations? Suppose you have a topology ingesting data from Kafka, and you want to output

RE: When is the StreamPartitioner selectChannel() method called

2017-01-25 Thread Katsipoulakis, Nikolaos Romanos
Hello again, As a follow-up email, on the same topology, I set break-points inside the selectChannel() methods for RebalancePartitioner and ForwardPartitioner and they are reached. Unfortunately, the same break-point set on the HashPartitioner is not reached. To make sure that an instance of

Re: How to get help on ClassCastException when re-submitting a job

2017-01-25 Thread Fabian Hueske
Thank you Giuliano! 2017-01-25 6:54 GMT+01:00 Giuliano Caliari : > Issue reported: > > https://issues.apache.org/jira/browse/FLINK-5633 > > Sorry for taking so long > > > > -- > View this message in context: http://apache-flink-user- >

Issues while restarting a job on HA cluster

2017-01-25 Thread ani.desh1512
1. We have a HA cluster of 2 masters and 3 slaves. We run a jar through flink cli. Then we cancel that running job. Then we do some changes in the source code of jar, repackage it and deploy it again and run it again through cli. The following error occurs: /

RE: When is the StreamPartitioner selectChannel() method called

2017-01-25 Thread Katsipoulakis, Nikolaos Romanos
Hello Ufuk, First, thank you very much for your quick reply and for clarifying the difference between channels and slots. Turning to debugging and visiting the breakpoint inside the HashPartitioner, I need to inform you that I am using IntelliJ IDE and I have set up the environment as a maven

Re: benchmarking flink streaming

2017-01-25 Thread Stephan Ewen
Hi! There are new latency metrics in Flink 1.2 that you can use. They are sampled, so not on every record. You can always attach your own timestamps, in order to measure the latency of specific records. Stephan On Fri, Dec 16, 2016 at 5:02 PM, Meghashyam Sandeep V < vr1meghash...@gmail.com>

Re: Rapidly failing job eventually causes "Not enough free slots"

2017-01-25 Thread Stephan Ewen
Hi! Adding Ufuk and Till to this... You are right, these issues should not compromise HA. Is it possible that you share the logs to diagnose what the issue was? @Till, Ufuk: Can it be that the ZooKeeper client gave up for good trying to connect to ZooKeeper after a certain time? Stephan On

Re: When is the StreamPartitioner selectChannel() method called

2017-01-25 Thread Ufuk Celebi
Hey Nikos, slots are only relevant for scheduling tasks. The number of outgoing channels depends on the number of parallel subtasks that consume a produced intermediate result stream, say the result of a source operator. If you have a job with a simple source->keyBy->map flow with parallelism X

Re: Queryable State

2017-01-25 Thread Nico Kruber
Hi Dawid, sorry for the late reply, I was fixing some issues for queryable state and may now have gotten to the point of your error: you may be seeing a race condition with the MemoryStateBackend state backend (the default) as described here: https://issues.apache.org/jira/browse/FLINK-5642 I'm

When is the StreamPartitioner selectChannel() method called

2017-01-25 Thread Katsipoulakis, Nikolaos Romanos
Hello all, I have been looking into different StreamPartitioner implementations of Flink, and I noticed they come with an implementation of selectChannel(), as defined in the ChannelSelector interface. In order to understand better the actions of a StreamPartitioner during execution, I set up

Re: Flink configuration

2017-01-25 Thread Greg Hogan
Has anyone reported decreased performance with hyper-threading? On Tue, Jan 24, 2017 at 11:18 AM, Aljoscha Krettek wrote: > Hi, > that wording is from a time where no-one though about VMs with virtual > cores. IMHO this maps directly to virtual cores so you should set it >

Re: How to configure max parallelism?

2017-01-25 Thread Gyula Fóra
Great, thank you for the explanation Ufuk :) Gyula Ufuk Celebi ezt írta (időpont: 2017. jan. 25., Sze, 11:11): > Hey Gyula, > > as far as I can tell, there are no docs for this yet. Good news is > that the docs are getting a lot of love for the release and I guess > this is

Re: Problem with state Apache Flink 1.2.0 RC0

2017-01-25 Thread Ufuk Celebi
- Are you running this with an HA cluster or non-HA? - Could you share a screenshot please? On Tue, Jan 24, 2017 at 12:16 PM, Salou Guillaume wrote: > Hi ! > > I have the same problem on my laptop and on my desk at work. > > I have also tested, it appears under private

Re: How to configure max parallelism?

2017-01-25 Thread Ufuk Celebi
Hey Gyula, as far as I can tell, there are no docs for this yet. Good news is that the docs are getting a lot of love for the release and I guess this is simply something that is still missing. The setMaxParallelism call should be available in all the places where you can also call

Re: Improving Flink Performance

2017-01-25 Thread Stephan Ewen
Have you tried the object reuse option mentioned above? On Tue, Jan 24, 2017 at 6:52 PM, Jonas wrote: > The performance hit due to decoding the JSON is expected and there is not a > lot (except for changing the encoding that I can do about that). Alright. > > When joining the

Re: Better way to read several stream sources

2017-01-25 Thread Sendoh
Found the reason. I saw using ParallelSourceFunction my override open() is called 4 times, comparing to using sourceFunction open() is called only once, and my override open() constructs the connection to sources, which determines how many source are going to be read. Cheers, Sendoh -- View

Re: Flink with Yarn on MapR

2017-01-25 Thread Robert Metzger
Hi, I think this is a re-post of a question I've already "answered": http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-with-Yarn-on-MapR-td15448.html On Wed, Jan 25, 2017 at 12:12 AM, ani.desh1512 wrote: > Hi, > I am trying to setup flink with Yarn

Re: multi tenant workflow execution

2017-01-25 Thread Fabian Hueske
Hi Chen, yes, timers of a ProcessFunction are organized by key (you can have multiple timers per key as well), stored in the keyed state, checkpointed, and restored. I'm not sure about the guarantees for iterative streams. Best, Fabian 2017-01-25 8:18 GMT+01:00 Chen Qin :

How to configure max parallelism?

2017-01-25 Thread Gyula Fóra
Hi all, I can't seem to find in the documentation how to set the maximum parallelism for rescaling keyed state. Can anyone help me out here? Thanks! Gyula