Re: Flink performance

2024-03-12 Thread Robin Moffatt via user
Engineer, Decodable On Tue, 12 Mar 2024 at 06:59, Kamal Mittal via user wrote: > Hello Community, > > > > Please share info. for below query. > > > > Rgds, > > Kamal > > > > *From:* Kamal Mittal via user > *Sent:* Monday, March 11, 2024 1:18 PM

RE: Flink performance

2024-03-12 Thread Kamal Mittal via user
Hello Community, Please share info. for below query. Rgds, Kamal From: Kamal Mittal via user Sent: Monday, March 11, 2024 1:18 PM To: user@flink.apache.org Subject: Flink performance Hello, Can you please point me to documentation if any such available where flink talks about or documented

Flink performance

2024-03-11 Thread Kamal Mittal via user
Hello, Can you please point me to documentation if any such available where flink talks about or documented performance numbers w.r.t certain use cases? Rgds, Kamal

Re: Flink Performance Issue

2021-09-27 Thread Arvid Heise
Hi Kamaal, I did a quick test with a local Kafka in docker. With parallelism 1, I can process 20k messages of size 4KB in about 1 min. So if you use parallelism of 15, I'd expect it to take it below 10s even with bigger data skew. What I recommend you to do is to start from scratch and just work

Re: Flink Performance Issue

2021-09-27 Thread Mohammed Kamaal
Hi Robert, I have removed all the business logic (keyBy and window) operator code and just had a source and sink to test it. The throughput is 20K messages in 2 minutes. It is a simple read from source (kafka topic) and write to sink (kafka topic). Don't you think 2 minutes is also not a

Re: Flink Performance Issue

2021-09-22 Thread Robert Metzger
Hi Kamaal, I would first suggest understanding the performance bottleneck, before applying any optimizations. Idea 1: Are your CPUs fully utilized? if yes, good, then scaling up will probably help If not, then there's another inefficiency Idea 2: How fast can you get the data into your job,

Re: Flink Performance Issue

2021-09-22 Thread Mohammed Kamaal
Hi Arvid, The throughput has decreased further after I removed all the rebalance(). The performance has decreased from 14 minutes for 20K messages to 20 minutes for 20K messages. Below are the tasks that the flink application is performing. I am using keyBy and Window operation. Do you think

Re: Flink Performance Issue

2021-09-06 Thread Arvid Heise
Hi Mohammed, something is definitely wrong in your setup. You can safely say that you can process 1k records per second and core with Kafka and light processing, so you shouldn't even need to go distributed in your case. Do you perform any heavy computation? What is your flatMap doing? Are you

Re: Flink Performance Issue

2021-09-02 Thread Mohammed Kamaal
Hi Fabian, Just an update, Problem 2:- Caused by: org.apache.kafka.common.errors.NetworkException It is resolved. It was because we exceeded the number of allowed partitions for the kafka cluster (AWS MSK cluster). Have deleted unused topics and partitions to resolve the issue.

Re: Flink performance with multiple operators reshuffling data

2021-08-31 Thread JING ZHANG
Hi Jason, > In our case, our input/output ratio of these Flin operators are all 1 to 1, so I guess it doesn't matter that much.. Yes > But I think the keys we are using in general are pretty uniform. Cool. You could run for a period of time to see if there is data skew. If there is indeed a data

Re: Flink performance with multiple operators reshuffling data

2021-08-31 Thread Jason Liu
Thanks for the help guys! Yea we can potentially append random strings to the keys and duplicate data across them to avoid skewness, if necessary. But I think the keys we are using in general are pretty uniform. The lowest selectivity at the up fornt method is really interesting though. In our

Re: Flink performance with multiple operators reshuffling data

2021-08-30 Thread JING ZHANG
Hi Jason, A job with multiple reshuffle data could be scalable under normal circumstances. But we should carefully avoid data skew. Because if input stream has data skew, add more resources would not help. Besides that, if we could adjust the order of the functions, we could put the keyed process

Re: Flink performance with multiple operators reshuffling data

2021-08-30 Thread Caizhi Weng
Hi! Key-by operations can scale with parallelisms. Flink will shuffle your record to different sub-task according to the hash value of the key modulo number of parallelism, so the more parallelism you have the faster Flink can process data, unless there is a data skew. Jason Liu 于2021年8月31日周二

Flink performance with multiple operators reshuffling data

2021-08-30 Thread Jason Liu
Hi there, We have this use case where we need to have multiple keybys operators with its own MapState, all with different keys, in a single Flink app. This obviously means we'll be reshuffling our data a lot. Our TPS is around 1-2k, with ~2kb per event and we use Kinesis Data Analytics as

Re: Flink Performance Issue

2021-08-25 Thread Fabian Paul
Hi Mohammed, 200records should definitely be doable. The first you can do is remove the print out Sink because they are increasing the load on your cluster due to the additional IO operation and secondly preventing Flink from fusing operators. I am interested to see the updated job graph after

Re: Flink Performance Issue

2021-08-24 Thread Fabian Paul
Hi Mohammed, Without diving too much into your business logic a thing which catches my eye is the partitiong you are using. In general all calls to`keyBy`or `rebalance` are very expensive because all the data is shuffled across down- stream tasks. Flink tries to fuse operators with the same

Flink Performance Issue

2021-08-24 Thread Mohammed Kamaal
Hi, Apologize for the big message, to explain the issue in detail. We have a Flink (version 1.8) application running on AWS Kinesis Analytics. The application has a source which is a kafka topic with 15 partitions (AWS Managed Streaming Kafka) and the sink is again a kafka topic with 15

Re: Flink performance testing

2020-09-17 Thread Piotr Nowojski
Hi, But what are you asking for? Is it possible to do such benchmarks? Yes, it is possible. People are doing it all the time. Start a cluster, feed the data, measure the throughput (either via custom diagnostic operators, or via metrics [1]). Is there some framework to do it? Not that I know of.

Re: Flink performance testing

2020-09-16 Thread mahesh salunkhe
I would like to do performance testing for my flink job specially related with volume, how my flink job perform if more streaming data coming to my source connectors and measure benchmark for various operators? On Wed, 16 Sep 2020 at 12:03, Piotr Nowojski wrote: > Hi, > > I'm not sure what you

Re: Flink performance testing

2020-09-16 Thread Piotr Nowojski
Hi, I'm not sure what you are asking for. We do not provide benchmarks for all of the operators. We currently have a couple of micro benchmarks [1] for some of the operators, and we are also setting up some adhoc benchmarks when implementing various features. If you want to benchmark something

Flink performance testing

2020-09-16 Thread mahesh salunkhe
Team, What are the framework I should be using for Flink End-to-end Performance Testing? I would like to test performance of each flink operators, back pressure etc

Re: Flink performance tuning on operators

2020-05-18 Thread Arvid Heise
Hi Ivan, Just to add up to chaining: When splitting the map into two parts, objects need to be copied from one operator to the chained operator. Since your objects are very heavy that can take quite long, especially if you don't have a specific serializer configured but rely on Kryo. You can

Re: Flink performance tuning on operators

2020-05-15 Thread Chesnay Schepler
Generally there should be no difference. Can you check whether the maps are running as a chain (as a single task)? If they are running in a chain, then I would suspect that /something/ else is skewing your results. If not, then the added network/serialization pressure would explain it. I will

Flink performance tuning on operators

2020-05-14 Thread Ivan Yang
Hi, We have a Flink job that reads data from an input stream, then converts each event from JSON string Avro object, finally writes to parquet files using StreamingFileSink with OnCheckPointRollingPolicy of 5 mins. Basically a stateless job. Initially, we use one map operator to convert Json

Re: Flink Performance

2020-01-21 Thread Dharani Sudharsan
Thanks David. But I don’t see any solutions provided for the same. On Jan 21, 2020, at 7:13 PM, David Magalhães mailto:speeddra...@gmail.com>> wrote: I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone

Re: Flink Performance

2020-01-21 Thread David Magalhães
I've found this ( https://stackoverflow.com/questions/50580756/flink-window-dragged-stream-performance ) post on StackOverflow, where someone complains about performance drop in KeyBy. On Tue, Jan 21, 2020 at 1:24 PM Dharani Sudharsan < dharani.sudhar...@outlook.in> wrote: > Hi All, > >

Flink Performance

2020-01-21 Thread Dharani Sudharsan
Hi All, Currently, I’m running a flink streaming application, the configuration below. Task slots: 45 Task Managers: 3 Job Manager: 1 Cpu : 20 per machine My sample code below: Process Stream: datastream.flatmap().map().process().addsink Data size: 330GB approx. Raw Stream:

Re: Flink performance drops when async checkpoint is slow

2019-03-20 Thread Stephan Ewen
hink we might find something if seeing which operation > delays the task to cause the backpressure, and this operation might be > involved with HDFS. :) > > Best, > Zhijiang > > -- > From:Paul Lam > Send Time:2019年2月

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread zhijiang
which operation delays the task to cause the backpressure, and this operation might be involved with HDFS. :) Best, Zhijiang -- From:Paul Lam Send Time:2019年2月28日(星期四) 19:17 To:zhijiang Cc:user Subject:Re: Flink performance drops

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread Paul Lam
Hi Zhijiang, Thanks a lot for your reasoning! I tried to set the checkpoint to at-leaset-once as you suggested, but unluckily the problem remains the same :( IMHO, if it’s caused by barrier alignment, the state size (mainly buffers during alignment) would be big, right? But actually it’s

Re: Flink performance drops when async checkpoint is slow

2019-02-28 Thread zhijiang
Hi Paul, I am not sure whether task thread is involverd in some works during snapshoting states for FsStateBackend. But I have another experience which might also cause your problem. From your descriptions below, the last task is blocked by `SingleInputGate.getNextBufferOrEvent` that means the

Flink performance drops when async checkpoint is slow

2019-02-27 Thread Paul Lam
Hi, I have a Flink job (version 1.5.3) that consumes from Kafka topic, does some transformations and aggregates, and write to two Kafka topics respectively. Meanwhile, there’s a custom source that pulls configurations for the transformations periodically. The generic job graph is as below.

Re: Improving Flink Performance

2017-02-06 Thread Fabian Hueske
ith a > more > efficient one, the performance problems are gone. > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > Performance-tp11248p11447.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink Performance

2017-02-05 Thread Jonas
.nabble.com/Improving-Flink-Performance-tp11248p11447.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-26 Thread Stephan Ewen
@jonas Flink's Fork-Join Pool drives only the actors, which are doing coordination. Unless your job is permanently failing/recovering, they don't do much. On Thu, Jan 26, 2017 at 2:56 PM, Robert Metzger wrote: > Hi Jonas, > > The good news is that your job is completely

Re: Improving Flink Performance

2017-01-26 Thread Robert Metzger
<jo...@huntun.de> wrote: > JProfiler > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > Performance-tp11248p11311.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink Performance

2017-01-26 Thread Jonas
JProfiler -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11311.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-26 Thread dromitlabs
36050.n4.nabble.com/file/n11305/Tv6KnR6.png > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11307.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.

Re: Improving Flink Performance

2017-01-25 Thread Jonas
.nabble.com/Improving-Flink-Performance-tp11248p11307.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-25 Thread Jonas
che-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png> *Any ideas? * -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11305.html Sent from the Apache Flink User Mailing

Re: Improving Flink Performance

2017-01-25 Thread Jonas
I tried and it added a little performance (~10%) but nothing outstanding. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11301.html Sent from the Apache Flink User Mailing List archive. mailing list

Re: Improving Flink Performance

2017-01-25 Thread Stephan Ewen
message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > Performance-tp11248p11272.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink Performance

2017-01-24 Thread Jonas
know how to improve that? Might setting the buffer size / timeout be worth exploring? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11272.html Sent from the Apache Flink User Mailing List archive. mailing

Re: Improving Flink Performance

2017-01-24 Thread Stephan Ewen
nt: ?): Array[Byte] = >> { "\n".getBytes(CharsetUtil.UTF_8) } }) } and nc -lk PORT | pv --line-mode >> --rate --average-rate --format "Current: %r, Avg:%a, Total: %b" > >> /dev/null*I'm >> running this on a Intel i5-3470, 16G RAM, Ubuntu 16.04.1 LTS on Flink

Re: Improving Flink Performance

2017-01-24 Thread Aljoscha Krettek
> > /dev/null*I'm > running this on a Intel i5-3470, 16G RAM, Ubuntu 16.04.1 LTS on Flink 1.1.4 > -- > View this message in context: Improving Flink Performance > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248.html> > Sent from the Apache Flink User Mailing List archive. mailing list archive > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at > Nabble.com. >

Improving Flink Performance

2017-01-24 Thread Jonas
-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink performance

2017-01-24 Thread Jonas
I don't even have images in there :O Will delete this thread and create a new one. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11245.html Sent from the Apache Flink User Mailing List archive. mailing

Re: Improving Flink performance

2017-01-23 Thread Ted Yu
> View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > performance-tp11211p11225.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Re: Improving Flink performance

2017-01-23 Thread Jonas
I received it well-formatted. May it be that the issue is your Mail reader? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11225.html Sent from the Apache Flink User Mailing List archive. mailing list

Re: Improving Flink performance

2017-01-23 Thread Greg Hogan
ents were taken with > > and / > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Improving-Flink- > performance-tp11211.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >

Improving Flink performance

2017-01-23 Thread Jonas
to make this faster?* /Measurements were taken with and / -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Flink performance tuning

2016-05-17 Thread Robert Metzger
sk slot is being occupied. Something I am doing is > wrong.. > > 3 > > Task Managers > > 21 > > Task Slots > > 20 > > Available Task Slots > > > > > > Best regards, > > Serhiy. > > > > *From:* Robert Metzger [mailto:rmetz...@ap

RE: Flink performance tuning

2016-05-17 Thread Serhiy Boychenko
tion and what I have noticed is only one task slot is being occupied. Something I am doing is wrong.. 3 Task Managers 21 Task Slots 20 Available Task Slots Best regards, Serhiy. From: Robert Metzger [mailto:rmetz...@apache.org] Sent: 13 May 2016 15:26 To: user@flink.apache.org Subject: Re: Flink perf

Re: How to measure Flink performance

2016-05-13 Thread Ken Krugler
source and writing to sink). >> >> Cheers, >> >> Konstantin >> >> On 12.05.2016 18:57, prateekarora wrote: >>> Hi >>> >>> How can i measure throughput and latency of my application in flink 1.0.2 >>> ? >>> >

Re: Flink performance tuning

2016-05-13 Thread Stephan Ewen
One issue may be that the selection of YARN containers is not HDFS locality aware here. Hence, Flink may read more splits remotely, where MR reads more splits locally. On Fri, May 13, 2016 at 3:25 PM, Robert Metzger wrote: > Hi, > > Can you try running the job with 8 slots,

Re: Flink performance tuning

2016-05-13 Thread Robert Metzger
Hi, Can you try running the job with 8 slots, 7 GB (maybe you need to go down to 6 GB) and only three TaskManagers (-n 3) ? I'm suggesting this, because you have many small JVMs running on your machines. On such small machines you can probably get much more use out of your available memory by

Flink performance tuning

2016-05-13 Thread Serhiy Boychenko
Hey, I have successfully integrated Flink into our very small test cluster (3 machines with 8 cores, 8GBytes of memory and 2x1TB disks). Basically I am started the session to use YARN as RM and the data is being read from HDFS. /yarn-session.sh -n 21 -s 1 -jm 1024 -tm 1024 My code is very

Re: How to measure Flink performance

2016-05-12 Thread Konstantin Knauf
ve.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com. > -- Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182 TNG Technology Consulting GmbH, Betastr.

Re: How to measure Flink performance

2016-05-12 Thread prateekarora
Hi How can i measure throughput and latency of my application in flink 1.0.2 ? Regards Prateek -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741p6863.html Sent from the Apache Flink User Mailing List

Re: How to measure Flink performance

2016-05-09 Thread Ufuk Celebi
Hey Prateek, On Fri, May 6, 2016 at 6:40 PM, prateekarora wrote: > I have below information from spark . do i can get similar information from > Flink also ? if yes then how can i get that. You can get GC time via the task manager overview. The other metrics don't

How to measure Flink performance

2016-05-06 Thread prateekarora
in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-measure-Flink-performance-tp6741.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
You're obviously right, the configs were different. In the downloaded version I had set off heap memory to true, whereas in the version I compiled myself this one-time change to flink-conf.yaml was overwritten by recompiling. I have fixed it now and performance is the same. For the record, I had

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Ovidiu-Cristian MARCU
Hi, Your assumption may be incorrect related to the TeraSort use case for eastcirclek's implementation. How many time did you run your program? It would be helpful to give more details about your experiment, in terms of configuration, dataset size. Best, Ovidiu > On 14 Apr 2016, at 17:14,

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
I have tried multiple Maven and Scala Versions, but to no avail. I can't seem to achieve performance of the downloaded archive. I am stumped by this and will need to do more experiments when I have more time. Robert On Thu, Apr 14, 2016 at 1:13 PM, Robert Schmidtke

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
Hi Robert, thanks for the hint! Looks like something I could have figured out myself -.-" I'll let you know if I find something. Robert On Thu, Apr 14, 2016 at 1:06 PM, Robert Metzger wrote: > Hi Robert, > > check out the tools/create_release_files.sh file in the source

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Metzger
Hi Robert, check out the tools/create_release_files.sh file in the source tree. There you can see how we are building the release binaries. It would be quite interesting to find out what caused the performance difference. On Wed, Apr 13, 2016 at 5:03 PM, Robert Schmidtke

Flink performance pre-packaged vs. self-compiled

2016-04-13 Thread Robert Schmidtke
Hi everyone, I'm using Flink 0.10.2 for some benchmarks and had to add some small changes to Flink, which led me to compiling and running it myself. This is when I noticed a performance difference in the pre-packaged Flink version that I downloaded from the web (