Re: Performance and Latency Chart for Flink
Hi Greg,Setting "taskmanager.memory.preallocate" to true caused "Association with remote system [akka.tcp://flink@" "has failed" "[Disassociated]" on all TMs.Changed it back to false.I increased the NW buffers to 1 G & started to get TM slots exceptions. So I am going incremental with that value. Have it set at 8192 (twice as much as before 4096).Thanks From: Greg Hogan <c...@greghogan.com> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Monday, September 19, 2016 1:28 PM Subject: Re: Performance and Latency Chart for Flink My thought would be to compare the data rate and buffer sizes which gives a refresh interval. For example, if you are transmitting 1 GB/s on 128 MiB of network buffers then the refresh rate is at most 1/8 second. There is the same consideration with spill files if the system does not have sufficient free memory for a large number of readahead buffers. Another set of buffers are the kernel socket buffers and you can increase from the Linux default 4 MiB by changing "taskmanager.net.sendReceiveBufferSize" (documentation is in progress; see org.apache.flink.runtime.io.network.netty.NettyConfig). Your nodes have 100+ GB of memory so a conservative assignment might be a gigabyte of network buffers. Then add the following to the conf, restart the cluster, start jconsole on a TaskManager, connect to the TaskManager process, and on the MBeans tab look under org.apache.flink.metrics for Network.AvailableMemorySegments. metrics.reporters: my_jmx_reporter metrics.reporter.my_jmx_reporter.class: org.apache.flink.metrics.jmx.JMXReporter metrics.reporter.my_jmx_reporter.port: 9020-9040 On Mon, Sep 19, 2016 at 3:54 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I > followed that formula :-)))I can bump it up to twice as much like what the > example is doing to for instance 300 MiB.Is this reasonable? what do you > suggest as a reasonable range?Thanks Greg > > From: Greg Hogan <c...@greghogan.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Monday, September 19, 2016 12:43 PM > Subject: Re: Performance and Latency Chart for Flink > > You will need to add the configuration parameters to your flink-conf.yaml. > I believe the intent is that all configuration parameters should be listed > at > > https://ci.apache.org/projects/flink/flink-docs- > master/setup/config.html#full-reference > > My understanding is that the Flink buffers are currently copied to Netty > buffers, although I don't understand the stated memory doubling. > > > On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari < > amirto...@yahoo.com.invalid> wrote: > > > Hi Greg,In the same Flink config link below, there are parameters that > > dont even exist in flink-conf.yaml.Are they defined somewhere else?I > > grepped the followings & none existed in any of the files under conf > > folder."taskmanager.memory.fraction", taskmanager.memory.off > > -heap, taskmanager.memory.segment-size & many more. > > Also, isnt the example calculating the network buffers wrong? Based on > the > > example, roughly 5000 buffers x 32KiB = 16 KiB should be > > allocated.16 KiB divided by 1024 = 156.25 MiB. Why is the example > > saying "the system would allocate roughly 300 MiBytes for network > buffers." > > ?Thats roughly twice as much. Am i Missing something here?I still need > your > > help to set the accurate number for my > > - taskmanager.network.numberOfBuffers = 4096. > > > > Thanks for your response Greg.Amir- From: amir bahmanyari < > > amirto...@yahoo.com> > > To: "dev@flink.apache.org" <dev@flink.apache.org> > > Sent: Monday, September 19, 2016 10:34 AM > > Subject: Re: Performance and Latency Chart for Flink > > > > Hi Greg,I used this guideline to calculate "taskmanager.network. > numberOfBuffers":Apache > > Flink 1.2-SNAPSHOT Documentation: Configuration > > > > > > | > > | > > | > > | | | > > > > | > > > > | > > | > > | | > > Apache Flink 1.2-SNAPSHOT Documentation: Configuration > > | | > > > > | > > > > | > > > > > > > > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 > > is there in the formula.What would you set it to? Once I have that > number, > > I will set "taskmanager.memory.preallocate" to true & will give it > > another shot.Thanks Greg > > > > From: Greg Hogan <c...@
Re: Performance and Latency Chart for Flink
My thought would be to compare the data rate and buffer sizes which gives a refresh interval. For example, if you are transmitting 1 GB/s on 128 MiB of network buffers then the refresh rate is at most 1/8 second. There is the same consideration with spill files if the system does not have sufficient free memory for a large number of readahead buffers. Another set of buffers are the kernel socket buffers and you can increase from the Linux default 4 MiB by changing "taskmanager.net.sendReceiveBufferSize" (documentation is in progress; see org.apache.flink.runtime.io.network.netty.NettyConfig). Your nodes have 100+ GB of memory so a conservative assignment might be a gigabyte of network buffers. Then add the following to the conf, restart the cluster, start jconsole on a TaskManager, connect to the TaskManager process, and on the MBeans tab look under org.apache.flink.metrics for Network.AvailableMemorySegments. metrics.reporters: my_jmx_reporter metrics.reporter.my_jmx_reporter.class: org.apache.flink.metrics.jmx.JMXReporter metrics.reporter.my_jmx_reporter.port: 9020-9040 On Mon, Sep 19, 2016 at 3:54 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I > followed that formula :-)))I can bump it up to twice as much like what the > example is doing to for instance 300 MiB.Is this reasonable? what do you > suggest as a reasonable range?Thanks Greg > > From: Greg Hogan <c...@greghogan.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Monday, September 19, 2016 12:43 PM > Subject: Re: Performance and Latency Chart for Flink > > You will need to add the configuration parameters to your flink-conf.yaml. > I believe the intent is that all configuration parameters should be listed > at > > https://ci.apache.org/projects/flink/flink-docs- > master/setup/config.html#full-reference > > My understanding is that the Flink buffers are currently copied to Netty > buffers, although I don't understand the stated memory doubling. > > > On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari < > amirto...@yahoo.com.invalid> wrote: > > > Hi Greg,In the same Flink config link below, there are parameters that > > dont even exist in flink-conf.yaml.Are they defined somewhere else?I > > grepped the followings & none existed in any of the files under conf > > folder."taskmanager.memory.fraction", taskmanager.memory.off > > -heap, taskmanager.memory.segment-size & many more. > > Also, isnt the example calculating the network buffers wrong? Based on > the > > example, roughly 5000 buffers x 32KiB = 16 KiB should be > > allocated.16 KiB divided by 1024 = 156.25 MiB. Why is the example > > saying "the system would allocate roughly 300 MiBytes for network > buffers." > > ?Thats roughly twice as much. Am i Missing something here?I still need > your > > help to set the accurate number for my > >- taskmanager.network.numberOfBuffers = 4096. > > > > Thanks for your response Greg.Amir- From: amir bahmanyari < > > amirto...@yahoo.com> > > To: "dev@flink.apache.org" <dev@flink.apache.org> > > Sent: Monday, September 19, 2016 10:34 AM > > Subject: Re: Performance and Latency Chart for Flink > > > > Hi Greg,I used this guideline to calculate "taskmanager.network. > numberOfBuffers":Apache > > Flink 1.2-SNAPSHOT Documentation: Configuration > > > > > > | > > | > > | > > | || > > > > | > > > > | > > | > > | | > > Apache Flink 1.2-SNAPSHOT Documentation: Configuration > >| | > > > > | > > > > | > > > > > > > > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 > > is there in the formula.What would you set it to? Once I have that > number, > > I will set "taskmanager.memory.preallocate" to true & will give it > > another shot.Thanks Greg > > > > From: Greg Hogan <c...@greghogan.com> > > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > > Sent: Monday, September 19, 2016 8:29 AM > > Subject: Re: Performance and Latency Chart for Flink > > > > Hi Amir, > > > > You may see improved performance setting "taskmanager.memory. > preallocate: > > true" in order to use off-heap memory. > > > > Also, your number of buffers looks quite low and you may want to increase > > "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 > > MiB. > > > > As th
Re: Performance and Latency Chart for Flink
Excellent! On Mon, Sep 19, 2016 at 3:43 PM, Chesnay Schepler <ches...@apache.org> wrote: > It is normal that you don't see it in the WebInterface. > > FLINK-4389 was only about exposing metrics *to* the WebInterface, not > exposing them *from* it. > > Essentially, a metric travels from TaskManager -> WebInterface -> User. > FLINK-4389 was about the first arrow, which is a prerequisite step for the > second one. > > Regards, > Chesnay > > > On 19.09.2016 21:35, Greg Hogan wrote: > >> The nightly snapshots now include "[FLINK-4389] Expose metrics to >> WebFrontend": >>https://flink.apache.org/contribute-code.html#snapshots-nightly-builds >> >> For 1.2 we have metrics for "AvailableMemorySegments" and >> "TotalMemorySegments": >> >> https://ci.apache.org/projects/flink/flink-docs-master/ >> monitoring/metrics.html#list-of-all-variables >> >> However, when I download the snapshot and start a cluster with the default >> configuration I am not seeing a value for this metric in the web UI. >> >> An alternative is to configure the JMX reporter in flink-conf.yaml: >> >> metrics.reporters: jmx_reporter >> metrics.reporter.jmx_reporter.class: >> org.apache.flink.metrics.jmx.JMXReporter >> metrics.reporter.jmx_reporter.port: 9020 >> >> You can then monitor the system for the number of used memory segments. >> Let >> us know what you discover! >> >> On Mon, Sep 19, 2016 at 1:34 PM, amir bahmanyari < >> amirto...@yahoo.com.invalid> wrote: >> >> Hi Greg,I used this guideline to calculate "taskmanager. >>> network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation: >>> Configuration >>> >>> >>> | >>> | >>> | >>> | || >>> >>> | >>> >>>| >>> | >>> | | >>> Apache Flink 1.2-SNAPSHOT Documentation: Configuration >>> | | >>> >>>| >>> >>>| >>> >>> >>> >>> 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 >>> is there in the formula.What would you set it to? Once I have that >>> number, >>> I will set "taskmanager.memory.preallocate" to true & will give it >>> another shot.Thanks Greg >>> >>>From: Greg Hogan <c...@greghogan.com> >>> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> >>> Sent: Monday, September 19, 2016 8:29 AM >>> Subject: Re: Performance and Latency Chart for Flink >>> >>> Hi Amir, >>> >>> You may see improved performance setting "taskmanager.memory.preallocat >>> e: >>> true" in order to use off-heap memory. >>> >>> Also, your number of buffers looks quite low and you may want to increase >>> "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 >>> MiB. >>> >>> As this is a only benchmark are you able to post the code to github to >>> solicit feedback? >>> >>> Greg >>> >>> On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < >>> amirto...@yahoo.com.invalid> wrote: >>> >>> I have new findings & subsequently relative improvements.Am testing as we >>>> speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had >>>> keep state somewhere. I went with Redis. I found it to be a major bottle >>>> neck as Beam nodes constantly are going across NW to update its >>>> repository.So I replaced Redis with Java Concurrenthashmaps. Must >>>> faster. >>>> Then Kafka went out of disk space and the replication manager >>>> complained. So I clustered the two Kafka nodes hoping for sharing space. >>>> >>> As >>> >>>> of this second I am typing this email, its sustaining but only 1/2 of >>>> the 201401969 tuples have been processed after 3.5 hours.According to >>>> >>> the >>> >>>> Linear Road benchmarking expectations, if your system is working well, >>>> >>> this >>> >>>> whole 201401969 tuples must be done in 3.5 hrs max.So this means there >>>> >>> is >>> >>>> still room for tuning Flink nodes. I have already shared with you all >>>> >>> more >>> >>>> details about my config.It run perfe
Re: Performance and Latency Chart for Flink
You will need to add the configuration parameters to your flink-conf.yaml. I believe the intent is that all configuration parameters should be listed at https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#full-reference My understanding is that the Flink buffers are currently copied to Netty buffers, although I don't understand the stated memory doubling. On Mon, Sep 19, 2016 at 3:08 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > Hi Greg,In the same Flink config link below, there are parameters that > dont even exist in flink-conf.yaml.Are they defined somewhere else?I > grepped the followings & none existed in any of the files under conf > folder."taskmanager.memory.fraction", taskmanager.memory.off > -heap, taskmanager.memory.segment-size & many more. > Also, isnt the example calculating the network buffers wrong? Based on the > example, roughly 5000 buffers x 32KiB = 16 KiB should be > allocated.16 KiB divided by 1024 = 156.25 MiB. Why is the example > saying "the system would allocate roughly 300 MiBytes for network buffers." > ?Thats roughly twice as much. Am i Missing something here?I still need your > help to set the accurate number for my >- taskmanager.network.numberOfBuffers = 4096. > > Thanks for your response Greg.Amir- From: amir bahmanyari < > amirto...@yahoo.com> > To: "dev@flink.apache.org" <dev@flink.apache.org> > Sent: Monday, September 19, 2016 10:34 AM > Subject: Re: Performance and Latency Chart for Flink > > Hi Greg,I used this guideline to calculate > "taskmanager.network.numberOfBuffers":Apache > Flink 1.2-SNAPSHOT Documentation: Configuration > > > | > | > | > | || > > | > > | > | > | | > Apache Flink 1.2-SNAPSHOT Documentation: Configuration >| | > > | > > | > > > > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 > is there in the formula.What would you set it to? Once I have that number, > I will set "taskmanager.memory.preallocate" to true & will give it > another shot.Thanks Greg > > From: Greg Hogan <c...@greghogan.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Monday, September 19, 2016 8:29 AM > Subject: Re: Performance and Latency Chart for Flink > > Hi Amir, > > You may see improved performance setting "taskmanager.memory.preallocate: > true" in order to use off-heap memory. > > Also, your number of buffers looks quite low and you may want to increase > "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 > MiB. > > As this is a only benchmark are you able to post the code to github to > solicit feedback? > > Greg > > On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < > amirto...@yahoo.com.invalid> wrote: > > > I have new findings & subsequently relative improvements.Am testing as we > > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had > > keep state somewhere. I went with Redis. I found it to be a major bottle > > neck as Beam nodes constantly are going across NW to update its > > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. > > Then Kafka went out of disk space and the replication manager > > complained. So I clustered the two Kafka nodes hoping for sharing space. > As > > of this second I am typing this email, its sustaining but only 1/2 of > > the 201401969 tuples have been processed after 3.5 hours.According to > the > > Linear Road benchmarking expectations, if your system is working well, > this > > whole 201401969 tuples must be done in 3.5 hrs max.So this means there > is > > still room for tuning Flink nodes. I have already shared with you all > more > > details about my config.It run perfect yesterday with almost 1/10th of > this > > load. Perfect real-time send/processed streaming behavior.If thats the > case > > & I cannot get better performance with FlinkRunner, my nest stop is > > SparkRunner and repeat of the whole thing for final benchmarking of the > two > > under Beam APIs.Which was the initial intent anyways.If you have > > suggestions to make improvements in the above case, I am all ears & > greatly > > appreciate it.Cheers,Amir- > > > > From: "Chawla,Sumit" <sumitkcha...@gmail.com> > > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > > Sent: Sunday, September 18, 2016 2:07 PM > > Subject: Re: Performance and Latency Chart for Flink > > > > Has anyone else run th
Re: Performance and Latency Chart for Flink
It is normal that you don't see it in the WebInterface. FLINK-4389 was only about exposing metrics *to* the WebInterface, not exposing them *from* it. Essentially, a metric travels from TaskManager -> WebInterface -> User. FLINK-4389 was about the first arrow, which is a prerequisite step for the second one. Regards, Chesnay On 19.09.2016 21:35, Greg Hogan wrote: The nightly snapshots now include "[FLINK-4389] Expose metrics to WebFrontend": https://flink.apache.org/contribute-code.html#snapshots-nightly-builds For 1.2 we have metrics for "AvailableMemorySegments" and "TotalMemorySegments": https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#list-of-all-variables However, when I download the snapshot and start a cluster with the default configuration I am not seeing a value for this metric in the web UI. An alternative is to configure the JMX reporter in flink-conf.yaml: metrics.reporters: jmx_reporter metrics.reporter.jmx_reporter.class: org.apache.flink.metrics.jmx.JMXReporter metrics.reporter.jmx_reporter.port: 9020 You can then monitor the system for the number of used memory segments. Let us know what you discover! On Mon, Sep 19, 2016 at 1:34 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: Hi Greg,I used this guideline to calculate "taskmanager. network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation: Configuration | | | | || | | | | | Apache Flink 1.2-SNAPSHOT Documentation: Configuration | | | | 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 is there in the formula.What would you set it to? Once I have that number, I will set "taskmanager.memory.preallocate" to true & will give it another shot.Thanks Greg From: Greg Hogan <c...@greghogan.com> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Monday, September 19, 2016 8:29 AM Subject: Re: Performance and Latency Chart for Flink Hi Amir, You may see improved performance setting "taskmanager.memory.preallocate: true" in order to use off-heap memory. Also, your number of buffers looks quite low and you may want to increase "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB. As this is a only benchmark are you able to post the code to github to solicit feedback? Greg On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: I have new findings & subsequently relative improvements.Am testing as we speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had keep state somewhere. I went with Redis. I found it to be a major bottle neck as Beam nodes constantly are going across NW to update its repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. Then Kafka went out of disk space and the replication manager complained. So I clustered the two Kafka nodes hoping for sharing space. As of this second I am typing this email, its sustaining but only 1/2 of the 201401969 tuples have been processed after 3.5 hours.According to the Linear Road benchmarking expectations, if your system is working well, this whole 201401969 tuples must be done in 3.5 hrs max.So this means there is still room for tuning Flink nodes. I have already shared with you all more details about my config.It run perfect yesterday with almost 1/10th of this load. Perfect real-time send/processed streaming behavior.If thats the case & I cannot get better performance with FlinkRunner, my nest stop is SparkRunner and repeat of the whole thing for final benchmarking of the two under Beam APIs.Which was the initial intent anyways.If you have suggestions to make improvements in the above case, I am all ears & greatly appreciate it.Cheers,Amir- From: "Chawla,Sumit" <sumitkcha...@gmail.com> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Sunday, September 18, 2016 2:07 PM Subject: Re: Performance and Latency Chart for Flink Has anyone else run these kind of benchmarks? Would love to hear more people'e experience and details about those benchmarks. Regards Sumit Chawla On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> wrote: Hi Amir Would it be possible for you to share the numbers? Also share if possible your configuration details. Regards Sumit Chawla On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: Hi Fabian,FYI. This is report on other engines we did the same type of bench-marking.Also explains what Linear Road bench-marking is.Thanks for your help. http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- linear-road-benchmark https://github.com/IBMStreams/benchmarks https://www.datatorrent.com/blog/blog-implementing-linear-ro ad-benchmark-in-apex/
Re: Performance and Latency Chart for Flink
Hi Greg,In the same Flink config link below, there are parameters that dont even exist in flink-conf.yaml.Are they defined somewhere else?I grepped the followings & none existed in any of the files under conf folder."taskmanager.memory.fraction", taskmanager.memory.off-heap, taskmanager.memory.segment-size & many more. Also, isnt the example calculating the network buffers wrong? Based on the example, roughly 5000 buffers x 32KiB = 16 KiB should be allocated.16 KiB divided by 1024 = 156.25 MiB. Why is the example saying "the system would allocate roughly 300 MiBytes for network buffers." ?Thats roughly twice as much. Am i Missing something here?I still need your help to set the accurate number for my - taskmanager.network.numberOfBuffers = 4096. Thanks for your response Greg.Amir- From: amir bahmanyari <amirto...@yahoo.com> To: "dev@flink.apache.org" <dev@flink.apache.org> Sent: Monday, September 19, 2016 10:34 AM Subject: Re: Performance and Latency Chart for Flink Hi Greg,I used this guideline to calculate "taskmanager.network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation: Configuration | | | | || | | | | | Apache Flink 1.2-SNAPSHOT Documentation: Configuration | | | | 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 is there in the formula.What would you set it to? Once I have that number, I will set "taskmanager.memory.preallocate" to true & will give it another shot.Thanks Greg From: Greg Hogan <c...@greghogan.com> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Monday, September 19, 2016 8:29 AM Subject: Re: Performance and Latency Chart for Flink Hi Amir, You may see improved performance setting "taskmanager.memory.preallocate: true" in order to use off-heap memory. Also, your number of buffers looks quite low and you may want to increase "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB. As this is a only benchmark are you able to post the code to github to solicit feedback? Greg On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > I have new findings & subsequently relative improvements.Am testing as we > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had > keep state somewhere. I went with Redis. I found it to be a major bottle > neck as Beam nodes constantly are going across NW to update its > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. > Then Kafka went out of disk space and the replication manager > complained. So I clustered the two Kafka nodes hoping for sharing space. As > of this second I am typing this email, its sustaining but only 1/2 of > the 201401969 tuples have been processed after 3.5 hours.According to the > Linear Road benchmarking expectations, if your system is working well, this > whole 201401969 tuples must be done in 3.5 hrs max.So this means there is > still room for tuning Flink nodes. I have already shared with you all more > details about my config.It run perfect yesterday with almost 1/10th of this > load. Perfect real-time send/processed streaming behavior.If thats the case > & I cannot get better performance with FlinkRunner, my nest stop is > SparkRunner and repeat of the whole thing for final benchmarking of the two > under Beam APIs.Which was the initial intent anyways.If you have > suggestions to make improvements in the above case, I am all ears & greatly > appreciate it.Cheers,Amir- > > From: "Chawla,Sumit" <sumitkcha...@gmail.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Sunday, September 18, 2016 2:07 PM > Subject: Re: Performance and Latency Chart for Flink > > Has anyone else run these kind of benchmarks? Would love to hear more > people'e experience and details about those benchmarks. > > Regards > Sumit Chawla > > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> > wrote: > > > Hi Amir > > > > Would it be possible for you to share the numbers? Also share if possible > > your configuration details. > > > > Regards > > Sumit Chawla > > > > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < > > amirto...@yahoo.com.invalid> wrote: > > > >> Hi Fabian,FYI. This is report on other engines we did the same type of > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for > >> your help. > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- > >> linear-road-benchmark > >> https://github.com/IBMStrea
Re: Performance and Latency Chart for Flink
Hi Greg,I used this guideline to calculate "taskmanager.network.numberOfBuffers":Apache Flink 1.2-SNAPSHOT Documentation: Configuration | | | | || | | | | | Apache Flink 1.2-SNAPSHOT Documentation: Configuration | | | | 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 is there in the formula.What would you set it to? Once I have that number, I will set "taskmanager.memory.preallocate" to true & will give it another shot.Thanks Greg From: Greg Hogan <c...@greghogan.com> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Monday, September 19, 2016 8:29 AM Subject: Re: Performance and Latency Chart for Flink Hi Amir, You may see improved performance setting "taskmanager.memory.preallocate: true" in order to use off-heap memory. Also, your number of buffers looks quite low and you may want to increase "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB. As this is a only benchmark are you able to post the code to github to solicit feedback? Greg On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > I have new findings & subsequently relative improvements.Am testing as we > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had > keep state somewhere. I went with Redis. I found it to be a major bottle > neck as Beam nodes constantly are going across NW to update its > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. > Then Kafka went out of disk space and the replication manager > complained. So I clustered the two Kafka nodes hoping for sharing space. As > of this second I am typing this email, its sustaining but only 1/2 of > the 201401969 tuples have been processed after 3.5 hours.According to the > Linear Road benchmarking expectations, if your system is working well, this > whole 201401969 tuples must be done in 3.5 hrs max.So this means there is > still room for tuning Flink nodes. I have already shared with you all more > details about my config.It run perfect yesterday with almost 1/10th of this > load. Perfect real-time send/processed streaming behavior.If thats the case > & I cannot get better performance with FlinkRunner, my nest stop is > SparkRunner and repeat of the whole thing for final benchmarking of the two > under Beam APIs.Which was the initial intent anyways.If you have > suggestions to make improvements in the above case, I am all ears & greatly > appreciate it.Cheers,Amir- > > From: "Chawla,Sumit" <sumitkcha...@gmail.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Sunday, September 18, 2016 2:07 PM > Subject: Re: Performance and Latency Chart for Flink > > Has anyone else run these kind of benchmarks? Would love to hear more > people'e experience and details about those benchmarks. > > Regards > Sumit Chawla > > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> > wrote: > > > Hi Amir > > > > Would it be possible for you to share the numbers? Also share if possible > > your configuration details. > > > > Regards > > Sumit Chawla > > > > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < > > amirto...@yahoo.com.invalid> wrote: > > > >> Hi Fabian,FYI. This is report on other engines we did the same type of > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for > >> your help. > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- > >> linear-road-benchmark > >> https://github.com/IBMStreams/benchmarks > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro > >> ad-benchmark-in-apex/ > >> > >> > >> From: Fabian Hueske <fhue...@gmail.com> > >> To: "dev@flink.apache.org" <dev@flink.apache.org> > >> Sent: Friday, September 16, 2016 12:31 AM > >> Subject: Re: Performance and Latency Chart for Flink > >> > >> Hi, > >> > >> I am not aware of periodic performance runs for the Flink releases. > >> I know a few benchmarks which have been published at different points in > >> time like [1], [2], and [3] (you'll probably find more). > >> > >> In general, fair benchmarks that compare different systems (if there is > >> such thing) are very difficult and the results often depend on the use > >> case. > >> IMO the best option is to run your own benchmarks, if you have a > concrete > >> use case. > >> > >> Best, F
Re: Performance and Latency Chart for Flink
Hi Amir, You may see improved performance setting "taskmanager.memory.preallocate: true" in order to use off-heap memory. Also, your number of buffers looks quite low and you may want to increase "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB. As this is a only benchmark are you able to post the code to github to solicit feedback? Greg On Sun, Sep 18, 2016 at 9:00 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > I have new findings & subsequently relative improvements.Am testing as we > speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had > keep state somewhere. I went with Redis. I found it to be a major bottle > neck as Beam nodes constantly are going across NW to update its > repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. > Then Kafka went out of disk space and the replication manager > complained. So I clustered the two Kafka nodes hoping for sharing space. As > of this second I am typing this email, its sustaining but only 1/2 of > the 201401969 tuples have been processed after 3.5 hours.According to the > Linear Road benchmarking expectations, if your system is working well, this > whole 201401969 tuples must be done in 3.5 hrs max.So this means there is > still room for tuning Flink nodes. I have already shared with you all more > details about my config.It run perfect yesterday with almost 1/10th of this > load. Perfect real-time send/processed streaming behavior.If thats the case > & I cannot get better performance with FlinkRunner, my nest stop is > SparkRunner and repeat of the whole thing for final benchmarking of the two > under Beam APIs.Which was the initial intent anyways.If you have > suggestions to make improvements in the above case, I am all ears & greatly > appreciate it.Cheers,Amir- > > From: "Chawla,Sumit" <sumitkcha...@gmail.com> > To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> > Sent: Sunday, September 18, 2016 2:07 PM > Subject: Re: Performance and Latency Chart for Flink > > Has anyone else run these kind of benchmarks? Would love to hear more > people'e experience and details about those benchmarks. > > Regards > Sumit Chawla > > > On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> > wrote: > > > Hi Amir > > > > Would it be possible for you to share the numbers? Also share if possible > > your configuration details. > > > > Regards > > Sumit Chawla > > > > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < > > amirto...@yahoo.com.invalid> wrote: > > > >> Hi Fabian,FYI. This is report on other engines we did the same type of > >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for > >> your help. > >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- > >> linear-road-benchmark > >> https://github.com/IBMStreams/benchmarks > >> https://www.datatorrent.com/blog/blog-implementing-linear-ro > >> ad-benchmark-in-apex/ > >> > >> > >> From: Fabian Hueske <fhue...@gmail.com> > >> To: "dev@flink.apache.org" <dev@flink.apache.org> > >> Sent: Friday, September 16, 2016 12:31 AM > >> Subject: Re: Performance and Latency Chart for Flink > >> > >> Hi, > >> > >> I am not aware of periodic performance runs for the Flink releases. > >> I know a few benchmarks which have been published at different points in > >> time like [1], [2], and [3] (you'll probably find more). > >> > >> In general, fair benchmarks that compare different systems (if there is > >> such thing) are very difficult and the results often depend on the use > >> case. > >> IMO the best option is to run your own benchmarks, if you have a > concrete > >> use case. > >> > >> Best, Fabian > >> > >> [1] 08/2015: > >> http://data-artisans.com/high-throughput-low-latency-and-exa > >> ctly-once-stream-processing-with-apache-flink/ > >> [2] 12/2015: > >> https://yahooeng.tumblr.com/post/135321837876/benchmarking- > >> streaming-computation-engines-at > >> [3] 02/2016: > >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ > >> > >> > >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: > >> > >> > Hi > >> > > >> > Is there any performance run that is done for each Flink release? Or > you > >> > are aware of any third party evaluation of performance metrics for > >> Flink? > >> > I am interested in seeing how performance has improved over release to > >> > release, and performance vs other competitors. > >> > > >> > Regards > >> > Sumit Chawla > >> > > >> > >> > >> > >> > > > > > > > >
Re: Performance and Latency Chart for Flink
I have new findings & subsequently relative improvements.Am testing as we speak. 4 Beam server nodes , Azure A11 & 2 Kafka nodes same config.I had keep state somewhere. I went with Redis. I found it to be a major bottle neck as Beam nodes constantly are going across NW to update its repository.So I replaced Redis with Java Concurrenthashmaps. Must faster. Then Kafka went out of disk space and the replication manager complained. So I clustered the two Kafka nodes hoping for sharing space. As of this second I am typing this email, its sustaining but only 1/2 of the 201401969 tuples have been processed after 3.5 hours.According to the Linear Road benchmarking expectations, if your system is working well, this whole 201401969 tuples must be done in 3.5 hrs max.So this means there is still room for tuning Flink nodes. I have already shared with you all more details about my config.It run perfect yesterday with almost 1/10th of this load. Perfect real-time send/processed streaming behavior.If thats the case & I cannot get better performance with FlinkRunner, my nest stop is SparkRunner and repeat of the whole thing for final benchmarking of the two under Beam APIs.Which was the initial intent anyways.If you have suggestions to make improvements in the above case, I am all ears & greatly appreciate it.Cheers,Amir- From: "Chawla,Sumit" <sumitkcha...@gmail.com> To: dev@flink.apache.org; amir bahmanyari <amirto...@yahoo.com> Sent: Sunday, September 18, 2016 2:07 PM Subject: Re: Performance and Latency Chart for Flink Has anyone else run these kind of benchmarks? Would love to hear more people'e experience and details about those benchmarks. Regards Sumit Chawla On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> wrote: > Hi Amir > > Would it be possible for you to share the numbers? Also share if possible > your configuration details. > > Regards > Sumit Chawla > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < > amirto...@yahoo.com.invalid> wrote: > >> Hi Fabian,FYI. This is report on other engines we did the same type of >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for >> your help. >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- >> linear-road-benchmark >> https://github.com/IBMStreams/benchmarks >> https://www.datatorrent.com/blog/blog-implementing-linear-ro >> ad-benchmark-in-apex/ >> >> >> From: Fabian Hueske <fhue...@gmail.com> >> To: "dev@flink.apache.org" <dev@flink.apache.org> >> Sent: Friday, September 16, 2016 12:31 AM >> Subject: Re: Performance and Latency Chart for Flink >> >> Hi, >> >> I am not aware of periodic performance runs for the Flink releases. >> I know a few benchmarks which have been published at different points in >> time like [1], [2], and [3] (you'll probably find more). >> >> In general, fair benchmarks that compare different systems (if there is >> such thing) are very difficult and the results often depend on the use >> case. >> IMO the best option is to run your own benchmarks, if you have a concrete >> use case. >> >> Best, Fabian >> >> [1] 08/2015: >> http://data-artisans.com/high-throughput-low-latency-and-exa >> ctly-once-stream-processing-with-apache-flink/ >> [2] 12/2015: >> https://yahooeng.tumblr.com/post/135321837876/benchmarking- >> streaming-computation-engines-at >> [3] 02/2016: >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ >> >> >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: >> >> > Hi >> > >> > Is there any performance run that is done for each Flink release? Or you >> > are aware of any third party evaluation of performance metrics for >> Flink? >> > I am interested in seeing how performance has improved over release to >> > release, and performance vs other competitors. >> > >> > Regards >> > Sumit Chawla >> > >> >> >> >> > >
Re: Performance and Latency Chart for Flink
Has anyone else run these kind of benchmarks? Would love to hear more people'e experience and details about those benchmarks. Regards Sumit Chawla On Sun, Sep 18, 2016 at 2:01 PM, Chawla,Sumit <sumitkcha...@gmail.com> wrote: > Hi Amir > > Would it be possible for you to share the numbers? Also share if possible > your configuration details. > > Regards > Sumit Chawla > > > On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < > amirto...@yahoo.com.invalid> wrote: > >> Hi Fabian,FYI. This is report on other engines we did the same type of >> bench-marking.Also explains what Linear Road bench-marking is.Thanks for >> your help. >> http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the- >> linear-road-benchmark >> https://github.com/IBMStreams/benchmarks >> https://www.datatorrent.com/blog/blog-implementing-linear-ro >> ad-benchmark-in-apex/ >> >> >> From: Fabian Hueske <fhue...@gmail.com> >> To: "dev@flink.apache.org" <dev@flink.apache.org> >> Sent: Friday, September 16, 2016 12:31 AM >> Subject: Re: Performance and Latency Chart for Flink >> >> Hi, >> >> I am not aware of periodic performance runs for the Flink releases. >> I know a few benchmarks which have been published at different points in >> time like [1], [2], and [3] (you'll probably find more). >> >> In general, fair benchmarks that compare different systems (if there is >> such thing) are very difficult and the results often depend on the use >> case. >> IMO the best option is to run your own benchmarks, if you have a concrete >> use case. >> >> Best, Fabian >> >> [1] 08/2015: >> http://data-artisans.com/high-throughput-low-latency-and-exa >> ctly-once-stream-processing-with-apache-flink/ >> [2] 12/2015: >> https://yahooeng.tumblr.com/post/135321837876/benchmarking- >> streaming-computation-engines-at >> [3] 02/2016: >> http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ >> >> >> 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: >> >> > Hi >> > >> > Is there any performance run that is done for each Flink release? Or you >> > are aware of any third party evaluation of performance metrics for >> Flink? >> > I am interested in seeing how performance has improved over release to >> > release, and performance vs other competitors. >> > >> > Regards >> > Sumit Chawla >> > >> >> >> >> > >
Re: Performance and Latency Chart for Flink
Hi Amir Would it be possible for you to share the numbers? Also share if possible your configuration details. Regards Sumit Chawla On Fri, Sep 16, 2016 at 12:18 PM, amir bahmanyari < amirto...@yahoo.com.invalid> wrote: > Hi Fabian,FYI. This is report on other engines we did the same type of > bench-marking.Also explains what Linear Road bench-marking is.Thanks for > your help. > http://www.slideshare.net/RedisLabs/walmart-ibm-revisit- > the-linear-road-benchmark > https://github.com/IBMStreams/benchmarks > https://www.datatorrent.com/blog/blog-implementing-linear- > road-benchmark-in-apex/ > > > From: Fabian Hueske <fhue...@gmail.com> > To: "dev@flink.apache.org" <dev@flink.apache.org> > Sent: Friday, September 16, 2016 12:31 AM > Subject: Re: Performance and Latency Chart for Flink > > Hi, > > I am not aware of periodic performance runs for the Flink releases. > I know a few benchmarks which have been published at different points in > time like [1], [2], and [3] (you'll probably find more). > > In general, fair benchmarks that compare different systems (if there is > such thing) are very difficult and the results often depend on the use > case. > IMO the best option is to run your own benchmarks, if you have a concrete > use case. > > Best, Fabian > > [1] 08/2015: > http://data-artisans.com/high-throughput-low-latency-and- > exactly-once-stream-processing-with-apache-flink/ > [2] 12/2015: > https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming- > computation-engines-at > [3] 02/2016: > http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ > > > 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: > > > Hi > > > > Is there any performance run that is done for each Flink release? Or you > > are aware of any third party evaluation of performance metrics for Flink? > > I am interested in seeing how performance has improved over release to > > release, and performance vs other competitors. > > > > Regards > > Sumit Chawla > > > > > >
Re: Performance and Latency Chart for Flink
Hi Fabian,FYI. This is report on other engines we did the same type of bench-marking.Also explains what Linear Road bench-marking is.Thanks for your help. http://www.slideshare.net/RedisLabs/walmart-ibm-revisit-the-linear-road-benchmark https://github.com/IBMStreams/benchmarks https://www.datatorrent.com/blog/blog-implementing-linear-road-benchmark-in-apex/ From: Fabian Hueske <fhue...@gmail.com> To: "dev@flink.apache.org" <dev@flink.apache.org> Sent: Friday, September 16, 2016 12:31 AM Subject: Re: Performance and Latency Chart for Flink Hi, I am not aware of periodic performance runs for the Flink releases. I know a few benchmarks which have been published at different points in time like [1], [2], and [3] (you'll probably find more). In general, fair benchmarks that compare different systems (if there is such thing) are very difficult and the results often depend on the use case. IMO the best option is to run your own benchmarks, if you have a concrete use case. Best, Fabian [1] 08/2015: http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ [2] 12/2015: https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at [3] 02/2016: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: > Hi > > Is there any performance run that is done for each Flink release? Or you > are aware of any third party evaluation of performance metrics for Flink? > I am interested in seeing how performance has improved over release to > release, and performance vs other competitors. > > Regards > Sumit Chawla >
Re: Performance and Latency Chart for Flink
Hi Amir, it would be great if you could link to the details of your benchmark environment if you make such claims. Compared to which IBM system? Characteristics of your machines? Configuration of the software? Implementation code? etc. In general the Beam Runner also adds some overhead compared to native Flink jobs. There are many factors that could affect results. I don't know the Linear Road Benchmark but 150 times sounds unrealistic. Timo Am 16/09/16 um 10:02 schrieb amir bahmanyari: FYI, we, at a well known IT department, have been actively measuring Beam Flink Runner performance using MIT's Linear Road to stress the Flink Cluster servers.The results, thus far does not even come close to the previous streaming engines we have bench-marked.Our optimistic assumption was, when we started, that Beam runners (Flink for instance) will leave Storm & IBM in smoke.Wrong. What IBM managed to perform is 150 times better than Flink. Needless to mention Storm, and Hortonworks.As an example, IBM handled 150 expressways in 3.5 hours.In the same identical topology, everything fixed, Beam Flink Runner in a Flink Cluster handled 10 expressways in 17 hours at its best so far. I have followed every single performance tuning recommendation that is out there & none improved it even a bit.Works fine with 1 expressway. Sorry but thats our findings so far unless we are doing something wrong.I posted all details to this forum but never got any solid response that would make a difference in our observations.Therefore, we assume what we are seeing is the reality which we have to report to our superiors.Pls prove us wrong. We still have some time.Thanks.Amir- From: Fabian Hueske <fhue...@gmail.com> To: "dev@flink.apache.org" <dev@flink.apache.org> Sent: Friday, September 16, 2016 12:31 AM Subject: Re: Performance and Latency Chart for Flink Hi, I am not aware of periodic performance runs for the Flink releases. I know a few benchmarks which have been published at different points in time like [1], [2], and [3] (you'll probably find more). In general, fair benchmarks that compare different systems (if there is such thing) are very difficult and the results often depend on the use case. IMO the best option is to run your own benchmarks, if you have a concrete use case. Best, Fabian [1] 08/2015: http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ [2] 12/2015: https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at [3] 02/2016: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ 2016-09-16 5:54 GMT+02:00 Chawla,Sumit <sumitkcha...@gmail.com>: Hi Is there any performance run that is done for each Flink release? Or you are aware of any third party evaluation of performance metrics for Flink? I am interested in seeing how performance has improved over release to release, and performance vs other competitors. Regards Sumit Chawla -- Freundliche Grüße / Kind Regards Timo Walther Follow me: @twalthr https://www.linkedin.com/in/twalthr
Re: Performance and Latency Chart for Flink
Hi, I am not aware of periodic performance runs for the Flink releases. I know a few benchmarks which have been published at different points in time like [1], [2], and [3] (you'll probably find more). In general, fair benchmarks that compare different systems (if there is such thing) are very difficult and the results often depend on the use case. IMO the best option is to run your own benchmarks, if you have a concrete use case. Best, Fabian [1] 08/2015: http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ [2] 12/2015: https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at [3] 02/2016: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ 2016-09-16 5:54 GMT+02:00 Chawla,Sumit: > Hi > > Is there any performance run that is done for each Flink release? Or you > are aware of any third party evaluation of performance metrics for Flink? > I am interested in seeing how performance has improved over release to > release, and performance vs other competitors. > > Regards > Sumit Chawla >