Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-31 Thread Kamil Dziublinski
yep I meant 120 per second :) On Fri, Mar 31, 2017 at 11:19 AM, Ted Yu wrote: > The 1,2million seems to be European notation. > > You meant 1.2 million, right ? > > On Mar 31, 2017, at 1:19 AM, Kamil Dziublinski < > kamil.dziublin...@gmail.com> wrote: > > Hi, > > Thanks

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-31 Thread Ted Yu
The 1,2million seems to be European notation. You meant 1.2 million, right ? > On Mar 31, 2017, at 1:19 AM, Kamil Dziublinski > wrote: > > Hi, > > Thanks for the tip man. I tried playing with this. > Was changing fetch.message.max.bytes (I still have 0.8 kafka)

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-31 Thread Kamil Dziublinski
Hi, Thanks for the tip man. I tried playing with this. Was changing fetch.message.max.bytes (I still have 0.8 kafka) and also socket.receive.buffer.bytes. With some optimal settings I was able to get to 1,2 million reads per second. So 50% increase. But that unfortunately does not increase when I

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Tzu-Li (Gordon) Tai
I'm wondering what I can tweak further to increase this. I was reading in this blog: https://data-artisans.com/extending-the-yahoo-streaming-benchmark/ about 3 millions per sec with only 20 partitions. So i'm sure I should be able to squeeze out more out of it. Not really sure if it is relevant

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Kamil Dziublinski
Thanks Ted, will read about it. While we are on throughput. Do you guys have any suggestion on how to optimise kafka reading from flink? In my current setup: Flink is on 15 machines on yarn Kafka on 9 brokers with 40 partitions. Source parallelism is 40 for flink, And just for testing I left only

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Ted Yu
Kamil: In the upcoming hbase 2.0 release, there are more write path optimizations which would boost write performance further. FYI > On Mar 30, 2017, at 1:07 AM, Kamil Dziublinski > wrote: > > Hey guys, > > Sorry for confusion it turned out that I had a bug in

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Kamil Dziublinski
Hey guys, Sorry for confusion it turned out that I had a bug in my code, when I was not clearing this list in my batch object on each apply call. Forgot it has to be added since its different than fold. Which led to so high throughput. When I fixed this I was back to 160k per sec. I'm still

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-29 Thread Timo Walther
Hi Kamil, the performance implications might be the result of which state the underlying functions are using internally. WindowFunctions use ListState or ReducingState, fold() uses FoldingState. It also depends on the size of your state and the state backend you are using. I recommend the

20 times higher throughput with Window function vs fold function, intended?

2017-03-29 Thread Kamil Dziublinski
Hi guys, I’m using flink on production in Mapp. We recently swapped from storm. Before I have put this live I was doing performance tests and I found something that “feels” a bit off. I have a simple streaming job reading from kafka, doing window for 3 seconds and then storing into hbase.