Re: What is the best way to handle data skew processing in Data Stream applications?

2019-04-17 Thread Felipe Gutierrez
I guess I could implement a solution which is not static and extends the OneInputStreamOperator Flink operator. https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/MqttSensorDataSkewedCombinerByKeySkewedDAG.java#L84 Best, Felipe *--* *-- Feli

Re: What is the best way to handle data skew processing in Data Stream applications?

2019-04-11 Thread Felipe Gutierrez
thanks All for your suggestions! I am not sure if the option 3 that Fabian said I will need to change the Flink source code or it can be implemented on top of Flink. - 3) One approach to improve the processing of skewed data, is to change how keyed state is handled. Flink's

Re: What is the best way to handle data skew processing in Data Stream applications?

2019-04-11 Thread Till Rohrmann
Just a small addition: If two hot keys fall into two key groups which are being processed by the same TM, then it could help to change the parallelism, because then the key group mapping might be different. If two hot keys fall into the same key group, you can adjust the max parallelism which def

Re: What is the best way to handle data skew processing in Data Stream applications?

2019-04-11 Thread Fabian Hueske
Hi Felipe, three comments: 1) applying rebalance(), shuffle(), or rescale() before a keyBy() has no effect: keyBy() introduces a hash partitioning such that any data partitioning that you do immediately before keyBy() is destroyed. You only change the distribution for the call of the key extracto

What is the best way to handle data skew processing in Data Stream applications?

2019-04-10 Thread Felipe Gutierrez
Hi, I am studying data skew processing in Flink and how I can change the low-level control of physical partition ( https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#physical-partitioning) in order to have an even processing of tuples. I have created synthetic skewed data