yes. It will be very welcome a discussion with who knows better than me.
Basically, I am trying to implement the issue FLINK-1725 [1] that was gave
up on March 2017. Stephan Ewen said that there are more issues to be fixed
before going to this implementation and I don't really know which are
Wow, that's really cool! There are indeed a lot works you have done. IMO
it's beyond the scope of user group somewhat.
Just one small concern, I'm not sure I have fully understood your way of
"tackle data skew by altering the way Flink partition keys using
KeyedStream".
>From my understanding,
I`ve implemented a combiner [1] in Flink by extending
OneInputStreamOperator in Flink. I call my operator using "transform".
It works well and I guess it is useful if I import this operator in the
DataStream.java. I just need more to check if I need to touch other parts
of the source code.
But
Hi Felipe,
If I understand correctly, you want to solve data skew caused by imbalanced
key?
There is a common strategy to solve this kind of problem, pre-aggregation.
Like combiner of MapReduce.
But sadly, AFAIK Flink does not support pre-aggregation currently. I'm
afraid you have to implement
thanks Biao,
I see. To achieve what I want to do I need to work with KeyedStream. I
downloaded the Flink source code to learn and alter the KeyedStream to my
needs. I am not sure but it is a lot of work because as far as I understood
the key-groups have to be predictable [1]. and altering this
Hi Felipe,
Flink job graph is DAG based. It seems that you set an "edge property"
(partitioner) several times.
Flink does not support multiple partitioners on one edge. The later one
overrides the priors. That means the "keyBy" overrides the "rebalance" and
"partitionByPartial".
You could insert