Hi Niels,
I was more talking from a theoretical point of view.
Flink does not have a hook to inject a custom hash function (yet). I'm not
familiar with the details of the implementation to make an assessment
whether this would be possible or how much work it would be. However,
several users have
Hi Niels,
I think the biggest problem for keyed sources is that Flink must be able to
co-locate key-partitioned state with the pre-partitioned data.
This might work, if the key is the partition ID, i.e, not the original key
attribue that was hashed to assign events to partitions.
Flink could
Hi Niels,
If it’s only for simple data filtering that does not depend on the key, a
simple “flatMap” or “filter" directly after the source can be chained to the
source instances.
What that does is that the filter processing will be done within the same
thread as the one fetching data from a
Hi,
Ok. I think I get it.
WHAT IF:
Assume we create a addKeyedSource(...) which will allow us to add a source
that makes some guarantees about the data.
And assume this source returns simply the Kafka partition id as the result
of this 'hash' function.
Then if I have 10 kafka partitions I would
Hi Niels,
Thank you for bringing this up. I recall there was some previous discussion
related to this before: [1].
I don’t think this is possible at the moment, mainly because of how the API is
designed.
On the other hand, a KeyedStream in Flink is basically just a DataStream with a
hash
Hi,
In my scenario I have click stream data that I persist in Kafka.
I use the sessionId as the key to instruct Kafka to put everything with the
same sessionId into the same Kafka partition. That way I already have all
events of a visitor in a single kafka partition in a fixed order.
When I read