If I have a KafkaInputOperator that is reading from 2 partitions and I
setup a stream to a one-to-one operator, how would I read the messages from
each Kafka partition, extract a particular field (say user id) and then
send all the keys to a parallel downstream operator so that the keys are
grouped together.
Something like this:
Messages coming in on kafka are json of the format {'user_id': 123,
'value': 'abc'} where the user_id is some number. The messages are not
partitioned on the Kafka topic according to the user_id.
In coming msgs: repartition by id
id=1, id=2 ==> K1 -> JsonToPojo -- --> PoJo(1), PoJo(3), PoJo(1)
\ /
X
/ \
id=1, id=4 ==> K2 -> JsonToPojo -- --> PoJo(4)
Where I transform the Json messages into a POJO like:
class PoJo {
int id;
String value;
...
}
Would 'X' need to be a combination of a StreamMerger and some kind of
custom partitioner, or could the JsonToPojo operator directly partition
it's output based on the id value?