Thank you.
Yuan Mei 于2021年3月4日周四 下午11:10写道:
> Hey Yidan,
>
> KafkaShuffle is initially motivated to support shuffle data
> materialization on Kafka, and started with a limited version supporting
> hash-partition only. Watermark is maintained and forwarded as part of
> shuffle data. So you are
Hey Yidan,
KafkaShuffle is initially motivated to support shuffle data materialization
on Kafka, and started with a limited version supporting hash-partition
only. Watermark is maintained and forwarded as part of shuffle data. So you
are right, watermark storing/forwarding logic has nothing to do
And do you know when kafka consumer/producer will be re implemented
according to the new source/sink api? I am thinking whether I should adjust
the code for now, since I need to re adjust the code when it is
reconstructed to the new source/sink api.
yidan zhao 于2021年3月4日周四 下午4:44写道:
> I
I uploaded a picture to describe that.
https://ftp.bmp.ovh/imgs/2021/03/2068f2e22045e696.png
>
One more question, If I only need watermark's logic, not keyedStream, why
not provide methods such as writeDataStream and readDataStream. It uses the
similar methods for kafka producer sink records and broadcast watermark to
partitions and then kafka consumers read it and regenerate the watermark.
Great :)
Just one more note. Currently FlinkKafkaShuffle has a critical bug [1] that
probably will prevent you from using it directly. I hope it will be fixed
in some next release. In the meantime you can just inspire your solution
with the source code.
Best,
Piotrek
[1]
Yes, you are right and thank you. I take a brief look at what
FlinkKafkaShuffle is doing, it seems what I need and I will have a try.
>
Hi,
Can not you write the watermark as a special event to the "mid-topic"? In
the "new job2" you would parse this event and use it to assign watermark
before `xxxWindow2`? I believe this is what FlinkKafkaShuffle is doing [1],
you could look at its code for inspiration.
Piotrek
[1]
I have a job which includes about 50+ tasks. I want to split it to multiple
jobs, and the data is transferred through Kafka, but how about watermark?
Is anyone have do something similar and solved this problem?
Here I give an example:
The original job: kafkaStream1(src-topic) => xxxProcess =>