Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
Thank you. Yuan Mei 于2021年3月4日周四 下午11:10写道: > Hey Yidan, > > KafkaShuffle is initially motivated to support shuffle data > materialization on Kafka, and started with a limited version supporting > hash-partition only. Watermark is maintained and forwarded as part of > shuffle data. So you are

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread Yuan Mei
Hey Yidan, KafkaShuffle is initially motivated to support shuffle data materialization on Kafka, and started with a limited version supporting hash-partition only. Watermark is maintained and forwarded as part of shuffle data. So you are right, watermark storing/forwarding logic has nothing to do

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
And do you know when kafka consumer/producer will be re implemented according to the new source/sink api? I am thinking whether I should adjust the code for now, since I need to re adjust the code when it is reconstructed to the new source/sink api. yidan zhao 于2021年3月4日周四 下午4:44写道: > I

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
I uploaded a picture to describe that. https://ftp.bmp.ovh/imgs/2021/03/2068f2e22045e696.png >

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
One more question, If I only need watermark's logic, not keyedStream, why not provide methods such as writeDataStream and readDataStream. It uses the similar methods for kafka producer sink records and broadcast watermark to partitions and then kafka consumers read it and regenerate the watermark.

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread Piotr Nowojski
Great :) Just one more note. Currently FlinkKafkaShuffle has a critical bug [1] that probably will prevent you from using it directly. I hope it will be fixed in some next release. In the meantime you can just inspire your solution with the source code. Best, Piotrek [1]

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread yidan zhao
Yes, you are right and thank you. I take a brief look at what FlinkKafkaShuffle is doing, it seems what I need and I will have a try. >

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread Piotr Nowojski
Hi, Can not you write the watermark as a special event to the "mid-topic"? In the "new job2" you would parse this event and use it to assign watermark before `xxxWindow2`? I believe this is what FlinkKafkaShuffle is doing [1], you could look at its code for inspiration. Piotrek [1]

how to propagate watermarks across multiple jobs

2021-03-01 Thread yidan zhao
I have a job which includes about 50+ tasks. I want to split it to multiple jobs, and the data is transferred through Kafka, but how about watermark? Is anyone have do something similar and solved this problem? Here I give an example: The original job: kafkaStream1(src-topic) => xxxProcess =>