Hi Mason,

Sorry for my late response!
> quote_type
> “there was no logic to filter/remove splits”

Yes we indeed miss a split removal mechanism. Actually this is quite a tricky 
one considering exactly-once semantic: there’s risk of losing data if we remove 
a partition / topic from Kafka. There was a discussion about this topic in the 
user mailing list: 
https://lists.apache.org/thread/7r4h7v5k281w9cnbfw9lb8tp56r30lwt

An immature solution in my mind is that we can remove a split with the help of 
watermark. Once the watermark in a split has been pushed to end of global 
window, then we can assume that there’s no more new records in the split and we 
can remove it safely. But, this will invalidate all previous checkpoints 
because these split might not exist anymore in the external system (like topic 
has been removed in Kafka).

Hope this could answer your question and looking forward to your inspiring 
ideas!

--
Best Regards,

Qingsheng Ren
Email: renqs...@gmail.com
On Nov 10, 2021, 11:32 PM +0800, Mason Chen <mas.chen6...@gmail.com>, wrote:
>
> there was no logic to filter/remove splits

Reply via email to