Hi Mason, Sorry for my late response! > quote_type > “there was no logic to filter/remove splits”
Yes we indeed miss a split removal mechanism. Actually this is quite a tricky one considering exactly-once semantic: there’s risk of losing data if we remove a partition / topic from Kafka. There was a discussion about this topic in the user mailing list: https://lists.apache.org/thread/7r4h7v5k281w9cnbfw9lb8tp56r30lwt An immature solution in my mind is that we can remove a split with the help of watermark. Once the watermark in a split has been pushed to end of global window, then we can assume that there’s no more new records in the split and we can remove it safely. But, this will invalidate all previous checkpoints because these split might not exist anymore in the external system (like topic has been removed in Kafka). Hope this could answer your question and looking forward to your inspiring ideas! -- Best Regards, Qingsheng Ren Email: renqs...@gmail.com On Nov 10, 2021, 11:32 PM +0800, Mason Chen <mas.chen6...@gmail.com>, wrote: > > there was no logic to filter/remove splits