Re: Unbounded sources unable to recover from checkpointMark when withMaxReadTime() is used

2020-07-28 Thread Maximilian Michels
Hi Mani, The implementation of BoundedReadFromUnboundedSource currently doesn't allow that in the same way sources checkpoint. We would either have to convert it into a proper source (it's a DoFn atm), or store the checkpoint mark in Beam managed state (which will be checkpointed). The latter

Re: Partition unbounded collection like Kafka source

2020-07-28 Thread wang Wu
Yes my issue is the lag increasing. We are using Spark Runner. Source is Kafka and Sink is Cassandra. We tune the batch interval and max records per batch but the batch interval still less than the processing time of each batch. So it causes the latency. We tried to apply Reshuffle withRandomKey on