Re: Testing Apache Beam pipelines / python SDK

2020-07-21 Thread Robert Bradshaw
If you don't want to write to an actual file, the example with the Check transform should allow you to use Check(...) as you would a sink. (I realize this should have been run_my_pipeline( beam.Create([...]), "Check1" >> Check(equal_to([...])), "Check2" >>

Re: Testing Apache Beam pipelines / python SDK

2020-07-21 Thread Sofia’s World
Hello Robert could you point me to a test sample where a 'mock' sink is used? do you guys have a testing package , which provide an in memory sink where for example i can dump the result of my pipeline (as opposed to writing to a file) ? Additionally, what is the best way to test writing to

RE: Unbounded sources unable to recover from checkpointMark when withMaxReadTime() is used

2020-07-21 Thread Sunny, Mani Kolbe
Hi Alexey, I did explore that route. The problem there is to identify what timestamp to use. While processing records, you can capture the timestamp. This cannot be processing time, but event time on the Kinesis record. As far I see, event time on Kinesis record is generated from

Re: Unbounded sources unable to recover from checkpointMark when withMaxReadTime() is used

2020-07-21 Thread Alexey Romanenko
Hi Mani, Knowing when you last run was stopped (since you use batch mode), could you leverage “withInitialPositionInStream()” or “withInitialTimestampInStream()” for KinesisIO in this case? Alexey > On 21 Jul 2020, at 13:40, Sunny, Mani Kolbe wrote: > > Hi Max, > > Thank you for your

RE: Unbounded sources unable to recover from checkpointMark when withMaxReadTime() is used

2020-07-21 Thread Sunny, Mani Kolbe
Hi Max, Thank you for your reply. Our use case is to run a batch job against a Kinesis source. Our downstream systems are still on batch mode. So this application will read from the Kinesis source periodically and generate batched outputs. Without ability to resume from a checkpoint, it will

Re: Unbounded sources unable to recover from checkpointMark when withMaxReadTime() is used

2020-07-21 Thread Maximilian Michels
Hi Mani, BoundedReadFromUnboundedSource was originally intended to be used in batch pipelines. In batch, runners typically do not perform checkpointing. In case of failures, they re-run the entire pipeline. Keep in mind that, even with checkpointing, reading for a finite time in the