Thank you. Checkpoints timeout often, even though the timeout limit is 20 minutes. The volume of records in our processing window that require checkpointing is large (between 200000 and 2 million). I made the assumption that Flink would batch a blob of bytes to S3, and not create an S3 call per record. Is this assumption correct?
I need to look into whether I am being rate-limited by amazon. I assumed that a rate limiting error would have bubbled up as an error in the logs. I will find a way to assure that error is logged or captured somehow. How would backpressure come into play during checkpointing? I would expect Amazon to have enough resources. When I turn my sink (the next operator) into a print, it fails during checkpointing as well. I will explore what you mentioned though. Thank you. On Mon, Feb 1, 2021 at 6:53 AM Piotr Nowojski <[email protected]> wrote: > Hi, > > Yes, it's working. You would need to analyse what's working slower than > expected. Checkpointing times? (Async duration? Sync duration? Start > delay/back pressure?) Throughput? Recovery/startup? Are you being rate > limited by Amazon? > > Piotrek > > czw., 28 sty 2021 o 03:46 Marco Villalobos <[email protected]> > napisaĆ(a): > >> Just curious, has anybody had success with Amazon EMR with RocksDB and >> checkpointing in S3? >> >> That's the configuration I am trying to setup, but my system is running >> more slowly than expected. >> >
