Every job is required to have a sink, but there's no requirement that all
output be done via sinks. It's not uncommon, and doesn't have to cause
problems, to have other operators that do I/O.

What can be problematic, however, is doing blocking I/O. While your user
function is blocked, the function will exert back pressure, and checkpoint
barriers will be unable to make any progress. This sometimes leads to
checkpoint timeouts and job failures. So it's recommended to make any I/O
you do asynchronous, using an AsyncFunction [1] or something similar.

Note that the asynchronous i/o function stores the records for in-flight
asynchronous requests in checkpoints, and restores/re-triggers the requests
when recovering from a failure. This might lead to duplicate results if you
are using it to do non-idempotent database writes. If you need
transactions, use a sink that offers them.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html
<https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/asyncio.html>

Best,
David

On Sun, Jul 26, 2020 at 11:08 AM Tom Fennelly <tfenne...@cloudbees.com>
wrote:

> Hi.
>
> What are the negative side effects of (for example) a filter function
> occasionally making a call out to a DB ? Is this a big no-no and should all
> outputs be done through sinks and side outputs, no exceptions ?
>
> Regards,
>
> Tom.
>

Reply via email to