A quick note on this: the side-input API is still ongoing work and it turns out
it’s more complicated (obviously … ) and we will need quite a bit more work on
other parts of Flink before we can provide a good built-in solution.
In the meantime, you can check out the Async I/O operator [1]. I
Hi Josh,
I have a use-case similar to yours. I need to join a stream with data from a
database to which I have access via a REST API. Since the Side inputs API
continues begin and ongoing work. I am wondering how did you approached it,
Did you use the rich function updating it periodically?
Hi Aljoscha,
That sounds exactly like the kind of feature I was looking for, since my
use-case fits the "Join stream with slowly evolving data" example. For now,
I will do an implementation similar to Max's suggestion. Of course it's not
as nice as the proposed feature, as there will be a delay
Hi Josh,
for the first part of your question you might be interested in our ongoing
work of adding side inputs to Flink. I started this design doc:
https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit?usp=sharing
It's still somewhat rough around the edges but could
Hi Josh,
You can trigger an occasional refresh, e.g. on every 100 elements
received. Or, you could start a thread that does that every 100
seconds (possible with a lock involved to prevent processing in the
meantime).
Cheers,
Max
On Mon, May 23, 2016 at 7:36 PM, Josh wrote:
>
Hi Max,
Thanks, that's very helpful re the REST API sink. For now I don't need
exactly once guarantees for the sink, so I'll just write a simple HTTP sink
implementation. But may need to move to the idempotent version in future!
For 1), that sounds like a simple/easy solution, but how would I
Hi Josh,
1) Use a RichFunction which has an `open()` method to load data (e.g. from
a database) at runtime before the processing starts.
2) No that's fine. If you want your Rest API Sink to interplay with
checkpointing (for fault-tolerance), this is a bit tricky though depending
on the
Hi Rami,
Thanks for the fast reply.
1. In your solution, would I need to create a new stream for 'item
updates', and add it as a source of my Flink job? Then I would need to
ensure item updates get broadcast to all nodes that are running my job and
use them to update the in-memory
Hi all,
I am new to Flink and have a couple of questions which I've had trouble
finding answers to online. Any advice would be much appreciated!
1. What's a typical way of handling the scenario where you want to join
streaming data with a (relatively) static data source? For example, if I