Hi all, Our team is brainstorming how to solve a particular type of problem with Beam, and it's a bit beyond our experience level, so I thought I'd turn to the experts for some advice.
Here are the pieces of our puzzle: - a data source with the following properties: - rows represent work to do - each row has an integer priority - rows can be added or deleted - priorities of a row can be changed - <10k rows - a slow worker Beam transform (Map/ParDo) that consumes a row at a time We want a streaming pipeline that delivers rows from our data store to the worker transform, resorting the source based on priority each time a new row is delivered. The goal is that last second changes in priority can affect the order of the slowly yielding read. Throughput is not a major concern since the worker is the bottleneck. I have a few questions: - is the sort of problem that BeamSQL can solve? I'm not sure how sorting and resorting are handled there in a streaming context... - I'm unclear on how back-pressure in Flink affects streaming reads. It's my hope that data/messages are left in the data source until back-pressure subsides, rather than read eagerly into memory. Can someone clarify this for me? - is there a combination of windowing and triggering that can solve this continual resorting plus slow yielding problem? It's not unreasonable to keep all of our rows in memory on Flink, as long as we're snapshotting state. Any advice on how an expert Beamer would solve this is greatly appreciated! Thanks, chad