Hi to all,
we're still playing with Flink streaming part in order to see whether it
can improve our current batch pipeline.
At the moment, we have a job that translate incoming data (as Row) into
Tuple4, groups them together by the first field and persist the result to
disk (using a thrift object). When we need to add tuples to those grouped
objects we need to read again the persisted data, flat it back to Tuple4,
union with the new tuples, re-group by key and finally persist.

This is very expansive to do with batch computation while is should pretty
straightforward to do with streaming (from what I understood): I just need
to use ListState. Right?
Then, let's say I need to scan all the data of the stateful computation
(key and values), in order to do some other computation, I'd like to know:

   - how to do that? I.e. create a DataSet/DataSource<Key,Value> from the
   stateful data in the stream
   - is there any problem to access the stateful data without stopping
   incoming data (and thus possible updates to the states)?

Thanks in advance for the support,
Flavio

Reply via email to