Hi to all, we're still playing with Flink streaming part in order to see whether it can improve our current batch pipeline. At the moment, we have a job that translate incoming data (as Row) into Tuple4, groups them together by the first field and persist the result to disk (using a thrift object). When we need to add tuples to those grouped objects we need to read again the persisted data, flat it back to Tuple4, union with the new tuples, re-group by key and finally persist.
This is very expansive to do with batch computation while is should pretty straightforward to do with streaming (from what I understood): I just need to use ListState. Right? Then, let's say I need to scan all the data of the stateful computation (key and values), in order to do some other computation, I'd like to know: - how to do that? I.e. create a DataSet/DataSource<Key,Value> from the stateful data in the stream - is there any problem to access the stateful data without stopping incoming data (and thus possible updates to the states)? Thanks in advance for the support, Flavio