I'd like to clarify what I said before.
By using MapState mainly gain two things:
- position access by index
- the full list does not need to be deserialized to read values (which is
how ListState works).
Point access should obviously done by get(index).
However, iterating over the list should be done by iterating over the entry
(or value) set. The entry set iterator will prefetch multiple entries and
only deserialize the key / values when you access them. This reduces the
number of RocksDB look-ups.
2018-02-19 0:10 GMT+01:00 Ken Krugler <kkrugler_li...@transpac.com>:
> Hi there,
> I’ve got a MapState where I need to iterate over the entries.
> This currently isn’t supported (at least for Rocks DB), AFAIK, though
> there is an issue/PR <https://issues.apache.org/jira/browse/FLINK-8297> to
> improve this.
> The best solution I’ve seen is what Fabian proposed, which involves
> keeping a ValueState with a count of entries, and then having the key for
> the MapState be the index.
> I cannot comment on the internal design, but you could put the data into a
> RocksDBStateBackend MapState<Integer, X> where the value X is your data
> type and the key is the list index. You would need another ValueState for
> the current number of elements that you put into the MapState.
> A MapState allows to fetch and traverse the key, value, or entry set of the
> Map without loading it completely into memory.
> The sets are traversed in sort order of the key, so should be in insertion
> order (given that you properly increment the list index).
> This effectively lets you iterate over all of the map entries for a given
> (keyed) state - though it doesn’t solve the “I have to iterate over _every_
> entry” situation.
> Is this currently the best option?
> — Ken
> +1 530-210-6378 <(530)%20210-6378>