Hi Fabian, > I'd like to clarify what I said before. > > By using MapState mainly gain two things: > - position access by index > - the full list does not need to be deserialized to read values (which is how > ListState works). > > Point access should obviously done by get(index). > However, iterating over the list should be done by iterating over the entry > (or value) set. The entry set iterator will prefetch multiple entries and > only deserialize the key / values when you access them. This reduces the > number of RocksDB look-ups.
Sorry, I should have been more precise in my description below. I have to do incremental iteration (e.g. process the next 10 entries). I’m assuming I can’t hold onto the iterator across calls to a function, right? If so, then making get(index) calls via the technique described below is currently the most efficient approach, yes? Thanks, — Ken > 2018-02-19 0:10 GMT+01:00 Ken Krugler <kkrugler_li...@transpac.com > <mailto:kkrugler_li...@transpac.com>>: > Hi there, > > I’ve got a MapState where I need to iterate over the entries. > > This currently isn’t supported (at least for Rocks DB), AFAIK, though there > is an issue/PR <https://issues.apache.org/jira/browse/FLINK-8297> to improve > this. > > The best solution I’ve seen is what Fabian proposed, which involves keeping a > ValueState with a count of entries, and then having the key for the MapState > be the index. > >> I cannot comment on the internal design, but you could put the data into a >> RocksDBStateBackend MapState<Integer, X> where the value X is your data >> type and the key is the list index. You would need another ValueState for >> the current number of elements that you put into the MapState. >> A MapState allows to fetch and traverse the key, value, or entry set of the >> Map without loading it completely into memory. >> The sets are traversed in sort order of the key, so should be in insertion >> order (given that you properly increment the list index). > > > This effectively lets you iterate over all of the map entries for a given > (keyed) state - though it doesn’t solve the “I have to iterate over _every_ > entry” situation. > > Is this currently the best option? > > Thanks, > > — Ken > > -------------------------------------------- > http://about.me/kkrugler <http://about.me/kkrugler> > +1 530-210-6378 <tel:(530)%20210-6378> > -------------------------------------------- http://about.me/kkrugler +1 530-210-6378