Hi Fabian,

> I'd like to clarify what I said before.
> 
> By using MapState mainly gain two things:
> - position access by index
> - the full list does not need to be deserialized to read values (which is how 
> ListState works).
> 
> Point access should obviously done by get(index). 
> However, iterating over the list should be done by iterating over the entry 
> (or value) set. The entry set iterator will prefetch multiple entries and 
> only deserialize the key / values when you access them. This reduces the 
> number of RocksDB look-ups.

Sorry, I should have been more precise in my description below. I have to do 
incremental iteration (e.g. process the next 10 entries).

I’m assuming I can’t hold onto the iterator across calls to a function, right?

If so, then making get(index) calls via the technique described below is 
currently the most efficient approach, yes?

Thanks,

— Ken


> 2018-02-19 0:10 GMT+01:00 Ken Krugler <kkrugler_li...@transpac.com 
> <mailto:kkrugler_li...@transpac.com>>:
> Hi there,
> 
> I’ve got a MapState where I need to iterate over the entries.
> 
> This currently isn’t supported (at least for Rocks DB), AFAIK, though there 
> is an issue/PR <https://issues.apache.org/jira/browse/FLINK-8297> to improve 
> this.
> 
> The best solution I’ve seen is what Fabian proposed, which involves keeping a 
> ValueState with a count of entries, and then having the key for the MapState 
> be the index.
> 
>> I cannot comment on the internal design, but you could put the data into a
>> RocksDBStateBackend MapState<Integer, X> where the value X is your data
>> type and the key is the list index. You would need another ValueState for
>> the current number of elements that you put into the MapState.
>> A MapState allows to fetch and traverse the key, value, or entry set of the
>> Map without loading it completely into memory.
>> The sets are traversed in sort order of the key, so should be in insertion
>> order (given that you properly increment the list index).
> 
> 
> This effectively lets you iterate over all of the map entries for a given 
> (keyed) state - though it doesn’t solve the “I have to iterate over _every_ 
> entry” situation.
> 
> Is this currently the best option?
> 
> Thanks,
> 
> — Ken
> 
> --------------------------------------------
> http://about.me/kkrugler <http://about.me/kkrugler>
> +1 530-210-6378 <tel:(530)%20210-6378>
> 

--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378

Reply via email to