Re: Add MapState for keyed streams
Hi Jark If the state is very big, it may occupy a lot of memory if we return Set>. By wrapping the returned iterator, we can easily implement a method returning Iterable >. Users can use that returned Iterable in the foreach loop. Regards Xiaogang 2016-10-19 17:43 GMT+08:00 Jark Wu : > Hi Xiaogang, > > I think maybe return Set > is better than > Iterator >. > Because users can use foreach on Set but not Iterator, and can use > iterator access via set.iterator(). > Maybe Map.entrySet() is a more familiar way to users. > > > - Jark Wu > > > 在 2016年10月19日,下午5:18,SHI Xiaogang 写道: > > > > Agreed. > > > > contains(K key) should be provided. > > The iterator() method should return Iterator > instead of > > Iterator >. > > > > Besides, size() may also be provided. > > > > With these methods, MapStates appear very similar to Java Maps. Users > will > > be very happy to use them. > > > > Regards, > > Xiaogang > > > > > > 2016-10-19 16:55 GMT+08:00 Till Rohrmann : > > > >> Hi Xiaogang, > >> > >> I really like your proposal and think that this would be a valuable > >> addition to Flink :-) > >> > >> For convenience we could maybe add contains(K key), too. > >> > >> Java's Map interface returns a Set of Entry when calling entrySet() > (which > >> is the equivalent of iterator() in our interface). The Entry interface > not > >> only allows to get access to the key and value of the map entry but also > >> allows to set a value for the respective key via setValue (even though > it's > >> an optional operation). Maybe we want to offer something similar when > >> getting access to the entry set via the iterator method. > >> > >> Cheers, > >> Till > >> > >> On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang > >> wrote: > >> > >>> Hi, all. I created the JIRA https://issues.apache.org/ > >>> jira/browse/FLINK-4856 to > >>> propose adding MapStates into Flink. > >>> > >>> MapStates are very useful in our daily jobs. For example, when > >> implementing > >>> DistinctCount, we store the values into a MapState and the result of > each > >>> group(key) is exactly the number of entries in the MapState. > >>> > >>> In my opinion, the methods provided by the MapState may include: > >>> * void put(K key, V value) > >>> * V get(K key) > >>> * Iterable keys() > >>> * Iterable values() > >>> * Iterator > iterator() > >>> > >>> Do you have any comments? Any is appreciated. > >>> > >>> Xiaogang > >>> > >> > >
Re: Add MapState for keyed streams
Hi Xiaogang, I think maybe return Set> is better than Iterator >. Because users can use foreach on Set but not Iterator, and can use iterator access via set.iterator(). Maybe Map.entrySet() is a more familiar way to users. - Jark Wu > 在 2016年10月19日,下午5:18,SHI Xiaogang 写道: > > Agreed. > > contains(K key) should be provided. > The iterator() method should return Iterator > instead of > Iterator >. > > Besides, size() may also be provided. > > With these methods, MapStates appear very similar to Java Maps. Users will > be very happy to use them. > > Regards, > Xiaogang > > > 2016-10-19 16:55 GMT+08:00 Till Rohrmann : > >> Hi Xiaogang, >> >> I really like your proposal and think that this would be a valuable >> addition to Flink :-) >> >> For convenience we could maybe add contains(K key), too. >> >> Java's Map interface returns a Set of Entry when calling entrySet() (which >> is the equivalent of iterator() in our interface). The Entry interface not >> only allows to get access to the key and value of the map entry but also >> allows to set a value for the respective key via setValue (even though it's >> an optional operation). Maybe we want to offer something similar when >> getting access to the entry set via the iterator method. >> >> Cheers, >> Till >> >> On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang >> wrote: >> >>> Hi, all. I created the JIRA https://issues.apache.org/ >>> jira/browse/FLINK-4856 to >>> propose adding MapStates into Flink. >>> >>> MapStates are very useful in our daily jobs. For example, when >> implementing >>> DistinctCount, we store the values into a MapState and the result of each >>> group(key) is exactly the number of entries in the MapState. >>> >>> In my opinion, the methods provided by the MapState may include: >>> * void put(K key, V value) >>> * V get(K key) >>> * Iterable keys() >>> * Iterable values() >>> * Iterator > iterator() >>> >>> Do you have any comments? Any is appreciated. >>> >>> Xiaogang >>> >>
Re: Add MapState for keyed streams
Agreed. contains(K key) should be provided. The iterator() method should return Iterator> instead of Iterator >. Besides, size() may also be provided. With these methods, MapStates appear very similar to Java Maps. Users will be very happy to use them. Regards, Xiaogang 2016-10-19 16:55 GMT+08:00 Till Rohrmann : > Hi Xiaogang, > > I really like your proposal and think that this would be a valuable > addition to Flink :-) > > For convenience we could maybe add contains(K key), too. > > Java's Map interface returns a Set of Entry when calling entrySet() (which > is the equivalent of iterator() in our interface). The Entry interface not > only allows to get access to the key and value of the map entry but also > allows to set a value for the respective key via setValue (even though it's > an optional operation). Maybe we want to offer something similar when > getting access to the entry set via the iterator method. > > Cheers, > Till > > On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang > wrote: > > > Hi, all. I created the JIRA https://issues.apache.org/ > > jira/browse/FLINK-4856 to > > propose adding MapStates into Flink. > > > > MapStates are very useful in our daily jobs. For example, when > implementing > > DistinctCount, we store the values into a MapState and the result of each > > group(key) is exactly the number of entries in the MapState. > > > > In my opinion, the methods provided by the MapState may include: > > * void put(K key, V value) > > * V get(K key) > > * Iterable keys() > > * Iterable values() > > * Iterator > iterator() > > > > Do you have any comments? Any is appreciated. > > > > Xiaogang > > >
Re: Add MapState for keyed streams
Hi Xiaogang, I really like your proposal and think that this would be a valuable addition to Flink :-) For convenience we could maybe add contains(K key), too. Java's Map interface returns a Set of Entry when calling entrySet() (which is the equivalent of iterator() in our interface). The Entry interface not only allows to get access to the key and value of the map entry but also allows to set a value for the respective key via setValue (even though it's an optional operation). Maybe we want to offer something similar when getting access to the entry set via the iterator method. Cheers, Till On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogangwrote: > Hi, all. I created the JIRA https://issues.apache.org/ > jira/browse/FLINK-4856 to > propose adding MapStates into Flink. > > MapStates are very useful in our daily jobs. For example, when implementing > DistinctCount, we store the values into a MapState and the result of each > group(key) is exactly the number of entries in the MapState. > > In my opinion, the methods provided by the MapState may include: > * void put(K key, V value) > * V get(K key) > * Iterable keys() > * Iterable values() > * Iterator > iterator() > > Do you have any comments? Any is appreciated. > > Xiaogang >
[jira] [Created] (FLINK-4856) Add MapState for keyed streams
Xiaogang Shi created FLINK-4856: --- Summary: Add MapState for keyed streams Key: FLINK-4856 URL: https://issues.apache.org/jira/browse/FLINK-4856 Project: Flink Issue Type: New Feature Components: State Backends, Checkpointing Reporter: Xiaogang Shi Many states in keyed streams are organized as key-value pairs. Currently, these states are implemented by storing the entire map into a ValueState or a ListState. The implementation however is very costly because all entries have to be serialized/deserialized when updating a single entry. To improve the efficiency of these states, MapStates are urgently needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)