Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
Hi Jark

If the state is very big, it may occupy a lot of memory if we return
Set>.

By wrapping the returned iterator, we can easily implement a method
returning  Iterable>.
Users can use that returned Iterable in the foreach loop.

Regards
Xiaogang



2016-10-19 17:43 GMT+08:00 Jark Wu :

> Hi Xiaogang,
>
> I think maybe return Set> is better than
> Iterator>.
> Because users can use foreach on Set but not Iterator, and can use
> iterator access via set.iterator().
> Maybe Map.entrySet() is a more familiar way to users.
>
>
> - Jark Wu
>
> > 在 2016年10月19日,下午5:18,SHI Xiaogang  写道:
> >
> > Agreed.
> >
> > contains(K key) should be provided.
> > The iterator() method should return Iterator> instead of
> > Iterator>.
> >
> > Besides, size() may also be provided.
> >
> > With these methods, MapStates appear very similar to Java Maps. Users
> will
> > be very happy to use them.
> >
> > Regards,
> > Xiaogang
> >
> >
> > 2016-10-19 16:55 GMT+08:00 Till Rohrmann :
> >
> >> Hi Xiaogang,
> >>
> >> I really like your proposal and think that this would be a valuable
> >> addition to Flink :-)
> >>
> >> For convenience we could maybe add contains(K key), too.
> >>
> >> Java's Map interface returns a Set of Entry when calling entrySet()
> (which
> >> is the equivalent of iterator() in our interface). The Entry interface
> not
> >> only allows to get access to the key and value of the map entry but also
> >> allows to set a value for the respective key via setValue (even though
> it's
> >> an optional operation). Maybe we want to offer something similar when
> >> getting access to the entry set via the iterator method.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang 
> >> wrote:
> >>
> >>> Hi, all. I created the JIRA https://issues.apache.org/
> >>> jira/browse/FLINK-4856 to
> >>> propose adding MapStates into Flink.
> >>>
> >>> MapStates are very useful in our daily jobs. For example, when
> >> implementing
> >>> DistinctCount, we store the values into a MapState and the result of
> each
> >>> group(key) is exactly the number of entries in the MapState.
> >>>
> >>> In my opinion, the methods provided by the MapState may include:
> >>> * void put(K key, V value)
> >>> * V get(K key)
> >>> * Iterable keys()
> >>> * Iterable values()
> >>> * Iterator> iterator()
> >>>
> >>> Do you have any comments? Any is appreciated.
> >>>
> >>> Xiaogang
> >>>
> >>
>
>


Re: Add MapState for keyed streams

2016-10-19 Thread Jark Wu
Hi Xiaogang,

I think maybe return Set> is better than 
Iterator>. 
Because users can use foreach on Set but not Iterator, and can use iterator 
access via set.iterator(). 
Maybe Map.entrySet() is a more familiar way to users.


- Jark Wu 

> 在 2016年10月19日,下午5:18,SHI Xiaogang  写道:
> 
> Agreed.
> 
> contains(K key) should be provided.
> The iterator() method should return Iterator> instead of
> Iterator>.
> 
> Besides, size() may also be provided.
> 
> With these methods, MapStates appear very similar to Java Maps. Users will
> be very happy to use them.
> 
> Regards,
> Xiaogang
> 
> 
> 2016-10-19 16:55 GMT+08:00 Till Rohrmann :
> 
>> Hi Xiaogang,
>> 
>> I really like your proposal and think that this would be a valuable
>> addition to Flink :-)
>> 
>> For convenience we could maybe add contains(K key), too.
>> 
>> Java's Map interface returns a Set of Entry when calling entrySet() (which
>> is the equivalent of iterator() in our interface). The Entry interface not
>> only allows to get access to the key and value of the map entry but also
>> allows to set a value for the respective key via setValue (even though it's
>> an optional operation). Maybe we want to offer something similar when
>> getting access to the entry set via the iterator method.
>> 
>> Cheers,
>> Till
>> 
>> On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang 
>> wrote:
>> 
>>> Hi, all. I created the JIRA https://issues.apache.org/
>>> jira/browse/FLINK-4856 to
>>> propose adding MapStates into Flink.
>>> 
>>> MapStates are very useful in our daily jobs. For example, when
>> implementing
>>> DistinctCount, we store the values into a MapState and the result of each
>>> group(key) is exactly the number of entries in the MapState.
>>> 
>>> In my opinion, the methods provided by the MapState may include:
>>> * void put(K key, V value)
>>> * V get(K key)
>>> * Iterable keys()
>>> * Iterable values()
>>> * Iterator> iterator()
>>> 
>>> Do you have any comments? Any is appreciated.
>>> 
>>> Xiaogang
>>> 
>> 



Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
Agreed.

contains(K key) should be provided.
The iterator() method should return Iterator> instead of
Iterator>.

Besides, size() may also be provided.

With these methods, MapStates appear very similar to Java Maps. Users will
be very happy to use them.

Regards,
Xiaogang


2016-10-19 16:55 GMT+08:00 Till Rohrmann :

> Hi Xiaogang,
>
> I really like your proposal and think that this would be a valuable
> addition to Flink :-)
>
> For convenience we could maybe add contains(K key), too.
>
> Java's Map interface returns a Set of Entry when calling entrySet() (which
> is the equivalent of iterator() in our interface). The Entry interface not
> only allows to get access to the key and value of the map entry but also
> allows to set a value for the respective key via setValue (even though it's
> an optional operation). Maybe we want to offer something similar when
> getting access to the entry set via the iterator method.
>
> Cheers,
> Till
>
> On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang 
> wrote:
>
> > Hi, all. I created the JIRA https://issues.apache.org/
> > jira/browse/FLINK-4856 to
> > propose adding MapStates into Flink.
> >
> > MapStates are very useful in our daily jobs. For example, when
> implementing
> > DistinctCount, we store the values into a MapState and the result of each
> > group(key) is exactly the number of entries in the MapState.
> >
> > In my opinion, the methods provided by the MapState may include:
> > * void put(K key, V value)
> > * V get(K key)
> > * Iterable keys()
> > * Iterable values()
> > * Iterator> iterator()
> >
> > Do you have any comments? Any is appreciated.
> >
> > Xiaogang
> >
>


Re: Add MapState for keyed streams

2016-10-19 Thread Till Rohrmann
Hi Xiaogang,

I really like your proposal and think that this would be a valuable
addition to Flink :-)

For convenience we could maybe add contains(K key), too.

Java's Map interface returns a Set of Entry when calling entrySet() (which
is the equivalent of iterator() in our interface). The Entry interface not
only allows to get access to the key and value of the map entry but also
allows to set a value for the respective key via setValue (even though it's
an optional operation). Maybe we want to offer something similar when
getting access to the entry set via the iterator method.

Cheers,
Till

On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang 
wrote:

> Hi, all. I created the JIRA https://issues.apache.org/
> jira/browse/FLINK-4856 to
> propose adding MapStates into Flink.
>
> MapStates are very useful in our daily jobs. For example, when implementing
> DistinctCount, we store the values into a MapState and the result of each
> group(key) is exactly the number of entries in the MapState.
>
> In my opinion, the methods provided by the MapState may include:
> * void put(K key, V value)
> * V get(K key)
> * Iterable keys()
> * Iterable values()
> * Iterator> iterator()
>
> Do you have any comments? Any is appreciated.
>
> Xiaogang
>


[jira] [Created] (FLINK-4856) Add MapState for keyed streams

2016-10-18 Thread Xiaogang Shi (JIRA)
Xiaogang Shi created FLINK-4856:
---

 Summary: Add MapState for keyed streams
 Key: FLINK-4856
 URL: https://issues.apache.org/jira/browse/FLINK-4856
 Project: Flink
  Issue Type: New Feature
  Components: State Backends, Checkpointing
Reporter: Xiaogang Shi


Many states in keyed streams are organized as key-value pairs. Currently, these 
states are implemented by storing the entire map into a ValueState or a 
ListState. The implementation however is very costly because all entries have 
to be serialized/deserialized when updating a single entry. To improve the 
efficiency of these states, MapStates are urgently needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)