This makes perfect sense to me. Thanks Congxian and Kostas for your inputs.

Gagan

On Thu, Jan 10, 2019 at 6:03 PM Kostas Kloudas <k.klou...@da-platform.com>
wrote:

> Hi Gagan,
>
> I agree with Congxian!
> In MapState, when accessing the state/value associated with a key in the
> map, then the whole value is de-serialized (and serialized in case of a
> put()).
> Given this, it is more efficient to have many keys, with small state, than
> fewer keys with huge state.
>
> Cheers,
> Kostas
>
>
> On Thu, Jan 10, 2019 at 12:34 PM Congxian Qiu <qcx978132...@gmail.com>
> wrote:
>
>> Hi, Gagan Agrawal
>>
>> In my opinion, I prefer the first.
>>
>> Here is the reason.
>>
>> In RocksDB StateBackend, we will serialize the key, namespace, user-key
>> into a serialized bytes (key-bytes) and serialize user-value to serialized
>> bytes(value-bytes) then insert  into the key-bytes/value-bytes into
>> RocksDB, when retrieving from RocksDB we can user get(for a single
>> key/value) or iterator(for a key range).
>>
>> If we store four maps into a single MapState, we need to deserialize the
>> value-bytes(a Map) when we want to retrieve a single user-value.
>>
>>
>> Gagan Agrawal <agrawalga...@gmail.com> 于2019年1月10日周四 上午10:38写道:
>>
>>> Hi,
>>> I have a use case where 4 streams get merged (union) and grouped on
>>> common key (keyBy) and a custom KeyedProcessFunction is called. Now I need
>>> to keep state (RocksDB backend) for all 4 streams in my custom
>>> KeyedProcessFunction where each of these 4 streams would be stored as map.
>>> So I have 2 options
>>>
>>> 1. Create a separate MapStateDescriptor for each of these streams and
>>> store their events separately.
>>> 2. Create a single MapStateDescriptor where there will be only 4 keys
>>> (corresponding to 4 stream types) and value will be of type Map which
>>> further keep events from respective streams.
>>>
>>> I want to understand from performance perspective, would there be any
>>> difference in above approaches. Will keeping 4 different MapState cause 4
>>> lookups for RocksDB backend when they are accessed? Or all of these
>>> MapStates are internally stored within RocksDB in single row corresponding
>>> to respective key (as per keyedStream) and hence they are all fetched in
>>> single call before operator's processElement is called? If there are
>>> different lookups in RocksDB for each of MapStateDescriptor, then I think
>>> keeping them in single MapStateDescriptor would be more efficient minimize
>>> RocksDB calls? Please advise.
>>>
>>> Gagan
>>>
>>
>>
>> --
>> Best,
>> Congxian
>>
>

Reply via email to