Ok if i am taking it as right for an example :

if  i am creating a keyed state with name "total count by email" for
key(project id + email)  than it will create a single hash-table or column
family "total count by email" and all the unique email id's will be rows of
that single hash-table or column family and than i can store millions of
unique email id's in that.

Means it will create only single state object for all unique email id's ?




On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <se...@apache.org> wrote:

> Each keyed state in Flink is a hashtable or a column family in RocksDB.
> Having too many of those is not memory efficient.
>
> Having fewer states is better, if you can adapt your schema that way.
>
> I would also look into "MapState", which is an efficient way to have "sub
> keys" under a keyed state.
>
> Stephan
>
>
> On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have to compute results on basis of lot of history data, parameters
>> like total transactions in last 1 month, last 1 day, last 1 hour etc. by
>> email id, ip, mobile, name, address, zipcode etc.
>>
>> So my question is this right approach to create keyed state by email,
>> mobile, zipcode etc. or should i create 1 big mapped state (BS) and than
>> process that BS, may be in process function or by applying some loop and
>> filter logic in window or process function.
>>
>> My main worry is i will end up with millions of states, because there can
>> be millions unique emails, phone numbers or zipcode if i create keyed state
>> by email, phone etc.
>>
>> am i right ? is this impact on the performance or is this wrong approach
>> ? Which approach would you suggest in this use case.
>>
>>
>> --
>> Thanks Regards
>>
>> SHASHANK AGARWAL
>>  ---  Trying to mobilize the things....
>>
>>
>>
>>
>>
>


-- 
Thanks Regards

SHASHANK AGARWAL
 ---  Trying to mobilize the things....

Reply via email to