Each keyed state in Flink is a hashtable or a column family in RocksDB.
Having too many of those is not memory efficient.

Having fewer states is better, if you can adapt your schema that way.

I would also look into "MapState", which is an efficient way to have "sub
keys" under a keyed state.

Stephan


On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank...@gmail.com>
wrote:

> Hello,
>
> I have to compute results on basis of lot of history data, parameters like
> total transactions in last 1 month, last 1 day, last 1 hour etc. by email
> id, ip, mobile, name, address, zipcode etc.
>
> So my question is this right approach to create keyed state by email,
> mobile, zipcode etc. or should i create 1 big mapped state (BS) and than
> process that BS, may be in process function or by applying some loop and
> filter logic in window or process function.
>
> My main worry is i will end up with millions of states, because there can
> be millions unique emails, phone numbers or zipcode if i create keyed state
> by email, phone etc.
>
> am i right ? is this impact on the performance or is this wrong approach ?
> Which approach would you suggest in this use case.
>
>
> --
> Thanks Regards
>
> SHASHANK AGARWAL
>  ---  Trying to mobilize the things....
>
>
>
>
>

Reply via email to