Ok if i am taking it as right for an example : if i am creating a keyed state with name "total count by email" for key(project id + email) than it will create a single hash-table or column family "total count by email" and all the unique email id's will be rows of that single hash-table or column family and than i can store millions of unique email id's in that.
Means it will create only single state object for all unique email id's ? On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <se...@apache.org> wrote: > Each keyed state in Flink is a hashtable or a column family in RocksDB. > Having too many of those is not memory efficient. > > Having fewer states is better, if you can adapt your schema that way. > > I would also look into "MapState", which is an efficient way to have "sub > keys" under a keyed state. > > Stephan > > > On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank...@gmail.com> > wrote: > >> Hello, >> >> I have to compute results on basis of lot of history data, parameters >> like total transactions in last 1 month, last 1 day, last 1 hour etc. by >> email id, ip, mobile, name, address, zipcode etc. >> >> So my question is this right approach to create keyed state by email, >> mobile, zipcode etc. or should i create 1 big mapped state (BS) and than >> process that BS, may be in process function or by applying some loop and >> filter logic in window or process function. >> >> My main worry is i will end up with millions of states, because there can >> be millions unique emails, phone numbers or zipcode if i create keyed state >> by email, phone etc. >> >> am i right ? is this impact on the performance or is this wrong approach >> ? Which approach would you suggest in this use case. >> >> >> -- >> Thanks Regards >> >> SHASHANK AGARWAL >> --- Trying to mobilize the things.... >> >> >> >> >> > -- Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things....