Thanks a lot!  got it :)

On Wed, Nov 21, 2018 at 11:40 PM Jamie Grier <jgr...@lyft.com> wrote:

> Hi Avi,
>
> The typical approach would be as you've described in #1.  #2 is not
> necessary -- #1 is already doing basically exactly that.
>
> -Jamie
>
>
> On Wed, Nov 21, 2018 at 3:36 AM Avi Levi <avi.l...@bluevoyant.com> wrote:
>
>> Hi ,
>> I am very new to flink so please be gentle :)
>>
>> *The challenge:*
>> I have a road sensor that should scan billons of cars per day. for
>> starter I want to recognise if each car that passes by is new or not. new
>> cars (never been seen before by that sensor ) will be placed on a different
>> topic on kafka than the other (total of two topics for new and old) .
>>  under the assumption that the state will contain billions of unique car
>> ids.
>>
>> *Suggested Solutions*
>> My question is it which approach is better.
>> Both approaches using RocksDB
>>
>> 1. use the ValueState and to split the steam like
>>   *val domainsSrc = env*
>> *    .addSource(consumer)*
>> *    .keyBy(car => car.id <http://car.id>)*
>> *    .map(...)*
>> and checking if the state value is null to recognise new cars. if new
>> than I will update the state
>> how will the persistent data will be shard among the nodes in the cluster
>> (let's say that I have 10 nodes) ?
>>
>> 2. use MapState and to partition the stream to groups by some arbitrary
>> factor e.g
>> *val domainsSrc = env*
>> *    .addSource(consumer)*
>> *    .keyBy{ car =>*
>> *        val h car.id.hashCode % partitionFactor*
>> *        math.abs(h)*
>> *    } .map(...)*
>> and to check *mapState.keys.contains(car.id <http://car.id>) *if not -
>> add it to the state
>>
>> which approach is better ?
>>
>> Thanks in advance
>> Avi
>>
>

Reply via email to