In the previous email you gave me 2 solutions
1. Bloom filter --> problem in repopulating the bloom filter on restarts
2. keeping the state of the unique ids

Please elaborate on 2.



On Wed, Jan 25, 2017 at 10:53 AM, Burak Yavuz <brk...@gmail.com> wrote:

> I don't have any sample code, but on a high level:
>
> My state would be: (Long, BloomFilter[UUID])
> In the update function, my value will be the UUID of the record, since the
> word itself is the key.
> I'll ask my BloomFilter if I've seen this UUID before. If not increase
> count, also add to Filter.
>
> Does that make sense?
>
>
> On Wed, Jan 25, 2017 at 9:28 AM, shyla deshpande <deshpandesh...@gmail.com
> > wrote:
>
>> Hi Burak,
>> Thanks for the response. Can you please elaborate on your idea of storing
>> the state of the unique ids.
>> Do you have any sample code or links I can refer to.
>> Thanks
>>
>> On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz <brk...@gmail.com> wrote:
>>
>>> Off the top of my head... (Each may have it's own issues)
>>>
>>> If upstream you add a uniqueId to all your records, then you may use a
>>> BloomFilter to approximate if you've seen a row before.
>>> The problem I can see with that approach is how to repopulate the bloom
>>> filter on restarts.
>>>
>>> If you are certain that you're not going to reprocess some data after a
>>> certain time, i.e. there is no way I'm going to get the same data in 2
>>> hours, it may only happen in the last 2 hours, then you may also keep the
>>> state of uniqueId's as well, and then age them out after a certain time.
>>>
>>>
>>> Best,
>>> Burak
>>>
>>> On Tue, Jan 24, 2017 at 9:53 PM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
>>>> Please share your thoughts.....
>>>>
>>>> On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande <
>>>> deshpandesh...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande <
>>>>> deshpandesh...@gmail.com> wrote:
>>>>>
>>>>>> My streaming application stores lot of aggregations using
>>>>>> mapWithState.
>>>>>>
>>>>>> I want to know what are all the possible ways I can make it
>>>>>> idempotent.
>>>>>>
>>>>>> Please share your views.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande <
>>>>>> deshpandesh...@gmail.com> wrote:
>>>>>>
>>>>>>> In a Wordcount application which  stores the count of all the words
>>>>>>> input so far using mapWithState.  How do I make sure my counts are not
>>>>>>> messed up if I happen to read a line more than once?
>>>>>>>
>>>>>>> Appreciate your response.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to