In the previous email you gave me 2 solutions 1. Bloom filter --> problem in repopulating the bloom filter on restarts 2. keeping the state of the unique ids
Please elaborate on 2. On Wed, Jan 25, 2017 at 10:53 AM, Burak Yavuz <brk...@gmail.com> wrote: > I don't have any sample code, but on a high level: > > My state would be: (Long, BloomFilter[UUID]) > In the update function, my value will be the UUID of the record, since the > word itself is the key. > I'll ask my BloomFilter if I've seen this UUID before. If not increase > count, also add to Filter. > > Does that make sense? > > > On Wed, Jan 25, 2017 at 9:28 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Hi Burak, >> Thanks for the response. Can you please elaborate on your idea of storing >> the state of the unique ids. >> Do you have any sample code or links I can refer to. >> Thanks >> >> On Wed, Jan 25, 2017 at 9:13 AM, Burak Yavuz <brk...@gmail.com> wrote: >> >>> Off the top of my head... (Each may have it's own issues) >>> >>> If upstream you add a uniqueId to all your records, then you may use a >>> BloomFilter to approximate if you've seen a row before. >>> The problem I can see with that approach is how to repopulate the bloom >>> filter on restarts. >>> >>> If you are certain that you're not going to reprocess some data after a >>> certain time, i.e. there is no way I'm going to get the same data in 2 >>> hours, it may only happen in the last 2 hours, then you may also keep the >>> state of uniqueId's as well, and then age them out after a certain time. >>> >>> >>> Best, >>> Burak >>> >>> On Tue, Jan 24, 2017 at 9:53 PM, shyla deshpande < >>> deshpandesh...@gmail.com> wrote: >>> >>>> Please share your thoughts..... >>>> >>>> On Tue, Jan 24, 2017 at 4:01 PM, shyla deshpande < >>>> deshpandesh...@gmail.com> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jan 24, 2017 at 9:44 AM, shyla deshpande < >>>>> deshpandesh...@gmail.com> wrote: >>>>> >>>>>> My streaming application stores lot of aggregations using >>>>>> mapWithState. >>>>>> >>>>>> I want to know what are all the possible ways I can make it >>>>>> idempotent. >>>>>> >>>>>> Please share your views. >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Mon, Jan 23, 2017 at 5:41 PM, shyla deshpande < >>>>>> deshpandesh...@gmail.com> wrote: >>>>>> >>>>>>> In a Wordcount application which stores the count of all the words >>>>>>> input so far using mapWithState. How do I make sure my counts are not >>>>>>> messed up if I happen to read a line more than once? >>>>>>> >>>>>>> Appreciate your response. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >