Thanks Saikat. I am not sure if sequential keys/timestamps and Kafka like offsets would help if there are many data source clients and many streamer nodes in play; depending on the checkpoint, we might still end up duplicates (unless you're saying each client sequences its payload before sending it to the streamer; even then duplicates are possible on the cache). The only sure way, it seems to me, is for the client that catches the exception to check the cache and only resend the diff, which make things very complex. The other approach, if I am right is, to enable overwrite, so the streamer would dedup the data in cache. The latter is costly too. I think the ideal approach would have been if there were some type of streamer resiliency in place where another streamer node could pick up the buffer from a crashed streamer and continue the work.
On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <[email protected]> wrote: > Hi, > > To minimise data loss during streamer node failure I think we can use the > following steps: > > 1. Use autoFlushFrequency param to set the desired flush frequency, > depending on desired consistency level and performance you can choose how > frequently you would like the data to be flush to Ignite nodes. > > 2. Develop a automated checkpointing process to capture and store the > source data offset, it can be something like kafka message offset or cache > keys if keys are sequential or timestamp for last flush and depending on > that the Ignite client can restart the data streaming process from last > checkpoint if there are node failure. > > HTH > > Regards, > Saikat > > On Fri, Jan 10, 2020 at 4:34 AM narges saleh <[email protected]> wrote: > >> Thanks Saikat for the feedback. >> >> But if I use the overwrite option set to true to avoid duplicates in case >> I have to resend the entire payload in case of a streamer node failure, >> then I won't >> get optimal performance, right? >> What's the best practice for dealing with data streamer node failures? >> Are there examples? >> >> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <[email protected]> >> wrote: >> >>> Hi, >>> >>> AFAIK, the DataStreamer check for presence of key and if it is present >>> in the cache then it does not allow overwrite of value if allowOverwrite is >>> set to false. >>> >>> Regards, >>> Saikat >>> >>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <[email protected]> >>> wrote: >>> >>>> Thanks Andrei. >>>> >>>> If the external data source client sending batches of 2-3 MB say via >>>> TCP socket connection to a bunch of socket streamers (deployed as ignite >>>> services deployed to each ignite node) and say of the streamer nodes die, >>>> the data source client catching the exception, has to check the cache to >>>> see how much of the 2-4MB batch has been flushed to cache and resend the >>>> rest? Would setting streamer with overwrite set to true work, if the data >>>> source client resend the entire batch? >>>> A question regarding streamer with overwrite option set to true. How >>>> does the streamer compare the content the data in hand with the data in >>>> cache, if each record is being assigned UUID when being inserted to cache? >>>> >>>> >>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Not flushed data in a data streamer will be lost. Data streamer works >>>>> thought some Ignite node and in case if this the node failed it can't >>>>> somehow start working with another one. So your application should >>>>> think >>>>> about how to track that all data was loaded (wait for completion of >>>>> loading, catch the exceptions, check the cache sizes, etc) and use >>>>> another client for data loading in case if previous one was failed. >>>>> >>>>> BR, >>>>> Andrei >>>>> >>>>> 1/6/2020 2:37 AM, narges saleh пишет: >>>>> > Hi All, >>>>> > >>>>> > Another question regarding ignite's streamer. >>>>> > What happens to the data if the streamer node crashes before the >>>>> > buffer's content is flushed to the cache? Is the client responsible >>>>> > for making sure the data is persisted or ignite redirects the data >>>>> to >>>>> > another node's streamer? >>>>> > >>>>> > thanks. >>>>> >>>>
