Hello! I think you should consider using putAll() operation if resiliency is important for you, since this operation will be salvaged if initiator node fails.
Regards, -- Ilya Kasnacheev чт, 16 янв. 2020 г. в 15:48, narges saleh <[email protected]>: > Thanks Saikat. > > I am not sure if sequential keys/timestamps and Kafka like offsets would > help if there are many data source clients and many streamer nodes in play; > depending on the checkpoint, we might still end up duplicates (unless > you're saying each client sequences its payload before sending it to the > streamer; even then duplicates are possible on the cache). The only sure > way, it seems to me, is for the client that catches the exception to check > the cache and only resend the diff, which make things very complex. The > other approach, if I am right is, to enable overwrite, so the streamer > would dedup the data in cache. The latter is costly too. I think the ideal > approach would have been if there were some type of streamer resiliency in > place where another streamer node could pick up the buffer from a crashed > streamer and continue the work. > > > On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <[email protected]> > wrote: > >> Hi, >> >> To minimise data loss during streamer node failure I think we can use the >> following steps: >> >> 1. Use autoFlushFrequency param to set the desired flush frequency, >> depending on desired consistency level and performance you can choose how >> frequently you would like the data to be flush to Ignite nodes. >> >> 2. Develop a automated checkpointing process to capture and store the >> source data offset, it can be something like kafka message offset or cache >> keys if keys are sequential or timestamp for last flush and depending on >> that the Ignite client can restart the data streaming process from last >> checkpoint if there are node failure. >> >> HTH >> >> Regards, >> Saikat >> >> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <[email protected]> >> wrote: >> >>> Thanks Saikat for the feedback. >>> >>> But if I use the overwrite option set to true to avoid duplicates in >>> case I have to resend the entire payload in case of a streamer node >>> failure, then I won't >>> get optimal performance, right? >>> What's the best practice for dealing with data streamer node failures? >>> Are there examples? >>> >>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> AFAIK, the DataStreamer check for presence of key and if it is present >>>> in the cache then it does not allow overwrite of value if allowOverwrite is >>>> set to false. >>>> >>>> Regards, >>>> Saikat >>>> >>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <[email protected]> >>>> wrote: >>>> >>>>> Thanks Andrei. >>>>> >>>>> If the external data source client sending batches of 2-3 MB say via >>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite >>>>> services deployed to each ignite node) and say of the streamer nodes die, >>>>> the data source client catching the exception, has to check the cache to >>>>> see how much of the 2-4MB batch has been flushed to cache and resend the >>>>> rest? Would setting streamer with overwrite set to true work, if the data >>>>> source client resend the entire batch? >>>>> A question regarding streamer with overwrite option set to true. How >>>>> does the streamer compare the content the data in hand with the data in >>>>> cache, if each record is being assigned UUID when being inserted to >>>>> cache? >>>>> >>>>> >>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Not flushed data in a data streamer will be lost. Data streamer works >>>>>> thought some Ignite node and in case if this the node failed it can't >>>>>> somehow start working with another one. So your application should >>>>>> think >>>>>> about how to track that all data was loaded (wait for completion of >>>>>> loading, catch the exceptions, check the cache sizes, etc) and use >>>>>> another client for data loading in case if previous one was failed. >>>>>> >>>>>> BR, >>>>>> Andrei >>>>>> >>>>>> 1/6/2020 2:37 AM, narges saleh пишет: >>>>>> > Hi All, >>>>>> > >>>>>> > Another question regarding ignite's streamer. >>>>>> > What happens to the data if the streamer node crashes before the >>>>>> > buffer's content is flushed to the cache? Is the client responsible >>>>>> > for making sure the data is persisted or ignite redirects the data >>>>>> to >>>>>> > another node's streamer? >>>>>> > >>>>>> > thanks. >>>>>> >>>>>
