Hello Ilya, If I use putAll() operation then I won't get the streamer's bulk performance, would I? I have a huge amount of data to persist.
thanks. On Thu, Jan 16, 2020 at 8:43 AM Ilya Kasnacheev <[email protected]> wrote: > Hello! > > I think you should consider using putAll() operation if resiliency is > important for you, since this operation will be salvaged if initiator node > fails. > > Regards, > -- > Ilya Kasnacheev > > > чт, 16 янв. 2020 г. в 15:48, narges saleh <[email protected]>: > >> Thanks Saikat. >> >> I am not sure if sequential keys/timestamps and Kafka like offsets would >> help if there are many data source clients and many streamer nodes in play; >> depending on the checkpoint, we might still end up duplicates (unless >> you're saying each client sequences its payload before sending it to the >> streamer; even then duplicates are possible on the cache). The only sure >> way, it seems to me, is for the client that catches the exception to check >> the cache and only resend the diff, which make things very complex. The >> other approach, if I am right is, to enable overwrite, so the streamer >> would dedup the data in cache. The latter is costly too. I think the ideal >> approach would have been if there were some type of streamer resiliency in >> place where another streamer node could pick up the buffer from a crashed >> streamer and continue the work. >> >> >> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <[email protected]> >> wrote: >> >>> Hi, >>> >>> To minimise data loss during streamer node failure I think we can use >>> the following steps: >>> >>> 1. Use autoFlushFrequency param to set the desired flush frequency, >>> depending on desired consistency level and performance you can choose how >>> frequently you would like the data to be flush to Ignite nodes. >>> >>> 2. Develop a automated checkpointing process to capture and store the >>> source data offset, it can be something like kafka message offset or cache >>> keys if keys are sequential or timestamp for last flush and depending on >>> that the Ignite client can restart the data streaming process from last >>> checkpoint if there are node failure. >>> >>> HTH >>> >>> Regards, >>> Saikat >>> >>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <[email protected]> >>> wrote: >>> >>>> Thanks Saikat for the feedback. >>>> >>>> But if I use the overwrite option set to true to avoid duplicates in >>>> case I have to resend the entire payload in case of a streamer node >>>> failure, then I won't >>>> get optimal performance, right? >>>> What's the best practice for dealing with data streamer node failures? >>>> Are there examples? >>>> >>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> AFAIK, the DataStreamer check for presence of key and if it is present >>>>> in the cache then it does not allow overwrite of value if allowOverwrite >>>>> is >>>>> set to false. >>>>> >>>>> Regards, >>>>> Saikat >>>>> >>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks Andrei. >>>>>> >>>>>> If the external data source client sending batches of 2-3 MB say via >>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite >>>>>> services deployed to each ignite node) and say of the streamer nodes die, >>>>>> the data source client catching the exception, has to check the cache to >>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the >>>>>> rest? Would setting streamer with overwrite set to true work, if the data >>>>>> source client resend the entire batch? >>>>>> A question regarding streamer with overwrite option set to true. How >>>>>> does the streamer compare the content the data in hand with the data in >>>>>> cache, if each record is being assigned UUID when being inserted to >>>>>> cache? >>>>>> >>>>>> >>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Not flushed data in a data streamer will be lost. Data streamer >>>>>>> works >>>>>>> thought some Ignite node and in case if this the node failed it >>>>>>> can't >>>>>>> somehow start working with another one. So your application should >>>>>>> think >>>>>>> about how to track that all data was loaded (wait for completion of >>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use >>>>>>> another client for data loading in case if previous one was failed. >>>>>>> >>>>>>> BR, >>>>>>> Andrei >>>>>>> >>>>>>> 1/6/2020 2:37 AM, narges saleh пишет: >>>>>>> > Hi All, >>>>>>> > >>>>>>> > Another question regarding ignite's streamer. >>>>>>> > What happens to the data if the streamer node crashes before the >>>>>>> > buffer's content is flushed to the cache? Is the client >>>>>>> responsible >>>>>>> > for making sure the data is persisted or ignite redirects the data >>>>>>> to >>>>>>> > another node's streamer? >>>>>>> > >>>>>>> > thanks. >>>>>>> >>>>>>
