Hello! If you use it in a smart way you can get very close performance (to allowOverwrite=true data streamer), I guess.
Just call it with a decent number of entries belonging to the same cache partition from multiple threads, with non-intersecting keys of course. Regards, -- Ilya Kasnacheev чт, 16 янв. 2020 г. в 21:29, narges saleh <[email protected]>: > Hello Ilya, > > If I use putAll() operation then I won't get the streamer's bulk > performance, would I? I have a huge amount of data to persist. > > thanks. > > On Thu, Jan 16, 2020 at 8:43 AM Ilya Kasnacheev <[email protected]> > wrote: > >> Hello! >> >> I think you should consider using putAll() operation if resiliency is >> important for you, since this operation will be salvaged if initiator node >> fails. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> чт, 16 янв. 2020 г. в 15:48, narges saleh <[email protected]>: >> >>> Thanks Saikat. >>> >>> I am not sure if sequential keys/timestamps and Kafka like offsets would >>> help if there are many data source clients and many streamer nodes in play; >>> depending on the checkpoint, we might still end up duplicates (unless >>> you're saying each client sequences its payload before sending it to the >>> streamer; even then duplicates are possible on the cache). The only sure >>> way, it seems to me, is for the client that catches the exception to check >>> the cache and only resend the diff, which make things very complex. The >>> other approach, if I am right is, to enable overwrite, so the streamer >>> would dedup the data in cache. The latter is costly too. I think the ideal >>> approach would have been if there were some type of streamer resiliency in >>> place where another streamer node could pick up the buffer from a crashed >>> streamer and continue the work. >>> >>> >>> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> To minimise data loss during streamer node failure I think we can use >>>> the following steps: >>>> >>>> 1. Use autoFlushFrequency param to set the desired flush frequency, >>>> depending on desired consistency level and performance you can choose how >>>> frequently you would like the data to be flush to Ignite nodes. >>>> >>>> 2. Develop a automated checkpointing process to capture and store the >>>> source data offset, it can be something like kafka message offset or cache >>>> keys if keys are sequential or timestamp for last flush and depending on >>>> that the Ignite client can restart the data streaming process from last >>>> checkpoint if there are node failure. >>>> >>>> HTH >>>> >>>> Regards, >>>> Saikat >>>> >>>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <[email protected]> >>>> wrote: >>>> >>>>> Thanks Saikat for the feedback. >>>>> >>>>> But if I use the overwrite option set to true to avoid duplicates in >>>>> case I have to resend the entire payload in case of a streamer node >>>>> failure, then I won't >>>>> get optimal performance, right? >>>>> What's the best practice for dealing with data streamer node failures? >>>>> Are there examples? >>>>> >>>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> AFAIK, the DataStreamer check for presence of key and if it is >>>>>> present in the cache then it does not allow overwrite of value if >>>>>> allowOverwrite is set to false. >>>>>> >>>>>> Regards, >>>>>> Saikat >>>>>> >>>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks Andrei. >>>>>>> >>>>>>> If the external data source client sending batches of 2-3 MB say via >>>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite >>>>>>> services deployed to each ignite node) and say of the streamer nodes >>>>>>> die, >>>>>>> the data source client catching the exception, has to check the cache to >>>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the >>>>>>> rest? Would setting streamer with overwrite set to true work, if the >>>>>>> data >>>>>>> source client resend the entire batch? >>>>>>> A question regarding streamer with overwrite option set to true. How >>>>>>> does the streamer compare the content the data in hand with the data in >>>>>>> cache, if each record is being assigned UUID when being inserted to >>>>>>> cache? >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Not flushed data in a data streamer will be lost. Data streamer >>>>>>>> works >>>>>>>> thought some Ignite node and in case if this the node failed it >>>>>>>> can't >>>>>>>> somehow start working with another one. So your application should >>>>>>>> think >>>>>>>> about how to track that all data was loaded (wait for completion of >>>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use >>>>>>>> another client for data loading in case if previous one was failed. >>>>>>>> >>>>>>>> BR, >>>>>>>> Andrei >>>>>>>> >>>>>>>> 1/6/2020 2:37 AM, narges saleh пишет: >>>>>>>> > Hi All, >>>>>>>> > >>>>>>>> > Another question regarding ignite's streamer. >>>>>>>> > What happens to the data if the streamer node crashes before the >>>>>>>> > buffer's content is flushed to the cache? Is the client >>>>>>>> responsible >>>>>>>> > for making sure the data is persisted or ignite redirects the >>>>>>>> data to >>>>>>>> > another node's streamer? >>>>>>>> > >>>>>>>> > thanks. >>>>>>>> >>>>>>>
