Hello Ilya,

If I use putAll() operation then I won't get the streamer's bulk
performance, would I? I have a huge amount of data to persist.

thanks.

On Thu, Jan 16, 2020 at 8:43 AM Ilya Kasnacheev <[email protected]>
wrote:

> Hello!
>
> I think you should consider using putAll() operation if resiliency is
> important for you, since this operation will be salvaged if initiator node
> fails.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 16 янв. 2020 г. в 15:48, narges saleh <[email protected]>:
>
>> Thanks Saikat.
>>
>> I am not sure if sequential keys/timestamps and Kafka like offsets would
>> help if there are many data source clients and many streamer nodes in play;
>> depending on the checkpoint, we might still end up duplicates (unless
>> you're saying each client sequences its payload before sending it to the
>> streamer; even then duplicates are possible on the cache). The only sure
>> way, it seems to me, is for the client that catches the exception to check
>> the cache and only resend the diff, which make things very complex. The
>> other approach, if I am right is, to enable overwrite, so the streamer
>> would dedup the data in cache. The latter is costly too. I think the ideal
>> approach would have been if there were some type of streamer resiliency in
>> place where another streamer node could pick up the buffer from a crashed
>> streamer and continue the work.
>>
>>
>> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> To minimise data loss during streamer node failure I think we can use
>>> the following steps:
>>>
>>> 1. Use autoFlushFrequency param to set the desired flush frequency,
>>> depending on desired consistency level and performance you can choose how
>>> frequently you would like the data to be flush to Ignite nodes.
>>>
>>> 2. Develop a automated checkpointing process to capture and store the
>>> source data offset, it can be something like kafka message offset or cache
>>> keys if keys are sequential or timestamp for last flush and depending on
>>> that the Ignite client can restart the data streaming process from last
>>> checkpoint if there are node failure.
>>>
>>> HTH
>>>
>>> Regards,
>>> Saikat
>>>
>>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <[email protected]>
>>> wrote:
>>>
>>>> Thanks Saikat for the feedback.
>>>>
>>>> But if I use the overwrite option set to true to avoid duplicates in
>>>> case I have to resend the entire payload in case of a streamer node
>>>> failure, then I won't
>>>>  get optimal performance, right?
>>>> What's the best practice for dealing with data streamer node failures?
>>>> Are there examples?
>>>>
>>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> AFAIK, the DataStreamer check for presence of key and if it is present
>>>>> in the cache then it does not allow overwrite of value if allowOverwrite 
>>>>> is
>>>>> set to false.
>>>>>
>>>>> Regards,
>>>>> Saikat
>>>>>
>>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Andrei.
>>>>>>
>>>>>> If the external data source client sending batches of 2-3 MB say via
>>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite
>>>>>> services deployed to each ignite node) and say of the streamer nodes die,
>>>>>> the data source client catching the exception, has to check the cache to
>>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>>>>> rest? Would setting streamer with overwrite set to true work, if the data
>>>>>> source client resend the entire batch?
>>>>>> A question regarding streamer with overwrite option set to true. How
>>>>>> does the streamer compare the content the data in hand with the data in
>>>>>> cache, if each record is being assigned UUID when being  inserted to 
>>>>>> cache?
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Not flushed data in a data streamer will be lost. Data streamer
>>>>>>> works
>>>>>>> thought some Ignite node and in case if this the node failed it
>>>>>>> can't
>>>>>>> somehow start working with another one. So your application should
>>>>>>> think
>>>>>>> about how to track that all data was loaded (wait for completion of
>>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>>>>> another client for data loading in case if previous one was failed.
>>>>>>>
>>>>>>> BR,
>>>>>>> Andrei
>>>>>>>
>>>>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>>>>> > Hi All,
>>>>>>> >
>>>>>>> > Another question regarding ignite's streamer.
>>>>>>> > What happens to the data if the streamer node crashes before the
>>>>>>> > buffer's content is flushed to the cache? Is the client
>>>>>>> responsible
>>>>>>> > for making sure the data is persisted or ignite redirects the data
>>>>>>> to
>>>>>>> > another node's streamer?
>>>>>>> >
>>>>>>> > thanks.
>>>>>>>
>>>>>>

Reply via email to