Re: Streamer and data loss

Ilya Kasnacheev Mon, 20 Jan 2020 03:47:17 -0800

Hello!

If you use it in a smart way you can get very close performance (to
allowOverwrite=true data streamer), I guess.


Just call it with a decent number of entries belonging to the same cache
partition from multiple threads, with non-intersecting keys of course.

Regards,
-- 
Ilya Kasnacheev


чт, 16 янв. 2020 г. в 21:29, narges saleh <[email protected]>:

> Hello Ilya,
>
> If I use putAll() operation then I won't get the streamer's bulk
> performance, would I? I have a huge amount of data to persist.
>
> thanks.
>
> On Thu, Jan 16, 2020 at 8:43 AM Ilya Kasnacheev <[email protected]>
> wrote:
>
>> Hello!
>>
>> I think you should consider using putAll() operation if resiliency is
>> important for you, since this operation will be salvaged if initiator node
>> fails.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 16 янв. 2020 г. в 15:48, narges saleh <[email protected]>:
>>
>>> Thanks Saikat.
>>>
>>> I am not sure if sequential keys/timestamps and Kafka like offsets would
>>> help if there are many data source clients and many streamer nodes in play;
>>> depending on the checkpoint, we might still end up duplicates (unless
>>> you're saying each client sequences its payload before sending it to the
>>> streamer; even then duplicates are possible on the cache). The only sure
>>> way, it seems to me, is for the client that catches the exception to check
>>> the cache and only resend the diff, which make things very complex. The
>>> other approach, if I am right is, to enable overwrite, so the streamer
>>> would dedup the data in cache. The latter is costly too. I think the ideal
>>> approach would have been if there were some type of streamer resiliency in
>>> place where another streamer node could pick up the buffer from a crashed
>>> streamer and continue the work.
>>>
>>>
>>> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> To minimise data loss during streamer node failure I think we can use
>>>> the following steps:
>>>>
>>>> 1. Use autoFlushFrequency param to set the desired flush frequency,
>>>> depending on desired consistency level and performance you can choose how
>>>> frequently you would like the data to be flush to Ignite nodes.
>>>>
>>>> 2. Develop a automated checkpointing process to capture and store the
>>>> source data offset, it can be something like kafka message offset or cache
>>>> keys if keys are sequential or timestamp for last flush and depending on
>>>> that the Ignite client can restart the data streaming process from last
>>>> checkpoint if there are node failure.
>>>>
>>>> HTH
>>>>
>>>> Regards,
>>>> Saikat
>>>>
>>>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks Saikat for the feedback.
>>>>>
>>>>> But if I use the overwrite option set to true to avoid duplicates in
>>>>> case I have to resend the entire payload in case of a streamer node
>>>>> failure, then I won't
>>>>>  get optimal performance, right?
>>>>> What's the best practice for dealing with data streamer node failures?
>>>>> Are there examples?
>>>>>
>>>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> AFAIK, the DataStreamer check for presence of key and if it is
>>>>>> present in the cache then it does not allow overwrite of value if
>>>>>> allowOverwrite is set to false.
>>>>>>
>>>>>> Regards,
>>>>>> Saikat
>>>>>>
>>>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Andrei.
>>>>>>>
>>>>>>> If the external data source client sending batches of 2-3 MB say via
>>>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite
>>>>>>> services deployed to each ignite node) and say of the streamer nodes 
>>>>>>> die,
>>>>>>> the data source client catching the exception, has to check the cache to
>>>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>>>>>> rest? Would setting streamer with overwrite set to true work, if the 
>>>>>>> data
>>>>>>> source client resend the entire batch?
>>>>>>> A question regarding streamer with overwrite option set to true. How
>>>>>>> does the streamer compare the content the data in hand with the data in
>>>>>>> cache, if each record is being assigned UUID when being  inserted to 
>>>>>>> cache?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Not flushed data in a data streamer will be lost. Data streamer
>>>>>>>> works
>>>>>>>> thought some Ignite node and in case if this the node failed it
>>>>>>>> can't
>>>>>>>> somehow start working with another one. So your application should
>>>>>>>> think
>>>>>>>> about how to track that all data was loaded (wait for completion of
>>>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>>>>>> another client for data loading in case if previous one was failed.
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> Andrei
>>>>>>>>
>>>>>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>>>>>> > Hi All,
>>>>>>>> >
>>>>>>>> > Another question regarding ignite's streamer.
>>>>>>>> > What happens to the data if the streamer node crashes before the
>>>>>>>> > buffer's content is flushed to the cache? Is the client
>>>>>>>> responsible
>>>>>>>> > for making sure the data is persisted or ignite redirects the
>>>>>>>> data to
>>>>>>>> > another node's streamer?
>>>>>>>> >
>>>>>>>> > thanks.
>>>>>>>>
>>>>>>>

Re: Streamer and data loss

Reply via email to