Re: Flink: For terabytes of keyed state.

Gowri Sundaram Tue, 05 May 2020 22:53:43 -0700

Hi Congxian,
Thank you so much for your response! We will go ahead and do a POC to test
out how Flink performs at scale.


Regards,
- Gowri

On Wed, May 6, 2020 at 8:34 AM Congxian Qiu <qcx978132...@gmail.com> wrote:

> Hi
>
> From my experience, you should care the state size for a single task(not
> the whole job state size), the download speed for single thread is almost
> 100 MB/s (this may vary in different env), and I do not have much
> performance for loading state into RocksDB(we use an internal KV store in
> my company), but loading state into RocksDB will not slower than
> downloading from my experience.
>
> Best,
> Congxian
>
>
> Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月3日周日 下午11:25写道：
>
>> Hi Congxian,
>> Thank you so much for your response, that really helps!
>>
>> From your experience, how long does it take for Flink to redistribute
>> terabytes of state data on node addition / node failure.
>>
>> Thanks!
>>
>> On Sun, May 3, 2020 at 6:56 PM Congxian Qiu <qcx978132...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> 1. From my experience, Flink can support such big state, you can set
>>> appropriate parallelism for the stateful operator. for RocksDB you may need
>>> to care about the disk performance.
>>> 2. Inside Flink, the state is separated by key-group, each
>>> task/parallelism contains multiple key-groups.  Flink does not need to
>>> restart when you add a node to the cluster, but every time restart from
>>> savepoint/checkpoint(or failover), Flink needs to redistribute the
>>> checkpoint data, this can be omitted if it's failover and local recovery[1]
>>> is enabled
>>> 3. for upload/download state, you can ref to the multiple thread
>>> upload/download[2][3] for speed up them
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#task-local-recovery
>>> [2] https://issues.apache.org/jira/browse/FLINK-10461
>>> [3] https://issues.apache.org/jira/browse/FLINK-11008
>>>
>>> Best,
>>> Congxian
>>>
>>>
>>> Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月1日周五 下午6:29写道：
>>>
>>>> Hello all,
>>>> We have read in multiple
>>>> <https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html>
>>>> sources <https://flink.apache.org/usecases.html> that Flink has been
>>>> used for use cases with terabytes of application state.
>>>>
>>>> We are considering using Flink for a similar use case with* keyed
>>>> state in the range of 20 to 30 TB*. We had a few questions regarding
>>>> the same.
>>>>
>>>>
>>>>    - *Is Flink a good option for this kind of scale of data* ? We are
>>>>    considering using RocksDB as the state backend.
>>>>    - *What happens when we want to add a node to the cluster *?
>>>>       - As per our understanding, if we have 10 nodes in our cluster,
>>>>       with 20TB of state, this means that adding a node would require the 
>>>> entire
>>>>       20TB of data to be shipped again from the external checkpoint remote
>>>>       storage to the taskmanager nodes.
>>>>       - Assuming 1Gb/s network speed, and assuming all nodes can read
>>>>       their respective 2TB state parallely, this would mean a *minimum
>>>>       downtime of half an hour*. And this is assuming the throughput
>>>>       of the remote storage does not become the bottleneck.
>>>>       - Is there any way to reduce this estimated downtime ?
>>>>
>>>>
>>>> Thank you!
>>>>
>>>

Re: Flink: For terabytes of keyed state.

Reply via email to