Hi,

I’d say it seems you are trying to identify bottlenecks in your job, and are 
currently looking at RocksDB Disk I/O as one of the bottlenecks. However, there 
are also other bottlenecks (e.g. CPU/memory/network/sink throttling), and from 
what you described, it’s possible that the HDFS sink is the bottleneck. Are you 
using Flink >= 1.13? If so you can use Flamegraphs on the Flink dashboard to 
debug what the busy operator is doing.

Regards,
Hong



From: Jing Ge <j...@ververica.com>
Date: Thursday, 21 July 2022 at 21:14
To: Yaroslav Tkachenko <yaros...@goldsky.io>
Cc: vtygoss <vtyg...@126.com>, "user@flink.apache.org" <user@flink.apache.org>
Subject: RE: [EXTERNAL]Using RocksDBStateBackend and SSD to store states, 
application runs slower..


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi,

using FLASH_SSD_OPTIMIZED already sets the number of threads to 4. This 
optimization can improve the source throughput and reduce the delayed wrate 
rate.

If this optimization didn't fix the back pressure, could you share more 
information about your job? Could you check the metric of the back pressured 
operator, e.g. check if it is caused by write-heavy or read-heavy tasks? You 
could try tuning rocksdb.writebuffer for write-heavy tasks.

On Thu, Jul 21, 2022 at 5:59 PM Yaroslav Tkachenko 
<yaros...@goldsky.io<mailto:yaros...@goldsky.io>> wrote:
Hi!

I'd try re-running the SSD test with the following config options:

state.backend.rocksdb.thread.num: 4
state.backend.rocksdb.predefined-options: FLASH_SSD_OPTIMIZED


On Thu, Jul 21, 2022 at 4:11 AM vtygoss 
<vtyg...@126.com<mailto:vtyg...@126.com>> wrote:

Hi, community!



I am doing some performance tests based on my scene.



1. Environment

- Flink: 1.13.5

- StateBackend: RocksDB, incremental

- user case: complex sql contains 7 joins and 2 aggregation, input data 
30,000,000 records and output 60,000,000 records about 80GB.

- resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3 slots 
per TM

- only difference: different config 'state.backend.rocksdb.localdir', one SATA 
disk or one SSD disk.



2. rand write performance difference between SATA and SSD

   4.8M/s is archived using SATA, while 48.2M/s using SSD.

   ```

   fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync  -fsync=1 
-runtime=300 -group_reporting -name=xxx -size=100G --allow_mounted_write=1 
-bs=8k  -numjobs=64 -filename=/mnt/disk11/xx

   ```



3. In my use case, Flink SQL application finished in 41minutes using SATA, 
while 45minutes using SSD.



Does this comparision suggest that the way to improve RocksDB performance by 
using SSD is not effective?

The direct downstream of the BackPressure operator is HdfsSink, does that mean 
the best target to improve application performance is HDFS?



Thanks for your any replies or suggestions.



Best Regards!












Reply via email to