Hi, community!

I am doing some performance tests based on my scene. 


1. Environment
- Flink: 1.13.5
- StateBackend: RocksDB, incremental
- user case: complex sql contains 7 joins and 2 aggregation, input data 
30,000,000 records and output 60,000,000 records about 80GB. 
- resource: flink on yarn. JM 2G, one TM 24G(8G on-heap, 16G off-heap). 3 slots 
per TM
- only difference: different config 'state.backend.rocksdb.localdir', one SATA 
disk or one SSD disk.


2. rand write performance difference between SATA and SSD
   4.8M/s is archived using SATA, while 48.2M/s using SSD.
   ```
   fio -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=sync  -fsync=1 
-runtime=300 -group_reporting -name=xxx -size=100G --allow_mounted_write=1 
-bs=8k  -numjobs=64 -filename=/mnt/disk11/xx
   ``` 


3. In my use case, Flink SQL application finished in 41minutes using SATA, 
while 45minutes using SSD. 


Does this comparision suggest that the way to improve RocksDB performance by 
using SSD is not effective? 
The direct downstream of the BackPressure operator is HdfsSink, does that mean 
the best target to improve application performance is HDFS?


Thanks for your any replies or suggestions. 


Best Regards!

Reply via email to