Re: Flink RocksDB Performance

2021-07-20 Thread Robert Metzger
Your understanding of the problem is correct -- the serialization cost is
the reason for the high CPU usage.

What you can also try to optimize is the serializers you are using (by
using data types that are efficient to serialize). See also this blog post:
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html

On Fri, Jul 16, 2021 at 12:02 PM Vijay Bhaskar 
wrote:

> Yes absolutely. Unless we need a very large state order of GB rocks DB is
> not required. RocksDB is good only because the Filesystem is very bad at
> LargeState. In other words FileSystem performs much better than RocksDB
> upto GB's. After that the file system degrades compared to RocksDB. Its not
> that RocksDB is performing better
>
> Regards
> Bhaskar
>
> On Fri, Jul 16, 2021 at 3:24 PM Zakelly Lan  wrote:
>
>> Hi Li Jim,
>> Filesystem performs much better than rocksdb (by multiple times), but it
>> is only suitable for small states. Rocksdb will consume more CPU on
>> background tasks, cache management, serialization/deserialization and
>> compression/decompression. In most cases, performance of the Rocksdb will
>> meet the need.
>> For tuning, please check
>> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/large_state_tuning/#tuning-rocksdb
>> Moreover, you could leverage some tools such as the async-profiler(
>> https://github.com/jvm-profiling-tools/async-profiler) to figure out
>> which part consumes the most CPU.
>>
>> On Fri, Jul 16, 2021 at 3:19 PM Li Jim  wrote:
>>
>>> Hello everyone,
>>> I am using Flink 1.13.1 CEP Library and doing some pressure test.
>>> My message rate is about 16000 records per second.
>>> I find that it cant process more than 16000 records per second because
>>> the CPU cost is up to 100%(say 800% because I allocated 8 vcores to a
>>> taskmanager).
>>> I tried switch to filesystem mode, it gtt faster and cpu cost goes low.
>>> I understand this may because of serialization/deserialization cost in
>>> rocksdb, but in some reason we must use rocksdb as state backend.
>>> Any suggestion to optimize this issue?
>>>
>>>
>>>
>>>
>>>


Re: Flink RocksDB Performance

2021-07-16 Thread Vijay Bhaskar
Yes absolutely. Unless we need a very large state order of GB rocks DB is
not required. RocksDB is good only because the Filesystem is very bad at
LargeState. In other words FileSystem performs much better than RocksDB
upto GB's. After that the file system degrades compared to RocksDB. Its not
that RocksDB is performing better

Regards
Bhaskar

On Fri, Jul 16, 2021 at 3:24 PM Zakelly Lan  wrote:

> Hi Li Jim,
> Filesystem performs much better than rocksdb (by multiple times), but it
> is only suitable for small states. Rocksdb will consume more CPU on
> background tasks, cache management, serialization/deserialization and
> compression/decompression. In most cases, performance of the Rocksdb will
> meet the need.
> For tuning, please check
> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/large_state_tuning/#tuning-rocksdb
> Moreover, you could leverage some tools such as the async-profiler(
> https://github.com/jvm-profiling-tools/async-profiler) to figure out
> which part consumes the most CPU.
>
> On Fri, Jul 16, 2021 at 3:19 PM Li Jim  wrote:
>
>> Hello everyone,
>> I am using Flink 1.13.1 CEP Library and doing some pressure test.
>> My message rate is about 16000 records per second.
>> I find that it cant process more than 16000 records per second because
>> the CPU cost is up to 100%(say 800% because I allocated 8 vcores to a
>> taskmanager).
>> I tried switch to filesystem mode, it gtt faster and cpu cost goes low.
>> I understand this may because of serialization/deserialization cost in
>> rocksdb, but in some reason we must use rocksdb as state backend.
>> Any suggestion to optimize this issue?
>>
>>
>>
>>
>>


Re: Flink RocksDB Performance

2021-07-16 Thread Zakelly Lan
Hi Li Jim,
Filesystem performs much better than rocksdb (by multiple times), but it is
only suitable for small states. Rocksdb will consume more CPU on background
tasks, cache management, serialization/deserialization and
compression/decompression. In most cases, performance of the Rocksdb will
meet the need.
For tuning, please check
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/large_state_tuning/#tuning-rocksdb
Moreover, you could leverage some tools such as the async-profiler(
https://github.com/jvm-profiling-tools/async-profiler) to figure out which
part consumes the most CPU.

On Fri, Jul 16, 2021 at 3:19 PM Li Jim  wrote:

> Hello everyone,
> I am using Flink 1.13.1 CEP Library and doing some pressure test.
> My message rate is about 16000 records per second.
> I find that it cant process more than 16000 records per second because the
> CPU cost is up to 100%(say 800% because I allocated 8 vcores to a
> taskmanager).
> I tried switch to filesystem mode, it gtt faster and cpu cost goes low.
> I understand this may because of serialization/deserialization cost in
> rocksdb, but in some reason we must use rocksdb as state backend.
> Any suggestion to optimize this issue?
>
>
>
>
>


Flink RocksDB Performance

2021-07-16 Thread Li Jim
Hello everyone,
I am using Flink 1.13.1 CEP Library and doing some pressure test.
My message rate is about 16000 records per second.
I find that it cant process more than 16000 records per second because the CPU 
cost is up to 100%(say 800% because I allocated 8 vcores to a taskmanager).
I tried switch to filesystem mode, it gtt faster and cpu cost goes low.
I understand this may because of serialization/deserialization cost in rocksdb, 
but in some reason we must use rocksdb as state backend.
Any suggestion to optimize this issue?