Re: RocksDB CPU resource usage

2021-06-17 Thread Padarn Wilson
Thanks both for the suggestions, all good ideas. I will try some of the
profiling suggestions and report back.

On Thu, Jun 17, 2021 at 4:13 PM Yun Tang  wrote:

> Hi Padarn,
>
> From my experiences, de-/serialization might not consume 3x CPU usage, and
> the background compaction could also increase the CPU usage. You could use
> async-profiler [1] to figure out what really consumed your CPU usage as it
> could also detect the native RocksDB thread stack.
>
>
> [1] https://github.com/jvm-profiling-tools/async-profiler
>
> Best
> Yun Tang
>
> --
> *From:* Robert Metzger 
> *Sent:* Thursday, June 17, 2021 14:11
> *To:* Padarn Wilson 
> *Cc:* JING ZHANG ; user 
> *Subject:* Re: RocksDB CPU resource usage
>
> If you are able to execute your job locally as well (with enough data),
> you can also run it with a profiler and see the CPU cycles spent on
> serialization (you can also use RocksDB locally)
>
> On Wed, Jun 16, 2021 at 3:51 PM Padarn Wilson  wrote:
>
> Thanks Robert. I think it would be easy enough to test this hypothesis by
> making the same comparison with some simpler state inside the aggregation
> window.
>
> On Wed, 16 Jun 2021, 7:58 pm Robert Metzger,  wrote:
>
> Depending on the datatypes you are using, seeing 3x more CPU usage seems
> realistic.
> Serialization can be quite expensive. See also:
> https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
> Maybe it makes sense to optimize there a bit.
>
> On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG  wrote:
>
> Hi Padarn,
> After switch stateBackend from filesystem to rocksdb, all reads/writes
> from/to backend have to go through de-/serialization to retrieve/store the
> state objects, this may cause more cpu cost.
> But I'm not sure it is the main reason leads to 3x CPU cost in your job.
> To find out the reason, we need more profile on CPU cost, such as Flame
> Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported
> in Flink[1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/
>
> Best,
> JING ZHANG
>
> Padarn Wilson  于2021年6月15日周二 下午5:05写道:
>
> Hi all,
>
> We have a job that we just enabled rocksdb on (instead of file backend),
> and see that the CPU usage is almost 3x greater on (we had to increase
> taskmanagers 3x to get it to run.
>
> I don't really understand this, is there something we can look at to
> understand why CPU use is so high? Our state mostly consists of aggregation
> windows.
>
> Cheers,
> Padarn
>
>


Re: RocksDB CPU resource usage

2021-06-17 Thread Yun Tang
Hi Padarn,

>From my experiences, de-/serialization might not consume 3x CPU usage, and the 
>background compaction could also increase the CPU usage. You could use 
>async-profiler [1] to figure out what really consumed your CPU usage as it 
>could also detect the native RocksDB thread stack.


[1] https://github.com/jvm-profiling-tools/async-profiler

Best
Yun Tang


From: Robert Metzger 
Sent: Thursday, June 17, 2021 14:11
To: Padarn Wilson 
Cc: JING ZHANG ; user 
Subject: Re: RocksDB CPU resource usage

If you are able to execute your job locally as well (with enough data), you can 
also run it with a profiler and see the CPU cycles spent on serialization (you 
can also use RocksDB locally)

On Wed, Jun 16, 2021 at 3:51 PM Padarn Wilson 
mailto:pad...@gmail.com>> wrote:
Thanks Robert. I think it would be easy enough to test this hypothesis by 
making the same comparison with some simpler state inside the aggregation 
window.

On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, 
mailto:rmetz...@apache.org>> wrote:
Depending on the datatypes you are using, seeing 3x more CPU usage seems 
realistic.
Serialization can be quite expensive. See also: 
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html 
Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG 
mailto:beyond1...@gmail.com>> wrote:
Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes from/to 
backend have to go through de-/serialization to retrieve/store the state 
objects, this may cause more cpu cost.
But I'm not sure it is the main reason leads to 3x CPU cost in your job.
To find out the reason, we need more profile on CPU cost, such as Flame Graphs. 
BTW, starting with Flink 1.13, Flame Graphs are natively supported in Flink[1].

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,
JING ZHANG

Padarn Wilson mailto:pad...@gmail.com>> 于2021年6月15日周二 
下午5:05写道:
Hi all,

We have a job that we just enabled rocksdb on (instead of file backend), and 
see that the CPU usage is almost 3x greater on (we had to increase taskmanagers 
3x to get it to run.

I don't really understand this, is there something we can look at to understand 
why CPU use is so high? Our state mostly consists of aggregation windows.

Cheers,
Padarn


Re: RocksDB CPU resource usage

2021-06-17 Thread Robert Metzger
If you are able to execute your job locally as well (with enough data), you
can also run it with a profiler and see the CPU cycles spent on
serialization (you can also use RocksDB locally)

On Wed, Jun 16, 2021 at 3:51 PM Padarn Wilson  wrote:

> Thanks Robert. I think it would be easy enough to test this hypothesis by
> making the same comparison with some simpler state inside the aggregation
> window.
>
> On Wed, 16 Jun 2021, 7:58 pm Robert Metzger,  wrote:
>
>> Depending on the datatypes you are using, seeing 3x more CPU usage seems
>> realistic.
>> Serialization can be quite expensive. See also:
>> https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
>> Maybe it makes sense to optimize there a bit.
>>
>> On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG  wrote:
>>
>>> Hi Padarn,
>>> After switch stateBackend from filesystem to rocksdb, all reads/writes
>>> from/to backend have to go through de-/serialization to retrieve/store the
>>> state objects, this may cause more cpu cost.
>>> But I'm not sure it is the main reason leads to 3x CPU cost in your job.
>>> To find out the reason, we need more profile on CPU cost, such as Flame
>>> Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported
>>> in Flink[1].
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/
>>>
>>> Best,
>>> JING ZHANG
>>>
>>> Padarn Wilson  于2021年6月15日周二 下午5:05写道:
>>>
 Hi all,

 We have a job that we just enabled rocksdb on (instead of file
 backend), and see that the CPU usage is almost 3x greater on (we had to
 increase taskmanagers 3x to get it to run.

 I don't really understand this, is there something we can look at to
 understand why CPU use is so high? Our state mostly consists of aggregation
 windows.

 Cheers,
 Padarn

>>>


Re: RocksDB CPU resource usage

2021-06-16 Thread Padarn Wilson
Thanks Robert. I think it would be easy enough to test this hypothesis by
making the same comparison with some simpler state inside the aggregation
window.

On Wed, 16 Jun 2021, 7:58 pm Robert Metzger,  wrote:

> Depending on the datatypes you are using, seeing 3x more CPU usage seems
> realistic.
> Serialization can be quite expensive. See also:
> https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
> Maybe it makes sense to optimize there a bit.
>
> On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG  wrote:
>
>> Hi Padarn,
>> After switch stateBackend from filesystem to rocksdb, all reads/writes
>> from/to backend have to go through de-/serialization to retrieve/store the
>> state objects, this may cause more cpu cost.
>> But I'm not sure it is the main reason leads to 3x CPU cost in your job.
>> To find out the reason, we need more profile on CPU cost, such as Flame
>> Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported
>> in Flink[1].
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/
>>
>> Best,
>> JING ZHANG
>>
>> Padarn Wilson  于2021年6月15日周二 下午5:05写道:
>>
>>> Hi all,
>>>
>>> We have a job that we just enabled rocksdb on (instead of file backend),
>>> and see that the CPU usage is almost 3x greater on (we had to increase
>>> taskmanagers 3x to get it to run.
>>>
>>> I don't really understand this, is there something we can look at to
>>> understand why CPU use is so high? Our state mostly consists of aggregation
>>> windows.
>>>
>>> Cheers,
>>> Padarn
>>>
>>


Re: RocksDB CPU resource usage

2021-06-16 Thread Robert Metzger
Depending on the datatypes you are using, seeing 3x more CPU usage seems
realistic.
Serialization can be quite expensive. See also:
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
Maybe it makes sense to optimize there a bit.

On Tue, Jun 15, 2021 at 5:23 PM JING ZHANG  wrote:

> Hi Padarn,
> After switch stateBackend from filesystem to rocksdb, all reads/writes
> from/to backend have to go through de-/serialization to retrieve/store the
> state objects, this may cause more cpu cost.
> But I'm not sure it is the main reason leads to 3x CPU cost in your job.
> To find out the reason, we need more profile on CPU cost, such as Flame
> Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported
> in Flink[1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/
>
> Best,
> JING ZHANG
>
> Padarn Wilson  于2021年6月15日周二 下午5:05写道:
>
>> Hi all,
>>
>> We have a job that we just enabled rocksdb on (instead of file backend),
>> and see that the CPU usage is almost 3x greater on (we had to increase
>> taskmanagers 3x to get it to run.
>>
>> I don't really understand this, is there something we can look at to
>> understand why CPU use is so high? Our state mostly consists of aggregation
>> windows.
>>
>> Cheers,
>> Padarn
>>
>


Re: RocksDB CPU resource usage

2021-06-15 Thread JING ZHANG
Hi Padarn,
After switch stateBackend from filesystem to rocksdb, all reads/writes
from/to backend have to go through de-/serialization to retrieve/store the
state objects, this may cause more cpu cost.
But I'm not sure it is the main reason leads to 3x CPU cost in your job.
To find out the reason, we need more profile on CPU cost, such as Flame
Graphs. BTW, starting with Flink 1.13, Flame Graphs are natively supported
in Flink[1].

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/debugging/flame_graphs/

Best,
JING ZHANG

Padarn Wilson  于2021年6月15日周二 下午5:05写道:

> Hi all,
>
> We have a job that we just enabled rocksdb on (instead of file backend),
> and see that the CPU usage is almost 3x greater on (we had to increase
> taskmanagers 3x to get it to run.
>
> I don't really understand this, is there something we can look at to
> understand why CPU use is so high? Our state mostly consists of aggregation
> windows.
>
> Cheers,
> Padarn
>


RocksDB CPU resource usage

2021-06-15 Thread Padarn Wilson
Hi all,

We have a job that we just enabled rocksdb on (instead of file backend),
and see that the CPU usage is almost 3x greater on (we had to increase
taskmanagers 3x to get it to run.

I don't really understand this, is there something we can look at to
understand why CPU use is so high? Our state mostly consists of aggregation
windows.

Cheers,
Padarn